A simple spellchecking system to detect non-word and real-word errors, and providing word suggestions based on the minimal edit distance between the input word and the words in the corpus.
Here are some screenshots of the spellchecker being used:
From the screenshot above, the wrong words or 'typos' are highlighted in red, and right-clicking the error word shows a list of suggested words based on the corpus, in order of their minimal edit distance.
Upon selecting a word from the list, the wrong word is now corrected.
If the input word is not found in our corpus, users can add it into the dictionary list, and it will not be detected as an error thereafter.
Credits This project was the collective effort of my classmates in the MSc Data Science program, and the job distribution were as follows:
Ms. Lam Ying Xian was our group leader, who was responsible for majority of the literature research work that gave us an idea of which libraries to use, how we should be implementing the edit distances and the n-gram models. She sourced the document which is a digital marketing textbook used as our corpus. SHe also delegated the roles to each of us and constantly monitored the work progress, as well as implementing various versions of the code and testing the spellchecking system together with me and Mr. Rakan Bani Melhem.
Mr. Rakan Bani Melhem wrote majority of the backend code as well as the N-gram model, whilst I wrote the implementation of minimal edit distance and the front-end GUI codes. I was also responsible for streamlining the final code to integrate well with the front-end GUI.
Mr. Thines Kumar and Mr. Adnan Islam were in charged of formulating the programe design flowchart, as well completion of the report to be submitted as a group assignment, and making sure no errors in spelling and grammer were present, as well as maintaining report aesthetics and formatting.