assignment 1 This repository consist of various text preprocessing techniques which we required when we solving a Natural Language Processing problems with unstructured textual dataset
here 3 colab files for 3 type of text.which student course, tweeter, & research paper each code has 4 sections .
1.Tokenization
2.spelling correction
3.stemmer
4.lemetization
Tokenization is a powerful way of dealing with text data. Inflected Language. "In grammar, inflection is the modification of a word to express different grammatical categories such as tense, case, voice, aspect, person, number, gender, and mood. An inflection expresses one or more grammatical categories with a prefix, suffix or infix, or another internal modification such as a vowel change" .
Techniques Used Stemming and Lemmatization are widely used in tagging systems, indexing, SEOs, Web search results, and information retrieval. For example, searching for fish on Google will also result in fishes, fishing as fish is the stem of both words