IR-text-pre-processing

assignment 1 This repository consist of various text preprocessing techniques which we required when we solving a Natural Language Processing problems with unstructured textual dataset

here 3 colab files for 3 type of text.which student course, tweeter, & research paper each code has 4 sections .

1.Tokenization

2.spelling correction

3.stemmer

4.lemetization

Tokenization is a powerful way of dealing with text data. Inflected Language. "In grammar, inflection is the modification of a word to express different grammatical categories such as tense, case, voice, aspect, person, number, gender, and mood. An inflection expresses one or more grammatical categories with a prefix, suffix or infix, or another internal modification such as a vowel change" .

Techniques Used Stemming and Lemmatization are widely used in tagging systems, indexing, SEOs, Web search results, and information retrieval. For example, searching for fish on Google will also result in fishes, fishing as fish is the stem of both words

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
output		output
src		src
README.md		README.md
Student_Course_Feedback.ipynb		Student_Course_Feedback.ipynb
research_paper.ipynb		research_paper.ipynb
tweeter_feed.ipynb		tweeter_feed.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IR-text-pre-processing

About

Uh oh!

Releases

Packages

Languages

haziranz/IR-text-pre-processing

Folders and files

Latest commit

History

Repository files navigation

IR-text-pre-processing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages