Skip to content

haziranz/IR-text-pre-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IR-text-pre-processing

assignment 1 This repository consist of various text preprocessing techniques which we required when we solving a Natural Language Processing problems with unstructured textual dataset

here 3 colab files for 3 type of text.which student course, tweeter, & research paper each code has 4 sections .

1.Tokenization

2.spelling correction

3.stemmer

4.lemetization

Tokenization is a powerful way of dealing with text data. Inflected Language. "In grammar, inflection is the modification of a word to express different grammatical categories such as tense, case, voice, aspect, person, number, gender, and mood. An inflection expresses one or more grammatical categories with a prefix, suffix or infix, or another internal modification such as a vowel change" .

Techniques Used Stemming and Lemmatization are widely used in tagging systems, indexing, SEOs, Web search results, and information retrieval. For example, searching for fish on Google will also result in fishes, fishing as fish is the stem of both words

About

CS 5615: Information Retrieval - Assignment 1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published