Skip to content

Latest commit

 

History

History
2 lines (2 loc) · 504 Bytes

README.md

File metadata and controls

2 lines (2 loc) · 504 Bytes

Preprocessing text data using NLP methods drawing

Base programm of preprocessing consists of tokenization, removing stop words and punctuation, lemmatization. Def was build with the help of spaCy - a library for advanced Natural Language Processing in Python. Use the link to read more about spaCy usage. Alternative version was build with nltk package.