Skip to content

egorcherkasoff/regex-nlp-text-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing "War and Peace, vol. 1" with Regular Expressions and NLTK

This Jupyter Notebook project uses Python's re (regular expression) and nltk (Natural Language Toolkit) packages to analyze Leo Tolstoy's classic novel, "War and Peace, vol. 1" in the original Russian language.

Getting Started

Before running this notebook on your local machine, you will need to clone this repository. You might also need to install the following packages:

  • nltk: for natural language processing pip install nltk

The notebook contains several code cells that analyze the text of the novel using regular expressions and NLTK.

Why I made this

This Jupyter Notebook project demonstrates how regular expressions and the NLTK package can be used to analyze the text of a classic novel in a foreign language. By using regular expressions to extract words and NLTK to process them, we were able to identify the most common words in the novel and remove common stopwords, as well as determine the overall mood of the book. But for the most part, I was just practicing with regex here.

About

Jupyter notebook, where I use nlp and practice regex

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published