This is a small project to kick-off my Natural language processing (NLP) journey. It is based on the project "Discover Insights into Classic Texts" from the course "Apply Natural Language Processing with Python skill Path" of CodeCademy.com
The goal of this project is to perform a simple analysis of the classical book Peter Pan and to discover who are the most mentioned characters. The file containing the book was downloaded from Project Gutenberg.
virtualenv --python=/usr/bin/python3.6 ~/NLP
source ~/NLP/bin/activate
In a terminal, clone this repository wherever you want:
git clone https://github.com/irenebosque/NLP-analyzing-Peter-Pan.git
Then, in the terminal copy/paste the following:
pip install nltk
python
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
To perform the analysis you need to run the file called script.py
python script.py
The most relevant final results are:
((('peter', 'NN'),), 319)
((('wendy', 'NN'),), 180)
((('hook', 'NN'),), 127)
((('john', 'NN'),), 116)
((('michael', 'NN'),), 71)
Looking at most_common_np_chunks, you can identify characters of importance in the text such as Peter, Wendy, Hook, John and Michael, based on their frequency.