Skip to content

irenebosque/NLP-analyzing-Peter-Pan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP - Analyzing Peter Pan

This is a small project to kick-off my Natural language processing (NLP) journey. It is based on the project "Discover Insights into Classic Texts" from the course "Apply Natural Language Processing with Python skill Path" of CodeCademy.com

Alt Text

Goal

The goal of this project is to perform a simple analysis of the classical book Peter Pan and to discover who are the most mentioned characters. The file containing the book was downloaded from Project Gutenberg.

Installation

Create a virtual environment (optional)

virtualenv --python=/usr/bin/python3.6 ~/NLP
source ~/NLP/bin/activate

Clone the repository

In a terminal, clone this repository wherever you want:

git clone https://github.com/irenebosque/NLP-analyzing-Peter-Pan.git

Install additional requirements

Then, in the terminal copy/paste the following:

pip install nltk
python
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

Run the code

To perform the analysis you need to run the file called script.py

python script.py

Analysis of the results

The most relevant final results are:

((('peter', 'NN'),), 319)
((('wendy', 'NN'),), 180)
((('hook', 'NN'),), 127)
((('john', 'NN'),), 116)
((('michael', 'NN'),), 71)

Looking at most_common_np_chunks, you can identify characters of importance in the text such as Peter, Wendy, Hook, John and Michael, based on their frequency.

About

Small project to kick-off my NLP journey

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages