NLP on the Books of Harry Potter

This repo demonstrates a collection of NLP tasks all using the books of Harry Potter for source documents. Individual tasks can be read about here:

Instructions for BasicNLP class (basic_nlp.py)

Functions of the class are topic modeling with LDA, document summarization, and sentiment analysis.

Initialize the class with a list of documents and an optional list of document titles, for example:

texts = ['this is the first document', 'this is the second document', 'this is the third document']
titles = ['doc1', 'doc2', 'doc3']

nlp = BasicNLP(texts, titles)

LDA:
1. Create an elbow plot and print the coherence scores by specifying the number of topics to include, with:
```
nlp.compute_coherence(start=5, stop=20, step=3)
```
2. Set the number of topics to use in the model with:
```
nlp.set_number_of_topics(10)
```
3. View the clusters (only available in Jupyter notebook):
```
import pyLDAvis
pyLDAvis.enable_notebook()
vis = nlp.view_clusters()
pyLDAvis.display(vis)
```
4. Get the vocabulary for each topic in the LDA model with (topics can be 'all', a list of integers, or a single integer):
```
nlp.get_topic_vocabulary(topics='all', num_words=10)
```
5. Get the documents most highly associated with the given topics with:
```
nlp.get_representative_documents(topics='all', num_docs=1)
```
6. Get the sentence summaries of the documents most highly associated with the given topics with:
```
nlp.get_representative_sentences(topics='all', num_sentences=3)
```
7. Provide a name for an LDA topic (if preferred over the numbering system) with:
```
nlp.name_topic(topic_number=1, topic_name='My topic')
```
Document summarization:

Get the sentence summaries of the requested documents with:
```
nlp.get_document_summaries(documents='all', num_sent=5)
```
Sentiment analysis:

Get the sentiment scores (compound, positive, neutral, negative) for the requested documents with:
```
nlp.get_sentiment(documents='all')
```

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
images		images
src		src
.gitignore		.gitignore
LDA.ipynb		LDA.ipynb
Print Chapters.ipynb		Print Chapters.ipynb
README.md		README.md
basic_nlp.py		basic_nlp.py
clean_files.ipynb		clean_files.ipynb
df_dominant_topics.csv		df_dominant_topics.csv
lda.xlsx		lda.xlsx
prepare_text.py		prepare_text.py
regex.docx		regex.docx
sentiment_analysis.ipynb		sentiment_analysis.ipynb
summarization_with_LDA.ipynb		summarization_with_LDA.ipynb
text_summarization.ipynb		text_summarization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

src

src

.gitignore

.gitignore

LDA.ipynb

LDA.ipynb

Print Chapters.ipynb

Print Chapters.ipynb

README.md

README.md

basic_nlp.py

basic_nlp.py

clean_files.ipynb

clean_files.ipynb

df_dominant_topics.csv

df_dominant_topics.csv

lda.xlsx

lda.xlsx

prepare_text.py

prepare_text.py

regex.docx

regex.docx

sentiment_analysis.ipynb

sentiment_analysis.ipynb

summarization_with_LDA.ipynb

summarization_with_LDA.ipynb

text_summarization.ipynb

text_summarization.ipynb

Repository files navigation

NLP on the Books of Harry Potter

Instructions for BasicNLP class (basic_nlp.py)

About

Releases

Packages

Languages

raffg/harry_potter_nlp

Folders and files

Latest commit

History

Repository files navigation

NLP on the Books of Harry Potter

Instructions for BasicNLP class (basic_nlp.py)

About

Resources

Stars

Watchers

Forks

Languages