# Natural Language Processing (NLP) Tasks - Solved Version

In this assignment, you'll use various NLP libraries to complete tasks such as tokenization, stop word filtering, lemmatization, and sentiment analysis.

## Instructions

The text is loaded from the `sample.txt` file.

In [None]:
# Make sure these are installed:
%pip install nltk
%pip install spacy
!python -m spacy download en_core_web_sm
%pip install textblob
%pip install deep-translator
%pip install wordcloud
%pip install flair

In [None]:
# Preload the nltk data
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

### Exercise 1: Create a Word List
Create a list of words from the sentence provided in the `sample.txt` file.

In [None]:
# Load the text from sample.txt
with open('../Resources/smart_stories.txt', 'r') as file:
    sentence = file.read()
# trim the whitespace from the beginning and end of the sentence
sentence = sentence.strip()
# remove extraneous whitespace from within the sentence
sentence = " ".join(sentence.split())

# Create a word list
word_list = sentence.split()
print("Word List:", word_list)
print("Word Count:", len(word_list))


### Exercise 2: Tokenize the Word List
Tokenize the word list using `nltk` and `spacy`.

In [None]:
import spacy

# Tokenizing using nltk
tokens_nltk = nltk.word_tokenize(sentence)
print("NLTK Tokenization:", tokens_nltk)

# Tokenizing using spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(sentence)
tokens_spacy = [token.text for token in doc]
print("Spacy Tokenization:", tokens_spacy)

### Exercise 3: Use a Stemmer and a Lemmatizer
Use NLTK's stemmer and Spacy's lemmatizer on the tokenized list of words.

In [None]:
# NLTK stemmer
from nltk.stem import PorterStemmer
ps = PorterStemmer()

stemmed_words = [ps.stem(word) for word in tokens_nltk]
print("Stemmed Words (NLTK):", stemmed_words)

# Using Spacy's lemmatizer
lemmatized_words = [token.lemma_ for token in doc]
print("Lemmatized Words (Spacy):", lemmatized_words)

### Exercise 4: Filter out Stop Words
Remove stop words from the word list using NLTK or Spacy.

In [None]:
from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

filtered_words_nltk = [word for word in tokens_nltk if word.lower() not in stop_words]
print("Filtered Words (NLTK):", filtered_words_nltk)

# Using Spacy to filter out stop words
filtered_words_spacy = [token.text for token in doc if not token.is_stop]
print("Filtered Words (Spacy):", filtered_words_spacy)

### Exercise 5: Identify the Parts of Speech
Identify the parts of speech of the words using NLTK and Spacy.

In [None]:
# Part-of-Speech tagging using nltk
pos_nltk = nltk.pos_tag(tokens_nltk)
print("Parts of Speech (NLTK):", pos_nltk)

# Using Spacy
pos_spacy = [(token.text, token.pos_) for token in doc]
print("Parts of Speech (Spacy):", pos_spacy)

### Exercise 6: Perform Sentiment Analysis
Analyze the sentiment of the text from `sample.txt` using `TextBlob`.

In [None]:
from textblob import TextBlob

blob = TextBlob(sentence)
print("Sentiment Polarity:", blob.sentiment.polarity)
print("Sentiment Subjectivity:", blob.sentiment.subjectivity)

### Exercise 7: Translate a Phrase into a Different Language
Translate the text from `sample.txt` into French using the `translate` library.

In [None]:
from deep_translator import GoogleTranslator

# Create a GoogleTranslator object for French translation
translator = GoogleTranslator(source='auto', target='fr')

# Translate the first 200 characters of the sentence to French
translation = translator.translate(sentence[0:200])

# Print the English text and the translated text
print("English Text:", sentence[0:200])
print("Translated to French:", translation)

### Exercise 8: Use NER to Find Important Nouns
Use Named Entity Recognition (NER) to find important nouns in the text using Spacy or Flair.

```python
import spacy
import streamlit as st

# Load the pre-trained model
nlp = spacy.load("en_core_web_sm")

# Process the text
text = "Peter Jackson is the director of The Lord of the Rings movies. He was born in New Zealand. The movies were filmed in New Zealand. The Lord of the Rings movies are based on the books by J.R.R. Tolkien, a British author. The books were written in the 1950s. The movies were filmed in the 2000s."
doc = nlp(text)

# Extract entities
entities = [(ent.text, ent.label_) for ent in doc.ents]

# Display the text and entities
st.write("Text:", text)
st.write("Entities:")
for entity in entities:
    st.write(f"{entity[0]} ({entity[1]})")

# Run the Streamlit app
# Save this script as app.py and run `streamlit run app.py` in the terminal
```

In [None]:
from flair.data import Sentence
from flair.models import SequenceTagger

# Load the NER tagger
tagger = SequenceTagger.load("ner")

# Process the text
text = "Peter Jackson is the director of The Lord of the Rings movies. He was born in New Zealand. The movies were filmed in New Zealand. The Lord of the Rings movies are based on the books by J.R.R. Tolkien, a British author. The books were written in the 1950s. The movies were filmed in the 2000s."
flair_sentence = Sentence(text)
tagger.predict(flair_sentence)

# Extract entities
for entity in flair_sentence.get_spans('ner'):
    print(entity.text, entity.get_label('ner').value)


### Exercise 9: Create a WordCloud
Create a word cloud from the text using the `WordCloud` library.

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Create a WordCloud from the text
wordcloud = WordCloud(width=800, height=400).generate(sentence)

# Display the generated WordCloud
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()