# **NLP Introduction & Text Processing**

1. **What is Computational Linguistics and how does it relate to NLP?**

  Ans. Computational linguistics and Natural Language Processing (NLP) are closely related fields that focus on the interaction between computers and human language.

**Computational Linguistics** (CL) is an interdisciplinary field that combines linguistics, computer science, artificial intelligence, and cognitive science. It focuses on understanding the linguistic rules and structures of human language and developing computational models to process and analyze language. CL is more theoretical and aims to build models of language for linguistic research.

**Natural Language Processing** (NLP) is a subfield of artificial intelligence that focuses on enabling computers to understand, interpret, and manipulate human language. NLP is more practical and aims to develop applications that involve language, such as machine translation, sentiment analysis, text summarization, and chatbots.

**Relationship:** NLP relies heavily on the theories and models developed in computational linguistics. CL provides the theoretical foundation and computational frameworks for NLP tasks. In essence, computational linguistics is the science behind NLP, and NLP is the engineering that puts CL theories into practice.

2. **Briefly describe the historical evolution of Natural Language Processing.**

  Ans. The historical evolution of Natural Language Processing (NLP) can be broadly categorized into several periods:

*   **Early Years (1950s-1960s):** This period was characterized by rule-based approaches and machine translation efforts. Early systems relied on hand-crafted rules and dictionaries to process language. The Georgetown-IBM experiment in 1954 is a notable example, aiming to automatically translate Russian to English.

*   **Statistical Revolution (Late 1980s-Early 2000s):** The rise of machine learning and the availability of larger datasets led to a shift towards statistical methods. Techniques like Hidden Markov Models (HMMs) and Support Vector Machines (SVMs) became prominent for tasks like part-of-speech tagging and parsing.

*   **Machine Learning and Feature Engineering (Early 2000s-2010s):** This era saw a focus on feature engineering, where researchers manually designed features from text to train machine learning models. This period also saw the development of techniques like Conditional Random Fields (CRFs) and the increasing use of large annotated corpora.

*   **Deep Learning Era (2010s-Present):** The advent of deep learning, particularly neural networks, revolutionized NLP. Architectures like Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and later, Transformers, achieved state-of-the-art results on various NLP tasks, leading to significant progress in areas like machine translation, text generation, and question answering.

*   **Large Language Models (LLMs) (Late 2010s-Present):** The development of massive pre-trained language models like BERT, GPT, and others marked a new era in NLP. These models, trained on vast amounts of text data, can be fine-tuned for a wide range of downstream tasks with remarkable performance, pushing the boundaries of what's possible with NLP.

3. **List and explain three major use cases of NLP in today’s tech industry.**

  Ans. Here are three major use cases of Natural Language Processing (NLP) in today's tech industry:

1.  **Sentiment Analysis:** This involves determining the emotional tone behind a piece of text. Companies use sentiment analysis to understand customer feedback from social media, reviews, and surveys. This helps in gauging public opinion, identifying areas for improvement, and making data-driven decisions. For example, a company can analyze tweets about their product to see if the overall sentiment is positive, negative, or neutral.

2.  **Machine Translation:** NLP enables the automatic translation of text or speech from one language to another. This is widely used in applications like Google Translate, which allows people to communicate across language barriers. It's also crucial for businesses operating globally, enabling them to localize content and communicate with customers and partners in different languages.

3.  **Chatbots and Virtual Assistants:** NLP powers conversational AI agents like chatbots and virtual assistants (e.g., Siri, Alexa). These systems use NLP to understand user queries in natural language and provide relevant responses or perform tasks. They are used for customer service, providing information, automating tasks, and enhancing user interaction with technology.

4. **What is text normalization and why is it essential in text processing tasks?**

  Ans. **Text normalization** is the process of converting text into a canonical (standard) form. The goal is to reduce the variations in text so that different forms of the same word or concept are treated consistently during processing. This is crucial because raw text data is often noisy and inconsistent, with variations in spelling, capitalization, punctuation, and word forms (e.g., "run," "running," "ran").

**Why it is essential in text processing tasks:**

Text normalization is essential for several reasons:

*   **Reduces redundancy:** It helps in treating different variations of the same word as a single entity, reducing the size of the vocabulary and simplifying further analysis.
*   **Improves accuracy:** By standardizing the text, it improves the accuracy of NLP tasks such as information retrieval, text classification, and sentiment analysis. For example, if we're searching for documents about "fishing," normalizing "fishes" and "fished" to their base form "fish" will ensure us to capture all relevant documents.
*   **Facilitates analysis:** Normalized text is easier to process and analyze with algorithms and models. It ensures that the models are not confused by variations in the text data.
*   **Enhances comparability:** It allows for easier comparison and analysis of different text documents or datasets.

Common text normalization techniques include:

*   **Lowercasing:** Converting all text to lowercase to treat words like "The" and "the" the same.
*   **Punctuation removal:** Removing punctuation marks that may not be relevant for analysis.
*   **Tokenization:** Splitting text into individual words or tokens.
*   **Stemming:** Reducing words to their root form (e.g., "running" -> "run"). This is a cruder approach than lemmatization.
*   **Lemmatization:** Reducing words to their base or dictionary form (e.g., "better" -> "good"). This is usually more sophisticated than stemming as it considers the context and meaning of the word.
*   **Handling special characters and numbers:** Deciding how to treat numbers, symbols, and other special characters.

In summary, text normalization is a foundational step in most text processing pipelines, ensuring that the text data is clean, consistent, and ready for further analysis and modeling.

5. **Compare and contrast stemming and lemmatization with suitable examples.**

  Ans. **Stemming** and **Lemmatization** are both techniques used in text normalization to reduce words to their base form. However, they differ in their approach and the resulting form.

**Stemming:**

*   **Approach:** Stemming is a more rudimentary process that simply chops off the end of a word to arrive at a root form. It's often rule-based and doesn't consider the context or meaning of the word.
*   **Result:** The resulting "stem" may not be a valid word in the dictionary.
*   **Speed:** Generally faster than lemmatization.
*   **Accuracy:** Less accurate than lemmatization as it can result in meaningless stems.
*   **Example:**
    *   "running" -> "run"
    *   "fishes" -> "fish"
    *   "studies" -> "studi" (not a valid word)
    *   "better" -> "better" (doesn't handle irregular forms)

**Lemmatization:**

*   **Approach:** Lemmatization is a more sophisticated process that uses a dictionary and linguistic rules to return the base or dictionary form of a word, known as the "lemma." It considers the context of the word to determine the correct lemma.
*   **Result:** The resulting "lemma" is a valid word in the dictionary.
*   **Speed:** Generally slower than stemming due to the dictionary lookup and linguistic analysis.
*   **Accuracy:** More accurate than stemming as it produces valid words and handles irregular forms.
*   **Example:**
    *   "running" -> "run"
    *   "fishes" -> "fish"
    *   "studies" -> "study" (valid word)
    *   "better" -> "good" (handles irregular forms)

**Comparison and Contrast:**

| Feature      | Stemming                         | Lemmatization                      |
| :----------- | :------------------------------- | :--------------------------------- |
| **Approach** | Rule-based, chops off ends       | Dictionary and linguistic rules    |
| **Result**   | May not be a valid word (stem)   | Valid word (lemma)                 |
| **Speed**    | Faster                           | Slower                             |
| **Accuracy** | Less accurate                    | More accurate                      |
| **Context**  | Doesn't consider context         | Considers context                  |
| **Irregular Forms** | Doesn't handle irregular forms | Handles irregular forms (e.g., "better" -> "good") |

In summary, stemming is a faster but less accurate method that produces word stems that may not be valid words. Lemmatization is a slower but more accurate method that produces valid word lemmas by considering context and using a dictionary. The choice between stemming and lemmatization depends on the specific NLP task and the desired level of accuracy and speed.

6. **Write a Python program that uses regular expressions (regex) to extract all email addresses from the following block of text:**

    “Hello team, please contact us at support@xyz.com for technical issues, or reach out to our HR at hr@xyz.com. You can also connect with John at john.doe@xyz.org and jenny via jenny_clarke126@mail.co.us. For partnership inquiries, email partners@xyz.biz.”

In [5]:
import re

text = "Hello team, please contact us at support@xyz.com for technical issues, or reach out to our HR at hr@xyz.com. You can also connect with John at john.doe@xyz.org and jenny via jenny_clarke126@mail.co.us. For partnership inquiries, email partners@xyz.biz."

# Regex pattern for email addresses
email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

# Find all email addresses in the text
email_addresses = re.findall(email_pattern, text)

# Print the extracted email addresses
for email in email_addresses:
    print(email)

support@xyz.com
hr@xyz.com
john.doe@xyz.org
jenny_clarke126@mail.co.us
partners@xyz.biz


7. **Given the sample paragraph below, perform string tokenization and frequency distribution using Python and NLTK:**

     “Natural Language Processing (NLP) is a fascinating field that combines linguistics, computer science, and artificial intelligence. It enables machines to understand, interpret, and generate human language. Applications of NLP include chatbots, sentiment analysis, and machine translation. As technology advances, the role of NLP in modern solutions is becoming increasingly critical.” **

In [6]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('stopwords')
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [7]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist

text = "Natural Language Processing (NLP) is a fascinating field that combines linguistics, computer science, and artificial intelligence. It enables machines to understand, interpret, and generate human language. Applications of NLP include chatbots, sentiment analysis, and machine translation. As technology advances, the role of NLP in modern solutions is becoming increasingly critical."

# Tokenization
tokens = word_tokenize(text)
print("Tokens:")
print(tokens)

# Frequency Distribution
fdist = FreqDist(tokens)
print("\nFrequency Distribution:")
print(fdist.most_common(10)) # Display the 10 most common tokens

Tokens:
['Natural', 'Language', 'Processing', '(', 'NLP', ')', 'is', 'a', 'fascinating', 'field', 'that', 'combines', 'linguistics', ',', 'computer', 'science', ',', 'and', 'artificial', 'intelligence', '.', 'It', 'enables', 'machines', 'to', 'understand', ',', 'interpret', ',', 'and', 'generate', 'human', 'language', '.', 'Applications', 'of', 'NLP', 'include', 'chatbots', ',', 'sentiment', 'analysis', ',', 'and', 'machine', 'translation', '.', 'As', 'technology', 'advances', ',', 'the', 'role', 'of', 'NLP', 'in', 'modern', 'solutions', 'is', 'becoming', 'increasingly', 'critical', '.']

Frequency Distribution:
[(',', 7), ('.', 4), ('NLP', 3), ('and', 3), ('is', 2), ('of', 2), ('Natural', 1), ('Language', 1), ('Processing', 1), ('(', 1)]


8. **Create a custom annotator using spaCy or NLTK that identifies and labels proper nouns in a given text.**

In [6]:
!pip install spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m93.7 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [8]:
import spacy

# Load the English language model
nlp = spacy.load("en_core_web_sm")

text = "Natural Language Processing (NLP) is a fascinating field that combines linguistics, computer science, and artificial intelligence. It enables machines to understand, interpret, and generate human language. Applications of NLP include chatbots, sentiment analysis, and machine translation. As technology advances, the role of NLP in modern solutions is becoming increasingly critical."

# Process the text
doc = nlp(text)

# Identify and label proper nouns
proper_nouns = [(token.text, token.pos_) for token in doc if token.pos_ == "PROPN"]

# Print the proper nouns
print("Proper Nouns:")
for noun, pos in proper_nouns:
    print(f"{noun}: {pos}")

Proper Nouns:
Natural: PROPN
Language: PROPN
Processing: PROPN
NLP: PROPN
NLP: PROPN
NLP: PROPN


9. **Using Genism, demonstrate how to train a simple Word2Vec model on the following dataset consisting of example sentences:**

    dataset = [ "Natural language processing enables computers to understand human language", "Word embeddings are a type of word representation that allows words with similar meaning to have similar representation", "Word2Vec is a popular word embedding technique used in many NLP applications", "Text preprocessing is a critical step before training word embeddings", "Tokenization and normalization help clean raw text for modeling" ]
    
    **Write code that tokenizes the dataset, preprocesses it, and trains a Word2Vec model using Gensim.**

In [9]:
!pip install gensim

Collecting gensim
  Downloading gensim-4.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (8.4 kB)
Downloading gensim-4.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (27.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.9/27.9 MB[0m [31m68.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gensim
Successfully installed gensim-4.4.0


In [10]:
import gensim
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize
import nltk
import re

# Download necessary NLTK data (if not already downloaded)
try:
    nltk.data.find('tokenizers/punkt')
except nltk.downloader.DownloadError:
    nltk.download('punkt')
except LookupError:
    nltk.download('punkt')


dataset = [
    "Natural language processing enables computers to understand human language",
    "Word embeddings are a type of word representation that allows words with similar meaning to have similar representation",
    "Word2Vec is a popular word embedding technique used in many NLP applications",
    "Text preprocessing is a critical step before training word embeddings",
    "Tokenization and normalization help clean raw text for modeling"
]

# Preprocessing function
def preprocess_text(text):
    # Lowercase the text
    text = text.lower()
    # Remove punctuation and numbers
    text = re.sub(r'[^a-z\s]', '', text)
    # Tokenize the text
    tokens = word_tokenize(text)
    return tokens

# Preprocess the dataset
processed_dataset = [preprocess_text(sentence) for sentence in dataset]

# Train the Word2Vec model
# vector_size: Dimensionality of the word vectors.
# window: The maximum distance between the current and predicted word within a sentence.
# min_count: Ignores all words with total frequency lower than this.
# sg: Training algorithm: 1 for skip-gram; 0 for CBOW.
model = Word2Vec(sentences=processed_dataset, vector_size=100, window=5, min_count=1, workers=4, sg=0)

# Print the trained model's vocabulary size
print(f"Vocabulary size: {len(model.wv)}")

# Example: Get the vector for a word
word_vector = model.wv['language']
print(f"\nVector for 'language':\n{word_vector[:10]}...") # Print first 10 elements

# Example: Find most similar words
similar_words = model.wv.most_similar('language')
print(f"\nWords similar to 'language':")
for word, similarity in similar_words:
    print(f"{word}: {similarity:.4f}")

Vocabulary size: 46

Vector for 'language':
[-0.00958061  0.00894419  0.00416531  0.00923353  0.00664613  0.00292132
  0.00980621 -0.004423   -0.0067969   0.00421717]...

Words similar to 'language':
technique: 0.2854
text: 0.1991
processing: 0.1907
are: 0.1001
step: 0.0966
is: 0.0747
that: 0.0728
representation: 0.0608
meaning: 0.0468
modeling: 0.0448


10. **Imagine you are a data scientist at a fintech startup. You’ve been tasked with analyzing customer feedback. Outline the steps you would take to clean, process, and extract useful insights using NLP techniques from thousands of customer reviews.**

In [11]:
import pandas as pd
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import spacy

# --- Step 1: Data Loading and Initial Inspection ---
# In a real scenario, we would load our data from a file (e.g., CSV, JSON)
# For this example, we'll use a sample list of reviews.
# Replace this with our actual data loading code.
customer_reviews = [
    "The product is great! I love the features.",
    "This app is very slow and crashes frequently.",
    "Customer service was helpful and resolved my issue quickly.",
    "The user interface is confusing and hard to navigate.",
    "Amazing experience, highly recommended!",
    "The price is a bit high, but the quality is good.",
    "I had a terrible experience with this service.",
    "The new update fixed many bugs, much better now.",
    "Not satisfied with the performance.",
    "Excellent product and fast delivery."
]

print("--- Initial Data ---")
for i, review in enumerate(customer_reviews):
    print(f"Review {i+1}: {review}")
print("-" * 20)


# --- Step 2: Text Cleaning ---
def clean_text(text):
    # Remove special characters and punctuation
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    # Convert to lowercase
    text = text.lower()
    return text

cleaned_reviews = [clean_text(review) for review in customer_reviews]

print("--- Cleaned Data ---")
for i, review in enumerate(cleaned_reviews):
    print(f"Review {i+1}: {review}")
print("-" * 20)


# --- Step 3: Text Preprocessing ---
# Download necessary NLTK data (if not already downloaded)
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

try:
    nltk.data.find('corpora/stopwords')
except LookupError:
    nltk.download('stopwords')

try:
    nltk.data.find('corpora/wordnet')
except LookupError:
    nltk.download('wordnet')

try:
    nltk.data.find('sentiment/vader_lexicon')
except LookupError:
    nltk.download('vader_lexicon')


# Initialize lemmatizer and stop words
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    # Tokenization
    tokens = word_tokenize(text)
    # Remove stop words and lemmatize
    processed_tokens = [lemmatizer.lemmatize(word) for word in tokens if word not in stop_words]
    return processed_tokens

processed_reviews = [preprocess_text(review) for review in cleaned_reviews]

print("--- Processed Data (Tokens) ---")
for i, tokens in enumerate(processed_reviews):
    print(f"Review {i+1}: {tokens}")
print("-" * 20)


# --- Step 4: Exploratory Data Analysis (EDA) on Text ---
# Join tokens back to strings for EDA and further processing
processed_reviews_str = [" ".join(tokens) for tokens in processed_reviews]

# Calculate word frequency
all_words = [word for tokens in processed_reviews for word in tokens]
freq_dist = nltk.FreqDist(all_words)

print("--- Most Common Words (EDA) ---")
print(freq_dist.most_common(10))
print("-" * 20)


# --- Step 5: Sentiment Analysis ---
analyzer = SentimentIntensityAnalyzer()

sentiment_scores = [analyzer.polarity_scores(review) for review in processed_reviews_str]

print("--- Sentiment Scores ---")
for i, scores in enumerate(sentiment_scores):
    print(f"Review {i+1}: {scores}")
print("-" * 20)


# --- Step 6: Topic Modeling (using NMF as an example) ---
# Using TF-IDF for topic modeling
tfidf_vectorizer = TfidfVectorizer(max_features=1000, max_df=0.95, min_df=2) # Adjust parameters as needed
tfidf_matrix = tfidf_vectorizer.fit_transform(processed_reviews_str)

# Apply NMF for topic modeling
num_topics = 3 # Define the number of topics
nmf_model = NMF(n_components=num_topics, random_state=1)
nmf_matrix = nmf_model.fit_transform(tfidf_matrix)

# Display top words per topic
print("--- Topic Modeling (NMF) ---")
feature_names = tfidf_vectorizer.get_feature_names_out()
for topic_idx, topic in enumerate(nmf_model.components_):
    top_words_idx = topic.argsort()[-10:][::-1]
    top_words = [feature_names[i] for i in top_words_idx]
    print(f"Topic {topic_idx+1}: {', '.join(top_words)}")
print("-" * 20)


# --- Step 7: Named Entity Recognition (NER) ---
# Load spaCy model (if not already loaded)
try:
    nlp = spacy.load("en_core_web_sm")
except:
    !python -m spacy download en_core_web_sm
    nlp = spacy.load("en_core_web_sm")

print("--- Named Entity Recognition (NER) ---")
for i, review in enumerate(cleaned_reviews): # Use cleaned reviews for NER
    doc = nlp(review)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    print(f"Review {i+1} Entities: {entities}")
print("-" * 20)


# --- Step 8: Feature Engineering (Example: TF-IDF) ---
# TF-IDF matrix was already created in Topic Modeling step (tfidf_matrix)
# We can use this matrix for downstream tasks like classification.
print("--- Feature Engineering (TF-IDF Matrix Shape) ---")
print(tfidf_matrix.shape)
print("-" * 20)


# --- Step 9: Insight Extraction and Reporting ---
# This step involves interpreting the results from previous steps.
# For example, we would analyze:
# - Overall sentiment distribution
# - Sentiment trends for specific topics or entities
# - Most frequent complaints or praises (from topics and keywords)
# - Identify reviews related to specific entities

print("--- Insight Extraction (Examples) ---")
# Example 1: Reviews with strong negative sentiment
negative_reviews_indices = [i for i, scores in enumerate(sentiment_scores) if scores['neg'] > 0.5]
print("Reviews with strong negative sentiment:")
for i in negative_reviews_indices:
    print(f"- Review {i+1}: {customer_reviews[i]} (Sentiment: {sentiment_scores[i]})")

# Example 2: Reviews related to Topic 1 (based on top words)
# We would need to map reviews to topics based on the nmf_matrix
# For simplicity, we'll just print the top words of Topic 1 again.
print("\nTop words for Topic 1 (potential area of focus):")
print(f"{', '.join([feature_names[i] for i in nmf_model.components_[0].argsort()[-10:][::-1]])}")
print("-" * 20)

# --- Step 10: Finish task ---
print("--- Analysis Complete ---")
print("Based on the sentiment scores, topic modeling, and NER, we can now delve deeper into specific areas of customer feedback to extract actionable insights for our fintech startup.")

--- Initial Data ---
Review 1: The product is great! I love the features.
Review 2: This app is very slow and crashes frequently.
Review 3: Customer service was helpful and resolved my issue quickly.
Review 4: The user interface is confusing and hard to navigate.
Review 5: Amazing experience, highly recommended!
Review 6: The price is a bit high, but the quality is good.
Review 7: I had a terrible experience with this service.
Review 8: The new update fixed many bugs, much better now.
Review 9: Not satisfied with the performance.
Review 10: Excellent product and fast delivery.
--------------------
--- Cleaned Data ---
Review 1: the product is great i love the features
Review 2: this app is very slow and crashes frequently
Review 3: customer service was helpful and resolved my issue quickly
Review 4: the user interface is confusing and hard to navigate
Review 5: amazing experience highly recommended
Review 6: the price is a bit high but the quality is good
Review 7: i had a terrible exp

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


--- Processed Data (Tokens) ---
Review 1: ['product', 'great', 'love', 'feature']
Review 2: ['app', 'slow', 'crash', 'frequently']
Review 3: ['customer', 'service', 'helpful', 'resolved', 'issue', 'quickly']
Review 4: ['user', 'interface', 'confusing', 'hard', 'navigate']
Review 5: ['amazing', 'experience', 'highly', 'recommended']
Review 6: ['price', 'bit', 'high', 'quality', 'good']
Review 7: ['terrible', 'experience', 'service']
Review 8: ['new', 'update', 'fixed', 'many', 'bug', 'much', 'better']
Review 9: ['satisfied', 'performance']
Review 10: ['excellent', 'product', 'fast', 'delivery']
--------------------
--- Most Common Words (EDA) ---
[('product', 2), ('service', 2), ('experience', 2), ('great', 1), ('love', 1), ('feature', 1), ('app', 1), ('slow', 1), ('crash', 1), ('frequently', 1)]
--------------------
--- Sentiment Scores ---
Review 1: {'neg': 0.0, 'neu': 0.194, 'pos': 0.806, 'compound': 0.8519}
Review 2: {'neg': 0.474, 'neu': 0.526, 'pos': 0.0, 'compound': -0.4019}
Revi