# Introduction to Natural Language Processing
Natural Language Processing (NLP) enables machines to interpret, understand, and generate human language.
Applications of NLP include chatbots, machine translation, and sentiment analysis.

In [None]:
%pip install -q nltk spacy transformers tensorflow tf-keras textblob


In [None]:
# Import necessary libraries for NLP
import nltk
import spacy
from transformers import pipeline


# Download any necessary NLTK resources
nltk.download('punkt')
nltk.download('stopwords')


# Understanding Sentiment Analysis
Sentiment analysis is the process of identifying and classifying sentiments within text. It is widely used in various
industries to understand public opinion, customer feedback, and more.

In [None]:
# Sample text data to work with
sample_texts = [
  "I absolutely love this product! It works like a charm.",
  "This was a terrible experience; I am never coming back.",
  "It was okay, not the best but not the worst either."
]

# Display sample texts
for text in sample_texts:
  print(text)



# Text Preprocessing Steps
Preprocessing is a crucial step in NLP that involves preparing and cleaning the text for analysis. Common steps include:
- **Tokenization**: Breaking down text into individual words or tokens
- **Stopword Removal**: Removing "common" words that may not carry significant meaning (and, the, is, this...)

- **Stemming/Lemmatization**: Reducing words to their root forms

.
<hr>
.

![Tokenize](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*Rs6fzMD_9AFzSfNUguPlDA.jpeg)



In [None]:
# Example of tokenization and stopword removal with NLTK
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

# Tokenize and remove stopwords from the sample text
stop_words = set(stopwords.words('english'))
processed_texts = []

for text in sample_texts:
    tokens = word_tokenize(text.lower())
    filtered_tokens = [word for word in tokens if word.isalnum() and word not in stop_words]
    processed_texts.append(filtered_tokens)

# Display processed text
for i, tokens in enumerate(processed_texts):
    print(f"Original: {sample_texts[i]}")
    print(f"Processed: {tokens}\n")



# Lemmatization

Lemmatization is the process of reducing a word to its base or dictionary form, known as the lemma. It can help improve the accuracy of NLP tasks by grouping together different forms of the same word.

The goal is to group together different inflected forms of a word so they can be analyzed as a single item. Unlike stemming, which simply chops off the ends of words, lemmatization considers the context and part of speech of the word to choose an actual valid root word.

This is often more accurate than stemming, as it considers the word's context and part of speech.

> For example, the lemma of "running" is "run," and the lemma of "better" is "good."




In [None]:
!spacy download en_core_web_sm

In [None]:
import spacy

# Load the spaCy English language model
nlp = spacy.load("en_core_web_sm")

# Example text
text = "I am running quickly, and I feel better now."

# Process the text with spaCy
doc = nlp(text)

# Print the lemmas of each token
for token in doc:
    print(token.text.ljust(10), token.lemma_)

# Choosing a Sentiment Analysis Model
Several libraries and models can perform sentiment analysis, including:
- **Rule-based**: `TextBlob`, `VADER`
- **Transformer-based**: BERT, DistilBERT
Here, we will use `TextBlob` and Hugging Face's `transformers` library.

In [None]:
# Load a pretrained sentiment analysis pipeline using Hugging Face's transformers
sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", framework="tf")


# Hands-On Sentiment Analysis Example
Let's analyze the sentiment of sample texts using both `TextBlob` and a transformer model.

> Note: Using a free huggingface token

In [None]:
# Import TextBlob for rule-based sentiment analysis
from textblob import TextBlob

# Analyze sentiment with TextBlob
print("TextBlob Sentiment Analysis:")
for text in sample_texts:
    blob = TextBlob(text)
    print(f"Text: {text}\nSentiment: {blob.sentiment}\n")

# Analyze sentiment with transformers
print("Transformer-based Sentiment Analysis:")
for text in sample_texts:
    result = sentiment_analyzer(text)[0]
    print(f"Text: {text}\nSentiment: {result['label']}, Confidence: {result['score']:.2f}\n")

# Discussion of Model Performance and Limitations
### Challenges:
- **Context Sensitivity**: Models may misinterpret context, especially with sarcasm or irony.
- **Domain-Specific Language**: Generic models might struggle with industry-specific terms.
Improving sentiment analysis often requires high-quality, domain-specific datasets and, in some cases, custom fine-tuning of models.

# Summary and Q&A
In this session, we've covered:
- Key NLP concepts
- Steps for text preprocessing
- Examples of sentiment analysis using TextBlob and transformer-based models.

**Further Learning**:
- Hugging Face Documentation: https://huggingface.co/docs
- NLP courses: Online platforms like Coursera, edX, and DataCamp offer great resources.

Feel free to explore further, and let’s open the floor for any questions!