# 1.1 What is Natural Language Processing (NLP)?

**Installed the required Python prerequisite packages and libraries.**

In [None]:
!pip install translate
!pip install sumy
!pip install gensim

### 1.1.4 Example: Tokenization in NLP
Here's a concise breakdown of what each part of the code does:

1. **Importing NLTK:**  
   `import nltk` brings in the Natural Language Toolkit (NLTK), a widely used Python library for processing and analyzing human language data.

2. **Downloading Tokenizer Models:**  
   `nltk.download('punkt')` and `nltk.download('punkt_tab')` download the necessary data for tokenization. The 'punkt' model is essential for splitting text into words and punctuation marks. While 'punkt' is the main requirement for basic tokenization, 'punkt_tab' is also included here, though it's not typically needed for standard use.

3. **Importing the Tokenizer Function:**  
   `from nltk.tokenize import word_tokenize` imports the `word_tokenize` function, which is specifically designed to break text into individual tokens (words and punctuation).

4. **Defining the Text:**  
   `text = "Natural Language Processing (NLP) enables machines to understand human language."` sets up a sample string to demonstrate tokenization.

5. **Tokenizing the Text:**  
   `tokens = word_tokenize(text)` applies the tokenizer to the sample text, resulting in a list of tokens (words and punctuation marks).

6. **Displaying the Tokens:**  
   `print(tokens)` outputs the list of tokens to the console, allowing you to see the result of the tokenization process.

**Note:** Downloading the 'punkt' data is only necessary the first time you use it on your system.

In [None]:

# Download the necessary library and resources
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')
from nltk.tokenize import word_tokenize

# Sample text
text = "Natural Language Processing (NLP) enables machines to understand human language."

# Tokenizing the text
tokens = word_tokenize(text)

# Display the tokens
print(tokens)

# 1.2 Significance and Applications of NLP



1.2.2 Uses of NLP with Examples

In [None]:

from translate import Translator

# Create a translator object
translator = Translator(to_lang="es")

# Translate a phrase
translation = translator.translate("How are you?")
print(translation)  # Output: ¿Cómo estás?


In [None]:
from textblob import TextBlob

# Sample text
text = "I love this product! It's amazing."

# Create a TextBlob object
blob = TextBlob(text)

# Perform sentiment analysis
sentiment = blob.sentiment
print(sentiment)  # Output: Sentiment(polarity=0.65, subjectivity=0.6)

In [None]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer

# Sample text
text = """
Natural Language Processing (NLP) is a fascinating field at the intersection of computer science, artificial intelligence, and linguistics. It enables machines to understand, interpret, and generate human language, opening up a world of possibilities for applications ranging from chatbots and translation services to sentiment analysis and beyond.
"""

# Create a parser
parser = PlaintextParser.from_string(text, Tokenizer("english"))

# Create a summarizer
summarizer = LsaSummarizer()

# Generate the summary
summary = summarizer(parser.document, 2)  # Summarize to 2 sentences
for sentence in summary:
    print(sentence)

1.2.3 Real-World Example: E-commerce Review Analysis

In [None]:
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Sample reviews
reviews = [
    "This product is fantastic! It exceeded my expectations.",
    "Not worth the price. I'm disappointed with the quality.",
    "Good value for money. Will buy again.",
]

# Initialize the sentiment analyzer
nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()

# Analyze each review
for review in reviews:
    sentiment = sia.polarity_scores(review)
    print(f"Review: {review}\\nSentiment: {sentiment}\\n")


# 1.3 Overview of Python for NLP

1.3.2 Key Python Libraries for NLP with Examples

In [None]:
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize

text = "Natural Language Processing with Python is fun!"
tokens = word_tokenize(text)
print(tokens)

In [None]:
import spacy

# Load SpaCy model
nlp = spacy.load("en_core_web_sm")

text = "Apple is looking at buying U.K. startup for $1 billion."
doc = nlp(text)

# Extract named entities
for ent in doc.ents:
    print(ent.text, ent.label_)

In [None]:
from gensim.models import Word2Vec

# Sample sentences
sentences = [
    ["natural", "language", "processing"],
    ["python", "is", "a", "powerful", "language"],
    ["text", "processing", "with", "gensim"],
]

# Train Word2Vec model
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4)

# Get vector for a word
vector = model.wv['language']
print(vector)

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

# Sample data
texts = ["I love this product", "This is the worst experience", "Absolutely fantastic!", "Not good at all"]
labels = [1, 0, 1, 0]

# Vectorize text data
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Train a Naive Bayes classifier
classifier = MultinomialNB()
classifier.fit(X, labels)

# Predict sentiment for a new text
new_text = ["I hate this"]
X_new = vectorizer.transform(new_text)
prediction = classifier.predict(X_new)
print(prediction)

1.3.3 Setting Up Your Python Environment for NLP


In [None]:
import nltk
from nltk.tokenize import word_tokenize
import spacy
from gensim.models import Word2Vec
from sklearn.feature_extraction.text import CountVectorizer

# Verify NLTK
nltk.download('punkt')
text = "Natural Language Processing with Python is fun!"
tokens = word_tokenize(text)
print("NLTK Tokens:", tokens)

# Verify SpaCy
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
print("SpaCy Tokens:", [token.text for token in doc])

# Verify gensim
sentences = [["natural", "language", "processing"], ["python", "is", "fun"]]
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4)
print("Word2Vec Vocabulary:", list(model.wv.index_to_key))

# Verify scikit-learn
vectorizer = CountVectorizer()
X = vectorizer.fit_transform([text])
print("CountVectorizer Feature Names:", vectorizer.get_feature_names_out())

1.3.4 Example: End-to-End NLP Pipeline

In [None]:
import nltk
import spacy
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from nltk.corpus import stopwords
nltk.download('stopwords')

# Sample data
texts = [
    "I love this product! It's amazing.",
    "This is the worst experience I've ever had.",
    "Absolutely fantastic! Highly recommend.",
    "Not good at all. Very disappointing."
]
labels = [1, 0, 1, 0]

# Load SpaCy model
nlp = spacy.load("en_core_web_sm")

# Custom tokenizer using SpaCy
def spacy_tokenizer(sentence):
    doc = nlp(sentence)
    return [token.text for token in doc]

# Stop words
stop_words = set(stopwords.words('english'))

# Define the pipeline
pipeline = Pipeline([
    ('vectorizer', CountVectorizer(tokenizer=spacy_tokenizer, stop_words=list(stop_words))),
    ('classifier', MultinomialNB())
])

# Train the model
pipeline.fit(texts, labels)

# Predict sentiment for a new text
new_text = ["I hate this product"]
prediction = pipeline.predict(new_text)
print(prediction)