## Text pre-processing techniques

## 01) Import necessary libraries

In [1]:
# Download NLTK data
import nltk
nltk.download('punkt')        # For tokenization
nltk.download('stopwords')    # For stop words
nltk.download('wordnet')      # For lemmatization
nltk.download('averaged_perceptron_tagger') # For POS tagging
nltk.download('omw-1.4')      # For WordNet data
nltk.download('punkt_tab')    # Download the missing resource for sentence tokenization


# Download spaCy model
# !python -m spacy download en_core_web_  #spacy is an alternative for nltk
!C:\anc\python.exe -m pip install spacy==3.5.4
# use it if you are getting some errors when downloading spacy

[nltk_data] Downloading package punkt to C:\Users\Srivathsan
[nltk_data]     V\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to C:\Users\Srivathsan
[nltk_data]     V\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to C:\Users\Srivathsan
[nltk_data]     V\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Srivathsan V\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package omw-1.4 to C:\Users\Srivathsan
[nltk_data]     V\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package punkt_tab to C:\Users\Srivathsan
[nltk_data]     V\AppData\Roaming\nltk_data...
[nltk_data]   Pa



## 02) Load sample data

In [2]:
# Sample text representing a part of a document
sample_document = """
Machine learning (ML) is a field of study in artificial intelligence concerned with the development of computer algorithms that can learn from and make predictions on data. Algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.

Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. A subset of machine learning is closely related to computational statistics, which focuses on making predictions using computers. Mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning.

In its application across business problems, machine learning is also referred to as predictive analytics.
"""


In [3]:
# Let's also add some text with punctuation, numbers, and mixed casing
noisy_text = """
Machine learning rocks! It's revolutionizing the world in 2023 (and beyond!).
Visit our site: http://example.com for more info.
This is awesome!!! We collected 1,234 data points.
Softbank and Google are major players.
"""

In [4]:
print("--- Sample Document ---")
print(sample_document)
print("\n--- Noisy Text ---")
print(noisy_text)


######
### Note: For loading .docx files, you would typically do:

# import docx2txt
# try:
#     text = docx2txt.process("your_document.docx")
#     print(text)
# except Exception as e:
#     print(f"Error loading docx: {e}")


--- Sample Document ---

Machine learning (ML) is a field of study in artificial intelligence concerned with the development of computer algorithms that can learn from and make predictions on data. Algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.

Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. A subset of machine learning is closely related to computational statistics, which focuses on making predictions using computers. Mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning.

In its application across business problems, mach

In [5]:
import re
import string

# Combine texts
raw_text = sample_document + "\n" + noisy_text

print("Original Raw Text Length:", len(raw_text))
print("--- --------------------------- ---")
print("--- Original Raw Text ---")
print(raw_text)

Original Raw Text Length: 1254
--- --------------------------- ---
--- Original Raw Text ---

Machine learning (ML) is a field of study in artificial intelligence concerned with the development of computer algorithms that can learn from and make predictions on data. Algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.

Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. A subset of machine learning is closely related to computational statistics, which focuses on making predictions using computers. Mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a related field of study, focusing on exploratory data analysis through unsu

## 03) Convert to lower case

In [6]:
cleaned_text = raw_text.lower()

print("\n--- After Lowercasing ---")
print(cleaned_text)


--- After Lowercasing ---

machine learning (ml) is a field of study in artificial intelligence concerned with the development of computer algorithms that can learn from and make predictions on data. algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.

machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. a subset of machine learning is closely related to computational statistics, which focuses on making predictions using computers. mathematical optimization delivers methods, theory and application domains to the field of machine learning. data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning.

in its application across business problems, m

## 04) Remove or fetch URLs

In [7]:
def remove_urls(text):
    url_pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
    return re.sub(url_pattern, '', text)

print(remove_urls(cleaned_text))


machine learning (ml) is a field of study in artificial intelligence concerned with the development of computer algorithms that can learn from and make predictions on data. algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.

machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. a subset of machine learning is closely related to computational statistics, which focuses on making predictions using computers. mathematical optimization delivers methods, theory and application domains to the field of machine learning. data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning.

in its application across business problems, machine learning is also ref

## 05) Remove punctuations

In [8]:
# string.punctuation includes -> !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

translator = str.maketrans('', '', string.punctuation)

# This line creates a translation table that tells Python which characters to delete from a string.

# str.maketrans(from, to, delete) creates a mapping for str.translate().

#In this case:

# from = '' and to = '': no character substitutions.

# delete = string.punctuation: remove all punctuation characters.

trans = str.maketrans("aeiou", "12345")

# Explanation:
#"a" → "1", "e" → "2", "i" → "3", "o" → "4", "u" → "5"

text = "education"
translated = text.translate(trans)
print(translated)

2d5c1t34n


In [9]:
cleaned_text = cleaned_text.translate(translator)
print("\n--- After Removing Punctuation ---")
print(cleaned_text)


--- After Removing Punctuation ---

machine learning ml is a field of study in artificial intelligence concerned with the development of computer algorithms that can learn from and make predictions on data algorithms build a mathematical model based on sample data known as training data in order to make predictions or decisions without being explicitly programmed to perform the task

machine learning algorithms are used in a wide variety of applications such as email filtering and computer vision where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks a subset of machine learning is closely related to computational statistics which focuses on making predictions using computers mathematical optimization delivers methods theory and application domains to the field of machine learning data mining is a related field of study focusing on exploratory data analysis through unsupervised learning

in its application across business problems machine le

## 06) Remove or fetch numbers

In [10]:
"""
re.sub(pattern, replacement, text)
→ Substitutes all matches of the pattern with the replacement string ('' here).
"""

cleaned_text = re.sub(r'\d+', '', cleaned_text)#Removes all numbers (whole or part of numbers) from the text.


print("\n--- After Removing Numbers ---")
print(cleaned_text)


--- After Removing Numbers ---

machine learning ml is a field of study in artificial intelligence concerned with the development of computer algorithms that can learn from and make predictions on data algorithms build a mathematical model based on sample data known as training data in order to make predictions or decisions without being explicitly programmed to perform the task

machine learning algorithms are used in a wide variety of applications such as email filtering and computer vision where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks a subset of machine learning is closely related to computational statistics which focuses on making predictions using computers mathematical optimization delivers methods theory and application domains to the field of machine learning data mining is a related field of study focusing on exploratory data analysis through unsupervised learning

in its application across business problems machine learni

## 07) Remove extra white spaces

In [11]:
cleaned_text = re.sub(r'\s+', ' ', cleaned_text).strip()
print("\n--- After Removing Extra Whitespace ---")
print(cleaned_text)


--- After Removing Extra Whitespace ---
machine learning ml is a field of study in artificial intelligence concerned with the development of computer algorithms that can learn from and make predictions on data algorithms build a mathematical model based on sample data known as training data in order to make predictions or decisions without being explicitly programmed to perform the task machine learning algorithms are used in a wide variety of applications such as email filtering and computer vision where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks a subset of machine learning is closely related to computational statistics which focuses on making predictions using computers mathematical optimization delivers methods theory and application domains to the field of machine learning data mining is a related field of study focusing on exploratory data analysis through unsupervised learning in its application across business problems machine 

In [12]:
print("\nCleaned Text Length:", len(cleaned_text))


Cleaned Text Length: 1201


## 08) Sentence tokenization

In [13]:
from nltk.tokenize import sent_tokenize

nltk_sentences = sent_tokenize(raw_text) # Using raw_text to show sentence boundary handling
print("--- Sentence Tokenization (NLTK) ---")
for i, sentence in enumerate(nltk_sentences):
    print(f"Sentence {i+1}: {sentence}")


--- Sentence Tokenization (NLTK) ---
Sentence 1: 
Machine learning (ML) is a field of study in artificial intelligence concerned with the development of computer algorithms that can learn from and make predictions on data.
Sentence 2: Algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.
Sentence 3: Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.
Sentence 4: A subset of machine learning is closely related to computational statistics, which focuses on making predictions using computers.
Sentence 5: Mathematical optimization delivers methods, theory and application domains to the field of machine learning.
Sentence 6: Data mining is a related field of study, focusing on exploratory data analys

## 09) Word Tokenization

In [14]:
from nltk.tokenize import word_tokenize

# Using NLTK (on the cleaned text)
nltk_word_tokens = word_tokenize(cleaned_text)
print("--- Word Tokenization (NLTK) ---")
print(nltk_word_tokens[:20]) # Print first 20 tokens

--- Word Tokenization (NLTK) ---
['machine', 'learning', 'ml', 'is', 'a', 'field', 'of', 'study', 'in', 'artificial', 'intelligence', 'concerned', 'with', 'the', 'development', 'of', 'computer', 'algorithms', 'that', 'can']


## 10) Remove stop words

In [15]:
from nltk.corpus import stopwords

# Get English stop words from NLTK
nltk_stop_words = set(stopwords.words('english'))
print("--- Sample NLTK Stop Words ---")
print(list(nltk_stop_words)[:10])

# Remove stop words using NLTK tokens
nltk_tokens_no_stopwords = [word for word in nltk_word_tokens if word not in nltk_stop_words]
print("\n--- NLTK Tokens after Stop Word Removal ---")
print(nltk_tokens_no_stopwords[:20])


--- Sample NLTK Stop Words ---
['against', 'there', 'do', 'such', "they'd", "isn't", 'are', 'so', 'its', 'needn']

--- NLTK Tokens after Stop Word Removal ---
['machine', 'learning', 'ml', 'field', 'study', 'artificial', 'intelligence', 'concerned', 'development', 'computer', 'algorithms', 'learn', 'make', 'predictions', 'data', 'algorithms', 'build', 'mathematical', 'model', 'based']


In [16]:
from collections import Counter
# Step 5: Count word frequencies
word_counts = Counter(nltk_tokens_no_stopwords)

# Display word counts
print(word_counts)

Counter({'learning': 7, 'machine': 6, 'data': 6, 'algorithms': 4, 'field': 3, 'predictions': 3, 'study': 2, 'computer': 2, 'make': 2, 'mathematical': 2, 'perform': 2, 'related': 2, 'application': 2, 'ml': 1, 'artificial': 1, 'intelligence': 1, 'concerned': 1, 'development': 1, 'learn': 1, 'build': 1, 'model': 1, 'based': 1, 'sample': 1, 'known': 1, 'training': 1, 'order': 1, 'decisions': 1, 'without': 1, 'explicitly': 1, 'programmed': 1, 'task': 1, 'used': 1, 'wide': 1, 'variety': 1, 'applications': 1, 'email': 1, 'filtering': 1, 'vision': 1, 'difficult': 1, 'infeasible': 1, 'develop': 1, 'conventional': 1, 'needed': 1, 'tasks': 1, 'subset': 1, 'closely': 1, 'computational': 1, 'statistics': 1, 'focuses': 1, 'making': 1, 'using': 1, 'computers': 1, 'optimization': 1, 'delivers': 1, 'methods': 1, 'theory': 1, 'domains': 1, 'mining': 1, 'focusing': 1, 'exploratory': 1, 'analysis': 1, 'unsupervised': 1, 'across': 1, 'business': 1, 'problems': 1, 'also': 1, 'referred': 1, 'predictive': 1, 

## 11) Stemming

In [17]:
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import wordnet


# --- Stemming with NLTK (PorterStemmer) ---
porter_stemmer = PorterStemmer()
stemmed_tokens = [porter_stemmer.stem(word) for word in nltk_tokens_no_stopwords] # Apply to NLTK tokens after stop word removal

print("--- Stemmed Tokens (NLTK PorterStemmer) ---")
print(stemmed_tokens[:20])

#Example
#    learning -> learn
#    machine -> machin (it can sometimes cut of last few letters of the words and thus it can also produce incorrect words)




--- Stemmed Tokens (NLTK PorterStemmer) ---
['machin', 'learn', 'ml', 'field', 'studi', 'artifici', 'intellig', 'concern', 'develop', 'comput', 'algorithm', 'learn', 'make', 'predict', 'data', 'algorithm', 'build', 'mathemat', 'model', 'base']


## 12) Lemmatization

In [18]:
# Lemmatization: A more sophisticated process that uses a vocabulary and morphological analysis to return the
# base or dictionary form of a word (known as the lemma).
# It's slower but produces actual words (e.g., "better" -> "good").
# Lemmatization often requires the Part-of-Speech (POS) tag of the word to be accurate.

# from nltk.stem import WordNetLemmatizer
# from nltk.corpus import stopwords, wordnet
# from nltk import pos_tag, word_tokenize

# lemmatizer = WordNetLemmatizer()
# stop_words = set(stopwords.words('english'))

# def get_wordnet_pos(treebank_tag):
#     """Map POS tag to WordNet format."""
#     if treebank_tag.startswith('J'):
#         return wordnet.ADJ
#     elif treebank_tag.startswith('V'):
#         return wordnet.VERB
#     elif treebank_tag.startswith('N'):
#         return wordnet.NOUN
#     elif treebank_tag.startswith('R'):
#         return wordnet.ADV
#     else:
#         return wordnet.NOUN  # Default POS

# # POS tagging
# pos_tagged = pos_tag(nltk_tokens_no_stopwords)
# print("--- pos tag ---")
# print(pos_tagged[:20])

# # Lemmatize using POS
# nltk_lemmatized_tokens = [
#     lemmatizer.lemmatize(word, get_wordnet_pos(pos))
#     for word, pos in pos_tagged
# ]

# print("\n--- Lemmatized Tokens (NLTK WordNetLemmatizer) ---")
# print(nltk_lemmatized_tokens[:20])

import spacy # altn. for nltk

import spacy.cli
spacy.cli.download("en_core_web_sm") #downloads the corresponding english spacy model for spacy version

# Load the small English spaCy model
nlp = spacy.load('en_core_web_sm')

spacy_doc_raw = nlp(raw_text) # Using raw_text
spacy_sentences = [sent.text for sent in spacy_doc_raw.sents]
print("\n--- Sentence Tokenization (spaCy) ---")
for i, sentence in enumerate(spacy_sentences):
    print(f"Sentence {i+1}: {sentence}")

# Using spaCy (on the cleaned text)
# spaCy's Doc object already contains tokens
spacy_doc_cleaned = nlp(cleaned_text)
spacy_word_tokens = [token.text for token in spacy_doc_cleaned]
print("\n--- Word Tokenization (spaCy) ---")
print(spacy_word_tokens[:20]) # Print first 20 tokens

# --- Lemmatization with spaCy ---

# Extract lemmas from spaCy Doc object (on cleaned text, filtering stop words)
spacy_lemmatized_tokens = [token.lemma_ for token in spacy_doc_cleaned if not token.is_stop]

print("\n--- Lemmatized Tokens (spaCy) ---")
print(spacy_lemmatized_tokens[:20])

# Note: spaCy's lemmatizer is generally more accurate as it uses context and
# doesn't require manual POS tag conversion like NLTK's WordNetLemmatizer

[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')

--- Sentence Tokenization (spaCy) ---
Sentence 1: 
Machine learning (ML) is a field of study in artificial intelligence concerned with the development of computer algorithms that can learn from and make predictions on data.
Sentence 2: Algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.


Sentence 3: Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.
Sentence 4: A subset of machine learning is closely related to computational statistics, which focuses on making predictions using computers.
Sentence 5: Mathematical optimization delivers methods, theory and application domains to t

## 13) Bag of words (BOW)

In [19]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
import pandas as pd

# Sample noisy texts
texts = [
    "Machine learning rocks! It's revolutionizing the world in 2023 (and beyond!). Visit our site: http://example.com for more info.",
    "This is awesome!!! We collected 1,234 data points.",
    "Softbank and Google are major players.",
    "Predictive analytics uses machine learning to solve business problems.",
]

# Initialize NLTK tools
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def clean_text(text):
    # Lowercase
    text = text.lower()
    # Remove URLs
    text = re.sub(r'https?://\S+|www\.\S+', '', text)
    # Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))
    # Remove numbers
    text = re.sub(r'\d+', '', text)
    # Remove extra spaces
    text = re.sub(r'\s+', ' ', text).strip()

    # Tokenize
    tokens = word_tokenize(text)

    # POS tagging
    tagged_tokens = pos_tag(tokens)

    # Lemmatize, remove stopwords, and keep words > 1 char
    lemmatized_tokens = [
        lemmatizer.lemmatize(token, get_wordnet_pos(tag))
        for token, tag in tagged_tokens
        if token not in stop_words and len(token) > 1
    ]

    return " ".join(lemmatized_tokens)

# Apply cleaning to all texts
cleaned_texts = [clean_text(text) for text in texts]

# Show result
print("Cleaned Texts:")
for i, txt in enumerate(cleaned_texts, 1):
    print(f"{i}: {txt}")

# Using sklearn CountVectorizer for BoW
vectorizer = CountVectorizer()
bow_matrix = vectorizer.fit_transform(cleaned_texts)

print("\nBoW Feature Names:", vectorizer.get_feature_names_out())

print("\nBoW Matrix (Count of words per document):")
bow_matrix.toarray()



NameError: name 'pos_tag' is not defined

In [None]:
feature_names =vectorizer.get_feature_names_out()
bow_values = bow_matrix.toarray()

# Create a DataFrame
bow_df = pd.DataFrame(bow_values, columns=feature_names)

# Add row indices as Document IDs
bow_df.index = [f"Doc_{i+1}" for i in range(len(bow_df))]

# Display the DataFrame
print("\nTF-IDF DataFrame:")
bow_df.round(3)  # Optional: round for readability

## 14) TF-IDF (Term Frequency - Inverse Document Frequency)

In [None]:
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(cleaned_texts)

print("TF-IDF Feature Names:", tfidf_vectorizer.get_feature_names_out())

print("\nTF-IDF Matrix (weights per word per document):")
print(tfidf_matrix.toarray())


In [None]:
# Get feature names and TF-IDF matrix values
feature_names = tfidf_vectorizer.get_feature_names_out()
tfidf_values = tfidf_matrix.toarray()
# Create a DataFrame
tfidf_df = pd.DataFrame(tfidf_values, columns=feature_names)

# Add row indices as Document IDs
tfidf_df.index = [f"Doc_{i+1}" for i in range(len(tfidf_df))]

# Display the DataFrame
print("\nTF-IDF DataFrame:")
tfidf_df.round(3)  # Optional: roun

## 15) Part-of-Speech (POS) Tagging using spacy

In [None]:
print("\n--- POS Tagging with spaCy ---")
# spaCy tokens have .pos_ (coarse-grained) and .tag_ (fine-grained) attributes
for i, token in enumerate(spacy_doc_cleaned):
    print(f"{token.text}: {token.pos_} ({token.tag_})")
    if i >= 20: break # Limit output

        
        
        
# SpaCy's POS tagging is part of its standard pipeline and is generally very accurate.
# It provides both coarse-grained (`.pos_`) and fine-grained (`.tag_`) POS tags.

## 16) Shoallow chunking

In [None]:
# Using spaCy to find noun chunks
print("--- Noun Chunking with spaCy ---")
for chunk in spacy_doc_cleaned.noun_chunks:
    print(chunk.text)

# SpaCy's noun chunker is a simple form of shallow parsing.
# More complex chunking (e.g., identifying verb phrases) can be done by defining patterns
# or using libraries specifically for chunking with NLTK or spaCy extensions.


## 17) Named Entity Recognition (NER) 
It is the process of identifying and classifying named entities in text into predefined categories such as person names, organizations, locations, dates, etc. This is crucial for extracting structured information from unstructured text.

SpaCy has excellent built-in NER capabilities.

In [None]:
# Using spaCy for Named Entity Recognition
print("--- Named Entity Recognition (NER) with spaCy ---")
for ent in spacy_doc_raw.ents: # Accessing .ents attribute for detected entities
    if ent.label_ in ['ORG', 'GPE', 'PERSON', 'DATE', 'CARDINAL', 'LOC']: # Filter for common entity types
         print(f"Entity: {ent.text}, Type: {ent.label_}")

# SpaCy can identify various types of entities.
# This is highly valuable for information extraction from documents.

## 18) Vader sentiment analysis
How VADER Works 

Uses a sentiment dictionary:

Words like "great", "amazing" = positive

Words like "bad", "terrible" = negative

Applies rules:

Boosts scores for ALL CAPS, exclamation marks (!), degree modifiers ("very", "extremely")

Handles negations like "not good" → negative

Calculates scores:

Scores each word

Applies modifiers

Combines into neg, neu, pos

Then computes the final compound score using a normalization formula (-1 to 1).

     Compound Score Range	        Sentiment Interpretation
        >= 0.05	                      Positive sentiment 👍
        > -0.05 and < 0.05	          Neutral sentiment 😐
        <= -0.05	                  Negative sentiment 👎

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

nltk.download('vader_lexicon')

analyzer = SentimentIntensityAnalyzer()

In [None]:
# Example sentiment analysis on a sentence from our original text
sentence_for_sentiment = "Machine learning is a field of study in artificial intelligence."
vs = analyzer.polarity_scores(sentence_for_sentiment)
print("\n--- Sentiment Analysis with NLTK's VADER ---")
print(f"Sentence: {sentence_for_sentiment}")
print(f"VADER Polarity Scores: {vs}")

In [None]:
sentence_for_sentiment_2 = "Machine learning rocks! This is awesome!!!"
vs2 = analyzer.polarity_scores(sentence_for_sentiment_2)
print(f"Sentence: {sentence_for_sentiment_2}")
print(f"VADER Polarity Scores: {vs2}")
