<a href="https://colab.research.google.com/github/skar2019/ai-ml/blob/main/NLTK_Examples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To get started, you’ll need to install NLTK and download the NLTK data package:

In [None]:
# Installation and setup
!pip install nltk
import nltk
nltk.download('all')  # You can download specific datasets as needed


**Tokenization**
Tokenization splits text into words or sentences.

In [None]:
from nltk.tokenize import word_tokenize, sent_tokenize

text = "Hello! How are you today? NLP is fascinating."
word_tokens = word_tokenize(text)  # Word tokenization
sentence_tokens = sent_tokenize(text)  # Sentence tokenization

print("Word Tokens:", word_tokens)
print("Sentence Tokens:", sentence_tokens)


**Stemming and Lemmatization**
Simplify words to their root or base form.

In [None]:
from nltk.stem import PorterStemmer, WordNetLemmatizer

stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

word = "running"
print("Stemming:", stemmer.stem(word))
print("Lemmatization:", lemmatizer.lemmatize(word, pos="v"))


**Part-of-Speech (POS) Tagging**
Label each word with its grammatical role.

In [None]:
from nltk import pos_tag
from nltk.tokenize import word_tokenize

text = "NLTK is a great toolkit for NLP."
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)

print("POS Tags:", pos_tags)


**Named Entity Recognition (NER)**
Identify entities such as names, organizations, and locations.

In [None]:
from nltk import ne_chunk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

text = "Apple was founded by Steve Jobs in California."
tokens = word_tokenize(text)
tags = pos_tag(tokens)
named_entities = ne_chunk(tags)

print("Named Entities:", named_entities)


**Text Generation**
Generate random sentences based on a model or corpus.

In [None]:
from nltk.corpus import genesis
from nltk import bigrams, FreqDist, ConditionalFreqDist
import random

genesis_words = genesis.words('english-kjv.txt')
bigrams = nltk.bigrams(genesis_words)
cfd = ConditionalFreqDist(bigrams)

word = 'God'
for i in range(15):
    print(word, end=' ')
    word = random.choice(list(cfd[word].keys()))


**Sentiment Analysis**
Analyze sentiment in text (e.g., positive, negative).

In [None]:
from nltk.sentiment import SentimentIntensityAnalyzer

sia = SentimentIntensityAnalyzer()
text = "I love NLP! It’s such an interesting field."

sentiment = sia.polarity_scores(text)
print("Sentiment Analysis:", sentiment)
