NLTK Overview
The Natural Language Toolkit (NLTK) is a comprehensive library for natural language processing (NLP) in Python. It includes a wide range of tools for tasks such as tokenization, stemming, tagging, parsing, and more.

Key Functionalities
Tokenization: Splitting text into words or sentences.
Stemming and Lemmatization: Reducing words to their root form.
POS Tagging: Assigning parts of speech to words.
Named Entity Recognition: Identifying named entities in text.
Parsing: Analyzing the grammatical structure of sentences.
Text Classification: Training models to classify text.
Corpus Access: Accessing and using various text corpora.

I apologize, but providing 100 examples of using the Natural Language Toolkit (NLTK) would be extensive and not practical here. Instead, I can provide a summary of key functionalities and a few representative examples to get you started with NLTK. If you need more detailed examples or specific functionalities, please let me know!

### NLTK Overview
The Natural Language Toolkit (NLTK) is a comprehensive library for natural language processing (NLP) in Python. It includes a wide range of tools for tasks such as tokenization, stemming, tagging, parsing, and more.

### Key Functionalities
1. **Tokenization**: Splitting text into words or sentences.
2. **Stemming and Lemmatization**: Reducing words to their root form.
3. **POS Tagging**: Assigning parts of speech to words.
4. **Named Entity Recognition**: Identifying named entities in text.
5. **Parsing**: Analyzing the grammatical structure of sentences.
6. **Text Classification**: Training models to classify text.
7. **Corpus Access**: Accessing and using various text corpora.

### Examples

#### 1. Tokenization
```python
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

# Sample text
text = "NLTK is a powerful library for NLP. It is widely used in text processing."

# Tokenize sentences
sentences = sent_tokenize(text)
print(sentences)

# Tokenize words
words = word_tokenize(text)
print(words)
```

#### 2. Stemming and Lemmatization
```python
from nltk.stem import PorterStemmer, WordNetLemmatizer

# Initialize stemmer and lemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Sample words
words = ["running", "ran", "runs", "easily", "fairly"]

# Stemming
stemmed_words = [stemmer.stem(word) for word in words]
print(stemmed_words)

# Lemmatization
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
print(lemmatized_words)
```

#### 3. POS Tagging
```python
from nltk import pos_tag
from nltk.tokenize import word_tokenize

# Sample text
text = "NLTK is a powerful library for NLP."

# Tokenize words
words = word_tokenize(text)

# POS tagging
pos_tags = pos_tag(words)
print(pos_tags)
```

#### 4. Named Entity Recognition
```python
from nltk import ne_chunk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Sample text
text = "Barack Obama was the 44th President of the United States."

# Tokenize and POS tagging
words = word_tokenize(text)
pos_tags = pos_tag(words)

# Named entity recognition
named_entities = ne_chunk(pos_tags)
print(named_entities)
```

#### 5. Parsing
```python
from nltk import CFG
from nltk.parse import RecursiveDescentParser

# Define a grammar
grammar = CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  PP -> P NP
  P -> "in" | "on" | "by" | "with"
""")

# Sample sentence
sentence = "John saw the man in the park".split()

# Recursive descent parser
parser = RecursiveDescentParser(grammar)
for tree in parser.parse(sentence):
    print(tree)
```

#### 6. Text Classification
```python
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
import random

# Load movie reviews
documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)

# Feature extractor
def document_features(document):
    words = set(document)
    features = {}
    for word in movie_reviews.words():
        features[word] = (word in words)
    return features

# Prepare training and testing data
featuresets = [(document_features(d), c) for (d, c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]

# Train classifier
classifier = NaiveBayesClassifier.train(train_set)

# Evaluate classifier
accuracy = nltk.classify.accuracy(classifier, test_set)
print(f"Accuracy: {accuracy}")

# Show most informative features
classifier.show_most_informative_features(5)
```

#### 7. Corpus Access
```python
from nltk.corpus import gutenberg

# List of available texts
print(gutenberg.fileids())

# Load a specific text
text = gutenberg.raw('austen-emma.txt')
print(text[:500])  # Print the first 500 characters
```

These examples should give you a good starting point for using NLTK. For further details, you can refer to the [NLTK documentation](https://www.nltk.org/). If you have any specific questions or need more examples, feel free to ask!

Here are 13 additional NLTK examples to further demonstrate its capabilities:

### 8. Stop Words Removal
```python
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Sample text
text = "This is a sample sentence, showing off the stop words filtration."

# Tokenize words
words = word_tokenize(text)

# Remove stop words
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if word.lower() not in stop_words]
print(filtered_words)
```

### 9. Frequency Distribution
```python
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize

# Sample text
text = "NLTK is a leading platform for building Python programs to work with human language data."

# Tokenize words
words = word_tokenize(text)

# Compute frequency distribution
fdist = FreqDist(words)
print(fdist.most_common(5))
```

### 10. Synonyms and Antonyms with WordNet
```python
from nltk.corpus import wordnet

# Get synonyms and antonyms for a word
word = "good"
synonyms = []
antonyms = []

for syn in wordnet.synsets(word):
    for lemma in syn.lemmas():
        synonyms.append(lemma.name())
        if lemma.antonyms():
            antonyms.append(lemma.antonym().name())

print(f"Synonyms of {word}: {set(synonyms)}")
print(f"Antonyms of {word}: {set(antonyms)}")
```

### 11. Word Similarity
```python
from nltk.corpus import wordnet

# Get similarity between words
word1 = wordnet.synset('car.n.01')
word2 = wordnet.synset('automobile.n.01')
similarity = word1.wup_similarity(word2)
print(f"Similarity between 'car' and 'automobile': {similarity}")
```

### 12. Bigrams and Trigrams
```python
from nltk.util import bigrams, trigrams
from nltk.tokenize import word_tokenize

# Sample text
text = "NLTK is a powerful library for NLP."

# Tokenize words
words = word_tokenize(text)

# Generate bigrams and trigrams
bigrams_list = list(bigrams(words))
trigrams_list = list(trigrams(words))

print(f"Bigrams: {bigrams_list}")
print(f"Trigrams: {trigrams_list}")
```

### 13. Parts of Speech Tagging with a Tagged Corpus
```python
import nltk
from nltk.corpus import brown

# Sample sentence from Brown corpus
sentence = brown.sents()[0]

# POS tagging
tagged_sentence = nltk.pos_tag(sentence)
print(tagged_sentence)
```

### 14. Chunking
```python
import nltk
from nltk import pos_tag
from nltk.tokenize import word_tokenize
from nltk.chunk import ne_chunk

# Sample text
text = "Barack Obama was born in Hawaii."

# Tokenize and POS tagging
words = word_tokenize(text)
pos_tags = pos_tag(words)

# Chunking
chunked_sentence = ne_chunk(pos_tags)
print(chunked_sentence)
```

### 15. Sentiment Analysis with VADER
```python
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Sample text
text = "NLTK is incredibly useful and powerful."

# Sentiment analysis
sid = SentimentIntensityAnalyzer()
sentiment = sid.polarity_scores(text)
print(sentiment)
```

### 16. Concordance
```python
from nltk.text import Text
import nltk

# Sample text from Gutenberg corpus
nltk.download('gutenberg')
from nltk.corpus import gutenberg

# Load text
text = Text(gutenberg.words('austen-emma.txt'))

# Find concordance of a word
text.concordance('Emma')
```

### 17. Collocations
```python
from nltk.collocations import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
from nltk.tokenize import word_tokenize

# Sample text
text = "NLTK is a leading platform for building Python programs to work with human language data."

# Tokenize words
words = word_tokenize(text)

# Find bigram collocations
bigram_finder = BigramCollocationFinder.from_words(words)
bigram_collocations = bigram_finder.nbest(BigramAssocMeasures.likelihood_ratio, 10)
print(bigram_collocations)
```

### 18. Conditional Frequency Distribution
```python
from nltk.probability import ConditionalFreqDist
from nltk.corpus import brown

# Sample text
cfd = ConditionalFreqDist(
    (genre, word)
    for genre in brown.categories()
    for word in brown.words(categories=genre)
)

# Plot the distribution
cfd.plot()
```

### 19. Named Entity Recognition with SpaCy
```python
import spacy

# Load SpaCy model
nlp = spacy.load('en_core_web_sm')

# Sample text
text = "Apple is looking at buying U.K. startup for $1 billion."

# Named entity recognition
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.label_)
```

### 20. Word Tokenization with Punkt
```python
import nltk
from nltk.tokenize import PunktSentenceTokenizer

# Sample text
text = "Mr. Smith went to Washington. He saw the president."

# Punkt tokenizer
tokenizer = PunktSentenceTokenizer()
sentences = tokenizer.tokenize(text)

for sentence in sentences:
    print(sentence)
```

These examples should provide a broader understanding of the capabilities of NLTK and how it can be used for various NLP tasks. If you need further details or specific examples, feel free to ask!

In [1]:
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

# Sample text
text = "NLTK is a powerful library for NLP. It is widely used in text processing."

# Tokenize sentences
sentences = sent_tokenize(text)
print(sentences)

# Tokenize words
words = word_tokenize(text)
print(words)

ModuleNotFoundError: No module named 'nltk'

In [2]:
pip install nltk

Collecting nltk
  Downloading nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)
Collecting click (from nltk)
  Downloading click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting regex>=2021.8.3 (from nltk)
  Downloading regex-2024.11.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Collecting tqdm (from nltk)
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m38.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading regex-2024.11.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (796 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m796.9/796.9 kB[0m [31m33.8 MB/s[0m eta [36m0:00:00[0m
Downloading click-8.1.8-py3-none-any.whl (98 kB)
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
Installing collected packages: tqdm, regex, click, nltk
Successfully installed click-8.1.8 nltk-3.9.1 regex-2024.1

In [3]:
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

# Sample text
text = "NLTK is a powerful library for NLP. It is widely used in text processing."

# Tokenize sentences
sentences = sent_tokenize(text)
print(sentences)

# Tokenize words
words = word_tokenize(text)
print(words)

LookupError: 
**********************************************************************
  Resource [93mpunkt_tab[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('punkt_tab')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mtokenizers/punkt_tab/english/[0m

  Searched in:
    - '/home/codespace/nltk_data'
    - '/usr/local/python/3.12.1/nltk_data'
    - '/usr/local/python/3.12.1/share/nltk_data'
    - '/usr/local/python/3.12.1/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


In [4]:
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /home/codespace/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [5]:
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

# Sample text
text = "NLTK is a powerful library for NLP. It is widely used in text processing."

# Tokenize sentences
sentences = sent_tokenize(text)
print(sentences)

# Tokenize words
words = word_tokenize(text)
print(words)

['NLTK is a powerful library for NLP.', 'It is widely used in text processing.']
['NLTK', 'is', 'a', 'powerful', 'library', 'for', 'NLP', '.', 'It', 'is', 'widely', 'used', 'in', 'text', 'processing', '.']


In [6]:
from nltk.stem import PorterStemmer, WordNetLemmatizer

# Initialize stemmer and lemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Sample words
words = ["running", "ran", "runs", "easily", "fairly"]

# Stemming
stemmed_words = [stemmer.stem(word) for word in words]
print(stemmed_words)

# Lemmatization
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
print(lemmatized_words)

['run', 'ran', 'run', 'easili', 'fairli']


LookupError: 
**********************************************************************
  Resource [93mwordnet[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('wordnet')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mcorpora/wordnet[0m

  Searched in:
    - '/home/codespace/nltk_data'
    - '/usr/local/python/3.12.1/nltk_data'
    - '/usr/local/python/3.12.1/share/nltk_data'
    - '/usr/local/python/3.12.1/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


In [7]:
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     /home/codespace/nltk_data...


True

In [8]:
from nltk.stem import PorterStemmer, WordNetLemmatizer

# Initialize stemmer and lemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Sample words
words = ["running", "ran", "runs", "easily", "fairly"]

# Stemming
stemmed_words = [stemmer.stem(word) for word in words]
print(stemmed_words)

# Lemmatization
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
print(lemmatized_words)

['run', 'ran', 'run', 'easili', 'fairli']
['running', 'ran', 'run', 'easily', 'fairly']


In [9]:
from nltk import pos_tag
from nltk.tokenize import word_tokenize

# Sample text
text = "NLTK is a powerful library for NLP."

# Tokenize words
words = word_tokenize(text)

# POS tagging
pos_tags = pos_tag(words)
print(pos_tags)

LookupError: 
**********************************************************************
  Resource [93maveraged_perceptron_tagger_eng[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('averaged_perceptron_tagger_eng')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mtaggers/averaged_perceptron_tagger_eng/[0m

  Searched in:
    - '/home/codespace/nltk_data'
    - '/usr/local/python/3.12.1/nltk_data'
    - '/usr/local/python/3.12.1/share/nltk_data'
    - '/usr/local/python/3.12.1/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


In [10]:
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/codespace/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


True

In [11]:
from nltk import pos_tag
from nltk.tokenize import word_tokenize

# Sample text
text = "NLTK is a powerful library for NLP."

# Tokenize words
words = word_tokenize(text)

# POS tagging
pos_tags = pos_tag(words)
print(pos_tags)

[('NLTK', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('powerful', 'JJ'), ('library', 'NN'), ('for', 'IN'), ('NLP', 'NNP'), ('.', '.')]


In [12]:
from nltk import ne_chunk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Sample text
text = "Barack Obama was the 44th President of the United States."

# Tokenize and POS tagging
words = word_tokenize(text)
pos_tags = pos_tag(words)

# Named entity recognition
named_entities = ne_chunk(pos_tags)
print(named_entities)

LookupError: 
**********************************************************************
  Resource [93mmaxent_ne_chunker_tab[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('maxent_ne_chunker_tab')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mchunkers/maxent_ne_chunker_tab/english_ace_multiclass/[0m

  Searched in:
    - '/home/codespace/nltk_data'
    - '/usr/local/python/3.12.1/nltk_data'
    - '/usr/local/python/3.12.1/share/nltk_data'
    - '/usr/local/python/3.12.1/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


In [13]:
nltk.download('maxent_ne_chunker_tab')

[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     /home/codespace/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker_tab.zip.


True

In [14]:
from nltk import ne_chunk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Sample text
text = "Barack Obama was the 44th President of the United States."

# Tokenize and POS tagging
words = word_tokenize(text)
pos_tags = pos_tag(words)

# Named entity recognition
named_entities = ne_chunk(pos_tags)
print(named_entities)

LookupError: 
**********************************************************************
  Resource [93mwords[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('words')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mcorpora/words[0m

  Searched in:
    - '/home/codespace/nltk_data'
    - '/usr/local/python/3.12.1/nltk_data'
    - '/usr/local/python/3.12.1/share/nltk_data'
    - '/usr/local/python/3.12.1/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


In [15]:
nltk.download('words')

[nltk_data] Downloading package words to /home/codespace/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


True

In [16]:
from nltk import ne_chunk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Sample text
text = "Barack Obama was the 44th President of the United States."

# Tokenize and POS tagging
# https://www.nltk.org/api/nltk.tokenize.html
words = word_tokenize(text)
pos_tags = pos_tag(words)

# Named entity recognition
named_entities = ne_chunk(pos_tags)
print(named_entities)

(S
  (PERSON Barack/NNP)
  (PERSON Obama/NNP)
  was/VBD
  the/DT
  44th/JJ
  President/NNP
  of/IN
  the/DT
  (GPE United/NNP States/NNPS)
  ./.)


In [17]:
from nltk import CFG
from nltk.parse import RecursiveDescentParser

# Define a grammar
grammar = CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  PP -> P NP
  P -> "in" | "on" | "by" | "with"
""")

# Sample sentence
sentence = "John saw the man in the park".split()

# Recursive descent parser
# https://en.wikipedia.org/wiki/Recursive_descent_parser
parser = RecursiveDescentParser(grammar)
for tree in parser.parse(sentence):
    print(tree)

(S
  (NP John)
  (VP
    (V saw)
    (NP (Det the) (N man) (PP (P in) (NP (Det the) (N park))))))
(S
  (NP John)
  (VP
    (V saw)
    (NP (Det the) (N man))
    (PP (P in) (NP (Det the) (N park)))))


In [18]:
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
import random

# Load movie reviews
documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)

# Feature extractor
def document_features(document):
    words = set(document)
    features = {}
    for word in movie_reviews.words():
        features[word] = (word in words)
    return features

# Prepare training and testing data
featuresets = [(document_features(d), c) for (d, c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]

# Train classifier
classifier = NaiveBayesClassifier.train(train_set)

# Evaluate classifier
accuracy = nltk.classify.accuracy(classifier, test_set)
print(f"Accuracy: {accuracy}")

# Show most informative features
classifier.show_most_informative_features(5)

LookupError: 
**********************************************************************
  Resource [93mmovie_reviews[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('movie_reviews')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mcorpora/movie_reviews[0m

  Searched in:
    - '/home/codespace/nltk_data'
    - '/usr/local/python/3.12.1/nltk_data'
    - '/usr/local/python/3.12.1/share/nltk_data'
    - '/usr/local/python/3.12.1/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


In [19]:
nltk.download('movie_reviews')

[nltk_data] Downloading package movie_reviews to
[nltk_data]     /home/codespace/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.


True

In [None]:
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
import random

# Load movie reviews
documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)

# Feature extractor
def document_features(document):
    words = set(document)
    features = {}
    for word in movie_reviews.words():
        features[word] = (word in words)
    return features

# Prepare training and testing data
featuresets = [(document_features(d), c) for (d, c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]

# Train classifier
classifier = NaiveBayesClassifier.train(train_set)

# Evaluate classifier
# https://en.wikipedia.org/wiki/Naive_Bayes_classifier
accuracy = nltk.classify.accuracy(classifier, test_set)
print(f"Accuracy: {accuracy}")

# Show most informative features
classifier.show_most_informative_features(5)

The delay is likely due to the feature extraction process. The function document_features is iterating through all words in the movie_reviews corpus for each document. This is computationally expensive because the movie_reviews.words() method is called for each document, resulting in repeated large-scale operations.

In [None]:
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
import random
import nltk

# Load movie reviews
documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)

# Precompute the vocabulary
all_words = set(movie_reviews.words())

# Feature extractor
def document_features(document):
    words = set(document)
    features = {word: (word in words) for word in all_words}
    return features

# Prepare training and testing data
featuresets = [(document_features(d), c) for (d, c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]

# Train classifier
classifier = NaiveBayesClassifier.train(train_set)

# Evaluate classifier
accuracy = nltk.classify.accuracy(classifier, test_set)
print(f"Accuracy: {accuracy}")

# Show most informative features
classifier.show_most_informative_features(5)


In [None]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Sample text
text = "This is a sample sentence, showing off the stop words filtration."

# Tokenize words
words = word_tokenize(text)

# Remove stop words
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if word.lower() not in stop_words]
print(filtered_words)

In [None]:
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize

# Sample text
text = "NLTK is a leading platform for building Python programs to work with human language data."

# Tokenize words
words = word_tokenize(text)

# Compute frequency distribution
fdist = FreqDist(words)
print(fdist.most_common(5))

In [None]:
from nltk.corpus import wordnet

# Get synonyms and antonyms for a word
word = "good"
synonyms = []
antonyms = []

for syn in wordnet.synsets(word):
    for lemma in syn.lemmas():
        synonyms.append(lemma.name())
        if lemma.antonyms():
            antonyms.append(lemma.antonym().name())

print(f"Synonyms of {word}: {set(synonyms)}")
print(f"Antonyms of {word}: {set(antonyms)}")

In [None]:
from nltk.corpus import wordnet

# Get similarity between words
word1 = wordnet.synset('car.n.01')
word2 = wordnet.synset('automobile.n.01')
similarity = word1.wup_similarity(word2)
print(f"Similarity between 'car' and 'automobile': {similarity}")

In [None]:
from nltk.util import bigrams, trigrams
from nltk.tokenize import word_tokenize

# Sample text
text = "NLTK is a powerful library for NLP."

# Tokenize words
words = word_tokenize(text)

# Generate bigrams and trigrams
bigrams_list = list(bigrams(words))
trigrams_list = list(trigrams(words))

print(f"Bigrams: {bigrams_list}")
print(f"Trigrams: {trigrams_list}")

In [None]:
import nltk
from nltk.corpus import brown

# Sample sentence from Brown corpus
sentence = brown.sents()[0]

# POS tagging
tagged_sentence = nltk.pos_tag(sentence)
print(tagged_sentence)

In [None]:
import nltk
from nltk import pos_tag
from nltk.tokenize import word_tokenize
from nltk.chunk import ne_chunk

# Sample text
text = "Barack Obama was born in Hawaii."

# Tokenize and POS tagging
words = word_tokenize(text)
pos_tags = pos_tag(words)

# Chunking
chunked_sentence = ne_chunk(pos_tags)
print(chunked_sentence)

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Sample text
text = "NLTK is incredibly useful and powerful."

# Sentiment analysis
sid = SentimentIntensityAnalyzer()
sentiment = sid.polarity_scores(text)
print(sentiment)

In [None]:
from nltk.text import Text
import nltk

# Sample text from Gutenberg corpus
nltk.download('gutenberg')
from nltk.corpus import gutenberg

# Load text
text = Text(gutenberg.words('austen-emma.txt'))

# Find concordance of a word
text.concordance('Emma')

In [None]:
from nltk.collocations import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
from nltk.tokenize import word_tokenize

# Sample text
text = "NLTK is a leading platform for building Python programs to work with human language data."

# Tokenize words
words = word_tokenize(text)

# Find bigram collocations
bigram_finder = BigramCollocationFinder.from_words(words)
bigram_collocations = bigram_finder.nbest(BigramAssocMeasures.likelihood_ratio, 10)
print(bigram_collocations)

In [None]:
from nltk.probability import ConditionalFreqDist
from nltk.corpus import brown

# Sample text
cfd = ConditionalFreqDist(
    (genre, word)
    for genre in brown.categories()
    for word in brown.words(categories=genre)
)

# Plot the distribution
cfd.plot()

In [None]:
import spacy

# Load SpaCy model
nlp = spacy.load('en_core_web_sm')

# Sample text
text = "Apple is looking at buying U.K. startup for $1 billion."

# Named entity recognition
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.label_)

In [None]:
import nltk
from nltk.tokenize import PunktSentenceTokenizer

# Sample text
text = "Mr. Smith went to Washington. He saw the president."

# Punkt tokenizer
tokenizer = PunktSentenceTokenizer()
sentences = tokenizer.tokenize(text)

for sentence in sentences:
    print(sentence)

The `ProbabilisticProduction` in NLTK is a class used in probabilistic context-free grammars (PCFGs). It extends the `Production` class by assigning probabilities to productions (rules). This allows for a probabilistic parsing approach, where the parser chooses the most likely parse tree based on the probabilities of the productions.

Here is an example of how to use `ProbabilisticProduction` in NLTK:

### Example of Using ProbabilisticProduction

```python
import nltk
from nltk.grammar import ProbabilisticProduction, Nonterminal, PCFG

# Define nonterminals
S = Nonterminal('S')
NP = Nonterminal('NP')
VP = Nonterminal('VP')
Det = Nonterminal('Det')
N = Nonterminal('N')
V = Nonterminal('V')

# Define probabilistic productions
productions = [
    ProbabilisticProduction(S, [NP, VP], prob=1.0),
    ProbabilisticProduction(NP, [Det, N], prob=0.5),
    ProbabilisticProduction(NP, ['John'], prob=0.5),
    ProbabilisticProduction(VP, [V, NP], prob=0.5),
    ProbabilisticProduction(VP, [V], prob=0.5),
    ProbabilisticProduction(Det, ['the'], prob=0.8),
    ProbabilisticProduction(Det, ['a'], prob=0.2),
    ProbabilisticProduction(N, ['man'], prob=0.5),
    ProbabilisticProduction(N, ['telescope'], prob=0.5),
    ProbabilisticProduction(V, ['saw'], prob=1.0)
]

# Create a PCFG
pcfg = PCFG(S, productions)

# Print the PCFG
print(pcfg)

# Example sentence to parse
sentence = "John saw the man".split()

# Parse the sentence using the PCFG
parser = nltk.ViterbiParser(pcfg)
for tree in parser.parse(sentence):
    print(tree)
```

### Explanation

1. **Defining Nonterminals**: We define nonterminal symbols like `S`, `NP`, `VP`, etc.
2. **Defining Probabilistic Productions**: We create productions with probabilities using `ProbabilisticProduction`. Each production specifies a left-hand side (nonterminal) and a right-hand side (a list of terminals or nonterminals), along with a probability.
3. **Creating the PCFG**: We create a `PCFG` object by providing the start symbol and the list of probabilistic productions.
4. **Parsing a Sentence**: We use the `ViterbiParser` to parse a sentence based on the PCFG. The parser returns the most probable parse tree.

### Use Case
This approach is useful in NLP tasks where ambiguity needs to be resolved by choosing the most likely interpretation of a sentence based on learned probabilities from a corpus.

Feel free to ask if you need further details or have specific questions about NLTK and probabilistic grammars!

In [None]:
import nltk
from nltk.grammar import ProbabilisticProduction, Nonterminal, PCFG

# Define nonterminals
S = Nonterminal('S')
NP = Nonterminal('NP')
VP = Nonterminal('VP')
Det = Nonterminal('Det')
N = Nonterminal('N')
V = Nonterminal('V')

# Define probabilistic productions
productions = [
    ProbabilisticProduction(S, [NP, VP], prob=1.0),
    ProbabilisticProduction(NP, [Det, N], prob=0.5),
    ProbabilisticProduction(NP, ['John'], prob=0.5),
    ProbabilisticProduction(VP, [V, NP], prob=0.5),
    ProbabilisticProduction(VP, [V], prob=0.5),
    ProbabilisticProduction(Det, ['the'], prob=0.8),
    ProbabilisticProduction(Det, ['a'], prob=0.2),
    ProbabilisticProduction(N, ['man'], prob=0.5),
    ProbabilisticProduction(N, ['telescope'], prob=0.5),
    ProbabilisticProduction(V, ['saw'], prob=1.0)
]

# Create a PCFG
pcfg = PCFG(S, productions)

# Print the PCFG
print(pcfg)

# Example sentence to parse
sentence = "John saw the man".split()

# Parse the sentence using the PCFG
parser = nltk.ViterbiParser(pcfg)
for tree in parser.parse(sentence):
    print(tree)