# Module 1, Week 1, Assignment 2: Exploring NLTK

In this assignment, you'll explore NLTK (Natural Language Toolkit), a powerful library for building NLP applications. You'll work on tasks like tagging, parsing, and basic text analysis using NLTK's capabilities.

---

## Objectives
- Learn to perform Part-of-Speech (POS) tagging using NLTK.
- Explore Named Entity Recognition (NER).
- Use NLTK to calculate word frequency distribution in text.
- Understand basic parsing techniques with NLTK.

### Instructions:
1. Follow the examples and explanations in each section.
2. Complete the **TODO** tasks to practice.
3. Reflect on the output and experiment with different texts.

---

## Step 1: Import Required Libraries and Download Data
We'll begin by importing NLTK and downloading the necessary datasets for this assignment.

In [None]:
# Import Libraries
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk
from nltk.probability import FreqDist
from nltk.tree import Tree

# Download Required NLTK Data Files
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')  # For POS tagging
nltk.download('maxent_ne_chunker')  # For NER
nltk.download('words')  # For NER
nltk.download('gutenberg')  # Sample texts

## Step 2: Part-of-Speech (POS) Tagging
POS tagging involves labeling each word in a sentence with its grammatical role (e.g., noun, verb, adjective). This is crucial for understanding the structure of a sentence.

In [None]:
# Sample Text
text = "Natural Language Processing helps computers understand human language."

# Tokenize and Apply POS Tagging
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
print("POS Tags:\n", pos_tags)

# TODO: Perform POS tagging on a custom sentence
# text_custom = "Your custom sentence here."
# tokens_custom = word_tokenize(text_custom)
# pos_tags_custom = pos_tag(tokens_custom)
# print("\nPOS Tags for Custom Text:\n", pos_tags_custom)

## Step 3: Named Entity Recognition (NER)
NER identifies and classifies entities in text, such as names of people, organizations, locations, etc.

In [None]:
# Apply NER on the Sample Text
ner_tree = ne_chunk(pos_tags)
print("Named Entities:")
for subtree in ner_tree:
    if isinstance(subtree, Tree):
        print(subtree)

# TODO: Perform NER on your custom sentence from Step 2
# ner_tree_custom = ne_chunk(pos_tags_custom)
# print("\nNamed Entities for Custom Text:")
# for subtree in ner_tree_custom:
#     if isinstance(subtree, Tree):
#         print(subtree)

## Step 4: Word Frequency Distribution
Analyzing the frequency of words in a text helps identify common themes or patterns.

In [None]:
# Sample Text for Frequency Distribution
sample_text = nltk.corpus.gutenberg.raw('austen-emma.txt')[:500]
tokens_sample = word_tokenize(sample_text)

# Calculate Frequency Distribution
fdist = FreqDist(tokens_sample)
print("Most Common Words:\n", fdist.most_common(10))

# Plot Frequency Distribution
fdist.plot(10, title="Top 10 Words in Sample Text")

# TODO: Perform frequency distribution analysis on a custom paragraph
# custom_text = "Your custom paragraph here."
# tokens_custom = word_tokenize(custom_text)
# fdist_custom = FreqDist(tokens_custom)
# print("\nMost Common Words in Custom Text:\n", fdist_custom.most_common(10))
# fdist_custom.plot(10, title="Top 10 Words in Custom Text")

## Step 5: Parsing Sentences
Parsing involves analyzing the grammatical structure of a sentence. NLTK provides tools for syntactic parsing.

In [None]:
# Define a Simple Grammar for Parsing
grammar = nltk.CFG.fromstring("""
  S -> NP VP
  NP -> DT NN
  VP -> VBZ NP
  DT -> 'the'
  NN -> 'cat' | 'mat'
  VBZ -> 'sits'
""")

# Create a Parser
parser = nltk.ChartParser(grammar)

# Parse a Sentence
sentence = ['the', 'cat', 'sits', 'the', 'mat']
for tree in parser.parse(sentence):
    print(tree)
    tree.draw()

# TODO: Define a custom grammar and parse a new sentence
# custom_grammar = nltk.CFG.fromstring("""
# Your custom grammar here
# """)
# custom_parser = nltk.ChartParser(custom_grammar)
# custom_sentence = ["Your", "custom", "sentence", "tokens"]
# for tree in custom_parser.parse(custom_sentence):
#     print(tree)
#     tree.draw()

### Congratulations! 🎉
You have completed the second assignment on exploring NLTK. You've worked with POS tagging, NER, frequency distribution, and basic parsing, which are foundational skills for NLP tasks.

---

### Reflection:
- How does NER help in extracting important information from text?
- What insights can you gain from word frequency distribution?
- Experiment with different grammars and sentences in parsing. What challenges do you encounter?

Feel free to expand on these techniques and explore NLTK's additional functionalities!