### Q1. NLP Processing Steps

Write Python code to perform the following steps:

1. Segment into tokens  
2. Remove stopwords  
3. Apply lemmatization (not stemming)  
4. Keep only verbs and nouns (use POS tags)  

**Input text:**  
"John enjoys playing football while Mary loves reading books in the library."

In [1]:
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk import word_tokenize, pos_tag

nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('omw-1.4')

# Input text
text = "John enjoys playing football while Mary loves reading books in the library."

# Step 1: Tokenization
tokens = word_tokenize(text)

# Step 2: Remove stopwords and non-alphabetic tokens
stop_words = set(stopwords.words('english'))
filtered_tokens = [w for w in tokens if w.lower() not in stop_words and w.isalpha()]

# Step 3: Lemmatization
lemmatizer = WordNetLemmatizer()

# Helper function to map POS tags for lemmatization
def get_wordnet_pos(tag):
    if tag.startswith('J'):
        return 'a'  # adjective
    elif tag.startswith('V'):
        return 'v'  # verb
    elif tag.startswith('N'):
        return 'n'  # noun
    elif tag.startswith('R'):
        return 'r'  # adverb
    else:
        return None

# Step 4: POS tagging
pos_tags = pos_tag(filtered_tokens)

# Step 5: Lemmatize and keep only nouns & verbs
final_words = []
for word, tag in pos_tags:
    wn_tag = get_wordnet_pos(tag)
    if wn_tag in ('n', 'v'):  # keep only nouns and verbs
        lemma = lemmatizer.lemmatize(word.lower(), wn_tag)
        final_words.append(lemma)

print("Final output tokens:", final_words)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


Final output tokens: ['john', 'enjoy', 'play', 'football', 'mary', 'love', 'read', 'book']


### Q2. Named Entity Recognition (NER) and Pronoun Ambiguity Detection

Use Python and any NLP model to perform:

1. Named Entity Recognition (NER)  
2. Pronoun ambiguity detection using the following rule:  
   - If the text contains a pronoun ("he", "she", "they"), print:  
     **"Warning: Possible pronoun ambiguity detected!"**

**Input text:**  
"Chris met Alex at Apple headquarters in California. He told him about the new iPhone launch."


In [None]:
import spacy

nlp = spacy.load("en_core_web_sm")

text = "Chris met Alex at Apple headquarters in California. He told him about the new iPhone launch."

# 1. Named Entity Recognition (NER)
doc = nlp(text)

print("Named Entities:")
for ent in doc.ents:
    print(f"{ent.text}  -->  {ent.label_}")

# 2. Pronoun ambiguity detection
pronouns = {"he", "she", "they"}
words = [token.text.lower() for token in doc]

if any(p in words for p in pronouns):
    print("\nWarning: Possible pronoun ambiguity detected!")


Named Entities:
Chris  -->  PERSON
Alex  -->  PERSON
Apple  -->  ORG
California  -->  GPE
iPhone  -->  ORG

