# **Experiment-12**

### Objective: 
Write a program to read text data from a file and perform pre-processing, Word Sense Disambiguation and list of synonyms, antonyms, hypernyms and hyponyms of every word as obtained from the lexical ontology WordNet.

In [1]:
%pip install nltk pandas





[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
# Import necessary libraries
import nltk
from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.wsd import lesk
from nltk.stem import WordNetLemmatizer
import re

# Download required NLTK data files
nltk.download('punkt', download_dir="C:/nltk_data")
nltk.download('stopwords', download_dir="C:/nltk_data")
nltk.download('wordnet', download_dir="C:/nltk_data")    

[nltk_data] Downloading package punkt to C:/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to C:/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to C:/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [3]:
def read_text_file(file_path):
    """Read text data from the specified file."""
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
    return text

# Specify the file path
file_path = 'input.txt'

# Read the content of input.txt
text = read_text_file(file_path)
print("Content of input.txt:")
print(text)

Content of input.txt:
Microsoft announced a new product launch event in New York City on December 1st, 2024. 
Elon Musk, the CEO of Tesla, hinted at a partnership with NASA for a Mars mission. 
The GDP of India grew by 7% in the first quarter of 2023, according to a report by Reuters. 
Amazon is planning to open a new data center in Dublin, Ireland next year.
Barack Obama gave a keynote speech at Stanford University last Thursday.
Bitcoin reached an all-time high of $68,000 in November 2021.
The Louvre Museum in Paris saw record-breaking attendance last summer.



In [4]:
# Define a preprocessing function
def preprocess_text(text):
    # Convert to lowercase
    text = text.lower()
    # Remove punctuation and special characters
    text = re.sub(r'[^\w\s]', '', text)
    # Tokenize the text
    tokens = word_tokenize(text)
    # Remove stopwords
    tokens = [word for word in tokens if word not in stopwords.words('english')]
    # Lemmatize the tokens
    lemmatizer = WordNetLemmatizer()
    tokens = [lemmatizer.lemmatize(word) for word in tokens]
    return tokens

# Preprocess the text
tokens = preprocess_text(text)
print("Preprocessed Tokens:", tokens)

Preprocessed Tokens: ['microsoft', 'announced', 'new', 'product', 'launch', 'event', 'new', 'york', 'city', 'december', '1st', '2024', 'elon', 'musk', 'ceo', 'tesla', 'hinted', 'partnership', 'nasa', 'mar', 'mission', 'gdp', 'india', 'grew', '7', 'first', 'quarter', '2023', 'according', 'report', 'reuters', 'amazon', 'planning', 'open', 'new', 'data', 'center', 'dublin', 'ireland', 'next', 'year', 'barack', 'obama', 'gave', 'keynote', 'speech', 'stanford', 'university', 'last', 'thursday', 'bitcoin', 'reached', 'alltime', 'high', '68000', 'november', '2021', 'louvre', 'museum', 'paris', 'saw', 'recordbreaking', 'attendance', 'last', 'summer']


In [5]:
# Define a function for Word Sense Disambiguation and extracting lexical relations
def analyze_word(word, context):
    # Perform Word Sense Disambiguation using the Lesk algorithm
    sense = lesk(context, word)
    
    # Initialize lists for synonyms, antonyms, hypernyms, and hyponyms
    synonyms = set()
    antonyms = set()
    hypernyms = set()
    hyponyms = set()

    if sense:
        # Extract synonyms
        for lemma in sense.lemmas():
            synonyms.add(lemma.name())
            # Extract antonyms
            if lemma.antonyms():
                antonyms.update([ant.name() for ant in lemma.antonyms()])
        
        # Extract hypernyms and hyponyms
        hypernyms.update([hypernym.name() for hypernym in sense.hypernyms()])
        hyponyms.update([hyponym.name() for hyponym in sense.hyponyms()])

        # Display the analysis
        print(f"\nWord: {word}")
        print(f"Identified Sense: {sense.definition()}")
        print(f"Synonyms: {', '.join(synonyms) if synonyms else 'None'}")
        print(f"Antonyms: {', '.join(antonyms) if antonyms else 'None'}")
        print(f"Hypernyms: {', '.join(hypernyms) if hypernyms else 'None'}")
        print(f"Hyponyms: {', '.join(hyponyms) if hyponyms else 'None'}")
    else:
        print(f"\nWord: {word}")
        print("No identified sense found.")

In [6]:
# Analyze each word in the preprocessed tokens
context = tokens  # Use the entire token list as context
for word in tokens:
    analyze_word(word, context)


Word: microsoft
No identified sense found.

Word: announced
Identified Sense: declared publicly; made widely known
Synonyms: announced, proclaimed
Antonyms: None
Hypernyms: None
Hyponyms: None

Word: new
Identified Sense: (of a new kind or fashion) gratuitously new
Synonyms: new, newfangled
Antonyms: None
Hypernyms: None
Hyponyms: None



Word: product
Identified Sense: a consequence of someone's efforts or of a particular set of circumstances
Synonyms: product
Antonyms: None
Hypernyms: consequence.n.01
Hyponyms: None

Word: launch
Identified Sense: launch for the first time; launch on a maiden voyage
Synonyms: launch
Antonyms: None
Hypernyms: launch.v.05
Hyponyms: None

Word: event
Identified Sense: a phenomenon located at a single point in space-time; the fundamental observational entity in relativity theory
Synonyms: event
Antonyms: None
Hypernyms: physical_phenomenon.n.01
Hyponyms: None

Word: new
Identified Sense: (of a new kind or fashion) gratuitously new
Synonyms: new, newfangled
Antonyms: None
Hypernyms: None
Hyponyms: None

Word: york
Identified Sense: the English royal house (a branch of the Plantagenet line) that reigned from 1461 to 1485; its emblem was a white rose
Synonyms: York, House_of_York
Antonyms: None
Hypernyms: royalty.n.02, dynasty.n.01
Hyponyms: None

Word: city
Identified Sense: people living 