<a href="https://colab.research.google.com/github/poojitha-135/LLM/blob/main/exp4(llm).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#4a. Named Entity Recognition
from tensorflow.keras.datasets import imdb
# Load IMDb dataset with limited vocabulary
(train_data, _), _ = imdb.load_data(num_words=1000)
# Obtain word index dictionary
index_map = imdb.get_word_index()
# Create reverse mapping to decode integers to words
index_to_word = {}
for word, index in index_map.items():
    index_to_word[index + 3] = word
index_to_word[0] = "<PAD>"
index_to_word[1] = "<START>"
index_to_word[2] = "<UNK>"
# Function to convert encoded review into text
def convert_to_text(encoded_review):
    decoded_words = []
    for num in encoded_review:
        decoded_words.append(index_to_word.get(num, "?"))
    return " ".join(decoded_words)
# Simple rule-based entity recognizer
def identify_entities(sentence):
    print("\nNamed Entity Recognition Output:\n")
    for token in sentence.split():
        if token.isupper():
            print(token, "-> Entity")
        elif len(token) >= 8:
            print(token, "-> Entity")
        else:
            print(token, "-> Non-Entity")
# ------MAIN--------
review_id = int(input("Enter IMDb review number (0–24999): "))
# Decode selected review
decoded_review = convert_to_text(train_data[review_id])
# Display partial review
print("\nSelected IMDb Review (partial):\n")
print(decoded_review[:300], "...")
# Perform Named Entity Recognition
identify_entities(decoded_review)


Enter IMDb review number (0–24999): 8

Selected IMDb Review (partial):

<START> just got out and cannot believe what a brilliant documentary this is <UNK> do you <UNK> out of a movie theater in such <UNK> and <UNK> <UNK> movies have become so over <UNK> that the <UNK> of <UNK> something truly special and unique <UNK> happens <UNK> <UNK> did this to me when it first came ...

Named Entity Recognition Output:

<START> -> Entity
just -> Non-Entity
got -> Non-Entity
out -> Non-Entity
and -> Non-Entity
cannot -> Non-Entity
believe -> Non-Entity
what -> Non-Entity
a -> Non-Entity
brilliant -> Entity
documentary -> Entity
this -> Non-Entity
is -> Non-Entity
<UNK> -> Entity
do -> Non-Entity
you -> Non-Entity
<UNK> -> Entity
out -> Non-Entity
of -> Non-Entity
a -> Non-Entity
movie -> Non-Entity
theater -> Non-Entity
in -> Non-Entity
such -> Non-Entity
<UNK> -> Entity
and -> Non-Entity
<UNK> -> Entity
<UNK> -> Entity
movies -> Non-Entity
have -> Non-Entity
become -> Non-Entity
so -> Non-Entity
o

In [None]:
#4b. Text summerization
from tensorflow.keras.datasets import imdb
from collections import Counter
# Load IMDb dataset
(X_train, _), _ = imdb.load_data(num_words=1000)
# Get word index
word_index = imdb.get_word_index()
# Reverse mapping with offset handling
reverse_word_index = {value + 3: key for key, value in word_index.items()}
reverse_word_index[0] = "<PAD>"
reverse_word_index[1] = "<START>"
reverse_word_index[2] = "<UNK>"
# Decode review function
def decode_review(review):
    return " ".join([reverse_word_index.get(i, "?") for i in review])
# -------- USER INPUT --------
review_no = int(input("Enter IMDb review number (0–24999): "))
review_text = decode_review(X_train[review_no])
words = review_text.split()
# Frequency-based extractive summarization
freq = Counter(words)
summary = " ".join([word for word, _ in freq.most_common(15)])
# Output
print("\nSelected IMDb Review (partial):\n")
print(review_text[:300], "...")
print("\nSummary:\n")
print(summary)

Enter IMDb review number (0–24999): 18

Selected IMDb Review (partial):

<START> i have only had the <UNK> of seeing this movie once when i was rather young so much of the movie is <UNK> in trying to remember it however i can say it was not as funny as a movie called killer <UNK> should have been and the most memorable things from this movie are the song and the scene wi ...

Summary:

<UNK> a the movie and to of this as i have was is however not


In [None]:
#4c. Question Answering
import tensorflow as tf
from tensorflow.keras.layers import Embedding, GlobalAveragePooling1D, Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
vocab_size = 1000
max_len = 100
# Load IMDb data
(X_train, y_train), _ = imdb.load_data(num_words=vocab_size)
X_train = pad_sequences(X_train, maxlen=max_len)
# Build model
model = Sequential([
    Embedding(vocab_size, 64, input_length=max_len),
    GlobalAveragePooling1D(),
    Dense(32, activation="relu"),
    Dense(1, activation="sigmoid")
])
model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)
model.fit(X_train, y_train, epochs=2, batch_size=128, validation_split=0.2)
# ---- USER INPUT FIX ----
word_index = imdb.get_word_index()
def encode_review(text):
    encoded = []
    for word in text.lower().split():
        idx = word_index.get(word)
        if idx is not None:
            idx = idx + 3   # IMPORTANT FIX
            if idx < vocab_size:
                encoded.append(idx)
        else:
            encoded.append(2)  # <UNK>
    return pad_sequences([encoded], maxlen=max_len)
# User input
user_review = input("\nEnter a movie review: ")
encoded_review = encode_review(user_review)
prediction = model.predict(encoded_review)[0][0]
print("\nQuestion: Is the review positive?")
if prediction >= 0.6:
    print("Answer: Yes (Positive)")
else:
    print("Answer: No (Negative)")

Epoch 1/2
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.6200 - loss: 0.6676 - val_accuracy: 0.7826 - val_loss: 0.5001
Epoch 2/2
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8000 - loss: 0.4585 - val_accuracy: 0.8078 - val_loss: 0.4102

Enter a movie review: movie is very good
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step

Question: Is the review positive?
Answer: Yes (Positive)
