# Opinion Mining

Sentiment analysis, often referred to as opinion mining, is a common task in natural language processing (NLP) that involves classifying sentiments expressed in text, typically into categories like positive, negative, and neutral. For this example, we'll focus on review analysis using a dataset of Amazon or Netflix reviews.

We'll use a popular dataset for sentiment analysis: the IMDb movie reviews dataset, available through TensorFlow datasets. This dataset is similar in structure to Amazon and Netflix reviews and serves as a good example for sentiment analysis.


However, due to data availability constraints in this environment, I'll guide you through a hypothetical example using a similar approach, which you can adapt to Amazon or Netflix reviews.


**Adaptation for Amazon or Netflix Reviews**

To adapt this code for Amazon or Netflix reviews:

- Data Acquisition: Obtain a dataset of Amazon or Netflix reviews. These datasets can often be found on websites like Kaggle or through APIs.
- Preprocessing: You may need to preprocess the reviews (tokenization, padding) similarly to the IMDb dataset. Be mindful of the dataset's structure and content.
- Model Tuning: Depending on the dataset's complexity and nuances, you might need to adjust the model architecture, hyperparameters, or even consider more advanced models like Transformers.




In [1]:
# Setup and Data Loading
# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from tensorflow.keras.datasets import imdb

# Set parameters for the dataset and model
max_features = 5000  # number of words to consider as features
maxlen = 100  # cut reviews after this number of words
embedding_size = 64

# Load the IMDb dataset
print("Loading data...")
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences for consistency
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test = pad_sequences(x_test, maxlen=maxlen)

print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)



Loading data...
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
x_train shape: (25000, 100)
x_test shape: (25000, 100)


In [2]:
# Model Building

# Build LSTM model
model = Sequential()
model.add(Embedding(max_features, embedding_size, input_length=maxlen))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])


In [3]:
# Training the Model
# Train the model
batch_size = 32
model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_split=0.2)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7f2795d3b850>

In [4]:
# Evaluation
# Evaluate the model performance
score, acc = model.evaluate(x_test, y_test, batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)


Test score: 0.4344063699245453
Test accuracy: 0.8358799815177917


The same approach can be extended and customized for specific datasets like Amazon or Netflix reviews.