<a href="https://colab.research.google.com/github/shanaka-desoysa/notes/blob/main/docs/Python/Deep_Learning/Sentiment_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis

Sentiment analysis is a natural language processing technique used to determine whether data is positive, negative or neutral. Sentiment analysis is often performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand customer needs.

## BERT Model
- Split Data: Split your dataset into training and validation sets.
- Training: Train the model on the training set.
- Evaluation: Use the validation set to evaluate the model.
Here’s how you can do it with the BERT model:

```python
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Split the data
train_texts, val_texts, train_labels, val_labels = train_test_split(
    [d['text'] for d in data], [0, 1, 2, 3], test_size=0.2, random_state=42
)

# Tokenize the data
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)

# Convert to torch tensors
train_dataset = torch.utils.data.Dataset(train_encodings, train_labels)
val_dataset = torch.utils.data.Dataset(val_encodings, val_labels)

# Train the model
trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=val_dataset)
trainer.train()

# Evaluate the model
predictions = trainer.predict(val_dataset)
pred_labels = predictions.argmax(axis=1)

# Print classification report
print(classification_report(val_labels, pred_labels, target_names=['positive', 'negative', 'neutral', 'ambiguous']))

```

## LSTM Model
- Split Data: Split your dataset into training and validation sets.
- Training: Train the model on the training set.
- Evaluation: Use the validation set to evaluate the model.
Here’s how you can do it with the LSTM model:

```python
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Split the data
train_texts, val_texts, train_labels, val_labels = train_test_split(
    [d['text'] for d in data], [0, 1, 2, 3], test_size=0.2, random_state=42
)

# Tokenize and pad the data
train_sequences = tokenizer.texts_to_sequences(train_texts)
val_sequences = tokenizer.texts_to_sequences(val_texts)
train_padded = pad_sequences(train_sequences, maxlen=50)
val_padded = pad_sequences(val_sequences, maxlen=50)

# Train the model
model.fit(train_padded, train_labels, epochs=5, batch_size=2, validation_data=(val_padded, val_labels))

# Evaluate the model
pred_labels = model.predict(val_padded).argmax(axis=1)

# Print classification report
print(classification_report(val_labels, pred_labels, target_names=['positive', 'negative', 'neutral', 'ambiguous']))

```

## RNN Model
- Split Data: Split your dataset into training and validation sets.
- Training: Train the model on the training set.
- Evaluation: Use the validation set to evaluate the model.
Here’s how you can do it with the RNN model:

```python
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Split the data
train_texts, val_texts, train_labels, val_labels = train_test_split(
    [d['text'] for d in data], [0, 1, 2, 3], test_size=0.2, random_state=42
)

# Tokenize and pad the data
train_sequences = tokenizer.texts_to_sequences(train_texts)
val_sequences = tokenizer.texts_to_sequences(val_texts)
train_padded = pad_sequences(train_sequences, maxlen=50)
val_padded = pad_sequences(val_sequences, maxlen=50)

# Train the model
model.fit(train_padded, train_labels, epochs=5, batch_size=2, validation_data=(val_padded, val_labels))

# Evaluate the model
pred_labels = model.predict(val_padded).argmax(axis=1)

# Print classification report
print(classification_report(val_labels, pred_labels, target_names=['positive', 'negative', 'neutral', 'ambiguous']))
```