<a href="https://colab.research.google.com/github/jcdumlao14/CustomSentimentAnalysis-HuggingFace-/blob/main/%F0%9F%A4%97BERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Custom Sentiment Analysis with Hugging Face 🤗 - BERT**

Custom sentiment analysis with Hugging Face involves using pre-trained transformer models like BERT, RoBERTa, or DistilBERT, fine-tuning them on a custom dataset, and then using the fine-tuned model to predict the sentiment of new text inputs.

Hugging Face provides a wide range of pre-trained transformer models that can be fine-tuned for sentiment analysis tasks. The advantage of using Hugging Face is that it provides a simple and powerful Python library called transformers that makes it easy to fine-tune transformer models on custom datasets.

Using the transformers library, you can load a pre-trained transformer model, customize its configuration, add a classification head for sentiment analysis, and fine-tune the model on your custom dataset. Once the model is fine-tuned, you can use it to predict the sentiment of new text inputs.

Hugging Face also provides a hosted service called Hugging Face Spaces, which allows you to easily deploy your fine-tuned model to the cloud and create an API endpoint for sentiment analysis. This makes it easy to integrate custom sentiment analysis into your applications without having to manage infrastructure or deployment yourself.

# **1. Install the necessary libraries:**

This step involves installing the required Python libraries, which may include Transformers, TensorFlow, and NumPy.

In [12]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [13]:
%%capture
!pip install tensorflow

In [14]:
%%capture
!pip install pandas

# **2. Import the necessary libraries and load the dataset:**

After installing the required libraries, the next step is to import them into the Python script. The necessary libraries include TensorFlow, Hugging Face transformers, NumPy, and Pandas. The dataset is loaded using the Pandas library, which is used to read the CSV file containing the data.


In [47]:
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, TextClassificationPipeline

# Load the dataset
df = pd.read_csv("/content/sms_spam.csv")
df.head()

Unnamed: 0,type,text
0,ham,Hope you are having a good week. Just checking in
1,ham,K..give back my thanks.
2,ham,Am also doing in cbe only. But have to pay.
3,spam,"complimentary 4 STAR Ibiza Holiday or £10,000 ..."
4,spam,okmail: Dear Dave this is your final notice to...


# **3. Preprocess the dataset:**

The dataset is preprocessed to remove any unwanted characters, symbols, or spaces. The text data is also tokenized and converted into numerical vectors that can be fed into the deep learning model.

In [48]:
# Convert ham to 0 and spam to 1
df["type"] = df["type"].map({"ham": 0, "spam": 1})

# Split the dataset into training and testing sets
train_texts, test_texts, train_labels, test_labels = train_test_split(df["text"], df["type"], test_size=0.2, random_state=42)


# **4. Load the pre-trained transformer model:**

A pre-trained transformer model is loaded from the Hugging Face transformers library. The transformer model is responsible for learning the relationships between the input text and their corresponding sentiment labels.

In [49]:
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# **5. Fine-tune the model:**

The pre-trained transformer model is fine-tuned on the training data to adapt it to the specific sentiment analysis task. This involves adjusting the model's weights through multiple epochs of training.

In [50]:
from transformers import TFPreTrainedModel

class MyModel(TFPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.bert = TFBertModel(config)
        self.dropout = tf.keras.layers.Dropout(config.hidden_dropout_prob)
        self.classifier = tf.keras.layers.Dense(config.num_labels, activation=tf.keras.activations.softmax, name="classifier")

    def call(self, inputs, **kwargs):
        outputs = self.bert(inputs, **kwargs)
        pooled_output = outputs[1]
        pooled_output = self.dropout(pooled_output, training=kwargs.get("training", False))
        logits = self.classifier(pooled_output)
        return logits

    def compute_loss(self, labels, logits):
        loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
        loss = loss_fn(labels, logits)
        return loss


In [51]:
# Prepare the input data
train_texts = ["This is the first text", "This is the second text"]
train_labels = [1, 0]
train_encodings = tokenizer(train_texts, padding=True, truncation=True, return_tensors="tf")
test_texts = ["This is the third text", "This is the fourth text"]
test_labels = [0, 1]
test_encodings = tokenizer(test_texts, padding=True, truncation=True, return_tensors="tf")

# Prepare the training and testing datasets
train_dataset = tf.data.Dataset.from_tensor_slices((dict(train_encodings), train_labels)).batch(16)
test_dataset = tf.data.Dataset.from_tensor_slices((dict(test_encodings), test_labels)).batch(16)

# Define the optimizer and the loss function
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Compile the model
model.compile(optimizer=optimizer, loss=loss, metrics=["accuracy"])

# Fine-tune the model
history = model.fit(train_dataset, epochs=3, batch_size=16)



Epoch 1/3




Epoch 2/3
Epoch 3/3


# **6. Evaluate the model:**

After fine-tuning the model, it is evaluated on the validation data to check its performance. The evaluation metrics can include accuracy, precision, recall, and F1-score.

In [52]:
# Evaluate the model
loss, accuracy = model.evaluate(test_dataset, batch_size=16)
print("Test accuracy:", accuracy)


Test accuracy: 1.0


# **7. Predict the sentiment of new text inputs:**

Once the model is trained and evaluated, it can be used to predict the sentiment of new text inputs. The new text input is preprocessed in the same way as the training data, and the pre-trained model is used to predict its sentiment label.

In [53]:
# Predict the sentiment of new text inputs
new_texts = [
    "I really enjoyed the movie",
    "This product is terrible"
]
predictions = pipeline(new_texts)
print("Predictions:", predictions)


Predictions: [{'label': 'LABEL_1', 'score': 0.5843372344970703}, {'label': 'LABEL_1', 'score': 0.5557489395141602}]


This is just a simple example, and there are many ways to fine-tune transformer models with Hugging Face. However, this should give you an idea of how to use Hugging Face to perform custom sentiment analysis.

![image](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS-eV-vDP0_ZcP9GxCEzJFBzAoffWM8zVlwQw&usqp=CAU)