<a href="https://colab.research.google.com/github/samruddhibisen03/IIIT-HYD-Project-Code-Crew-/blob/main/EMNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Name:Muskan Rahangdale**

**NLP:NATURAL LANGUAGE PROCESSING**

Natural Language Processing (NLP) is an area of AI that deals with how computers and human language interact. It aims to give computers the ability to grasp, make sense of, create, and answer human language in ways that matter and help people.



**1.Installing necessary libraries**

PyTorch:PyTorch is an open-source tool for machine learning. It focuses on tensor math and automatic differentiation. People use it for jobs like computer vision and NLP.

Transformer Library:Hugging Face's Transformers library gives you ready-to-use models for NLP work. This includes sorting text and spotting names in it. It makes it simpler to apply cutting-edge models such as BERT and GPT.

In [None]:
!pip install torch transformers




**2.Loading Pre-trained model and tokenizer**

Tokenizer: Turns text into token IDs for the model to process. It handles tasks like padding and trimming to get the text ready.

Model: The DistilBERT model tweaked for figuring out sentiment. It spits out logits that turn into class predictions.

In [None]:
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

# Load pre-trained model and tokenizer for sentiment analysis
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained(model_name)

# Set the model to evaluation mode
model.eval()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

**3.Defining the Sentiment Analysis Function**

Tokenization: Changes text into a form the model can understand. The return_tensors="pt" option tells it to output PyTorch tensors.


Inference: The model takes the tokenized input and creates logits. Using torch.no_grad() turns off gradient calculations, which helps save memory and speed things up.


Prediction: torch.argmax finds the index of the highest logit, which shows what class the model thinks is right. For DistilBERT trained on SST-2, class 1 means positive sentiment and class 0 means negative sentiment.

In [None]:
def analyze_text_sentiment(text):
    # Tokenize input text
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

    # Perform inference
    with torch.no_grad():
        outputs = model(**inputs)

    # Get the prediction (logits)
    logits = outputs.logits
    predicted_class_id = torch.argmax(logits, dim=1).item()

    # Map the prediction to sentiment
    sentiment = "positive" if predicted_class_id == 1 else "negative"

    return sentiment



**4.Example Usage:**

Positive Sentiment: The model gives "positive" as output for text that shows good or upbeat feelings.


Negative Sentiment: The model gives "negative" as output for text that shares unhappiness or bad feelings.


In [None]:
# Example usage
text = "I like java language"
sentiment = analyze_text_sentiment(text)
print(f"Text: {text}\nSentiment: {sentiment}")
# Example of a text with negative sentiment
text = "I am extremely disappointed with the service. It was terrible and frustrating."
sentiment = analyze_text_sentiment(text)
print(f"Text: {text}\nSentiment: {sentiment}")


Text: I like java language
Sentiment: positive
Text: I am extremely disappointed with the service. It was terrible and frustrating.
Sentiment: negative


**Conclusion:**

This code shows how to use PyTorch and the Transformers library to analyze sentiment. By using the pre-trained DistilBERT model fine-tuned to classify sentiment, we can figure out if a text is positive or negative. This method uses powerful NLP tools to make it easier and faster to get sentiment insights from text data.