# Inference

Here we build the inference on the 4 countries we want to analyze : Germany, Italy, Spain, France. In particular, we will use our fine-tuned models. We select only the comments our models are most confident in, i.e. over 90% for this part, as we want to make sure that we interpret on as less noise as possible, while simultaneously still having a good amount of comments to analyze.

In [1]:
import torch
from transformers import BertTokenizer, BertForSequenceClassification, AutoModelForSequenceClassification, AutoTokenizer, pipeline
import pandas as pd

In [2]:
# Choose the language to do inference on
LANGUAGE = 'german'

In [7]:
if LANGUAGE == 'german':
    model_path = 'sentiment_model_finetuned_german'
    model = AutoModelForSequenceClassification.from_pretrained(model_path)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Load the data for inference
    inference_comments_df = pd.read_csv('Comments DB/german/Inference/german_combined_ready_for_inference.csv')
    inference_comments = inference_comments_df['Comment'].tolist()
    # Make sure the comments are strings
    inference_comments = [str(comment) for comment in inference_comments]

    inputs = tokenizer(inference_comments, return_tensors="pt", padding='max_length', truncation=True, max_length=64)

    # Predict sentiment
    with torch.no_grad():
        outputs = model(**inputs)

    # Apply softmax to get probabilities
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
  
    # Define sentiment classes
    sentiment_classes = ['negative', 'neutral', 'positive']

    # Get predictions for each input text
    predicted_classes = probabilities.argmax(dim=1)

    # Get the highest probability for each input text (the score of the predicted class)
    score = probabilities.max(dim=1).values

    # Create dataframe with comments, the predicted sentiment and the scores
    inference_df = pd.DataFrame({'Comment': inference_comments, 'Sentiment': [sentiment_classes[p] for p in predicted_classes], 'Score': score.tolist()})

    # Only keep scores above 0.90
    inference_df = inference_df[inference_df['Score'] > 0.90]

    # Send to csv
    inference_df.to_csv('Comments DB/german/Inference/results/german_results.csv', index=True)


