# Inference of ESG Multiclass Classification using Fine-tuned RoBERTa

In this notebook, we perform the inference stage for the previously fine-tuned RoBERTa model on ESG news classification into three categories: Environmental, Social, and Governance.

The main objective is to apply the trained multiclass classification model to a large dataset of ESG-related news, in order to automatically predict the ESG category associated with each news article. This process will allow us to structure and label a large amount of textual information for subsequent analysis.

The inference pipeline includes:
- Loading the fine-tuned RoBERTa model with previously saved weights.
- Preprocessing the news dataset (title + content) for tokenization.
- Applying the trained model to predict the ESG category for each sample.
- Storing the resulting predictions for later usage in ESG scoring models and further investment algorithms.

This inference step is critical to transform raw news data into structured ESG signals that can be exploited in decision-making systems, particularly in the context of ESG-focused investment strategies.

In [None]:
import torch
import torch.nn as nn
from transformers import RobertaModel
import numpy as np
from tqdm import tqdm
from transformers import RobertaTokenizer
from torch.utils.data import Dataset, DataLoader

In [None]:
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
path = '/content/drive/MyDrive/MIAX/TRABAJO FINAL DE MASTER DEFINITIVO/total_news_esg_filtered.csv'
df_trainning = pd.read_csv(path)
df_trainning

  df_trainning = pd.read_csv(path)


Unnamed: 0,id,Publication Date,title,content,source,sentiment_body,sentiment_body_score,company,sector,url,ticker,description,text,esg_pred
0,1.0,2024-12-09 08:41:30+00:00,Call of Duty: Black Ops 6 Review,The last year or so wasn’t the kindest to Call...,Eggplante,positive,0.98,activision,Communication,,ATVI,,Call of Duty: Black Ops 6 Review. The last yea...,0
1,2.0,2024-07-23 17:13:37+00:00,Xbox Celebrates the Release of Modern Warfare ...,"Recently, many rumors were running on the inte...",Cinelinx,positive,0.90,activision,Communication,,ATVI,,Xbox Celebrates the Release of Modern Warfare ...,0
2,3.0,2024-07-18 14:19:34+00:00,How ToxMod's AI impacted toxicity in Call of D...,It's no secret Call of Duty has toxic players....,DNyuz,neutral,0.48,activision,Communication,,ATVI,,How ToxMod's AI impacted toxicity in Call of D...,1
3,4.0,2024-07-18 14:04:50+00:00,How ToxMod's AI impacted toxicity in Call of D...,It's no secret Call of Duty has toxic players....,Allusanewshub,neutral,0.54,activision,Communication,,ATVI,,How ToxMod's AI impacted toxicity in Call of D...,1
4,5.0,2024-05-27 07:03:20+00:00,"Meta, Activision Sued by Families of Uvalde Sc...",In the wake of the tragic shooting at Robb Ele...,Tech Times,neutral,0.92,activision,Communication,,ATVI,,"Meta, Activision Sued by Families of Uvalde Sc...",1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1070256,,2025-03-21 16:21:09,Nike shares tumble as tariff concerns shake in...,"Shares in NIKE, Inc. (NYSE: NKE) are falling i...",Fast Company,,,nike,,https://www.fastcompany.com/91303743/nike-shar...,NKE,,Nike shares tumble as tariff concerns shake in...,1
1070257,,2025-03-21 20:20:23,Making Sense of Early Q1 Earnings Reports,We have been seeing some of the early Q1 resul...,Zacks Investment Research,,,nike,,https://www.zacks.com/commentary/2433762/makin...,NKE,,Making Sense of Early Q1 Earnings Reports. We ...,0
1070258,,2025-03-22 07:30:00,"Why Is Nike Stock Falling, and Should Investor...",Nike's (NKE -5.37%) sales are declining in eve...,The Motley Fool,,,nike,,https://www.fool.com/investing/2025/03/22/why-...,NKE,,"Why Is Nike Stock Falling, and Should Investor...",0
1070259,,2025-03-22 11:00:00,Will Nike Investors' Frustrations End Anytime ...,"If you're a Nike (NKE -5.37%) investor, it wou...",The Motley Fool,,,nike,,https://www.fool.com/investing/2025/03/22/will...,NKE,,Will Nike Investors' Frustrations End Anytime ...,0


In [None]:
df_trainning = df_trainning[df_trainning['esg_pred'] == 1].copy()
df_trainning

Unnamed: 0,id,Publication Date,title,content,source,sentiment_body,sentiment_body_score,company,sector,url,ticker,description,text,esg_pred
2,3.0,2024-07-18 14:19:34+00:00,How ToxMod's AI impacted toxicity in Call of D...,It's no secret Call of Duty has toxic players....,DNyuz,neutral,0.48,activision,Communication,,ATVI,,How ToxMod's AI impacted toxicity in Call of D...,1
3,4.0,2024-07-18 14:04:50+00:00,How ToxMod's AI impacted toxicity in Call of D...,It's no secret Call of Duty has toxic players....,Allusanewshub,neutral,0.54,activision,Communication,,ATVI,,How ToxMod's AI impacted toxicity in Call of D...,1
4,5.0,2024-05-27 07:03:20+00:00,"Meta, Activision Sued by Families of Uvalde Sc...",In the wake of the tragic shooting at Robb Ele...,Tech Times,neutral,0.92,activision,Communication,,ATVI,,"Meta, Activision Sued by Families of Uvalde Sc...",1
7,8.0,2024-04-02 15:34:29+00:00,Federal racial discrimination lawsuit against ...,"'In Tesla's narrative, the agencies learned no...",Head Topics,negative,0.41,activision,Communication,,ATVI,,Federal racial discrimination lawsuit against ...,1
10,11.0,2024-01-24 03:28:56+00:00,Activision Will Pay $50 Million to Settle Work...,Activision Blizzard will pay roughly $50 milli...,Mytimesnow,positive,0.64,activision,Communication,,ATVI,,Activision Will Pay $50 Million to Settle Work...,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1070247,,2025-03-21 10:50:51,Nike is starting to show green-shoots with inn...,"Andrea Andreeva, Piper Sandler analyst, joins ...",CNBC Television,,,nike,,https://www.youtube.com/watch?v=yrwUTpiLvsI,NKE,,Nike is starting to show green-shoots with inn...,1
1070249,,2025-03-21 11:19:37,"Nike Stock Plummets on Dismal Forecast, Bear N...",Brushing off a fiscal third-quarter earnings a...,Schaeffers Research,,,nike,,https://www.schaeffersresearch.com/content/new...,NKE,,"Nike Stock Plummets on Dismal Forecast, Bear N...",1
1070250,,2025-03-21 11:20:15,Nike Turnaround Hits Snags Amid Inventory Rese...,Nike's turnaround effort is facing challenges ...,Bloomberg Markets and Finance,,,nike,,https://www.youtube.com/watch?v=T4MRIaMz2ks,NKE,,Nike Turnaround Hits Snags Amid Inventory Rese...,1
1070253,,2025-03-21 12:31:54,Why Nike Stock Got Tripped Up Today,Shares of Nike (NKE -5.16%) fell Friday mornin...,The Motley Fool,,,nike,,https://www.fool.com/investing/2025/03/21/why-...,NKE,,Why Nike Stock Got Tripped Up Today. Shares of...,1


In [None]:
class RoBERTaClass(nn.Module):
    def __init__(self):
        super(RoBERTaClass, self).__init__()
        self.roberta = RobertaModel.from_pretrained('roberta-base', return_dict=True)
        self.dropout = nn.Dropout(0.3)
        self.out = nn.Linear(768, 3)

    def forward(self, input_ids, attention_mask):
        output = self.roberta(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = self.dropout(output.pooler_output)
        return self.out(pooled_output)

ruta_pesos = "/content/drive/MyDrive/MIAX/TRABAJO FINAL DE MASTER DEFINITIVO/type_esg_model_weights.pt"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = RoBERTaClass()
model.load_state_dict(torch.load(ruta_pesos, map_location=device))
model.to(device)
model.eval()

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


RoBERTaClass(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): La

In [None]:
df_filtered = df_trainning[['text']]

In [None]:
class InferenceDataset(Dataset):
    def __init__(self, texts, tokenizer, max_len=256):
        self.texts = texts
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts[idx])
        encoded = tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=self.max_len,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt'
        )
        return {
            'input_ids': encoded['input_ids'].squeeze(0),
            'attention_mask': encoded['attention_mask'].squeeze(0)
        }

In [None]:
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
dataset = InferenceDataset(df_filtered['text'].tolist(), tokenizer, max_len=256)
dataloader = DataLoader(dataset, batch_size=64)

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
def infer_model(model, dataloader, device):
    model.eval()
    predictions = []

    with torch.no_grad():
        for batch in tqdm(dataloader, desc="Clasificando noticias"):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)

            outputs = model(input_ids, attention_mask)
            _, preds = torch.max(outputs, dim=1)
            predictions.extend(preds.cpu().numpy())

    return np.array(predictions)

In [None]:
preds = infer_model(model, dataloader, device)

Clasificando noticias: 100%|██████████| 6738/6738 [45:34<00:00,  2.46it/s]


In [None]:
df_filtered['esg_type'] = preds

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered['esg_type'] = preds


In [None]:
df_filtered['esg_type'].value_counts()

Unnamed: 0_level_0,count
esg_type,Unnamed: 1_level_1
2,169286
1,151232
0,110702


In [None]:
df_trainning['esg_type'] = preds

In [None]:
conteo_por_empresa = df_trainning.groupby(['company', 'esg_type']).size().unstack(fill_value=0)
conteo_por_empresa.columns = ['Environmental (0)', 'Social (1)', 'Governance (2)']

In [None]:
nombres_equivalentes = {
    "amex": "american express",
    "conoco philips": "conocophillips",
    "conoco": "conocophillips",
    "cvs health": "cvs",
    "jhonson and jhonson": "johnson & johnson",
    "j&j": "johnson & johnson",
    "jp morgan": "jpmorgan",
    "the walt disney company": "disney",
    "unitedhealth": "united healthcare",
    "valero energy": "valero",
    "goldman": "goldman sachs"
}
df_trainning['company'] = df_trainning['company'].replace(nombres_equivalentes)

In [None]:
conteo_por_empresa = df_trainning.groupby(['company', 'esg_type']).size().unstack(fill_value=0)
conteo_por_empresa.columns = ['Environmental (0)', 'Social (1)', 'Governance (2)']
conteo_por_empresa

Unnamed: 0_level_0,Environmental (0),Social (1),Governance (2)
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
abbvie,50,748,321
activision,19,1130,2027
amazon,14159,20989,13213
american express,316,963,773
apple,9683,19861,25152
chevron,6169,766,2779
comcast,328,1206,1084
conocophillips,6960,261,953
cvs,301,4543,1324
disney,1261,5645,2315


In [None]:
path = '/content/drive/MyDrive/MIAX/TRABAJO FINAL DE MASTER DEFINITIVO/news_second_classified.csv'
df_trainning.to_csv(path, index=False)

In [None]:
df_trainning

Unnamed: 0,id,Publication Date,title,content,source,sentiment_body,sentiment_body_score,company,sector,url,ticker,description,text,esg_pred,esg_type
2,3.0,2024-07-18 14:19:34+00:00,How ToxMod's AI impacted toxicity in Call of D...,It's no secret Call of Duty has toxic players....,DNyuz,neutral,0.48,activision,Communication,,ATVI,,How ToxMod's AI impacted toxicity in Call of D...,1,1
3,4.0,2024-07-18 14:04:50+00:00,How ToxMod's AI impacted toxicity in Call of D...,It's no secret Call of Duty has toxic players....,Allusanewshub,neutral,0.54,activision,Communication,,ATVI,,How ToxMod's AI impacted toxicity in Call of D...,1,1
4,5.0,2024-05-27 07:03:20+00:00,"Meta, Activision Sued by Families of Uvalde Sc...",In the wake of the tragic shooting at Robb Ele...,Tech Times,neutral,0.92,activision,Communication,,ATVI,,"Meta, Activision Sued by Families of Uvalde Sc...",1,1
7,8.0,2024-04-02 15:34:29+00:00,Federal racial discrimination lawsuit against ...,"'In Tesla's narrative, the agencies learned no...",Head Topics,negative,0.41,activision,Communication,,ATVI,,Federal racial discrimination lawsuit against ...,1,1
10,11.0,2024-01-24 03:28:56+00:00,Activision Will Pay $50 Million to Settle Work...,Activision Blizzard will pay roughly $50 milli...,Mytimesnow,positive,0.64,activision,Communication,,ATVI,,Activision Will Pay $50 Million to Settle Work...,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1070247,,2025-03-21 10:50:51,Nike is starting to show green-shoots with inn...,"Andrea Andreeva, Piper Sandler analyst, joins ...",CNBC Television,,,nike,,https://www.youtube.com/watch?v=yrwUTpiLvsI,NKE,,Nike is starting to show green-shoots with inn...,1,2
1070249,,2025-03-21 11:19:37,"Nike Stock Plummets on Dismal Forecast, Bear N...",Brushing off a fiscal third-quarter earnings a...,Schaeffers Research,,,nike,,https://www.schaeffersresearch.com/content/new...,NKE,,"Nike Stock Plummets on Dismal Forecast, Bear N...",1,2
1070250,,2025-03-21 11:20:15,Nike Turnaround Hits Snags Amid Inventory Rese...,Nike's turnaround effort is facing challenges ...,Bloomberg Markets and Finance,,,nike,,https://www.youtube.com/watch?v=T4MRIaMz2ks,NKE,,Nike Turnaround Hits Snags Amid Inventory Rese...,1,2
1070253,,2025-03-21 12:31:54,Why Nike Stock Got Tripped Up Today,Shares of Nike (NKE -5.16%) fell Friday mornin...,The Motley Fool,,,nike,,https://www.fool.com/investing/2025/03/21/why-...,NKE,,Why Nike Stock Got Tripped Up Today. Shares of...,1,2
