In [None]:
# mount google drive
from google.colab import drive
drive.mount('/content/drive/')

import os
os.chdir('/content/drive/Shareddrives/CS263-Project/')

Mounted at /content/drive/


The purpose of this notebook is to generate tentative groundtruth to be compared through human validation. We are working with the following definitions of greenwashing as defined in previous papers:
1. misleading consumers about firm environmental performance or the environmental benefits of a product or service
2.  high ratio of $$\frac{positive}{negative}$$ sentiment as a reflection of selectively reporting positive information -> if they have a pretty stable ratio through time, then they are not necessarily greenwashing

We also have the following information based on ClimateBERT's data:
- climate related text
- sentiment (0: negative risk, 1: neutral, 2: positive opportunity)
- task force on climate-related financial disclosures a.k.a. tcfd (0: none/not related, 1: metrics, 2: strategy, 3: risk, 4: governance)
- climate_specificity (0: non-specific, 1: specific)
- commitment (0: no company commitment, 1: positive future company commitment)
- detection (0: not climate-related text, 1: climate-related text)
  - note: all of our data will have this labeled as 1


## Attempt 1: Naive Implementation of Linear Combinations

On first look, it seems like the relevant associations are:
- **sentiment**: the higher the label the better it seems for the company. Note, however, that even the positive label "opportunity" doesn't provide any assurances on the company's actual implementation
  - NOTE: initially, we were planning on using ClimateBERT's definition of sentiment (see above). However, we decided against it because it measures sentiment differently than what we want.
  - Instead, we will be using binary sentiment as generated from the following model, where 0 is negative and 1 is positive: https://huggingface.co/siebert/sentiment-roberta-large-english
- **TCFD** aims to target credibility and signaling -> meant to help companies provide an adequate climate-related disclosure. They find in https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4000708 that "supporting the TCFD is significantly associated with increased cheap talk cmopared to the baseline of not supporting the TCFD. This result suggests that signaling by supporting the TCFD is not associated with increased tangible and specific climate-related disclosures on commitments and actions" (p.24).
  - From these results, although we have this tag, it might be more beneficial to have this as a binary measure: didthe text get tagged for disclosing something under the TCFD guidelines? If so, it might negatively impact the company (because it's associated with cheap talk)
  - NOTE: we can and should probably change this metric
- **climate_specificity**: a non-zero value is good for the company!
- **commitment**: a non-zero value is also good for the company here!

Therefore, the current combination for deciding whether or not something is greenwashing is through the following linear combination:

$$x = (1 - sentiment) + specificity + commitment + deflection$$

$$ z = \begin{cases}
  0  & x > 2 \\
  1 & x \leq 2
\end{cases}$$

Here, 0 stands for no perceived risk of greenwashing, where 1 is perceived risk of greenwashing.

We realize that this may not be the most accurate or nuanced combination of these factors, but we will leave this for future work and use this simple model for this paper.

## Questions / Comments for the Meeting
1. Should one of these be considered more heavily than the others?
2. Is it right to exclude TCFD tag?

### Part 1: Extract ClimateBERT Data

In [None]:
!ls data

 no-deflection-updated-test-annotated.csv
 no-deflection-updated-test-annotated.gsheet
 no-deflection-updated-train-annotated.csv
 no-deflection-updated-train-annotated.gsheet
 no-deflection-updated-val-annotated.csv
 reports
 test-annotated.csv
 test.csv
 test_deflection.csv
 test.gsheet
 train-annotated.csv
 train-annotated.gsheet
 train.csv
 train_deflection.csv
 train.gsheet
 updated-test-annotated.csv
 updated-test-annotated.gsheet
'updated-train-annotated (1).gsheet'
 updated-train-annotated.csv
 updated-train-annotated.gsheet
 updated-val-annotated.csv
 updated-val-annotated.gsheet
'val-annotated (1).gsheet'
 val-annotated.csv
 val-annotated.gsheet
 validation.csv
 validation_deflection.csv


In [None]:
## read a csv file and return a dictionary form
import csv

def read_csv_file(filename):
  data_dict = {}

  with open(filename) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader: # row is a list at this point
      if line_count == 0:
        print(f'Column names are {", ".join(row)}')
        line_count += 1
      else:
        data_id = int(row[0])
        text = row[1]
        sentiment = int(row[2])
        tcfd = int(row[3])
        specificity = int(row[4])
        commitment = int(row[5])
        detection = int(row[6])
        data_dict[data_id] = {
            "text": text,
            "sentiment": sentiment,
            "tcfd": tcfd,
            "specificity": specificity,
            "commitment": commitment,
            "detection": detection
        }

    return data_dict

In [None]:
train_dict = read_csv_file("data/train.csv")
val_dict = read_csv_file("data/validation.csv")
test_dict = read_csv_file("data/test.csv")

Column names are , text, sentiment, tcfd, climate_specificity, commitment, detection
Column names are , text, sentiment, tcfd, climate_specificity, commitment, detection
Column names are , text, sentiment, tcfd, climate_specificity, commitment, detection


### Part 2: Generate New Sentiment
This is where we'll run the text through the sentiment model to receive a binary label separate from ClimateBERT. The code in this section is adapted from: https://colab.research.google.com/github/chrsiebert/sentiment-roberta-large-english/blob/main/sentiment_roberta_prediction_example.ipynb#scrollTo=0wC0q6Bxp3or.

In [None]:
!pip install transformers
!pip install --upgrade accelerate

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m98.0 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m104.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.15.1 tokenizers-0.13.3 transformers-4.29.2
Looking in i

In [None]:
import torch
import pandas as pd
import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer

# Create class for data preparation
class SimpleDataset:
    def __init__(self, tokenized_texts):
        self.tokenized_texts = tokenized_texts

    def __len__(self):
        return len(self.tokenized_texts["input_ids"])

    def __getitem__(self, idx):
        return {k: v[idx] for k, v in self.tokenized_texts.items()}

In [None]:
# Load tokenizer and model, create trainer
model_name = "siebert/sentiment-roberta-large-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
trainer = Trainer(model=model)

Downloading (…)okenizer_config.json:   0%|          | 0.00/256 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/687 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

In [None]:
# Create list of texts (can be imported from .csv, .xls etc.)
def get_text(data_dict):
  pred_texts = []
  data_order = []
  for id, vals in data_dict.items():
    data_order.append(id)
    pred_texts.append(data_dict[id]['text'])

  zipped = list(zip(data_order, pred_texts))
  sorted_zipped = sorted(zipped, key = lambda x: x[0])
  res = [[i for i, j in sorted_zipped],
       [j for i, j in sorted_zipped]]
  return res[0], res[1]

In [None]:
data_order_train, train_text = get_text(train_dict)
data_order_val, val_text = get_text(val_dict)
data_order_test, test_text = get_text(test_dict)

assert data_order_train == sorted(data_order_train)
assert data_order_val == sorted(data_order_val)
assert data_order_test == sorted(data_order_test)

In [None]:
tokenized_train = tokenizer(train_text,truncation=True,padding=True)
train_dataset = SimpleDataset(tokenized_train)

tokenized_val = tokenizer(val_text,truncation=True,padding=True)
val_dataset = SimpleDataset(tokenized_val)

tokenized_test = tokenizer(test_text,truncation=True,padding=True)
test_dataset = SimpleDataset(tokenized_test)

In [None]:
train_predictions = trainer.predict(train_dataset)
val_predictions = trainer.predict(val_dataset)
test_predictions = trainer.predict(test_dataset)

In [None]:
train_preds = train_predictions.predictions.argmax(-1)
train_labels = pd.Series(train_preds).map(model.config.id2label)
train_scores = (np.exp(train_predictions[0])/np.exp(train_predictions[0]).sum(-1,keepdims=True)).max(1)

val_preds = val_predictions.predictions.argmax(-1)
val_labels = pd.Series(val_preds).map(model.config.id2label)
val_scores = (np.exp(val_predictions[0])/np.exp(val_predictions[0]).sum(-1,keepdims=True)).max(1)

test_preds = test_predictions.predictions.argmax(-1)
test_labels = pd.Series(test_preds).map(model.config.id2label)
test_scores = (np.exp(test_predictions[0])/np.exp(test_predictions[0]).sum(-1,keepdims=True)).max(1)

In [None]:
# map the labels to binary
def binarize_labels(labels_list):
  labels = []
  for label in labels_list:
    if label == "NEGATIVE":
      labels.append(0)
    elif label == "POSITIVE":
      labels.append(1)
    else:
      print("ERROR")
  return labels

In [None]:
# Create DataFrame with texts, predictions, labels, and scores
#df = pd.DataFrame(list(zip(tra_texts,preds,labels,scores)), columns=['text','pred','label','score'])
#df.head()
train_labels = binarize_labels(train_labels)
val_labels = binarize_labels(val_labels)
test_labels = binarize_labels(test_labels)


print(train_labels)
print(val_labels)
print(test_labels)

[0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

In [None]:
# add the sentiment to the data dictionary
def add_sentiment_to_dict(id_list, label_list, data_dict):
  print(id_list)
  assert id_list == sorted(id_list)
  for index, id in enumerate(id_list):
    data_dict[id]["binary_sentiment"] = label_list[index]
  return data_dict

In [None]:
updated_train_dict = add_sentiment_to_dict(data_order_train, train_labels, train_dict)
updated_val_dict = add_sentiment_to_dict(data_order_val, val_labels, val_dict)
updated_test_dict = add_sentiment_to_dict(data_order_test, test_labels, test_dict)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,

### Part 3: Annotate for Greenwashing
Recall our current naive rule:
$$x = (2 - sentiment) + specificity + commitment$$

$$ z = \begin{cases}
  0  & x > 2 \\
  1 & x \leq 2
\end{cases}$$

In [None]:
# given the dictionary, we're going to implement the naive annotation and add it to each data examples
def annotate_groundtruth(data_dict):
  arr = []
  scores = []
  for data_id, val_dict in data_dict.items():
    sentiment = data_dict[data_id]["binary_sentiment"]
    specificity = data_dict[data_id]["specificity"]
    commitment = data_dict[data_id]["commitment"]

    x = (2 - sentiment) + specificity + commitment
    score = None

    if (x > 2):
      score = 0
    else:
      score = 1

    arr.append(x)
    scores.append(score)

    data_dict[data_id]["greenwashing_risk"] = score

  print('arr', arr)
  print('score', scores)
  return data_dict

In [None]:
train_dict_truth = annotate_groundtruth(train_dict)
val_dict_truth = annotate_groundtruth(val_dict)
test_dict_truth = annotate_groundtruth(test_dict)

arr [3, 1, 1, 2, 2, 1, 2, 2, 1, 1, 3, 2, 3, 2, 3, 2, 2, 1, 2, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 2, 1, 2, 3, 2, 2, 2, 2, 1, 2, 4, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 3, 1, 2, 2, 3, 2, 1, 3, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 1, 2, 3, 3, 3, 2, 3, 1, 1, 2, 1, 1, 2, 1, 2, 1, 2, 2, 1, 3, 2, 1, 2, 1, 1, 1, 1, 1, 1, 4, 2, 1, 2, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 3, 3, 3, 3, 2, 2, 3, 2, 2, 1, 1, 1, 2, 1, 2, 3, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 4, 1, 2, 1, 1, 2, 1, 4, 1, 2, 1, 3, 1, 2, 2, 2, 2, 1, 3, 1, 2, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 3, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 4, 2, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 3, 2, 1, 2, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,

### Part 4: Save New Data

In [None]:
# given the dictionary and the new filename, write the information appropriately
def write_annotation(data_dict, filename):
 with open(filename, mode='w') as data_file:
  csv_writer = csv.writer(data_file, delimiter=',')
  csv_writer.writerow(['number', 'text', 'sentiment', 'binary_sentiment', 'tcfd', 'climate_specificity', 'commitment', 'detection', 'greenwashing_risk'])

  for id, vals in data_dict.items():
    csv_writer.writerow([str(id), data_dict[id]['text'], str(data_dict[id]['sentiment']), str(data_dict[id]['binary_sentiment']), str(data_dict[id]['tcfd']), str(data_dict[id]['specificity']), str(data_dict[id]['commitment']), str(data_dict[id]['detection']), str(data_dict[id]['greenwashing_risk'])])

In [None]:
write_annotation(train_dict, "data/train-annotated.csv")
write_annotation(val_dict, "data/val-annotated.csv")
write_annotation(test_dict, "data/test-annotated.csv")

## Examples with Labels
Below are 5 examples from the training set with their original ClimateBERT labels, along with our tentasking greenwashing risk label.

The next 5 examples are from a combination of validation and test.

### Example 1
The Group is not aware of any noise pollution that could negatively impact the environment, nor is it aware of any impact on biodiversity. With regards to land use, the Group is only a commercial user, and the Group is not aware of any local constraints with regards to water supply. The Group does not believe that it is at risk with regards to climate change in the near-or mid-term.
- binary_sentiment: 1, climate_specificity: 0, commitment: 0
- greenwashing_risk: 1
- **correct?** No
- notes: "not aware" instead of "no", but seems suspicious, awareness gives leeway

### Example 2
The Fund updated the guidelines for its $1.5 billion Sustainable Investment Program (SIP), defining sustainable investing for the Fund and enumerating criteria, including best-in-class managers and strategies that identify macro trends or themes, such as Climate and Environment, Human Rights & Social Inclusion and Economic Development. All SIP investments will be held to the same investment criteria as all of the Fund’s other investments.
- binary_sentiment: 1, climate_specificity: 1, commitment: 1
- greenwashing_risk: 0
- **correct?** Yes

### Example 3
The Group faces many other risks which, although important and subject to regular review, have been assessed as less significant and are not listed here. These include, for example, natural catastrophe and business interruption risks and certain financial risks. A summary of financial risks and their management is provided on page 25.
- binary_sentiment: 1, climate_specificity: 0, commitment: 0
- greenwashing_risk: 1
- **correct?** Yes
- notes: "obfuscation" greenwashing

### Example 4
Investors are seeking a better understanding of how climate change may impact the company’s business over the short, medium and long term. They also want to know about the company’s planned response, including how it may need to change its strategy. However, according to EY’s July 2020 report ‘How will ESG performance shape your future?’, based on a global institutional investor survey, companies are failing to meet investors’ expectations on environmental, social and governance factors when compared with 2018.
- binary_sentiment: 0, climate_specificity: 0, commitment: 0
- greenwashing_risk: 1
- **correct?** N/A
- notes: seems like it's coming from a survey from companies

### Example 5
The cement industry is associated with high CO2 intensity and LafargeHolcim is exposed to a variety of regulatory frameworks to reduce emissions, some of which may be under revision. These frameworks can affect the business activities of LafargeHolcim. In addition, a perception of the sector as a high emitter could impact our reputation, thus reducing our attractiveness to investors, employees and potential employees.
- binary_sentiment: 1, climate_specificity: 0, commitment: 0
- greenwashing_risk: 1
- **correct?** No
- notes: admitting bad performance

### Example 6
New products include our \$1 per day electric vehicle charging offer, which will be available in November 2016, and the AGL Future Forests program which enables residential customers to offset carbon emissions based on their electricity consumption for $1 per week. This funds the purchase of native Australian forestry carbon credits to offset those emissions and also supports biodiversity conservation and the planting of Australian native trees.
- binary_sentiment: 1, climate_specificity: 1, commitment: 1
- greenwashing_risk: 0
- **correct?** N/A
- need more context, inconclusive, climate specificity is not really there

### Example 7
In October 2020, we approved our sustainable finance model, which provides for parameters and management for raising funds for projects classified as sustainable in the global market. This type of funding can be used to finance projects able to offer financial, environmental, social and governance (ESG) benefits.
- binary_sentiment: 1, climate_specificity: 1, commitment: 0
- greenwashing_risk: 1
- **correct?** Yes

### Example 8
As described in the section ‘Working conditions in our supply chain’ in the Report, 921 social compliance and environmental audits at suppliers were performed by inhouse technical staff as well as external third- party monitors commissioned by adidas business entities and licensees. The reasonableness and accuracy of the conclusions from the performed audit work were not part of our limited assurance engagement.
- binary_sentiment: 0, climate_specificity: 1, commitment: 0
- greenwashing_risk: 0
- **correct?** Yes
- note: looks like an audit statement from external auditor at the end of the report

### Example 9
We disseminate the importance of incorporating the sustainability principles in the planning and execution of actions to the entire value chain. Aiming to generate value for stakeholders and minimize any negative impacts, we have leaders committed to the challenge of reconciling business competitiveness with the construction of a more just and inclusive society.
- binary_sentiment: 1, climate_specificity: 0, commitment: 0
- greenwashing_risk: 1
- **correct?** Yes

### Example 10
Our goal for Mercedes-Benz Cars & Vans is to make our entire new car fleet CO2-neutral by 2039. We plan to achieve this goal using a holistic approach that includes ambitious targets for all stages of automotive value creation — from the supply chain to production, the vehicle use phase, and vehicle disposal and recycling. We plan to offer our customers several electric variants in all Mercedes-Benz car segments (from the smart to large SUVs) by 2022 and to have plug-in hybrids or all-electric vehicles account for more than 50% of our car sales by 2030. By 2030, we also plan to reduce the green- house gas emissions of the new vehicle fleet during the vehicle use phase (“well-to-wheel”) by more than 40% as compared to 2018. This target has been confirmed by the Science Based Targets Initiative (SBTI).
- binary_sentiment: 1, climate_specificity: 1, commitment: 1
- greenwashing_risk: 0
- **correct?** Yes

## Next Steps
1. Fix greenwashing risk labels
  - Given the validated labels, we will create a set of linear equations to solve
  - Ex: We want $$y = a*(sentiment) + b*(commitment) + c*(specificity) + d,$$ where $sigmoid(x)$ will map to the binary greenwashing label we want.
2. Fine-tune ClimateBERT for the downstream task of detecting greenwashing risk