# Session 12 - Measuring environmental impact

In this session, we're going to look at one particular way that we can measure the impact of our code on the world around us. In particular, we're going to be looking at how we can approximate the *environmental impact* of our cultural data science footprint.

To do this, we're going to use the open-source software package *CodeCarbon*. You can find more information at the following links:

- CodeCarbon Website: [https://codecarbon.io/](https://codecarbon.io/)
- GitHub Repo: [https://mlco2.github.io/codecarbon/](https://mlco2.github.io/codecarbon/)
- Documentation: [https://mlco2.github.io/codecarbon/](https://mlco2.github.io/codecarbon/)

We'll do some testing on HuggingFace pipelines.

## Testing HuggingFace pipelines

In [1]:
import os
from codecarbon import EmissionsTracker
from transformers import pipeline
import datasets
import pandas as pd
from tqdm.notebook import tqdm

2024-04-25 10:34:53.554293: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-25 10:34:53.558316: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-25 10:34:53.608511: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


__Text summarization pipeline__

You may remember from a couple of weeks ago that *text summarization* was quite a compute intensive task. So let's see exactly how compute intensive it is.

In [2]:
text = """In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. 
For translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers. 
On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, we achieve a new state of the art. 
In the former task our best model outperforms even all previously reported ensembles."""

In [3]:
summarizer = pipeline(task="summarization", 
                      min_length=10,
                      max_length=30)

No model was supplied, defaulted to google-t5/t5-small and revision d769bba (https://huggingface.co/google-t5/t5-small).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

There are a number of different ways that we can work with CodeCarbon, all of which is clearly explained in the relevant documentation.

We'll go through each of them one at a time here.

## Method 1 - Creating a tracker object

In [4]:
tracker = EmissionsTracker()
tracker.start()
summary = summarizer(text)
tracker.stop()

[codecarbon INFO @ 10:35:19] [setup] RAM Tracking...
[codecarbon INFO @ 10:35:19] [setup] GPU Tracking...
[codecarbon INFO @ 10:35:19] No GPU found.
[codecarbon INFO @ 10:35:19] [setup] CPU Tracking...
[codecarbon INFO @ 10:35:21] CPU Model on constant consumption mode: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
[codecarbon INFO @ 10:35:21] >>> Tracker's metadata:
[codecarbon INFO @ 10:35:21]   Platform system: Linux-5.4.256.el8-x86_64-with-glibc2.35
[codecarbon INFO @ 10:35:21]   Python version: 3.10.12
[codecarbon INFO @ 10:35:21]   CodeCarbon version: 2.3.5
[codecarbon INFO @ 10:35:21]   Available RAM : 376.535 GB
[codecarbon INFO @ 10:35:21]   CPU count: 64
[codecarbon INFO @ 10:35:21]   CPU model: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
[codecarbon INFO @ 10:35:21]   GPU count: None
[codecarbon INFO @ 10:35:21]   GPU model: None
I0000 00:00:1714034125.135901    6829 service.cc:145] XLA service 0x5566625b2980 initialized for platform Host (this does not guarantee that XLA will be us

7.718634290360947e-05

## Method 2 - Context manager

In [5]:
with EmissionsTracker() as tracker:
    summary = summarizer(text)
    print(summary)

[codecarbon INFO @ 10:35:53] [setup] RAM Tracking...
[codecarbon INFO @ 10:35:53] [setup] GPU Tracking...
[codecarbon INFO @ 10:35:53] No GPU found.
[codecarbon INFO @ 10:35:53] [setup] CPU Tracking...
[codecarbon INFO @ 10:35:54] CPU Model on constant consumption mode: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
[codecarbon INFO @ 10:35:54] >>> Tracker's metadata:
[codecarbon INFO @ 10:35:54]   Platform system: Linux-5.4.256.el8-x86_64-with-glibc2.35
[codecarbon INFO @ 10:35:54]   Python version: 3.10.12
[codecarbon INFO @ 10:35:54]   CodeCarbon version: 2.3.5
[codecarbon INFO @ 10:35:54]   Available RAM : 376.535 GB
[codecarbon INFO @ 10:35:54]   CPU count: 64
[codecarbon INFO @ 10:35:54]   CPU model: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
[codecarbon INFO @ 10:35:54]   GPU count: None
[codecarbon INFO @ 10:35:54]   GPU model: None
[codecarbon INFO @ 10:36:05] Energy consumed for RAM : 0.000279 kWh. RAM Power : 141.20075225830078 W
[codecarbon INFO @ 10:36:05] Energy consumed for all 

[{'summary_text': 'the Transformer replaces recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention .'}]


## Method 3 - A Python decoractor



In [8]:
from codecarbon import track_emissions

@track_emissions
def summarization(text):
    summary = summarizer(text)
    print(summary)

In [9]:
summarization(text)

[codecarbon INFO @ 10:46:10] [setup] RAM Tracking...
[codecarbon INFO @ 10:46:10] [setup] GPU Tracking...
[codecarbon INFO @ 10:46:10] No GPU found.
[codecarbon INFO @ 10:46:10] [setup] CPU Tracking...
[codecarbon INFO @ 10:46:11] CPU Model on constant consumption mode: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
[codecarbon INFO @ 10:46:11] >>> Tracker's metadata:
[codecarbon INFO @ 10:46:11]   Platform system: Linux-5.4.256.el8-x86_64-with-glibc2.35
[codecarbon INFO @ 10:46:11]   Python version: 3.10.12
[codecarbon INFO @ 10:46:11]   CodeCarbon version: 2.3.5
[codecarbon INFO @ 10:46:11]   Available RAM : 376.535 GB
[codecarbon INFO @ 10:46:11]   CPU count: 64
[codecarbon INFO @ 10:46:11]   CPU model: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
[codecarbon INFO @ 10:46:11]   GPU count: None
[codecarbon INFO @ 10:46:11]   GPU model: None
[codecarbon INFO @ 10:46:22] 
Graceful stopping: collecting and writing information.
Please wait a few seconds...
[codecarbon INFO @ 10:46:22] Energy consu

[{'summary_text': 'the Transformer replaces recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention .'}]


## A more complex example

We can make the results more useful by changing the tracker parameters - full list can be found here [https://mlco2.github.io/codecarbon/parameters.html](https://mlco2.github.io/codecarbon/parameters.html).

In the example that follows, we're going to download a HuggingFace dataset and a pretrained emotion classification model. 

We also introduce specific *tasks* to more clearly understand the impact of different parts of our code.

In [10]:
outfolder = os.path.join("..", "emissions")
os.mkdir(outfolder)

tracker = EmissionsTracker(project_name="sentiment classification",
                           experiment_id="sentiment_classifier",
                           output_dir=outfolder,
                           output_file="emissions_sentiment.csv")

# tracking data downloading
tracker.start_task("load dataset")
dataset = datasets.load_dataset("imdb", 
                                split="test")
imdb_emissions = tracker.stop_task()

# tracking downloading and initializing model
tracker.start_task("build model")
classifier = pipeline(task="sentiment-analysis", 
                      model="cardiffnlp/twitter-roberta-base-emotion")
model_emissions = tracker.stop_task()

# tracking classification pipeline
tracker.start_task("run classification")
preds = []
for row in tqdm(dataset["text"][:1000]):
    preds.append(classifier(row[:100]))
classifier_emissions = tracker.stop_task()

tracker.stop()

[codecarbon INFO @ 10:51:44] [setup] RAM Tracking...
[codecarbon INFO @ 10:51:44] [setup] GPU Tracking...
[codecarbon INFO @ 10:51:44] No GPU found.
[codecarbon INFO @ 10:51:44] [setup] CPU Tracking...
[codecarbon INFO @ 10:51:46] CPU Model on constant consumption mode: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
[codecarbon INFO @ 10:51:46] >>> Tracker's metadata:
[codecarbon INFO @ 10:51:46]   Platform system: Linux-5.4.256.el8-x86_64-with-glibc2.35
[codecarbon INFO @ 10:51:46]   Python version: 3.10.12
[codecarbon INFO @ 10:51:46]   CodeCarbon version: 2.3.5
[codecarbon INFO @ 10:51:46]   Available RAM : 376.535 GB
[codecarbon INFO @ 10:51:46]   CPU count: 64
[codecarbon INFO @ 10:51:46]   CPU model: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
[codecarbon INFO @ 10:51:46]   GPU count: None
[codecarbon INFO @ 10:51:46]   GPU model: None


Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

[codecarbon INFO @ 10:52:01] Energy consumed for RAM : 0.000470 kWh. RAM Power : 141.20075225830078 W
[codecarbon INFO @ 10:52:01] Energy consumed for all CPUs : 0.000142 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 10:52:01] 0.000612 kWh of electricity used since the beginning.


config.json:   0%|          | 0.00/768 [00:00<?, ?B/s]

tf_model.h5:   0%|          | 0.00/501M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-emotion.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

[codecarbon INFO @ 10:52:10] Energy consumed for RAM : 0.000812 kWh. RAM Power : 141.20075225830078 W
[codecarbon INFO @ 10:52:10] Energy consumed for all CPUs : 0.000244 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 10:52:10] 0.001056 kWh of electricity used since the beginning.


  0%|          | 0/1000 [00:00<?, ?it/s]

[codecarbon INFO @ 10:55:54] Energy consumed for RAM : 0.009614 kWh. RAM Power : 141.20075225830078 W
[codecarbon INFO @ 10:55:54] Energy consumed for all CPUs : 0.002894 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 10:55:54] 0.012508 kWh of electricity used since the beginning.
[codecarbon INFO @ 10:55:54] Energy consumed for RAM : 0.009614 kWh. RAM Power : 141.20075225830078 W
[codecarbon INFO @ 10:55:54] Energy consumed for all CPUs : 0.002894 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 10:55:54] 0.012508 kWh of electricity used since the beginning.
  df = pd.concat([df, pd.DataFrame.from_records([dict(data.values)])])
  df = pd.concat(


0.0022566208286612438

__Inspecting the results__

In [12]:
emissions_df = pd.read_csv()

TypeError: read_csv() missing 1 required positional argument: 'filepath_or_buffer'

In [None]:
emissions_df.columns

## Tasks

- Now that you have the basics down, head over and consider Assignment 5!