# 🌍 CodeCarbon Workshop: Tracking Carbon Emissions in Data Science
## A Hands-On Workshop
![CodeCarbon Logo](https://github.com/mlco2/codecarbon/blob/master/docs/edit/images/banner.png?raw=true)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mlco2/codecarbon/blob/docs/workshop-notebook/examples/notebooks/codecarbon_workshop.ipynb) TODO: Update this link to point to the workshop notebook in master branch

---

### 🎯 **Workshop Objectives**
By the end of this workshop, you will be able to:
- ✅ Understand the environmental impact of computing and data science
- ✅ Install and configure CodeCarbon for tracking carbon emissions
- ✅ Monitor carbon footprint of Python functions and ML models
- ✅ Analyze and visualize emissions data
- ✅ Compare different algorithms' carbon efficiency
- ✅ Implement best practices for sustainable coding


---

**Let's get started! 🚀**

## 📦 Section 1: Install and Import CodeCarbon

CodeCarbon is an open-source Python package that tracks the carbon emissions of your code. It works by monitoring:
- **CPU & GPU & RAM usage** 
- **Energy consumption**
- **Carbon intensity** of your electricity grid
- **Cloud provider** emissions factors

Let's start by installing it in our environment:

In [None]:
# Install CodeCarbon
!pip install codecarbon

# Also install some libraries we'll use for examples
!pip install scikit-learn matplotlib seaborn pandas

In [None]:
# Import the main CodeCarbon components
from codecarbon import EmissionsTracker, track_emissions
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import time
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

print("✅ CodeCarbon and dependencies imported successfully!")
print(f"📊 Pandas version: {pd.__version__}")

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

# Create the output directory for emissions logs
OUTPUT_DIR = "./emissions"
from pathlib import Path
Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)

## ⚙️ Section 2: Basic Carbon Tracking Setup

There are several ways to track emissions with CodeCarbon:

1. **Manual start/stop** - `tracker = EmissionsTracker(); tracker.start(); ...; tracker.stop()`
2. **Context Manager** - `with EmissionsTracker():`
3. **Decorator** - `@track_emissions`
4. **CLI** - `codecarbon monitor`

Let's explore these approaches. We will use a simple CPU-intensive function ([Fibonacci](https://en.wikipedia.org/wiki/Fibonacci_sequence) calculation) to demonstrate.

Imagine we would like to compute the Fibonacci of 35. And we are doubting in between 3 different implementations. Let's compare their carbon emissions. 

In [None]:
# Method 1: Manual start/stop Approach of codecarbon
print("🔍 Using EmissionsTracker with Manual Start/Stop")

def compute_fibonacci(n):
    if n <= 1:
        return n
    else:
        return compute_fibonacci(n-1) + compute_fibonacci(n-2)

# Track emissions manually
tracker = EmissionsTracker(project_name="fibonacci_1", output_dir=OUTPUT_DIR)
tracker.start()
try:
    # Run our computation
    result = compute_fibonacci(35)
    print(f"Fibonacci(35) = {result}")
finally:
    # Stop tracking and get emissions
    emissions = tracker.stop()

print(f"💨 Carbon emissions: {emissions:.6f} kg CO2eq")
print(f"⏱️  This is equivalent to consuming {emissions*1000:.3f} grams of CO2")


In [None]:

# Method 2: Context Manager Approach
print("🔍 Using EmissionsTracker as a Context Manager")
import math

def compute_fibonacci_2(n):
    phi = (1 + math.sqrt(5)) / 2
    return round((phi**n - (-phi)**(-n)) / math.sqrt(5))

with EmissionsTracker(project_name="fibonacci_2", output_dir=OUTPUT_DIR) as tracker:
    # Run our computation
    result = compute_fibonacci_2(35)
    print(f"Fibonacci(35) = {result}")
    

In [None]:
# Method 3: Decorator Approach
print("🔍  Using @track_emissions Decorator")

@track_emissions(project_name="fibonacci_3", output_dir=OUTPUT_DIR)
def compute_fibonacci_3(n):
    if n <= 1:
        return n
    a, b = 0, 1
    for _ in range(2, n+1):
        a, b = b, a + b
    return b

# Run the decorated function
result = compute_fibonacci_3(35)
print(f"Fibonacci(35) = {result}")
print(f"✅ Emissions automatically tracked and saved! See the emissions log in the {OUTPUT_DIR} folder.")

In [None]:
# See the comparison
print("🔍 Comparing all methods' emissions...")

# Load and compare the emissions
df =pd.read_csv(f"{OUTPUT_DIR}/emissions.csv")
df.sort_values(by='timestamp', inplace=True)
df[['project_name', 'duration','emissions']].head(3)


## 🤖 Section 3: Monitor Machine Learning Model Training

This is where CodeCarbon really shines! Let's track the carbon emissions of training different ML models:

In [None]:
# Generate a synthetic dataset for our ML experiments
X, y = make_classification(
    n_samples=10000, 
    n_features=20, 
    n_informative=15, 
    n_redundant=5, 
    n_classes=3, 
    random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"📊 Dataset created: {X_train.shape[0]} training samples, {X_test.shape[0]} test samples")
print(f"📊 Features: {X_train.shape[1]}, Classes: {len(np.unique(y))}")

In [None]:

# Let's track training different model configurations
models_to_test = [
    {"name": "Small RF", "n_estimators": 50, "max_depth": 5},
    {"name": "Medium RF", "n_estimators": 100, "max_depth": 10},
    {"name": "Large RF", "n_estimators": 200, "max_depth": 15}
]

results = []

for model_config in models_to_test:
    print(f"\n🌳 Training {model_config['name']}...")
    
    # Start emissions tracking
    tracker = EmissionsTracker(
        project_name=f"ml_training_{model_config['name'].replace(' ', '_').lower()}",
        output_dir=OUTPUT_DIR
    )
    tracker.start()
    
    # Train the model
    start_time = time.time()
    model = RandomForestClassifier(
        n_estimators=model_config['n_estimators'], # Number of trees in the forest
        max_depth=model_config['max_depth'], # Maximum depth of trees
        random_state=42,
        n_jobs=-1  # Use all available cores
    )
    
    model.fit(X_train, y_train)
    train_score = model.score(X_train, y_train)
    test_score = model.score(X_test, y_test)
    training_time = time.time() - start_time
    
    # Stop tracking
    emissions = tracker.stop()

    if emissions is None:
        emissions = 0.0  # In case emissions could not be measured
    
    # Store results
    results.append({
        'Model': model_config['name'],
        'Train Accuracy': train_score,
        'Test Accuracy': test_score,
        'Training Time (s)': training_time,
        'CO2 Emissions (kg)': emissions,
        'Emissions per Accuracy': emissions / test_score if test_score > 0  else float('inf')
    })
    
    print(f"✅ {model_config['name']}: {test_score:.3f} accuracy, {emissions:.6f} kg CO2eq")

print(f"\n🎯 All models trained! Check the results below:")

Now let's analyze our results and see which models are most carbon-efficient:

In [None]:
# Create a DataFrame with our results
df_results = pd.DataFrame(results)
print("📋 Model Comparison Results:")
print("=" * 50)
print(df_results.round(7))

In [None]:
# Create visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. CO2 Emissions by Model
axes[0, 0].bar(df_results['Model'], df_results['CO2 Emissions (kg)'], color='coral')
axes[0, 0].set_title('CO2 Emissions by Model')
axes[0, 0].set_ylabel('CO2 Emissions (kg)')
axes[0, 0].tick_params(axis='x', rotation=45)

# 2. Accuracy vs Emissions
axes[0, 1].scatter(df_results['Test Accuracy'], df_results['CO2 Emissions (kg)'], 
                   s=100, c=['red', 'orange', 'green'], alpha=0.7)
for i, model in enumerate(df_results['Model']):
    axes[0, 1].annotate(model, 
                       (df_results['Test Accuracy'].iloc[i], df_results['CO2 Emissions (kg)'].iloc[i]),
                       xytext=(5, 5), textcoords='offset points')
axes[0, 1].set_title('Accuracy vs Emissions Trade-off')
axes[0, 1].set_xlabel('Test Accuracy')
axes[0, 1].set_ylabel('CO2 Emissions (kg)')

# 3. Training Time vs Emissions
axes[1, 0].scatter(df_results['Training Time (s)'], df_results['CO2 Emissions (kg)'], 
                   s=100, c=['red', 'orange', 'green'], alpha=0.7)
for i, model in enumerate(df_results['Model']):
    axes[1, 0].annotate(model, 
                       (df_results['Training Time (s)'].iloc[i], df_results['CO2 Emissions (kg)'].iloc[i]),
                       xytext=(5, 5), textcoords='offset points')
axes[1, 0].set_title('Training Time vs Emissions')
axes[1, 0].set_xlabel('Training Time (seconds)')
axes[1, 0].set_ylabel('CO2 Emissions (kg)')

# 4. Efficiency Metric (Emissions per Accuracy Point)
axes[1, 1].bar(df_results['Model'], df_results['Emissions per Accuracy'], color='lightblue')
axes[1, 1].set_title('Carbon Efficiency (Lower is Better)')
axes[1, 1].set_ylabel('CO2 per Accuracy Point')
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Find the most efficient model
most_efficient = df_results.loc[df_results['Emissions per Accuracy'].idxmin()]
print(f"\n🏆 Most Carbon-Efficient Model: {most_efficient['Model']}")
print(f"   💚 Efficiency Score: {most_efficient['Emissions per Accuracy']:.8f} kg CO2 per accuracy point")
print(f"   📊 Test Accuracy: {most_efficient['Test Accuracy']:.3f}")
print(f"   💨 CO2 Emissions: {most_efficient['CO2 Emissions (kg)']:.6f} kg")

## ⚙️ Section 4: Configure Tracking Options

CodeCarbon offers many configuration options for different scenarios. See the [docs](https://codecarbon.io/docs/configuration/) for more details.

If we want to avoid specifying options every time we create a tracker, we can create a `.codecarbon.config` file in our project directory to set default options:

In [None]:
# We can create a config file `.codecarbon.config` to set default parameters
PROJECT_ID="" #TODO: your project ID
ORGANIZATION_ID="" #TODO: your organization ID
EXPERIMENT_ID="" #TODO: your experiment ID
API_KEY="" #TODO: your API key

with open(".codecarbon.config", "w") as f:
    f.write("""
[codecarbon]
# Set default output directory for all trackers
output_dir = ./emissions
# Set default project name
project_name = default_project
# Enable/disable logging to CSV file
save_to_file = True
# Enable/disable logging to console
save_to_logger = True
# Set the frequency of emissions updates (in seconds)
measure_power_secs = 5
            
# To connect to the CodeCarbon Dashboard, set your project, organization, experiment IDs and API key
project_id = """ + PROJECT_ID + """
organization_id = """ + ORGANIZATION_ID + """
experiment_id = """ + EXPERIMENT_ID + """
api_key = """ + API_KEY + """

    """)

Those options will be used unless we override them when creating a tracker.

In [None]:
# No internet? No problem!
from codecarbon import OfflineEmissionsTracker

# Offline Mode (useful when internet is unavailable)
print("\n 📡 Offline Mode Configuration:")
offline_tracker = OfflineEmissionsTracker(
    project_name="offline_demo",
    country_iso_code="USA",  # Manually specify country
    save_to_file=True
)

print("   ✅ Offline mode: No internet connection needed")
print("   🌍 Country specified: USA")


# Section 5: Codecarbon Dashboard - Analyze and Visualize Emissions Data 

CodeCarbon provides an online dashboard to visualize and analyze your emissions data. You can upload your emissions data to the [CodeCarbon Dashboard](https://dashboard.codecarbon.io/) for detailed insights.

1. Go to the [CodeCarbon Dashboard](https://dashboard.codecarbon.io/) and create a free account.
1. Create a new project.
1. Create a new experiment within that project.
1. Go to project settings ==> create an API key to start pushing data from your local runs to the dashboard.



## 🎯 Hands-On Exercise: Your Turn!

Now it's your turn to practice! Complete this exercise to reinforce what you've learned:

In [None]:
# 🎯 EXERCISE: Compare Two Data Processing Approaches
# Your task: Track and compare the carbon emissions of two different approaches
# to the same data processing task



## 🎯 Workshop Wrap-Up & Next Steps

Congratulations! You've completed the CodeCarbon workshop. Here's what we covered and your next steps:


### 🚀 Next Steps & Resources:

#### 📚 **Learning Resources:**
- 📖 [CodeCarbon Documentation](https://mlco2.github.io/codecarbon/)
- 📈 [ML CO2 Impact Calculator](https://mlco2.github.io/impact/)

#### 🛠️ **Integration Ideas:**
- Add CodeCarbon to your existing ML pipelines
- Integrate with MLOps tools (MLflow, Weights & Biases)
- Set up automated carbon reporting for your team
- Create carbon budgets for your projects

#### 🌱 **Community & Contribution:**
- ⭐ Star the [CodeCarbon repository](https://github.com/mlco2/codecarbon)
- 💬 Join discussions on sustainable ML
- 📝 Share your carbon-efficient practices
- 🐛 Report bugs and contribute improvements

#### 📊 **Real-World Application:**
1. **Start Small**: Add tracking to one project this week
2. **Measure**: Establish baseline emissions for your workflows  
3. **Optimize**: Implement 2-3 best practices from today
4. **Share**: Present findings to your team
5. **Scale**: Roll out to more projects

---

### 🎯 **Challenge for This Week:**
Choose one of your existing data science projects and:
1. Add CodeCarbon tracking
2. Identify the most carbon-intensive steps
3. Implement one optimization
4. Measure the improvement

---

### 🙋‍♀️ **Questions & Support:**
- GitHub Issues: [codecarbon/issues](https://github.com/mlco2/codecarbon/issues)
- GitHub Discussions: [codecarbon/discussions](https://github.com/mlco2/codecarbon/discussions)
- Linkedin: [CodeCarbon](https://www.linkedin.com/company/codecarbon/)
---

**Thank you for participating in this CodeCarbon workshop! 🌍💚**
