In [None]:
!pip install codecarbon==2.7.1

# The Environmental Impact of Software Engineering

## Resource Consumption in Software Development

Modern software development, particularly in AI and machine learning, requires substantial computational resources. This has significant environmental implications:

- Training complex models often requires millions of iterations and large datasets.
- Computation times can extend to tens or hundreds of hours on high-performance hardware.
- The energy demand for these processes contributes to a considerable environmental footprint.
- Additional environmental costs come from cooling IT equipment and data centers.

## Hardware Considerations: CPUs, GPUs and Environmental Impact

Central Processing Unit (CPUs) and now Graphics Processing Units (GPUs) have become crucial in software development, especially for AI and machine learning tasks. Their use comes with environmental considerations:

- GPUs typically consume more energy than traditional CPUs due to their architecture.
- Heat generation from GPUs necessitates extensive cooling systems, further increasing energy and water consumption.
  - Example: AWS data centers use an average of 0.19 L of water per kWh for cooling.
- The lifecycle of hardwares also contributes to environmental impact:
  - Production requires rare and precious materials, with significant ecological ramifications.
  - Limited lifespan and frequent upgrades generate hard-to-recycle electronic waste.

## Comparative Environmental Impact

To put the environmental impact in perspective:

| Item | CO2e Emissions |
|------|----------------|
| [NVIDIA A100 GPU manufacture](https://dl.acm.org/doi/pdf/10.1145/3581784.3607035) | Up to 25 kg |
| [CPU manufacture](https://api.boavizta.org/docs#/component/cpu_impact_bottom_up_v1_component_cpu_get) | 19 kg (range: 10-80 kg) |
| [1 HP ProBook laptop manufacturing](https://h20195.www2.hp.com/v2/GetDocument.aspx?docname=c08546039) | 112.5 kg |
| [3 000 km car journey](https://impactco2.fr/outils/transport) | 155 kg |
| [1 hot shower in France](https://borisruf.github.io/carbon-footprint-modeling-tool/?id=scenario-1hot-shower) | 162 g |
| [1 kg chicken meat production](https://openknowledge.fao.org/server/api/core/bitstreams/121cc613-3d0f-431c-b083-cc2031dd8826/content) | 1 kg |
| [1 kg beef meat production](https://openknowledge.fao.org/server/api/core/bitstreams/121cc613-3d0f-431c-b083-cc2031dd8826/content) | 30 kg |

## Implications for Software Engineers

As software engineers, we should consider:

1. Optimizing code for efficiency to reduce computational requirements.
2. Designing software architectures that minimize resource usage.
3. Exploring green computing practices in our development processes.
4. Considering the environmental impact when choosing development tools and platforms.
5. Advocating for sustainable practices in software development within our organizations.

By being aware of these environmental impacts, we can work towards more sustainable software engineering practices.

In [None]:
import numpy as np
import pandas as pd

# Data processing
from sklearn.model_selection import train_test_split

# Image processing
import cv2
from PIL import Image
from concurrent.futures import ThreadPoolExecutor
import multiprocessing

# Database query
import os
import sqlite3

# Energy consumption
import time

# First example

In [None]:
def while_loop(n):
    i = 0
    s = 0
    while i < n:
        s += 1
        i += 1
    return s
    
def for_loop(n):
    s = 0
    for i in range(n):
        s += 1
    return s

In [None]:
numb = 100000000

In [None]:
start_time_while = time.time()

while_loop(numb)

end_time_while = time.time()
time_while = end_time_while - start_time_while
print(f"{time_while} s")

In [None]:
start_time_for = time.time()

for_loop(numb)

end_time_for = time.time()
time_for = end_time_for - start_time_for
print(f"{time_for} s")

A for loop may be more optimized and streamlined for repetitive tasks with a known number of iterations, allowing for better resource management and potentially lower power consumption. While loops, on the other hand, may require additional instructions to evaluate the loop condition repeatedly, potentially resulting in higher power usage.

# Carbon emissions of locally executed software code
The Python package CodeCarbon is designed to measure the carbon footprint of executing code on a __local__ device. Check out the [documentation](https://mlco2.github.io/codecarbon/) for more details.

### First steps with CodeCarbon
Let's start with loading the package previously downloaded:  

_Source: [CodeCarbon](https://mlco2.github.io/codecarbon/)_

In [None]:
from codecarbon import OfflineEmissionsTracker

Start the tracker

/!\ If you use codecarbon version 2.6.0 or lower, it might raise an error:  
_TypeError: BaseEmissionsTracker.\_\_init\_\_() got an unexpected keyword argument 'allow_multiple_runs'_  

In this case, you need to remove the 'allow_multiple_runs=True' parameter of the OfflineEmissionsTracker.  
This parameter was introduced to fix a [bug in version 2.7.0 and 2.7.1](https://github.com/huggingface/optimum-benchmark/issues/260)

In [None]:
tracker1 = OfflineEmissionsTracker(country_iso_code='ESP', allow_multiple_runs=True)

In [None]:
tracker1.start()

print(while_loop(numb))

tracker1.stop()
print(f"Training time: {tracker1.final_emissions_data.duration:.2f} seconds")
print(f"GHG emissions: {np.format_float_scientific(tracker1.final_emissions_data.emissions, precision=2)} kg CO2e")
print(f"Electricity: {np.format_float_scientific(tracker1.final_emissions_data.energy_consumed, precision=2)} kWh")

In [None]:
tracker2 = OfflineEmissionsTracker(country_iso_code='ESP', allow_multiple_runs=True)

tracker2.start()

print(for_loop(numb))

tracker2.stop()
print(f"Training time: {tracker2.final_emissions_data.duration:.2f} seconds")
print(f"GHG emissions: {np.format_float_scientific(tracker2.final_emissions_data.emissions, precision=2)} kg CO2e")
print(f"Electricity: {np.format_float_scientific(tracker2.final_emissions_data.energy_consumed, precision=2)} kWh")

# Use customed python decorator with CodeCarbon to assess any function

Tracking a function with the codecarbon decorator saves the data in an emissions.csv file in the project root.  
You can customize your decorator to use different tracking parameters and display different metrics.

### CodeCarbon custom decorator parameters
**tracking_mode**: machine (default): entire machine ; process: try to isolate the process

**pue**: 1 (default) ; Old data-centers have a PUE up to 2.2, where new green one could be as low as 1.1.  
Power usage effectiveness (PUE) is a metric used to measure the energy efficiency of a computer data center. It is the ratio of how much energy is used by the computing equipment in contrast to cooling and other overhead that supports the equipment.

**gpu_ids**: None (default) ; User-specified known GPU ids to track.

**measure_power_secs**: 15 (default) ; Interval (in seconds) to measure hardware power usage. The smaller it is, the more precise the measure is, but the more resources it requires =(

**log_level**: Global codecarbon log level (by order of verbosity): “debug”, “info” (defaults), “warning”,
“error”, or “critical”

In [None]:
def track_emissions(func):
    def wrapper(*args, **kwargs):
        tracker = OfflineEmissionsTracker(country_iso_code='ESP', tracking_mode='machine', log_level='critical', allow_multiple_runs=True)
        tracker.start()

        result = func(*args, **kwargs)

        tracker.stop()
        print(f"\nTraining time: {tracker.final_emissions_data.duration:.4f} seconds")
        print(f"GHG emissions: {np.format_float_scientific(tracker.final_emissions_data.emissions, precision=2)} kg CO2e")
        print(f"Electricity: {np.format_float_scientific(tracker.final_emissions_data.energy_consumed, precision=2)} kWh\n")

        return result
    return wrapper

### Now assess carbon emissions with your custom decorator

In [None]:
@track_emissions
def while_loop(n):
    i = 0
    s = 0
    while i < n:
        s += 1
        i += 1
    return s

@track_emissions
def for_loop(n):
    s = 0
    for i in range(n):
        s += 1
    return s

print(while_loop(numb))
print(for_loop(numb))

In [None]:
@track_emissions
def sum_range(n):
    return sum(range(n))

@track_emissions
def sum_numpy(n):
    return np.sum(np.arange(n))

@track_emissions
def sum_math(n):
    return (n * (n-1)) // 2

print("Sum range:")
sum_range(numb)
print("\nSum numpy:")
sum_numpy(numb)
print("\nSum math:")
sum_math(numb)

# Software engineer example: Efficient Data Processing

In [None]:
# Generate a synthetic dataset
np.random.seed(18)
X = 2 * np.random.rand(10000000, 1)
y = 4 + 3 * X + np.random.randn(10000000, 1)

# Division of the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=18)

In [None]:
def process_data(X, y):
    # Simulate some data processing
    result = np.dot(X.T, y)
    return result

In [None]:
@track_emissions
def non_optimized_process():
    result = process_data(X_train, y_train)
    print(f"Result shape: {result.shape}")
    print(f"Result: {result}")

In [None]:
@track_emissions
def optimized_process():
    batch_size = 50000
    result = np.zeros((X_train.shape[1], y_train.shape[1]))
    for i in range(0, len(X_train), batch_size):
        end = i + batch_size
        result += process_data(X_train[i:end], y_train[i:end])
    print(f"Result shape: {result.shape}")
    print(f"Result: {result}")

In [None]:
non_optimized_process()

In [None]:
optimized_process()

<ins>Optimization Strategies:</ins>

- **Batch Processing**: Instead of processing all data at once, we use small batches. This is more efficient in terms of memory management and resource consumption.  
_Source: [Batch Processing: The Engine of Efficiency Behind the Scene](https://medium.com/@hayesha1744/batch-processing-the-engine-of-efficiency-behind-the-scene-511ac7536b3d)_
- **Reduce Redundancy**: Batch processing can lead to less overall processing time and energy consumption, depending on the context and available resources.

# Optimizing Image Processing Tasks

In [None]:
class ImageProcessor:
    def __init__(self, kernel_size=15):
        self.kernel_size = kernel_size
        # Creating the Gaussian kernel once
        self.kernel = self._create_gaussian_kernel()
        
    def _create_gaussian_kernel(self):
        """Creates an optimized Gaussian kernel"""
        x, y = np.meshgrid(np.linspace(-2, 2, self.kernel_size), 
                          np.linspace(-2, 2, self.kernel_size))
        d = np.sqrt(x*x + y*y)
        kernel = np.exp(-(d**2 / (2.0 * 2.0)))
        return (kernel / kernel.sum()).astype(np.float32)

    def process_original(self, image_array):
        """Original method not optimized for comparison"""
        start_time = time.time()
        result = np.zeros_like(image_array)
        
        for i in range(3):
            temp = image_array[:,:,i].astype(np.float32)
            for _ in range(5):
                temp = cv2.filter2D(temp, -1, self.kernel)
            result[:,:,i] = temp
            
        return result, time.time() - start_time

    def _process_channel(self, channel):
        """Processes a single image channel"""
        result = channel.astype(np.float32)
        for _ in range(5):
            result = cv2.filter2D(result, -1, self.kernel)
        return result

    def process_parallel(self, image_array):
        """Optimized version with multi-threading"""
        start_time = time.time()
        
        # Channel separation
        channels = [image_array[:,:,i] for i in range(3)]
        
        # Parallel channel processing
        with ThreadPoolExecutor(max_workers=3) as executor:
            processed_channels = list(executor.map(self._process_channel, channels))
            
        # Image reconstruction
        result = np.dstack(processed_channels).astype(np.uint8)
        
        return result, time.time() - start_time

    def process_opencv(self, image_array):
        """OpenCV-only version for maximum performance"""
        start_time = time.time()
        
        # Conversion to float32
        image = image_array.astype(np.float32)
        
        # Applying convolutions with OpenCV
        for _ in range(5):
            image = cv2.filter2D(image, -1, self.kernel)
            
        # Conversion to uint8
        result = np.clip(image, 0, 255).astype(np.uint8)
        return result, time.time() - start_time

In [None]:
def benchmark_all_methods(image_path):
    """Compare all treatment methods"""
    # Image loading
    img = cv2.imread(image_path)
    if img is None:
        raise ValueError(f"Impossible to load the image: {image_path}")
    
    processor = ImageProcessor()
    results = {}
    
    print("Benchmarking in progress...")
    
    # Testing each method
    print("1. Original method test...")
    _, time_original = processor.process_original(img.copy())
    results['Original'] = time_original
    
    print("2. Parallel method test...")
    _, time_parallel = processor.process_parallel(img.copy())
    results['Parallel'] = time_parallel
    
    print("3. OpenCV method test...")
    _, time_opencv = processor.process_opencv(img.copy())
    results['OpenCV'] = time_opencv
    
    # Results display
    print("\nBenchmark results:")
    print("-" * 40)
    for method, duration in results.items():
        print(f"{method:15} : {duration:.3f} secondes")
    
    speedup = time_original/min(time_parallel, time_opencv)
    print(f"\nMaximum acceleration: {speedup:.2f}x")
    
    return results

# Example of use
benchmark_all_methods("images/im1.jpg")

<ins>Optimization Strategies:</ins>
    
- **Use Efficient Libraries:** Leverage optimized libraries like OpenCV or Pillow for image processing tasks.
- **Resize Images:** Process smaller images when possible to reduce computational load.
- **Caching:** Implement caching mechanisms to avoid redundant processing of the same images.
- **Parallel Processing:** Utilize multi-threading for processing multiple images simultaneously.  
_Source: [Energy Efficiency in High-Performance Computing: Balancing Speed and Sustainability](https://developer.nvidia.com/blog/energy-efficiency-in-high-performance-computing-balancing-speed-and-sustainability/)_

# Optimizing Database Queries

For all systems working with a database, performance is strongly impacted by the database’s performance and the quality of the network.

While having an excellent network can improve system performance, adopting practices to enhance database processing performance is an essential ally.

Among the performance improvement practices, we can mention adding indexes to the database, regular database maintenance, and query optimization.

Tools exist to propose database optimization approaches, index creations to be done, following the analysis of queries that the tool sees passing through its query plan.

In [None]:
db_name = 'Car_Database - Copie.db'

In [None]:
if not os.path.isfile(db_name):
    print(f"Le fichier {db_name} n'a pas été trouvé dans le répertoire courant.")
    exit()

conn = sqlite3.connect(db_name)
cursor = conn.cursor()

In [None]:
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = [row[0] for row in cursor.fetchall()]

# Explore each table
for table in tables:
    print(f"\nTable '{table}':")

    cursor.execute(f"PRAGMA table_info({table});")
    columns = [col[1] for col in cursor.fetchall()]
    print("Columns:", ", ".join(columns))
    
    cursor.execute(f"SELECT COUNT(*) FROM {table}")
    lenght = cursor.fetchall()[0][0]
    print(f"Number of lines: {lenght}")

    # Get head of each table
    cursor.execute(f"SELECT * FROM {table} LIMIT 5;")
    rows = cursor.fetchall()
    print("\nFirst lines :")
    for row in rows:
        print(row)
        pass  
    print('\n#######################')

In [None]:
res1 = cursor.execute("""
SELECT c.customer_id, c.first_name, c.last_name, COUNT(cv.option_set_id) AS nombre_d_options
FROM Customers c
JOIN Customer_Ownership co ON c.customer_id = co.customer_id
JOIN Car_Vins cv ON co.vin = cv.vin
JOIN Car_Options copt ON cv.option_set_id = copt.option_set_id
GROUP BY c.customer_id, c.first_name, c.last_name;
""")
time1 = time.time()
print(res1.fetchall())
print(f"Time: {time.time() - time1} s")

## Optimization with indexing

In [None]:
# List modes de paiement
cursor.execute("""
CREATE INDEX customer_options_index ON Car_Vins (vin, option_set_id);
""")

In [None]:
res2 = cursor.execute("""
SELECT c.customer_id, c.first_name, c.last_name, COUNT(cv.option_set_id) AS nombre_d_options
FROM Customers c
JOIN Customer_Ownership co ON c.customer_id = co.customer_id
JOIN Car_Vins cv ON co.vin = cv.vin
JOIN Car_Options copt ON cv.option_set_id = copt.option_set_id
GROUP BY c.customer_id, c.first_name, c.last_name;
""")
time2 = time.time()
print(res2.fetchall())
print(f"Time: {time.time() - time2} s")

In [None]:
# Close database connexion
conn.close()

- **Indexing:** Properly index tables to improve query performance.  
_Source: [Green Coding 2/7: Principles of Sustainable Data Management](https://medium.com/just-tech-it-now/green-coding-2-8-principles-of-sustainable-data-management-29edf3cac561#aa5a)_
- **Query Optimization:** Write efficient queries that minimize data scans.
- **Database Schema Design:** Design an efficient schema, including appropriate normalization.

# General Optimization Strategies
- **Code Profiling:** Use profiling tools to identify performance bottlenecks.
- **Algorithmic Efficiency:** Choose and implement efficient algorithms for your tasks.
- **Parallel Processing:** Utilize multi-threading or distributed computing when appropriate.
- **Memory Management:** Optimize memory usage to reduce overall resource consumption.
- **Caching:** Implement caching strategies to avoid redundant computations.

# Language selection
**C/C++:** These are statically-typed, compiled languages. They are known for their efficiency and performance. Since they are compiled languages, the code is translated into machine code before it's run, which results in faster execution. However, they require more manual memory management, which can be a source of errors if not handled properly.

**Java:** Java is a statically-typed, interpreted language. It uses a Just-In-Time (JIT) compiler to convert bytecode into machine code during runtime, which makes it faster than pure interpretation. Java also has automatic memory management (garbage collection), which can sometimes result in performance overhead but eliminates the risk of memory leaks.

**Python:** Python is a dynamically-typed, interpreted language. It is known for its simplicity and readability, but its performance is generally slower compared to C/C++ and Java due to its interpretation at runtime. However, Python also has Just-In-Time compilation through libraries like PyPy and Numba, which can significantly speed up execution for certain types of computations.

# Evaluate the environmental impact of one of your projects!
You can use one of the libraries listed above in your projects, then try to reduce the environmental impact.