# Plotting Emissions data from HERWIG Simulation
Rosie Schiffmann <br>
University of Manchester <br>
July 2025

# 1. Introduction
In this notebook, energy and power consumption data obtained from running particle physics event generation simulations using HERWIG is plotted. CPU and RAM energy and power consumption, alongside estimated CO2e emissions was tracked by CodeCarbon. The simulation complexity was varied across runs by changing the total number of events generated by HERWIG, and tracking was performed separately for the integration and generation phases of simulation. During the integration phase, the total cross section is calculated by numerically integrating the squared matrix elements over the allowed phase space of the process. In the generation phase, Herwig samples specific final states based on the probability distribution obtained in the integration phase, and then simulates parton showers, hadronisation, and hadron decays using Monte Carlo techniques.

Data for individual runs from CodeCarbon is stored in CSV files, that follow the naming convention YYYYMMDDType_metadata_EVENTS_JOBS.csv. YYYYMMDD represents the date that the simulations were run. Type is either Int or Gen, to represent data recorded duting the integration or generation phase respectively. EVENTS is the total number of simulated events. This variable can be changed inside the run_herwig_with_cc_loop.ipynb file. JOBS represnets the number of parallel jobs for HERWIG to use, in order to speed up computation by utilising multiple cores. Data that covers the emissions produced by the entire 10 runs as a whole can be found in CSV files YYYYMMDDType_emissions_EVENTS-JOBS.csv.

Plotting functions in this notebook are taken from the emission_tracking.ipynb notebook in https://github.com/rosieschiffmann/event-Transport-Simulation-Energy-Estimation github repository. 

In [2]:
#import python libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import glob
import re

# 2. Data Processing

Here, we will assimilate all of the individual CSV files into 2 separate master CSV files: "Int_master_emissions_data.csv" and "Gen_master_emissions_data.csv". To do this, mean values were calculated for emissions, duration and CPU/GPU energy and power consumption for each raw data file corresponding to a different number of events.

In [6]:
#identify relevant csv files inside repository
int_files = glob.glob("*Int_metadata-*-4.csv")
gen_files = glob.glob("*Gen_metadata-*-4.csv")

def generate_master_csv(files, type):
	"""
	Function to generate a master csv file containing all data for plotting, for either the integration or generation
	phase of HERWIG.

	Parameters:
	- files (list of str): list of filenames to include in master csv
	- type (str): "Int" or "Gen" for integration phase data or generation phase data respectively
	"""
	summary_data = []

	for file in files:
		#identify and find mean of relevant data from CodeCarbon raw outputs
		df = pd.read_csv(file)
		mean_emissions = df['emissions'].mean()
		mean_duration = df['duration'].mean()
		mean_cpu_power = df['cpu_power'].mean()
		mean_ram_power = df['ram_power'].mean()
		mean_cpu_energy = df['cpu_energy'].mean()
		mean_ram_energy = df['ram_energy'].mean()
		#obtain number of events from filename
		match = re.search(r'_metadata-(\d+)-4\.csv', file)
		events = int(match.group(1)) if match else None

		#propagate errors
		emissions_err = df['emissions'].std() / np.sqrt(len(df['emissions']))
		duration_err = df['duration'].std() / np.sqrt(len(df['duration']))
		cpu_power_err = df['cpu_power'].std() / np.sqrt(len(df['cpu_power']))
		ram_power_err = df['ram_power'].std() / np.sqrt(len(df['ram_power']))
		cpu_energy_err = df['cpu_energy'].std() / np.sqrt(len(df['cpu_energy']))
		ram_energy_err = df['ram_energy'].std() / np.sqrt(len(df['ram_energy']))
		
		#add dictionary of data to summary_data list 
		summary_data.append({
			'filename': file,
			'number_of_events': events,
			'mean_emissions': mean_emissions,
			'emissions_err' : emissions_err,
			'mean_duration': mean_duration,
			'duration_err' : duration_err,
			'mean_cpu_power': mean_cpu_power,
			'cpu_power_err' : cpu_power_err,
			'mean_ram_power' : mean_ram_power,
			'ram_power_err' : ram_power_err,
			'mean_cpu_energy' : mean_cpu_energy,
			'cpu_energy_err' : cpu_energy_err,
			'mean_ram_energy' : mean_ram_energy,
			'ram_energy_err' : ram_energy_err,
			'cpu_energy_per_event' : mean_cpu_energy / events,
			'cpu_energy_per_event_err' : cpu_energy_err / events,
			'ram_energy_per_event' : mean_ram_energy / events,
			'ram_energy_per_event_err' : ram_energy_err / events,
			'duration_per_event' : mean_duration / events,
			'duration_per_event_err': duration_err / events
		})

	#convert into dataframe and create master csv file.
	summary_df = pd.DataFrame(summary_data)
	summary_df = summary_df.sort_values(by='number_of_events') #sort wrt number_of_events
	summary_df.to_csv(f"{type}_master_emissions_data.csv", index=False)
	return summary_df

int_summary_df = generate_master_csv(int_files, "Int")
gen_summary_df = generate_master_csv(gen_files, "Gen")
