<center><span style="font-size:30px; font-weight: bold;">Nordic Compass Database</span></center>
<center><span style="font-size:24px;">Analysis of ESG Performance and CSRD Compliance</span></center>

<center><span style="font-size:22px;"><b>Section 2:</b> Data manipulation and EDA </span></center>

## Define a base year

# CLEAN su_aud_disclose

I drop all data older than 2019. I chose this year because it is the base year for the Science-Based Targets Initiative's Business Ambition for 1.5°C (SBTI, 2024), which increased the number of companies who made climate commitments by over 80%. This makes it easier to compare companies relative to a common base year, while also avoiding any effects of Covid-19 on business performance. 

In [None]:
df = df.drop(df[df["year"] < 2019].index)

df["year"].value_counts()

year
2020.0    491
2019.0    486
2021.0    439
2022.0    422
Name: count, dtype: int64

## New columns

In [None]:
df.columns

Index(['comp_name', 'ticker', 'year', 'segment', 'industry', 'hq_country',
       'ceo_sust_statem', 'sales', 'env_policy', 'ep_targets',
       'env_impact_red', 'energy_consump', 'incr_renew_en', 'disclosure_raw',
       'resource_target', 'water_withdraw', 'water_disclose', 'ghg_emis',
       'transport_emis', 'audit_es_report', 'su_guidelines', 'su_aud_disclose',
       'su_eva_disclose', 'su_env_assess'],
      dtype='object')

In [None]:
# base_year

earliest_year = df.groupby("comp_name")["year"].min()

earliest_year

df = df.join(earliest_year, on="comp_name", how="left", rsuffix="_base")
df["base_year"] = df["year_base"]
df = df.drop("year_base", axis=1)

df.head()

Unnamed: 0,comp_name,ticker,year,segment,industry,hq_country,ceo_sust_statem,sales,env_policy,ep_targets,...,water_withdraw,water_disclose,ghg_emis,transport_emis,audit_es_report,su_guidelines,su_aud_disclose,su_eva_disclose,su_env_assess,base_year
2,Archer Ltd.,ARCHO,2020.0,Mid,Energy,Norway,1,735.714286,1,1,...,,0,,,1,1,1,0,0,2020.0
3,AutoStore Holdings Ltd.,AUTO,2021.0,Large,Industrial Goods and Services,Bermuda,1,292.5,1,0,...,,0,,371.9243,0,1,0,1,0,2021.0
4,Avance Gas Holding ltd,AGAS,2019.0,Mid,Energy,Norway,1,223.590179,1,1,...,,0,,,0,0,0,0,0,2019.0
5,Avance Gas Holding ltd,AGAS,2020.0,Mid,Energy,Norway,1,183.675,1,1,...,,0,,,1,1,0,0,0,2019.0
7,Borr Drilling Ltd,BDRILL,2019.0,Mid,Energy,Bermuda,0,291.848552,1,0,...,,0,150.784,43.671,0,1,0,0,0,2019.0


In [None]:
# # fix this

# # 1. Create 'years_esg_data' by counting the number of rows for each 'comp_name'
# df['years_esg_data'] = df.groupby('comp_name')['year'].transform('count')

# # 2. Create 'consecutive_years_esg_data' by checking consecutive years starting from 2022
# def calculate_consecutive_years(group):
#     # Create a set of years for the current 'comp_name'
#     years = set(group['year'])

#     # Start from 2022 and count consecutive years backwards
#     count = 0
#     for year in range(2022, 2019, -1):  # Checking years 2022, 2021, 2020, ...
#         if year in years:
#             count += 1
#         else:
#             break  # Stop if any year is missing in the consecutive sequence

#     return count

# # Apply the function to each group of 'comp_name'
# df['consecutive_years_esg_data'] = df.groupby('comp_name').apply(calculate_consecutive_years).reset_index(level=0, drop=True)

# # Optionally, display the results
# df[['comp_name', 'year', 'years_esg_data', 'consecutive_years_esg_data']].head()

In [None]:
df = df.rename(columns={"sales": "revenue_EUR"})

df.head()

Unnamed: 0,comp_name,ticker,year,segment,industry,hq_country,ceo_sust_statem,revenue_EUR,env_policy,ep_targets,...,water_withdraw,water_disclose,ghg_emis,transport_emis,audit_es_report,su_guidelines,su_aud_disclose,su_eva_disclose,su_env_assess,base_year
2,Archer Ltd.,ARCHO,2020.0,Mid,Energy,Norway,1,735.714286,1,1,...,,0,,,1,1,1,0,0,2020.0
3,AutoStore Holdings Ltd.,AUTO,2021.0,Large,Industrial Goods and Services,Bermuda,1,292.5,1,0,...,,0,,371.9243,0,1,0,1,0,2021.0
4,Avance Gas Holding ltd,AGAS,2019.0,Mid,Energy,Norway,1,223.590179,1,1,...,,0,,,0,0,0,0,0,2019.0
5,Avance Gas Holding ltd,AGAS,2020.0,Mid,Energy,Norway,1,183.675,1,1,...,,0,,,1,1,0,0,0,2019.0
7,Borr Drilling Ltd,BDRILL,2019.0,Mid,Energy,Bermuda,0,291.848552,1,0,...,,0,150.784,43.671,0,1,0,0,0,2019.0


In [None]:
# First convert both columns to numeric
df["ghg_emis"] = pd.to_numeric(df["ghg_emis"], errors="coerce")
df["revenue_EUR"] = pd.to_numeric(df["revenue_EUR"], errors="coerce")

# Then perform the division
df["ghg_emis_per_EUR_revenue"] = df["ghg_emis"] / df["revenue_EUR"]

df.head(1)

Unnamed: 0,comp_name,ticker,year,segment,industry,hq_country,ceo_sust_statem,revenue_EUR,env_policy,ep_targets,...,water_disclose,ghg_emis,transport_emis,audit_es_report,su_guidelines,su_aud_disclose,su_eva_disclose,su_env_assess,base_year,ghg_emis_per_EUR_revenue
2,Archer Ltd.,ARCHO,2020.0,Mid,Energy,Norway,1,735.714286,1,1,...,0,,,1,1,1,0,0,2020.0,


In [None]:
# First convert both columns to numeric
df["water_withdraw"] = pd.to_numeric(df["water_withdraw"], errors="coerce")
df["revenue_EUR"] = pd.to_numeric(df["revenue_EUR"], errors="coerce")

# Then perform the division
df["water_withdraw_per_EUR_revenue"] = df["water_withdraw"] / df["revenue_EUR"]

In [None]:
# folder_path = r"C:\Users\james\OneDrive - University of Aberdeen\01 - Turing College\\D99 - Capstone Project\ESG Ratings Project - Nordic Compass"

# df.to_csv(f'{folder_path}/nordic_compass_df_modified.csv', index=False)

In [None]:
df.columns

Index(['comp_name', 'ticker', 'year', 'segment', 'industry', 'hq_country',
       'ceo_sust_statem', 'revenue_EUR', 'env_policy', 'ep_targets',
       'env_impact_red', 'energy_consump', 'incr_renew_en', 'disclosure_raw',
       'resource_target', 'water_withdraw', 'water_disclose', 'ghg_emis',
       'transport_emis', 'audit_es_report', 'su_guidelines', 'su_aud_disclose',
       'su_eva_disclose', 'su_env_assess', 'base_year',
       'ghg_emis_per_EUR_revenue', 'water_withdraw_per_EUR_revenue'],
      dtype='object')

# To do:

#### Cleaning
Remove whitespace/commas from all values

Converting all 1s and 0s to NaNs for certain columns can all be done at once

Check 'energy_consumption', 'resource_target', 'water_withdraw', 'ghg_emis'  because there are a few nonsensical values



#### New columns


Create 'years_esg_data' and 'consecutive_years_esg_data' columns

Create 'gap analysis: total missing metrics (coverage of metrics)'

Create a column: 'GHG per EUR revenue_ranking_all_companies' - This is binned from 1 to 10 (using quartiles and calculated using only values 
from the same year)

Create a column: 'GHG per EUR revenue_ranking_sector' - This is also binned from 1 to 10 (and calculated using only values from the same year)

Calculate the average GHG per EUR revenue as well as IQR--apply the outlier transformation and put all outliers in the '0' bin

Create a column: 'GHG per EUR revenue_ranking_all_PY' -- This is to compare to the values from the previous year

Create a column: 'GHG per EUR revenue_ranking_sector_PY' -- This is to compare to the values from the previous year

Create a column: '% change in GHG per EUR revenue vs PY'

Create a column: '% change in GHG emissions vs PY'

Create a column: 'transport emissions as a % of total emissions'

Create a column: '% change in transport emissions vs PY'

Create a column: 'Transport emissions as % of total emissions' (compare to sector)


Use the bins only for GHG emissions/EUR--compare values in each bin for all columns...

See how bin values vary from year to year

Calculate the number of companies that have migrated from bin to bin





#### Bonus columns

'GHG intensity reduction % vs sector-specific targets'--normalise so make it a % above or below target

'GHG intensity reduction % vs others in the sector_CY'--also normalise (and consider whether positive is good or bad)



#### Summary columns

Summarise results by:

- Segment/Industry

- HQ country

Declarations per year

--check which industry has the highest % of missing values

Percentage of companies in each industry that have their sustainability work audited

## References

SBTI, 2024. Business ambition for 1.5°C campaign: final report. Available at: https://sciencebasedtargets.org/resources/files/SBTi-Business-Ambition-final-report.pdf (Accessed 17 February 2025)