# Abatement Calculations
This notebook outlines the process of developing the target variable: CO2 units abated annually over time. The calculation uses datasets from 2 different sources: the [Enerdata Statistical Yearbook](https://yearbook.enerdata.net/) and the [World Bank](https://data.worldbank.org/indicator/EN.CO2.ETOT.ZS). The intention of developing this target, instead of only using the metrics given, is to focus the analysis towards the long-term goal (improving environmental quality by reducing overall emissions).

## Notebook Contents
- [Loading in and merging data sources](#loading_and_merging)  
    [Countries in the dataset](#full_country_list)
- [Developing abatement equation](#equation)
- [Historical Abatement Calculations](#abatement_calculations)
- [Completed CSV File](#csv)

In [90]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

### _Loading in and merging data sources_
<a id='loading_and_merging'></a>

Although most of our data is from the same source, each attribute needed to be pulled and saved as a separate table, then read in individually. There were initially a few issues with the way the datasets were reading in because of how the Yearbook is formatted so the `kwargs` and the `.replace` cell were included as a result of trial-and-error read-ins.  

Otherwise, there is no true cleaning needed, except to [fill nulls for the patent dataset](#fill_nulls)

In [2]:
#Loading in CSVs
total_emissions = pd.read_csv('./data/C02_emissions.csv')
electric_consumption = pd.read_csv('./data/electric_consumption.csv')
electric_emissions = pd.read_csv('./data/electric_emissions.csv', encoding='latin')
electric_production = pd.read_csv('./data/electric_production.csv')
reneg_production = pd.read_csv('./data/reneg_production.csv')
patents = pd.read_csv('./data/patents.csv')

In [3]:
#Re-formatting our string characters into numbers to avoid errors

total_emissions.replace(r',','', inplace=True, regex=True)
electric_consumption.replace(r',','', inplace=True, regex=True)
electric_emissions.replace(r',','', inplace=True, regex=True)
electric_production.replace(r',','', inplace=True, regex=True)
reneg_production.replace(r',','', inplace=True, regex=True)
patents.replace(r',','', inplace=True, regex=True)

<a id='fill_nulls'></a>

In [72]:
patents.isnull().sum().sum()

121

In [35]:
#Patents is the only table that contains nulls for our designated time period (1990-2014)
patents.fillna(value=0, axis=0, inplace=True)

<a id='full_country_list'></a>

In [26]:
#Setting list of unique countries that exist in all dataframes
#Including world, will be separated later

Countries = ['Belgium',
'France',
'Germany',
'Italy',
'Netherlands',
'Poland',
'Portugal',
'Romania',
'Spain',
'United Kingdom',
'Turkey',
'Sweden',
'Norway',
'Kazakhstan',
'Ukraine',
'Uzbekistan',
'Canada',
'United States',
'Argentina',
'Brazil',
'Chile',
'Colombia',
'Mexico',
'China',
'India',
'Indonesia',
'Japan',
'Malaysia',
'Thailand',
'Australia',
'New Zealand',
'Algeria',
'Nigeria',
'South Africa',
'Kuwait',
'Saudi Arabia',
'United Arab Emirates',
             'World'
    ]

Countries.sort()

### _Developing the Abatement Equation_
<a id='equation'></a>

In order to develop this equation the datasets needed to be merged individually to avoid errors and column name confusion.

You can [skip to the equation broken down step-by-step](#equation_breakdown)


In [64]:
total_emissions_14 = total_emissions[['Country', '2014']]
electric_emissions_14 = electric_emissions[['Country Name', '2014']]
electric_production_14 = electric_production[['Country', '2014']]
reneg_production_14 = reneg_production[['Country','2014']]

In [65]:
abatement_dataset = pd.merge(total_emissions_14, electric_emissions_14, how='left',
                       left_on='Country', right_on='Country Name', left_index=True,
                            suffixes=[' co2_emissions','electric_emissions_pct'])

abatement_dataset.drop(labels='Country Name', axis=1, inplace=True)
abatement_dataset.columns = ['country','co2_emissions','electric_emissions_pct']

In [66]:
abatement_dataset = pd.merge(abatement_dataset, electric_production_14, how='left',
                            left_on='country', right_on='Country', left_index=True)

abatement_dataset.columns=['country', 'co2_emissions', 'electric_emissions_pct', 'country1', 'electric_production']

abatement_dataset.drop(labels='country1', axis=1, inplace=True)

In [67]:
abatement_dataset = pd.merge(abatement_dataset, reneg_production_14, how='left',
                            left_on='country', right_on='Country', left_index=True)

abatement_dataset.columns=['country', 'co2_emissions', 'electric_emissions_pct', 'electric_production', 'country1','reneg_production']

abatement_dataset.drop(labels='country1', axis=1, inplace=True)

In [69]:
abatement_dataset.co2_emissions = abatement_dataset.co2_emissions.astype(int)

In [95]:
abatement_dataset.head()

Unnamed: 0,country,co2_emissions,electric_emissions_pct,electric_production,reneg_production,co2_electricity,reneg_production_share,conventional_production_share,co2_units,co2_abated
49,Algeria,133,38.83,71,0.28,51.6439,0.1988,70.8012,0.729421,0.145009
30,Argentina,198,38.04,142,31.69,75.3192,44.9998,97.0002,0.776485,34.94167
46,Australia,378,58.36,248,14.92,220.6008,37.0016,210.9984,1.045509,38.685519
6,Belgium,89,25.85,73,18.44,23.0065,13.4612,59.5388,0.386412,5.201568
31,Brazil,475,26.31,591,73.08,124.9725,431.9028,159.0972,0.78551,339.264127


## _Equation Breakdown_
<a id='equation_breakdown'></a>
#### Attributes available:
- **co2_emissions**: _Total CO2 Units emitted annually, measured in metric tons of CO2_  
### $φ$
- **electric_emissions_pct**: _Percentage of CO2 emissions from producing heat & electricity_  
### $υ$
- **electric_production**: _Amount of electricity produced annually, measured in metric tons of energy_  
### $λ$   
- **reneg_production**: _Percentage of electricity production from renewable sources_  
### $γ$  

## Steps of developing the equation:

### $φυ$  
**co2_electricity**: _determines how many units of CO2 emissions come from producing heat & electricy_ 
- co2_emissions 
- electric_emissions_pct  
    
### $γλ$      
**reneg_production_share**: _determines what share of electricity production is from renewable energy sources_
- reneg_production
- electric_production 
    
### $λ-(γλ)$  
**conventional_production_share**: _determines what share of electricity production is fron conventional energy sources_
- electric_production
- reneg_production_share  
    
### $(φυ) / (λ-(γλ))$ 
**co2_units**: _determines the units of CO2 emissions from conventional production_
- CO2 Electricity
- Conventional Production Share 
    
# $((φυ) / (λ-(γλ)))γλ$
**co2_abated (Final Output)**: _metric tons of CO2 abated for every unit of energy produced using renewable sources_
- CO2 Units
- Reneg Production Share  


In [70]:
#1. determine actual emissions from electricity and heat production alone
#Using % of emissions that comes from generating heat & electricity
abatement_dataset['co2_electricity'] = abatement_dataset.co2_emissions * (abatement_dataset.electric_emissions_pct/100)

#2. understand how much C02 is produced per unit of energy
# Of electricity production how 
abatement_dataset['reneg_production_share'] = (abatement_dataset.reneg_production/100)*abatement_dataset.electric_production #share of regular energy
abatement_dataset['conventional_production_share'] = (abatement_dataset.electric_production-abatement_dataset.reneg_production_share)
# #once i know how much is non-renewable I can use C02 emissions/energyproduction and find out C02 per unit of production

# # Of the total energy output from electric plants what percentage is renewable
abatement_dataset['co2_units'] = abatement_dataset.co2_electricity/abatement_dataset.conventional_production_share

# # How much C02 is not being produced for the share of green energy usage
#C02 units * renewable energy share = C02 saved
abatement_dataset['co2_abated'] = abatement_dataset.co2_units*abatement_dataset.reneg_production_share

### _Historical Abatement Calculations_

<a id='abatement_calculations'></a>

In [53]:
#Reducing each dataset according to country

total_emissions = total_emissions.loc[total_emissions['Country'].isin(Countries)].sort_values('Country')
electric_consumption = electric_consumption.loc[electric_consumption['Unnamed: 0'].isin(Countries)].sort_values('Unnamed: 0')
electric_emissions = electric_emissions.loc[electric_emissions['Country Name'].isin(Countries)].sort_values('Country Name')
electric_production = electric_production.loc[electric_production['Country'].isin(Countries)].sort_values('Country') 
reneg_production = reneg_production.loc[reneg_production['Country'].isin(Countries)].sort_values('Country')
patents = patents.loc[patents['Unnamed: 0'].isin(Countries)].sort_values('Unnamed: 0')

In [59]:
datasets = ['total_emissions',
'electric_consumption',
'electric_emissions',
'electric_production',
'reneg_production']

In [60]:
years = ['1990','1991','1992','1993','1994','1995','1996','1997', '1998','1999','2000','2001',
         '2002','2003','2004','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014']

In [36]:
def abatement_calculator(datasets, no_years):
    abatement = pd.DataFrame()
    for year in no_years:
        for dataset in datasets:
            #print(dataset)
            if dataset == 'total_emissions_local':
                    country = eval(dataset)['Country']
                    co2_emissions = eval(dataset)[year]
                    co2_emissions = co2_emissions.astype(int).values
            if dataset == 'electric_emissions_local':
                    electric_emissions_pct = eval(dataset)[year].values/100
            if dataset == 'electric_production_local':
                    electric_prod = eval(dataset)[year].values
            if dataset == 'reneg_production_local':
                    reneg_prod_pct = eval(dataset)[year].values/100                    
                    
        abatement['country'] = country   
        abatement[year] = ((co2_emissions*electric_emissions_pct)/(electric_prod-(reneg_prod_pct*electric_prod)))*(reneg_prod_pct*electric_prod)
    return abatement

In [42]:
abate = abatement_calculator(datasets, years)

<a id='csv'></a>

In [80]:
abate.to_csv('./Data/abatement_calculations.csv')