# Sustainable Energy for All

## Introduction

The World Bank's Sustainable Energy For All (SE4All) Database provides country level historical data for access to electricity and non-solid fuel, share of renewable energy in total final energy consumption by technology, and energy intensity rate of improvement.

The SE4All database can be found at
http://databank.worldbank.org/data/reports.aspx?source=sustainable-energy-for-all&Type=TABLE&preview=on#

## Gather

The SE4All database was manually downloaded.

In [65]:
# Load dependencies
import numpy as np
import pandas as pd

In [66]:
# Import data
data_df = pd.read_csv('data/data.csv')
def_df = pd.read_csv('data/definition.csv')

In [67]:
# Check def_df
def_df

Unnamed: 0,Code,Indicator Name,Long definition
0,1.2_ACCESS.ELECTRICITY.RURAL,Access to electricity (% of rural population w...,Access to electricity (% of rural population w...
1,1.1_ACCESS.ELECTRICITY.TOT,Access to electricity (% of total population),Access to electricity (% of total population):...
2,1.3_ACCESS.ELECTRICITY.URBAN,Access to electricity (% of urban population w...,Access to electricity (% of urban population w...
3,6.1_PRIMARY.ENERGY.INTENSITY,Energy intensity level of primary energy (MJ/2...,Energy intensity level of primary energy (MJ/2...
4,4.1.2_REN.ELECTRICITY.OUTPUT,Renewable electricity output (GWh),Renewable electricity output (GWh): Electric o...
5,4.1_SHARE.RE.IN.ELECTRICITY,Renewable electricity share of total electrici...,Renewable electricity share of total electrici...
6,3.1_RE.CONSUMPTION,Renewable energy consumption (TJ),Renewable energy consumption (TJ): This indica...
7,2.1_SHARE.TOTAL.RE.IN.TFEC,Renewable energy share of TFEC (%),Renewable energy share of TFEC (%): Share of r...
8,4.1.1_TOTAL.ELECTRICITY.OUTPUT,Total electricity output (GWh),Total electricity output (GWh): Total number o...
9,1.1_TOTAL.FINAL.ENERGY.CONSUM,Total final energy consumption (TFEC) (TJ),Total final energy consumption (TFEC): This in...


In [68]:
# Check data
data_df.head()

Unnamed: 0,Country Name,Country Code,Time,Time Code,Access to Clean Fuels and Technologies for cooking (% of total population) [2.1_ACCESS.CFT.TOT],Access to electricity (% of rural population with access) [1.2_ACCESS.ELECTRICITY.RURAL],Access to electricity (% of total population) [1.1_ACCESS.ELECTRICITY.TOT],Access to electricity (% of urban population with access) [1.3_ACCESS.ELECTRICITY.URBAN],Energy intensity level of primary energy (MJ/2011 USD PPP) [6.1_PRIMARY.ENERGY.INTENSITY],Renewable electricity output (GWh) [4.1.2_REN.ELECTRICITY.OUTPUT],Renewable electricity share of total electricity output (%) [4.1_SHARE.RE.IN.ELECTRICITY],Renewable energy consumption (TJ) [3.1_RE.CONSUMPTION],Renewable energy share of TFEC (%) [2.1_SHARE.TOTAL.RE.IN.TFEC],Total electricity output (GWh) [4.1.1_TOTAL.ELECTRICITY.OUTPUT],Total final energy consumption (TFEC) (TJ) [1.1_TOTAL.FINAL.ENERGY.CONSUM]
0,Afghanistan,AFG,1990.0,YR1990,..,..,0.0099999997764825,52.0369758605957,1.88411277331996,764,67.73,6312.392,15.9245316828932,1128,39639.42002
1,Afghanistan,AFG,1991.0,YR1991,..,..,0.0099999997764825,53.8098373413086,1.99591306318908,690,67.98,6361.651,17.0364435282942,1015,37341.426275
2,Afghanistan,AFG,1992.0,YR1992,..,..,0.0099999997764825,55.5825233459473,1.33250174740999,478,67.99,6546.363,26.5216286544368,703,24683.11085
3,Afghanistan,AFG,1993.0,YR1993,..,..,0.0099999997764825,57.3541793823242,1.76063654184963,475,68.35,7849.649,30.5856670489932,695,25664.468875
4,Afghanistan,AFG,1994.0,YR1994,..,..,0.0099999997764825,59.1237754821777,2.24561299646338,472,68.7,8305.308,32.7962505504008,687,25323.95582


## Assess
### Summary of Data Issues
#### Quality
- Time/Time Code show the same information
- Column names very long
- Null Values might not be of null type
- The last 5 rows do not appear to be records
- Time of type float
- All other columns of type object
- Null values, most likely due to the last 5 rows not being observations
- Round percentages to 2 decimal points
- Percentages rounded to different digits
- Make sure no impossible percentages (<0 or >1)
- There are several instances of very specific values for columns 4, 5, 6, 7, 11, 12, 14. Are these correct?

#### Tidiness
- None

##### Visual Assessment

In [69]:
data_df.head()

Unnamed: 0,Country Name,Country Code,Time,Time Code,Access to Clean Fuels and Technologies for cooking (% of total population) [2.1_ACCESS.CFT.TOT],Access to electricity (% of rural population with access) [1.2_ACCESS.ELECTRICITY.RURAL],Access to electricity (% of total population) [1.1_ACCESS.ELECTRICITY.TOT],Access to electricity (% of urban population with access) [1.3_ACCESS.ELECTRICITY.URBAN],Energy intensity level of primary energy (MJ/2011 USD PPP) [6.1_PRIMARY.ENERGY.INTENSITY],Renewable electricity output (GWh) [4.1.2_REN.ELECTRICITY.OUTPUT],Renewable electricity share of total electricity output (%) [4.1_SHARE.RE.IN.ELECTRICITY],Renewable energy consumption (TJ) [3.1_RE.CONSUMPTION],Renewable energy share of TFEC (%) [2.1_SHARE.TOTAL.RE.IN.TFEC],Total electricity output (GWh) [4.1.1_TOTAL.ELECTRICITY.OUTPUT],Total final energy consumption (TFEC) (TJ) [1.1_TOTAL.FINAL.ENERGY.CONSUM]
0,Afghanistan,AFG,1990.0,YR1990,..,..,0.0099999997764825,52.0369758605957,1.88411277331996,764,67.73,6312.392,15.9245316828932,1128,39639.42002
1,Afghanistan,AFG,1991.0,YR1991,..,..,0.0099999997764825,53.8098373413086,1.99591306318908,690,67.98,6361.651,17.0364435282942,1015,37341.426275
2,Afghanistan,AFG,1992.0,YR1992,..,..,0.0099999997764825,55.5825233459473,1.33250174740999,478,67.99,6546.363,26.5216286544368,703,24683.11085
3,Afghanistan,AFG,1993.0,YR1993,..,..,0.0099999997764825,57.3541793823242,1.76063654184963,475,68.35,7849.649,30.5856670489932,695,25664.468875
4,Afghanistan,AFG,1994.0,YR1994,..,..,0.0099999997764825,59.1237754821777,2.24561299646338,472,68.7,8305.308,32.7962505504008,687,25323.95582


From the above visual assessment, the following issues were determined:
- Time/Time Code show the same information
- Column names very long
- Null Values might not be of null type
- The last 5 rows do not appear to be records

##### Programmatic Assessment

In [70]:
# Check data types/null values
data_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6998 entries, 0 to 6997
Data columns (total 15 columns):
Country Name                                                                                       6995 non-null object
Country Code                                                                                       6993 non-null object
Time                                                                                               6993 non-null float64
Time Code                                                                                          6993 non-null object
Access to Clean Fuels and Technologies for cooking (% of total population) [2.1_ACCESS.CFT.TOT]    6993 non-null object
Access to electricity (% of rural population with access) [1.2_ACCESS.ELECTRICITY.RURAL]           6993 non-null object
Access to electricity (% of total population) [1.1_ACCESS.ELECTRICITY.TOT]                         6993 non-null object
Access to electricity (% of urban population with ac

The following issues were determined from checking info:
- Time of type float
- All other columns of type object
- Null values, most likely due to the last 5 rows not being observations

In [71]:
# Check for duplicates
data_df.duplicated().sum()

2

2 rows are duplicates, most likely from the last 5 rows that are not observations

In [72]:
data_df.head()

Unnamed: 0,Country Name,Country Code,Time,Time Code,Access to Clean Fuels and Technologies for cooking (% of total population) [2.1_ACCESS.CFT.TOT],Access to electricity (% of rural population with access) [1.2_ACCESS.ELECTRICITY.RURAL],Access to electricity (% of total population) [1.1_ACCESS.ELECTRICITY.TOT],Access to electricity (% of urban population with access) [1.3_ACCESS.ELECTRICITY.URBAN],Energy intensity level of primary energy (MJ/2011 USD PPP) [6.1_PRIMARY.ENERGY.INTENSITY],Renewable electricity output (GWh) [4.1.2_REN.ELECTRICITY.OUTPUT],Renewable electricity share of total electricity output (%) [4.1_SHARE.RE.IN.ELECTRICITY],Renewable energy consumption (TJ) [3.1_RE.CONSUMPTION],Renewable energy share of TFEC (%) [2.1_SHARE.TOTAL.RE.IN.TFEC],Total electricity output (GWh) [4.1.1_TOTAL.ELECTRICITY.OUTPUT],Total final energy consumption (TFEC) (TJ) [1.1_TOTAL.FINAL.ENERGY.CONSUM]
0,Afghanistan,AFG,1990.0,YR1990,..,..,0.0099999997764825,52.0369758605957,1.88411277331996,764,67.73,6312.392,15.9245316828932,1128,39639.42002
1,Afghanistan,AFG,1991.0,YR1991,..,..,0.0099999997764825,53.8098373413086,1.99591306318908,690,67.98,6361.651,17.0364435282942,1015,37341.426275
2,Afghanistan,AFG,1992.0,YR1992,..,..,0.0099999997764825,55.5825233459473,1.33250174740999,478,67.99,6546.363,26.5216286544368,703,24683.11085
3,Afghanistan,AFG,1993.0,YR1993,..,..,0.0099999997764825,57.3541793823242,1.76063654184963,475,68.35,7849.649,30.5856670489932,695,25664.468875
4,Afghanistan,AFG,1994.0,YR1994,..,..,0.0099999997764825,59.1237754821777,2.24561299646338,472,68.7,8305.308,32.7962505504008,687,25323.95582


In [73]:
# Check the value counts for each column
data_df.iloc[:,14].value_counts()

..                983
0                 261
362.5               3
422.4               3
447.044225          3
54.78076            3
405.5               3
464.2               3
3096                3
466.7               2
1188.28             2
2597.5              2
278.5               2
72.2414             2
59.21076            2
1110.4              2
496.588525          2
1145.52             2
6670.695            2
21245.49            2
1201.55             2
463.1               2
323.6               2
466.08              2
3132                2
369.5               2
2169.9              2
2287.9              2
1117.44             2
1884.2465           1
                 ... 
4851580.857624      1
4456.95             1
97051.8792746       1
1460677.302504      1
7682.48743          1
2243913.115392      1
2799835.707636      1
5454.067255         1
1753.651799685      1
260192.537856       1
136095.618384       1
29416.31755         1
23961.881649        1
8671255.270632      1
18465.4097

The following issues were determined from checking value_counts:
- Percentages rounded to different digits
- There are several instances of very specific values for columns 4, 5, 6, 7, 11, 12, 14. Are these correct?

## Clean
### Summary of Data Issues
#### Quality
- Delete Time Code Column
- Reduce column name lengths
- Change null values ('..') to null values
- Delete the last 5 rows if they do not hold relevant information
- Check any null values
- Change time column from object to date datatype
- Change percent columns from object to float datatype
- Round percentages to 2 decimal points
- Make sure no impossible percentages (<0 or >1)
- Make sure duplicate values are correct, especially for columns (in data_df) 4, 5, 6, 7, 11, 12, 14. Are these correct?

#### Tidiness
- None

In [74]:
# Make copy of data_df
data_df_clean = data_df.copy()

### Delete Country Code and Time Code Columns

#### Define
Drop 'Time Code' column from data_df_clean

#### Code

In [75]:
# Drop columns
data_df_clean.drop(axis=1, columns=['Time Code'], inplace=True)

#### Test

In [76]:
# Check column names
data_df_clean.head()

Unnamed: 0,Country Name,Country Code,Time,Access to Clean Fuels and Technologies for cooking (% of total population) [2.1_ACCESS.CFT.TOT],Access to electricity (% of rural population with access) [1.2_ACCESS.ELECTRICITY.RURAL],Access to electricity (% of total population) [1.1_ACCESS.ELECTRICITY.TOT],Access to electricity (% of urban population with access) [1.3_ACCESS.ELECTRICITY.URBAN],Energy intensity level of primary energy (MJ/2011 USD PPP) [6.1_PRIMARY.ENERGY.INTENSITY],Renewable electricity output (GWh) [4.1.2_REN.ELECTRICITY.OUTPUT],Renewable electricity share of total electricity output (%) [4.1_SHARE.RE.IN.ELECTRICITY],Renewable energy consumption (TJ) [3.1_RE.CONSUMPTION],Renewable energy share of TFEC (%) [2.1_SHARE.TOTAL.RE.IN.TFEC],Total electricity output (GWh) [4.1.1_TOTAL.ELECTRICITY.OUTPUT],Total final energy consumption (TFEC) (TJ) [1.1_TOTAL.FINAL.ENERGY.CONSUM]
0,Afghanistan,AFG,1990.0,..,..,0.0099999997764825,52.0369758605957,1.88411277331996,764,67.73,6312.392,15.9245316828932,1128,39639.42002
1,Afghanistan,AFG,1991.0,..,..,0.0099999997764825,53.8098373413086,1.99591306318908,690,67.98,6361.651,17.0364435282942,1015,37341.426275
2,Afghanistan,AFG,1992.0,..,..,0.0099999997764825,55.5825233459473,1.33250174740999,478,67.99,6546.363,26.5216286544368,703,24683.11085
3,Afghanistan,AFG,1993.0,..,..,0.0099999997764825,57.3541793823242,1.76063654184963,475,68.35,7849.649,30.5856670489932,695,25664.468875
4,Afghanistan,AFG,1994.0,..,..,0.0099999997764825,59.1237754821777,2.24561299646338,472,68.7,8305.308,32.7962505504008,687,25323.95582


### Reduce Column Name Lengths

#### Define
Because this is just going to be used in Tableau, remove the descriptors in brackets. The names do not need to be lowercase without spaces.

#### Code

In [77]:
# Rename columns
# Loop through each name
for name in data_df_clean.columns[2:]:
    # Take initial portion of name, lowercase, replace spaces with underscores
    new_name = name.split('[')[0].strip()
    # Replace name in data_df_clean
    data_df_clean.rename(columns={name: new_name}, index=str, inplace=True)

# Change Time column to Year
data_df_clean.rename(columns={'Time': 'Year'}, index=str, inplace=True)

#### Test

In [78]:
data_df_clean.head()

Unnamed: 0,Country Name,Country Code,Year,Access to Clean Fuels and Technologies for cooking (% of total population),Access to electricity (% of rural population with access),Access to electricity (% of total population),Access to electricity (% of urban population with access),Energy intensity level of primary energy (MJ/2011 USD PPP),Renewable electricity output (GWh),Renewable electricity share of total electricity output (%),Renewable energy consumption (TJ),Renewable energy share of TFEC (%),Total electricity output (GWh),Total final energy consumption (TFEC) (TJ)
0,Afghanistan,AFG,1990.0,..,..,0.0099999997764825,52.0369758605957,1.88411277331996,764,67.73,6312.392,15.9245316828932,1128,39639.42002
1,Afghanistan,AFG,1991.0,..,..,0.0099999997764825,53.8098373413086,1.99591306318908,690,67.98,6361.651,17.0364435282942,1015,37341.426275
2,Afghanistan,AFG,1992.0,..,..,0.0099999997764825,55.5825233459473,1.33250174740999,478,67.99,6546.363,26.5216286544368,703,24683.11085
3,Afghanistan,AFG,1993.0,..,..,0.0099999997764825,57.3541793823242,1.76063654184963,475,68.35,7849.649,30.5856670489932,695,25664.468875
4,Afghanistan,AFG,1994.0,..,..,0.0099999997764825,59.1237754821777,2.24561299646338,472,68.7,8305.308,32.7962505504008,687,25323.95582


The column names are very long, but I think this is the best way to describe what the column/units are.

### Change null values to nan
#### Define

- Change null values ('..') to null values

#### Code

In [79]:
# Replace '..' values with nan
data_df_clean.replace('..',np.nan, inplace=True)

In [80]:
data_df_clean.head()

Unnamed: 0,Country Name,Country Code,Year,Access to Clean Fuels and Technologies for cooking (% of total population),Access to electricity (% of rural population with access),Access to electricity (% of total population),Access to electricity (% of urban population with access),Energy intensity level of primary energy (MJ/2011 USD PPP),Renewable electricity output (GWh),Renewable electricity share of total electricity output (%),Renewable energy consumption (TJ),Renewable energy share of TFEC (%),Total electricity output (GWh),Total final energy consumption (TFEC) (TJ)
0,Afghanistan,AFG,1990.0,,,0.0099999997764825,52.0369758605957,1.88411277331996,764,67.73,6312.392,15.9245316828932,1128,39639.42002
1,Afghanistan,AFG,1991.0,,,0.0099999997764825,53.8098373413086,1.99591306318908,690,67.98,6361.651,17.0364435282942,1015,37341.426275
2,Afghanistan,AFG,1992.0,,,0.0099999997764825,55.5825233459473,1.33250174740999,478,67.99,6546.363,26.5216286544368,703,24683.11085
3,Afghanistan,AFG,1993.0,,,0.0099999997764825,57.3541793823242,1.76063654184963,475,68.35,7849.649,30.5856670489932,695,25664.468875
4,Afghanistan,AFG,1994.0,,,0.0099999997764825,59.1237754821777,2.24561299646338,472,68.7,8305.308,32.7962505504008,687,25323.95582


#### Test

In [81]:
# Check value counts and null values of different columns
data_df_clean.iloc[:,12].value_counts()
data_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6998 entries, 0 to 6997
Data columns (total 14 columns):
Country Name                                                                  6995 non-null object
Country Code                                                                  6993 non-null object
Year                                                                          6993 non-null float64
Access to Clean Fuels and Technologies for cooking (% of total population)    3230 non-null object
Access to electricity (% of rural population with access)                     5129 non-null object
Access to electricity (% of total population)                                 5770 non-null object
Access to electricity (% of urban population with access)                     5771 non-null object
Energy intensity level of primary energy (MJ/2011 USD PPP)                    4947 non-null object
Renewable electricity output (GWh)                                            5514 non-null object
Renewa

### Drop Rows that aren't Records

#### Define
Drop last few rows that are not really records

#### Code

In [82]:
# Delete rows 6993 to 6997
data_df_clean.drop(axis=0, index=['6993','6994','6995','6996','6997'], inplace=True)

#### Test

In [83]:
data_df_clean.tail()

Unnamed: 0,Country Name,Country Code,Year,Access to Clean Fuels and Technologies for cooking (% of total population),Access to electricity (% of rural population with access),Access to electricity (% of total population),Access to electricity (% of urban population with access),Energy intensity level of primary energy (MJ/2011 USD PPP),Renewable electricity output (GWh),Renewable electricity share of total electricity output (%),Renewable energy consumption (TJ),Renewable energy share of TFEC (%),Total electricity output (GWh),Total final energy consumption (TFEC) (TJ)
6988,Zimbabwe,ZWE,2012.0,30.02,12.981644179652,36.7288780212402,85.306770324707,18.4372464925512,5558.0,60.39,304744.9,78.0197020630944,9203.0,390599.927892
6989,Zimbabwe,ZWE,2013.0,29.88,13.6699177131116,37.0768127441406,85.3514785766602,18.1529665775605,5162.0,53.94,311169.4,79.5610938290255,9570.0,391107.493656
6990,Zimbabwe,ZWE,2014.0,29.63,9.8,32.3,83.4,15.7495813871173,5575.0,55.62,320293.3,81.0492915007339,10023.0,395183.343456
6991,Zimbabwe,ZWE,2015.0,29.36,9.7,33.7,81.2,15.8038970473848,,,324422.6,81.797808552401,,396615.270924
6992,Zimbabwe,ZWE,2016.0,29.05,15.575583712267,38.1451377868652,85.5001602172852,,,,,,,


### Check Null Values

#### Define
Check columns with null values

#### Code

In [84]:
# Find columns with null values
data_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6993 entries, 0 to 6992
Data columns (total 14 columns):
Country Name                                                                  6993 non-null object
Country Code                                                                  6993 non-null object
Year                                                                          6993 non-null float64
Access to Clean Fuels and Technologies for cooking (% of total population)    3230 non-null object
Access to electricity (% of rural population with access)                     5129 non-null object
Access to electricity (% of total population)                                 5770 non-null object
Access to electricity (% of urban population with access)                     5771 non-null object
Energy intensity level of primary energy (MJ/2011 USD PPP)                    4947 non-null object
Renewable electricity output (GWh)                                            5514 non-null object
Renewa

#### Test

It looks like the columns with null values are just missing data. There is no way for me to determine the missing data, so I will just deal with these during the EDA.

### Change Columns to Correct Datatypes

#### Define
Check datatypes and change to correct datatypes

#### Code

In [85]:
data_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6993 entries, 0 to 6992
Data columns (total 14 columns):
Country Name                                                                  6993 non-null object
Country Code                                                                  6993 non-null object
Year                                                                          6993 non-null float64
Access to Clean Fuels and Technologies for cooking (% of total population)    3230 non-null object
Access to electricity (% of rural population with access)                     5129 non-null object
Access to electricity (% of total population)                                 5770 non-null object
Access to electricity (% of urban population with access)                     5771 non-null object
Energy intensity level of primary energy (MJ/2011 USD PPP)                    4947 non-null object
Renewable electricity output (GWh)                                            5514 non-null object
Renewa

In [94]:
# Change index datatype to int
data_df_clean.index = data_df_clean.index.map(int)
# Change year datatype to int
data_df_clean.Year = data_df_clean.Year.astype(int)
# Change most columns to float
data_df_clean.iloc[:,3:] = data_df_clean.iloc[:,3:].astype(float)
# Round decimals to 2 places
data_df_clean = data_df_clean.round(2)

#### Test

In [96]:
data_df_clean.head()

Unnamed: 0,Country Name,Country Code,Year,Access to Clean Fuels and Technologies for cooking (% of total population),Access to electricity (% of rural population with access),Access to electricity (% of total population),Access to electricity (% of urban population with access),Energy intensity level of primary energy (MJ/2011 USD PPP),Renewable electricity output (GWh),Renewable electricity share of total electricity output (%),Renewable energy consumption (TJ),Renewable energy share of TFEC (%),Total electricity output (GWh),Total final energy consumption (TFEC) (TJ)
0,Afghanistan,AFG,1990,,,0.01,52.04,1.88,764.0,67.73,6312.39,15.92,1128.0,39639.42
1,Afghanistan,AFG,1991,,,0.01,53.81,2.0,690.0,67.98,6361.65,17.04,1015.0,37341.43
2,Afghanistan,AFG,1992,,,0.01,55.58,1.33,478.0,67.99,6546.36,26.52,703.0,24683.11
3,Afghanistan,AFG,1993,,,0.01,57.35,1.76,475.0,68.35,7849.65,30.59,695.0,25664.47
4,Afghanistan,AFG,1994,,,0.01,59.12,2.25,472.0,68.7,8305.31,32.8,687.0,25323.96


In [97]:
data_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6993 entries, 0 to 6992
Data columns (total 14 columns):
Country Name                                                                  6993 non-null object
Country Code                                                                  6993 non-null object
Year                                                                          6993 non-null int64
Access to Clean Fuels and Technologies for cooking (% of total population)    3230 non-null float64
Access to electricity (% of rural population with access)                     5129 non-null float64
Access to electricity (% of total population)                                 5770 non-null float64
Access to electricity (% of urban population with access)                     5771 non-null float64
Energy intensity level of primary energy (MJ/2011 USD PPP)                    4947 non-null float64
Renewable electricity output (GWh)                                            5514 non-null float

All datatypes are now correct.

### Make sure no impossible percentages

#### Define
Check min, max values for percentage columns

#### Code

In [98]:
# Check if any percentage values > 100 or < 0
(data_df_clean.iloc[:,np.r_[2:6,8,10]] > 100).sum() + (data_df_clean.iloc[:,np.r_[2:6,8,10]] < 0).sum()

Year                                                                          6993
Access to Clean Fuels and Technologies for cooking (% of total population)       0
Access to electricity (% of rural population with access)                        0
Access to electricity (% of total population)                                    0
Renewable electricity output (GWh)                                            3365
Renewable energy consumption (TJ)                                             4838
dtype: int64

#### Test

Per above, there are none.

### Check Duplicates

#### Define
Check duplicate records and make sure they are correct

#### Code

In [99]:
# Check duplciates
data_df_clean.duplicated().sum()

0

#### Test
Per above, there are no duplicates

## Export Data

In [100]:
# Export data
data_df_clean.to_csv('data/clean/energy.csv', index=False)

In [101]:
# Check that data was exported correctly
pd.read_csv('data/clean/energy.csv').info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6993 entries, 0 to 6992
Data columns (total 14 columns):
Country Name                                                                  6993 non-null object
Country Code                                                                  6993 non-null object
Year                                                                          6993 non-null int64
Access to Clean Fuels and Technologies for cooking (% of total population)    3230 non-null float64
Access to electricity (% of rural population with access)                     5129 non-null float64
Access to electricity (% of total population)                                 5770 non-null float64
Access to electricity (% of urban population with access)                     5771 non-null float64
Energy intensity level of primary energy (MJ/2011 USD PPP)                    4947 non-null float64
Renewable electricity output (GWh)                                            5514 non-null float

In [113]:
data_df_clean.query('Year == 1990').info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 259 entries, 0 to 6966
Data columns (total 14 columns):
Country Name                                                                  259 non-null object
Country Code                                                                  259 non-null object
Year                                                                          259 non-null int64
Access to Clean Fuels and Technologies for cooking (% of total population)    0 non-null float64
Access to electricity (% of rural population with access)                     164 non-null float64
Access to electricity (% of total population)                                 207 non-null float64
Access to electricity (% of urban population with access)                     207 non-null float64
Energy intensity level of primary energy (MJ/2011 USD PPP)                    184 non-null float64
Renewable electricity output (GWh)                                            210 non-null float64
Renewable

In [None]:
data_df_clean