# **Problem Statement**
    
Tracking proposed generating projects in the energy sector in the United States is a constant challenge, as it's constantly evolving and difficult to track down a reliable source.
    
The Energy Information Administration (or "EIA") publishes, through its annual survey Form EIA-860, data  about existing and planned generators. The planned generators dataset in this survey could be used to identify and track the evolution of proposed generating projecs. This information is publicly available on the EIA's website: [Form-860](https://www.eia.gov/electricity/data/eia860
"EIA Form-860")

Isolating this information on a regular basis can be tedious, time consuming, and if done manually, can result in datasets that contain errors (i.e. typos, etc.).
    
This <span style="color:blue"> Python project </span> will look to address this manual process with code that replaces as many manual steps (or processes) as possible to ensure, timely and accurate results. In addition, data quality permitting, of course, a series of analytics might be explored and/or derived using Pandas seen in class.
    
## Success Criteria

Ideally, if successful, this python project and its supporting analytics should expedite the process of getting the appropriate datasets from the EIA and shed some light on the evolution (and composition) of planned generation projects in the energy sector in the United States.

## Significant Risks

There are several significant risks to this project, including but not limited to the following:

* Dealing with missing or incomplete datasets could affect results (*i.e. dealing with "NaN", etc.*);
* Overwriting or deleting the source datasets without knowing;

## Limitations

There are some limitations to this project, including but not limited to the following:

* Dataset(s) chosen might be scaled back to allow for easier analytics (*i.e. will only use 2018 and 2017 data, for comparison purposes)*;
* Dataset(s) chosen might be adapted to fit data formats seen in Class;
* Results derived from the dataset(s) can only be as good as dataset(s) themselves (*i.e. "garbage in, garbage out"*;

*Mitigation Measures*

* In order to mitigate some of the limitations outlined above, continuous inspection of data samples for quality and completness should be performed as steps are performed.

----



## Dataset(s):

As mentioned above, for this project we will limit the scope of our analysis to two datasets, as follows:
* List of 2017 Proposed Generation projects, and
* List of 2018 Proposed Generation projects

In [36]:
# TASK: Load each dataset you plan to used for your Project into a Panda dataframe and print it out.

In [34]:
# Importing pandas (and plots) package, and ensuring plots appear inside the notebook.

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [33]:
# Setting Data directory.

DATA_DIR = "/Users/marcelcouturier/Documents/Vitol/PythonTraining/general_assembly/marcel_couturier_project/data"

In [37]:
# Creating the first dataset from the "proposed_gen_2017.csv" file.

proposed_gen_2017_path = f"{DATA_DIR}/proposed_gen_2017.csv"
proposed_gen_2017 = pd.read_csv(proposed_gen_2017_path)

In [38]:
# Printing a sample of the first dataset.

proposed_gen_2017

Unnamed: 0,Utility ID,Utility Name,Plant Code,Plant Name,State,County,Generator ID,Technology,Prime Mover,Unit Code,...,Stoker Technology?,Other Combustion Technology?,Subcritical Technology?,Supercritical Technology?,Ultrasupercritical Technology?,Solid Fuel Gasification System?,Carbon Capture Technology?,Multiple Fuels?,Switch Between Oil and Natural Gas?,Cofire Fuels?
0,803,Arizona Public Service Co,116.0,Ocotillo,AZ,Maricopa,GT3,Natural Gas Fired Combustion Turbine,GT,,...,,,,,,,,N,,
1,803,Arizona Public Service Co,116.0,Ocotillo,AZ,Maricopa,GT4,Natural Gas Fired Combustion Turbine,GT,,...,,,,,,,,N,,
2,803,Arizona Public Service Co,116.0,Ocotillo,AZ,Maricopa,GT5,Natural Gas Fired Combustion Turbine,GT,,...,,,,,,,,N,,
3,803,Arizona Public Service Co,116.0,Ocotillo,AZ,Maricopa,GT6,Natural Gas Fired Combustion Turbine,GT,,...,,,,,,,,N,,
4,803,Arizona Public Service Co,116.0,Ocotillo,AZ,Maricopa,GT7,Natural Gas Fired Combustion Turbine,GT,,...,,,,,,,,N,,
5,14354,PacifiCorp,299.0,Blundell,UT,Beaver,3,Geothermal,ST,,...,,,,,,N,,N,,
6,22148,AES Alamitos LLC,315.0,AES Alamitos LLC,CA,Los Angeles,1A,Natural Gas Fired Combined Cycle,CT,,...,,,N,,,N,N,N,N,N
7,22148,AES Alamitos LLC,315.0,AES Alamitos LLC,CA,Los Angeles,1B,Natural Gas Fired Combined Cycle,CT,,...,,,,,,,,N,N,N
8,22148,AES Alamitos LLC,315.0,AES Alamitos LLC,CA,Los Angeles,1S,Natural Gas Fired Combined Cycle,CA,,...,,,,,,,,N,N,N
9,23693,AES Huntington Beach LLC,335.0,AES Huntington Beach LLC,CA,Orange,1A,Natural Gas Fired Combined Cycle,CT,,...,,,,,,,,N,,


In [39]:
# Creating the second dataset from the "proposed_gen_2018.csv" file.

proposed_gen_2018_path = f"{DATA_DIR}/proposed_gen_2018.csv"
proposed_gen_2018 = pd.read_csv(proposed_gen_2018_path)

In [40]:
# Printing a sample of the first dataset.

proposed_gen_2018

Unnamed: 0,Utility ID,Utility Name,Plant Code,Plant Name,State,County,Generator ID,Technology,Prime Mover,Unit Code,...,Stoker Technology?,Other Combustion Technology?,Subcritical Technology?,Supercritical Technology?,Ultrasupercritical Technology?,Solid Fuel Gasification System?,Carbon Capture Technology?,Multiple Fuels?,Switch Between Oil and Natural Gas?,Cofire Fuels?
0,803,Arizona Public Service Co,116.0,Ocotillo,AZ,Maricopa,GT3,Natural Gas Fired Combustion Turbine,GT,,...,,,,,,,,N,,
1,803,Arizona Public Service Co,116.0,Ocotillo,AZ,Maricopa,GT4,Natural Gas Fired Combustion Turbine,GT,,...,,,,,,,,N,,
2,803,Arizona Public Service Co,116.0,Ocotillo,AZ,Maricopa,GT5,Natural Gas Fired Combustion Turbine,GT,,...,,,,,,,,N,,
3,803,Arizona Public Service Co,116.0,Ocotillo,AZ,Maricopa,GT6,Natural Gas Fired Combustion Turbine,GT,,...,,,,,,,,N,,
4,803,Arizona Public Service Co,116.0,Ocotillo,AZ,Maricopa,GT7,Natural Gas Fired Combustion Turbine,GT,,...,,,,,,,,N,,
5,24211,Tucson Electric Power Co,126.0,H Wilson Sundt Generating Station,AZ,Pima,RIC1,Natural Gas Internal Combustion Engine,IC,,...,,,,,,,,N,,
6,24211,Tucson Electric Power Co,126.0,H Wilson Sundt Generating Station,AZ,Pima,RIC10,Natural Gas Internal Combustion Engine,IC,,...,,,,,,,,N,,
7,24211,Tucson Electric Power Co,126.0,H Wilson Sundt Generating Station,AZ,Pima,RIC2,Natural Gas Internal Combustion Engine,IC,,...,,,,,,,,N,,
8,24211,Tucson Electric Power Co,126.0,H Wilson Sundt Generating Station,AZ,Pima,RIC3,Natural Gas Internal Combustion Engine,IC,,...,,,,,,,,N,,
9,24211,Tucson Electric Power Co,126.0,H Wilson Sundt Generating Station,AZ,Pima,RIC4,Natural Gas Internal Combustion Engine,IC,,...,,,,,,,,N,,


In [41]:
# Viewing the size of the datasets.

print(proposed_gen_2017.shape)
print(proposed_gen_2018.shape)

(1285, 47)
(1471, 47)


In [42]:
# Viewing the Index of the first dataset.

proposed_gen_2017.columns

Index(['Utility ID', 'Utility Name', 'Plant Code', 'Plant Name', 'State',
       'County', 'Generator ID', 'Technology', 'Prime Mover', 'Unit Code',
       'Ownership', 'Duct Burners',
       'Can Bypass Heat Recovery Steam Generator?',
       'RTO/ISO LMP Node Designation',
       'RTO/ISO Location Designation for Reporting Wholesale Sales Data to FERC',
       'Nameplate Capacity (MW)', 'Nameplate Power Factor',
       'Summer Capacity (MW)', 'Winter Capacity (MW)', 'Status',
       'Effective Month', 'Effective Year', 'Current Month', 'Current Year',
       'Associated with Combined Heat and Power System', 'Sector Name',
       'Sector', 'Previously Canceled', 'Energy Source 1', 'Energy Source 2',
       'Energy Source 3', 'Energy Source 4', 'Energy Source 5',
       'Energy Source 6', 'Turbines or Hydrokinetic Buoys',
       'Fluidized Bed Technology?', 'Pulverized Coal Technology?',
       'Stoker Technology?', 'Other Combustion Technology?',
       'Subcritical Technology?', 'Sup

In [43]:
# Viewing the Index of the second dataset.

proposed_gen_2018.columns

Index(['Utility ID', 'Utility Name', 'Plant Code', 'Plant Name', 'State',
       'County', 'Generator ID', 'Technology', 'Prime Mover', 'Unit Code',
       'Ownership', 'Duct Burners',
       'Can Bypass Heat Recovery Steam Generator?',
       'RTO/ISO LMP Node Designation',
       'RTO/ISO Location Designation for Reporting Wholesale Sales Data to FERC',
       'Nameplate Capacity (MW)', 'Nameplate Power Factor',
       'Summer Capacity (MW)', 'Winter Capacity (MW)', 'Status',
       'Effective Month', 'Effective Year', 'Current Month', 'Current Year',
       'Associated with Combined Heat and Power System', 'Sector Name',
       'Sector', 'Previously Canceled', 'Energy Source 1', 'Energy Source 2',
       'Energy Source 3', 'Energy Source 4', 'Energy Source 5',
       'Energy Source 6', 'Turbines or Hydrokinetic Buoys',
       'Fluidized Bed Technology?', 'Pulverized Coal Technology?',
       'Stoker Technology?', 'Other Combustion Technology?',
       'Subcritical Technology?', 'Sup

In [48]:
# Viewing the data type of the first dataset.

proposed_gen_2017.dtypes

Utility ID                                                                  object
Utility Name                                                                object
Plant Code                                                                 float64
Plant Name                                                                  object
State                                                                       object
County                                                                      object
Generator ID                                                                object
Technology                                                                  object
Prime Mover                                                                 object
Unit Code                                                                   object
Ownership                                                                   object
Duct Burners                                                                object
Can 

In [49]:
# Viewing the data type of the second dataset.

proposed_gen_2018.dtypes

Utility ID                                                                  object
Utility Name                                                                object
Plant Code                                                                 float64
Plant Name                                                                  object
State                                                                       object
County                                                                      object
Generator ID                                                                object
Technology                                                                  object
Prime Mover                                                                 object
Unit Code                                                                   object
Ownership                                                                   object
Duct Burners                                                                object
Can 

# To be continued..