# Global Powerplants README
##### A data repository on Global Powerplant Data in the United States.

## **Table of Contents**
### 1. Getting Started
### 2. Importing Data
### 3. Using Your Data
### 4. Exporting Data
## *Getting Started*

### 1. Start by creating your Colab notebook and importing pandas as "# prompt: import pandas as pd"
### 2. Choose your data and download it to your computer as a cvs. file
## *Importing Data*
### 1. To get your data into Colab you can either mount you Google Drive where the file is stored or uploaded it to your "Contents" in Colab.
### 2. Click on the three dots beside the folder labeled "Content" and click "Upload new file"
### 3. To import the data into your notebook you import is as df=pd.read_csv("/content/global_power_plant_database.csv")
### 4. Your data is now imported into your notebook.
## *Using your Data*
### 1. Get familar with the data you are using
### 2. Codes like df. size, df.shape, df. columns, and df. dtypes can help you familarize yourself with the data.
### 3. Choose what parts of the data you are wanting to use.
## *Exporting Data*
### 1. Create a new data set of the data you picked out using the name you want it to be and the location
 "..."_subset = df.iloc[26510:26610, :]
### 2. Check to make sure your file is there, df.iloc[26510:26610, :]
### 3. Download your data set as csv. to the location, where you'd like it to go.
   "..."_subset.to_csv('/content/data/USA_df.csv')
### 4. You can check and make sure you file has downloaded by running the code "!ls /content/data/USA_df.csv".
### 5. Once it has been exported to your location you now have your new data file.

# Getting Started

Begin by importing pandas package using the following command:

In [1]:
# prompt: import pandas as pd

import pandas as pd


Your file should be downloaded as a csv file. Store this file in your Colab notebook under the content folder.

# Importing Data

To create your dataframe object, define your object as df by executing the **pd.read_csv()** function on your selected data file by inserting the correct file path into the parathenses.

# Exploring Data

To start, you should explore some of the basic attributes within your data. These attributes contain **values** that will give you information about your dataframe to help guide your interaction with your dataframe.
##### To access these attributes run the following code: **df.(attribute name)**
###### Some examples include:

In [9]:
df = pd.read_csv("/content/global_power_plant_database.csv")

In [10]:
df.shape

(29910, 24)

In [11]:
df.size

717840

In [12]:
df.columns

Index(['country', 'country_long', 'name', 'gppd_idnr', 'capacity_mw',
       'latitude', 'longitude', 'primary_fuel', 'other_fuel1', 'other_fuel2',
       'other_fuel3', 'commissioning_year', 'owner', 'source', 'url',
       'geolocation_source', 'wepp_id', 'year_of_capacity_data',
       'generation_gwh_2013', 'generation_gwh_2014', 'generation_gwh_2015',
       'generation_gwh_2016', 'generation_gwh_2017',
       'estimated_generation_gwh'],
      dtype='object')

In [13]:
df.dtypes

country                      object
country_long                 object
name                         object
gppd_idnr                    object
capacity_mw                 float64
latitude                    float64
longitude                   float64
primary_fuel                 object
other_fuel1                  object
other_fuel2                  object
other_fuel3                  object
commissioning_year          float64
owner                        object
source                       object
url                          object
geolocation_source           object
wepp_id                      object
year_of_capacity_data       float64
generation_gwh_2013         float64
generation_gwh_2014         float64
generation_gwh_2015         float64
generation_gwh_2016         float64
generation_gwh_2017         float64
estimated_generation_gwh    float64
dtype: object

In [14]:
df[0:3]

Unnamed: 0,country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,...,url,geolocation_source,wepp_id,year_of_capacity_data,generation_gwh_2013,generation_gwh_2014,generation_gwh_2015,generation_gwh_2016,generation_gwh_2017,estimated_generation_gwh
0,AFG,Afghanistan,Kajaki Hydroelectric Power Plant Afghanistan,GEODB0040538,33.0,32.322,65.119,Hydro,,,...,http://globalenergyobservatory.org,GEODB,1009793,2017.0,,,,,,
1,AFG,Afghanistan,Mahipar Hydroelectric Power Plant Afghanistan,GEODB0040541,66.0,34.556,69.4787,Hydro,,,...,http://globalenergyobservatory.org,GEODB,1009795,2017.0,,,,,,
2,AFG,Afghanistan,Naghlu Dam Hydroelectric Power Plant Afghanistan,GEODB0040534,100.0,34.641,69.717,Hydro,,,...,http://globalenergyobservatory.org,GEODB,1009797,2017.0,,,,,,


In [15]:
df ["country"][5:10]

5    AFG
6    AFG
7    ALB
8    ALB
9    ALB
Name: country, dtype: object

In [16]:
df.iloc[3,4]

11.55

In [17]:
df.iloc[:,4] # All rows of column 4

0         33.00
1         66.00
2        100.00
3         11.55
4         42.00
          ...  
29905     50.00
29906     20.00
29907    108.00
29908    920.00
29909    750.00
Name: capacity_mw, Length: 29910, dtype: float64

To get more familiar with working with your dataframes you can certain methods. These methods are going to narrow down your dataframe searches to look at the first five or last five rows in your dataset. You can do this with **.head()** and **.tail()** methods.

In [18]:
df.loc[:,["country","source","owner"]].head()

Unnamed: 0,country,source,owner
0,AFG,GEODB,
1,AFG,GEODB,
2,AFG,GEODB,
3,AFG,GEODB,
4,AFG,GEODB,


In [19]:
df["country"]=="AFG"

0         True
1         True
2         True
3         True
4         True
         ...  
29905    False
29906    False
29907    False
29908    False
29909    False
Name: country, Length: 29910, dtype: bool

In [20]:
df["country"][20:30]

20    DZA
21    DZA
22    DZA
23    DZA
24    DZA
25    DZA
26    DZA
27    DZA
28    DZA
29    DZA
Name: country, dtype: object

In [21]:
df["country"][50:60]

50    AGO
51    AGO
52    AGO
53    AGO
54    AGO
55    AGO
56    AGO
57    AGO
58    AGO
59    AGO
Name: country, dtype: object

In [22]:
df["country_long"][85:90]

85    Argentina
86    Argentina
87    Argentina
88    Argentina
89    Argentina
Name: country_long, dtype: object

In [23]:
df["country_long"][200:210]

200    Argentina
201    Argentina
202    Argentina
203    Argentina
204    Argentina
205    Argentina
206    Argentina
207    Argentina
208    Argentina
209    Argentina
Name: country_long, dtype: object

In [24]:
df["country_long"][26510:26620]

26510    United States of America
26511    United States of America
26512    United States of America
26513    United States of America
26514    United States of America
                   ...           
26615    United States of America
26616    United States of America
26617    United States of America
26618    United States of America
26619    United States of America
Name: country_long, Length: 110, dtype: object

### Creating a Subset

To take a smaller portion of your larger dataset you can create a subset. This will help you to decrease the size of your dataset and only have the information you need. You can do this by running the following: ()_subset = pd.DataFrame()

In [25]:
USA_subset = df.iloc[26510:26610, :]

In [26]:
new_df = df.iloc[26510:26610, :]

In [27]:
df.iloc[26510:26610, :]

Unnamed: 0,country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,...,url,geolocation_source,wepp_id,year_of_capacity_data,generation_gwh_2013,generation_gwh_2014,generation_gwh_2015,generation_gwh_2016,generation_gwh_2017,estimated_generation_gwh
26510,USA,United States of America,Outback Solar At Christmas Valley,USA0058131,4.4,43.2369,-120.4900,Solar,,,...,http://www.eia.gov/electricity/data/browser/,U.S. Energy Information Administration,67009,2017.0,10.705,10.460,10.229,9.439,9.9030,
26511,USA,United States of America,Overall Road Station,USA0056600,8.0,37.7608,-90.4526,Oil,,,...,http://www.eia.gov/electricity/data/browser/,U.S. Energy Information Administration,,2017.0,0.029,0.000,0.000,0.030,0.0370,
26512,USA,United States of America,Owatonna,USA0002003,19.0,44.0833,-93.2300,Gas,Oil,,...,http://www.eia.gov/electricity/data/browser/,U.S. Energy Information Administration,29275,2017.0,0.000,0.000,0.000,0.000,1.0486,
26513,USA,United States of America,Owen Solar,USA0058742,5.0,35.5861,-81.3153,Solar,,,...,http://www.eia.gov/electricity/data/browser/,U.S. Energy Information Administration,,2017.0,,,,6.051,9.6840,2.229638
26514,USA,United States of America,Owens Corning Headquarters,USA0060038,2.1,41.6447,-83.5364,Solar,,,...,http://www.eia.gov/electricity/data/browser/,U.S. Energy Information Administration,70330,2017.0,,,0.129,1.810,1.7210,0.936448
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26605,USA,United States of America,Paloma Solar,USA0057562,17.6,33.0211,-112.6614,Solar,,,...,http://www.eia.gov/electricity/data/browser/,U.S. Energy Information Administration,62439,2017.0,41.614,40.298,39.257,37.662,38.5920,
26606,USA,United States of America,Palomar Energy,USA0055985,559.0,33.1197,-117.1178,Gas,,,...,http://www.eia.gov/electricity/data/browser/,U.S. Energy Information Administration,49516,2017.0,3773.689,2598.647,3002.429,2299.052,2517.1370,
26607,USA,United States of America,Palouse,USA0057530,105.3,47.1558,-117.3644,Wind,,,...,http://www.eia.gov/electricity/data/browser/,U.S. Energy Information Administration,63825,2017.0,297.029,335.282,293.563,349.771,300.3800,
26608,USA,United States of America,Pamlico Partners Solar,USA0058799,5.0,35.5011,-76.8700,Solar,,,...,http://www.eia.gov/electricity/data/browser/,U.S. Energy Information Administration,,2017.0,0.000,8.285,9.356,9.444,9.5400,


In [28]:
USA_subset = pd.DataFrame(USA_subset)

### Exporting your Data

Now that you have your final dataset with the information you need, you will want to export this dataset as a csv. file. First, you can run the code: **!mkdir /content/data** to make sure your new dataset exists. If that checks off then you can begin to export by:**()_subset.to_csv()**

In [29]:
!mkdir /content/data

mkdir: cannot create directory ‘/content/data’: File exists


In [30]:
USA_subset.to_csv('/content/data/USA_df-2.csv')

In [31]:
!ls /content/data/USA_df.csv

ls: cannot access '/content/data/USA_df.csv': No such file or directory
