# Creating the Political Alignment Index and Merging the Datasets 

First we need to import the following packages to make everything function properly

In [1]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import re
import statsmodels.api as sm
from scipy import stats
import pylab 

pylab.rcParams['figure.figsize'] = (10., 8.) 

  from pandas.core import datetools


## Importing the Election Results Data

Now we need to import the election result data. There were 2 datasets for this, one ranging from 1980-2008 and one for the two most recent elections (2012 and 2016). The 'Outcome' column represents the party that the majority of people voted for in that state, that year, and the 'MajorityVote' column denotes the number of individuals that voted for the winning party in that state. 

In [2]:
election_results_1980_2008_path = "./data/Presedential election results reformatted.csv"
er_80_08 = pd.read_csv(election_results_1980_2008_path)
er_80_08.head()

Unnamed: 0,State,Year,Outcome,MajorityVote
0,Alabama,1980,R,654192
1,Alaska,1980,R,86112
2,Arizona,1980,R,529688
3,Arkansas,1980,R,403164
4,California,1980,R,4524858


In [3]:
election_results_2012_2016_path = "./data/Presdential Election 2012-2016 reformatted.csv"
er_12_16 = pd.read_csv(election_results_2012_2016_path)
er_12_16.head()

Unnamed: 0,State,Year,Outcome,MajorityVote
0,Alabama,2012,R,1255925
1,Alaska,2012,R,164676
2,Arizona,2012,R,1233654
3,Arkansas,2012,R,647744
4,California,2012,D,7854285


The following cell merges the two election result datasets, using the 'pd.concat' function.

In [4]:
us_er = pd.concat([er_80_08, er_12_16])
us_er.tail()

Unnamed: 0,State,Year,Outcome,MajorityVote
95,Virginia,2016,D,1981473
96,Washington,2016,D,1742718
97,West Virginia,2016,R,489371
98,Wisconsin,2016,R,1405284
99,Wyoming,2016,R,174419


The following cell indicates what the dual index looked like for the cleaned population dataset. This was useful for looking at specific years, states or indicators but unfortunately didn't allow for easy plotting or analysis, as Python doesn't do well with manipulating data that has 2 columns set as the indext. Also the 'astype.float' function converts the data in the MjorityVote column from a string value to a decimal number, as this will allow us to perform necessary arithetic functions with the data in the column later on. 

In [5]:
us_er = us_er.set_index(['State', 'Year'])
us_er = us_er.sort_index()
us_er['MajorityVote'] = us_er['MajorityVote'].astype(float)
us_er.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Outcome,MajorityVote
State,Year,Unnamed: 2_level_1,Unnamed: 3_level_1
Alabama,1980,R,654192.0
Alabama,1984,R,872849.0
Alabama,1988,R,815576.0
Alabama,1992,R,804283.0
Alabama,1996,R,749044.0


As explained above the dual column index won't allow us to do much with the data so we're first going to have to reset the index of the election result dataframe

In [6]:

us_er = us_er.reset_index()


##  Merging the population data with the election data and creating the political alignment index

Now we need to import the population data. The csv file that is being imported has already had its index reset like the above example. A new column called 'Unnamed: 0' with a repeat of the row number has been created in the process so it will be deleted using the '.drop' function. 

In [7]:
us_state_pop_path = "./data/us_state_pop__for_analysis.csv"

us_state_pop = pd.read_csv(us_state_pop_path)
us_state_pop.drop(['Unnamed: 0'], axis = 1, inplace = True)

us_state_pop.head()

Unnamed: 0,State,Year,Population
0,Alabama,1970,3444354.0
1,Alabama,1971,3497000.0
2,Alabama,1972,3540000.0
3,Alabama,1973,3581000.0
4,Alabama,1974,3628000.0


To calculate the average political alignment of voters in each state each year, we first need to calculate the proportion of the population that voted for the most popular party in each state and year. To do this, the population data needs to be added for the relevant years to the election results dataframe. This requires setting a new index called 'StateYear' that merges the 'State' and 'Year' column in the population dataframe (shown in the first 3 rows of code in the cell below). As the data is asymmetrical, ideally we don't want a left, right or inner join and the outer join function didn't prove very effective for handling this dataset. Therefore, we found a function '.iterrows', which iterates through the rows and only adds the population data to the election result dataframe if it finds the relevant state-year combination in the population dataframe. The for loop allows the function to be repeated for all rows of the dataframe individually. The propotion of the population that voted for the state's most popular party can then be calculated by dividing the 'MajorityVote' column by the 'Population' column. 

Now the average political alignmnent (represented in the 'Alignment' column) can be calculated. This makes use of a similar 'for loop' and '.iterrow' combination. That loop basically tells it to iterrate through the rows and if it finds that the outcome was democrat (D) that year for that state, then multiply the proportion by -1, and if it was republican then the Alignment value is the same as the proportion value for that state and year. This allows us to categorise voting behaviour on a well defined range, from very democrat (-1) to very republican (+1). 

In [8]:
for row in us_state_pop:
    us_state_pop['StateYear']= us_state_pop['State'] + us_state_pop['Year'].map(str)

us_state_pop = us_state_pop.set_index(['StateYear'])

for x,row in us_er.iterrows():
    election_state_year = us_er['State'] + us_er['Year'].map(str)
    us_er['Population'] = us_state_pop.loc[election_state_year, 'Population'].values
us_er.head()

Unnamed: 0,State,Year,Outcome,MajorityVote,Population
0,Alabama,1980,R,654192.0,3893888.0
1,Alabama,1984,R,872849.0,3951820.0
2,Alabama,1988,R,815576.0,4023844.0
3,Alabama,1992,R,804283.0,4154014.0
4,Alabama,1996,R,749044.0,4331102.0


In [9]:
us_er['Proportion'] = us_er['MajorityVote']/us_er['Population']
us_er['Alignment'] = us_er['Proportion'].apply(abs)

for x, row in us_er.iterrows():
    if row['Outcome'] == 'R': 
        us_er.loc[x,'Alignment']=us_er.loc[x,'Proportion']
    elif row['Outcome'] == 'D':
        us_er.loc[x,'Alignment']=us_er.loc[x,'Proportion']*(-1)
us_er

Unnamed: 0,State,Year,Outcome,MajorityVote,Population,Proportion,Alignment
0,Alabama,1980,R,654192.0,3893888.0,0.168005,0.168005
1,Alabama,1984,R,872849.0,3951820.0,0.220873,0.220873
2,Alabama,1988,R,815576.0,4023844.0,0.202686,0.202686
3,Alabama,1992,R,804283.0,4154014.0,0.193616,0.193616
4,Alabama,1996,R,749044.0,4331102.0,0.172945,0.172945
5,Alabama,2000,R,941173.0,4452173.0,0.211396,0.211396
6,Alabama,2004,D,1176394.0,4530729.0,0.259648,-0.259648
7,Alabama,2008,D,1266546.0,4718206.0,0.268438,-0.268438
8,Alabama,2012,R,1255925.0,4815960.0,0.260784,0.260784
9,Alabama,2016,R,1318255.0,4863300.0,0.271062,0.271062


This has worked great but we want to have election data from every year so we can more easily analyse the relationship between voting behaviour and the different environmental indicators later on. Therefore, we will assume that the voting behaviour doesn't change in between election years (i.e. if a state voted republican in 2004, we will assume that the voting behaviour in that state will be the same in 2005, 2006 and 2007). 

To do this, inserting blank rows is not only hard, but also inefficient, so we will merge the new election result dataframe with the popualtion dataframe, as this has yearly data, thus allowing us to complete the election results for each year that presedential elections didn't occur. This can be done using the same princple of the 'for loop' and '.iterrow' combination but in the opposite way from what was done above: 
Python will iterrate through all the rows of the population dataframe and when there is a match between the election result 'StateYear' and the population 'StateYear', the election result data will be added to the relevant row in the population dataframe. After this, the relevant election result data will be copied for the missing years, using the '.fillna(method='ffill')' function. This function basically assumes the closest value in the column directly above it for empty cells in the dataframe. 

In [10]:
for row in us_er:
    us_er['StateYear']= us_er['State'] + us_er['Year'].map(str)

us_er = us_er.set_index(['StateYear'])

for x,row in us_state_pop.iterrows():
    election_state_year = us_state_pop['State'] + us_state_pop['Year'].map(str)
    us_state_pop['Alignment'] = us_er.loc[election_state_year, 'Alignment'].values
    us_state_pop['Outcome'] = us_er.loc[election_state_year, 'Outcome']
us_state_pop.head()

Unnamed: 0_level_0,State,Year,Population,Alignment,Outcome
StateYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Alabama1970,Alabama,1970,3444354.0,,
Alabama1971,Alabama,1971,3497000.0,,
Alabama1972,Alabama,1972,3540000.0,,
Alabama1973,Alabama,1973,3581000.0,,
Alabama1974,Alabama,1974,3628000.0,,


In [11]:
us_state_pop = us_state_pop.fillna(method='ffill')
us_state_pop

Unnamed: 0_level_0,State,Year,Population,Alignment,Outcome
StateYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Alabama1970,Alabama,1970,3444354.0,,
Alabama1971,Alabama,1971,3497000.0,,
Alabama1972,Alabama,1972,3540000.0,,
Alabama1973,Alabama,1973,3581000.0,,
Alabama1974,Alabama,1974,3628000.0,,
Alabama1975,Alabama,1975,3680000.0,,
Alabama1976,Alabama,1976,3737000.0,,
Alabama1977,Alabama,1977,3783000.0,,
Alabama1978,Alabama,1978,3834000.0,,
Alabama1979,Alabama,1979,3869000.0,,


## Adding the different environmental indicator data to the election result-population dataframe

All of the data will now be merged into one dataframe, as this will make susbsequent data analysis and plotting easier. The per capita water consumption, total carbon dioxide (co2) emissions, renewable energy consumption in the residential sector (RETCB), total energy consumption (TERPB), and total petroleum product consumption (PARCP) will now be added. The co2 emission data, RETCB and PARCP data was not per capita, so this was made per capita by dividing their values by the population columns. 

In [12]:
state_water_use_path = "./data/state_water_use_copy1.csv"
state_water_use = pd.read_csv(state_water_use_path)
state_water_use.head()

Unnamed: 0.1,Unnamed: 0,State,Year,"Domestic self-supplied groundwater withdrawals, fresh, in Mgal/d","Domestic self-supplied groundwater withdrawals, saline, in Mgal/d","Domestic total self-supplied withdrawals, groundwater, in Mgal/d","Domestic self-supplied surface-water withdrawals, fresh, in Mgal/d","Domestic self-supplied surface-water withdrawals, saline, in Mgal/d","Domestic total self-supplied withdrawals, surface water, in Mgal/d","Domestic total self-supplied withdrawals, fresh, in Mgal/d",...,"Domestic deliveries from public supply, in Mgal/d","Domestic total self-supplied withdrawals, saline, in Mgal/d","Domestic per capita use, public-supplied, in gallons/person/day","Domestic total self-supplied withdrawals, in Mgal/d","Domestic total self-supplied withdrawals plus deliveries, in Mgal/d","Domestic consumptive use, fresh, in Mgal/d","Domestic consumptive use, saline, in Mgal/d","Domestic total consumptive use, in Mgal/d","Domestic per capita use, self-supplied, in gallons/person/day","Total domestic water use per capita, gallons/person/day"
0,0,Alabama,1990,27.74,0.0,0.0,0.0,0.0,0.0,27.74,...,368.47,0.0,100.0,0.0,396.21,53.09,0.0,0.0,75.0,175.0
1,1,Alabama,1995,61.9,0.0,61.9,0.0,0.0,0.0,61.9,...,383.31,0.0,112.0,61.9,445.21,88.75,0.0,88.75,75.0,187.0
2,2,Alabama,2000,65.06,0.0,0.0,0.0,0.0,0.0,65.06,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,75.0,75.0
3,3,Alabama,2005,39.12,0.0,0.0,0.0,0.0,0.0,39.12,...,325.94,0.0,81.0,0.0,365.06,0.0,0.0,0.0,75.0,156.0
4,4,Alabama,2010,37.97,0.0,0.0,0.0,0.0,0.0,37.97,...,326.69,0.0,77.0,0.0,364.66,0.0,0.0,0.0,70.0,147.0


In [13]:
for row in state_water_use:
    state_water_use['StateYear']= state_water_use['State'] + state_water_use['Year'].map(str)

state_water_use = state_water_use.set_index(['StateYear'])

for x,row in us_state_pop.iterrows():
    election_state_year = us_state_pop['State'] + us_state_pop['Year'].map(str)
    us_state_pop['Total domestic water use per capita, gallons/person/day'] = state_water_use.loc[election_state_year, 'Total domestic water use per capita, gallons/person/day'].values
pd.options.display.max_rows = 5000

us_state_pop.head()

Unnamed: 0_level_0,State,Year,Population,Alignment,Outcome,"Total domestic water use per capita, gallons/person/day"
StateYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alabama1970,Alabama,1970,3444354.0,,,
Alabama1971,Alabama,1971,3497000.0,,,
Alabama1972,Alabama,1972,3540000.0,,,
Alabama1973,Alabama,1973,3581000.0,,,
Alabama1974,Alabama,1974,3628000.0,,,


In [14]:
co2_emissions_path = "./data/co2_emissions_copy_use.csv"
co2_emissions = pd.read_csv(co2_emissions_path)
co2_emissions.head()

Unnamed: 0,State,Year,Reading
0,Alabama,1980,3.6
1,Alabama,1981,3.4
2,Alabama,1982,3.4
3,Alabama,1983,3.5
4,Alabama,1984,3.3


In [15]:
for row in co2_emissions:
    co2_emissions['StateYear']= co2_emissions['State'] + co2_emissions['Year'].map(str)

co2_emissions = co2_emissions.set_index(['StateYear'])

for x,row in us_state_pop.iterrows():
    election_state_year = us_state_pop['State'] + us_state_pop['Year'].map(str)
    us_state_pop['Residential co2 emissions, million metric tons'] = co2_emissions.loc[election_state_year, 'Reading'].values

us_state_pop

Unnamed: 0_level_0,State,Year,Population,Alignment,Outcome,"Total domestic water use per capita, gallons/person/day","Residential co2 emissions, million metric tons"
StateYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Alabama1970,Alabama,1970,3444354.0,,,,
Alabama1971,Alabama,1971,3497000.0,,,,
Alabama1972,Alabama,1972,3540000.0,,,,
Alabama1973,Alabama,1973,3581000.0,,,,
Alabama1974,Alabama,1974,3628000.0,,,,
Alabama1975,Alabama,1975,3680000.0,,,,
Alabama1976,Alabama,1976,3737000.0,,,,
Alabama1977,Alabama,1977,3783000.0,,,,
Alabama1978,Alabama,1978,3834000.0,,,,
Alabama1979,Alabama,1979,3869000.0,,,,


In [16]:
us_state_pop['Residential co2 emissions, metric tons/person'] = us_state_pop['Residential co2 emissions, million metric tons']*1000000/us_state_pop['Population']
us_state_pop.head()

Unnamed: 0_level_0,State,Year,Population,Alignment,Outcome,"Total domestic water use per capita, gallons/person/day","Residential co2 emissions, million metric tons","Residential co2 emissions, metric tons/person"
StateYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Alabama1970,Alabama,1970,3444354.0,,,,,
Alabama1971,Alabama,1971,3497000.0,,,,,
Alabama1972,Alabama,1972,3540000.0,,,,,
Alabama1973,Alabama,1973,3581000.0,,,,,
Alabama1974,Alabama,1974,3628000.0,,,,,


In [17]:
consumption_btu_csv_path = "./data/consumption_btu_actual_done.csv"
consumption_btu = pd.read_csv(consumption_btu_csv_path)
consumption_btu



Unnamed: 0.1,Unnamed: 0,State,MSN,Year,Reading
0,0,Alaska,ABICB,1970,0.00
1,1,Alaska,ARICB,1970,1817.00
2,2,Alaska,ARTCB,1970,1817.00
3,3,Alaska,ARTXB,1970,1817.00
4,4,Alaska,AVACB,1970,2331.00
5,5,Alaska,AVTCB,1970,2331.00
6,6,Alaska,AVTXB,1970,2331.00
7,7,Alaska,BMTCB,1970,5029.00
8,8,Alaska,CLACB,1970,14.00
9,9,Alaska,CLCCB,1970,183.00


In [18]:
value_list_btu_RETCB = ['RETCB']

consumption_btu_RETCB = consumption_btu[consumption_btu.MSN.isin(value_list_btu_RETCB)]
consumption_btu_RETCB.head()

Unnamed: 0.1,Unnamed: 0,State,MSN,Year,Reading
132,132,Alaska,RETCB,1970,8835.0
323,323,Alabama,RETCB,1970,132471.0
514,514,Arkansas,RETCB,1970,56933.0
705,705,Arizona,RETCB,1970,68919.0
896,896,California,RETCB,1970,521978.0


In [19]:
for row in consumption_btu_RETCB:
    consumption_btu_RETCB['StateYear']= consumption_btu_RETCB['State'] + consumption_btu_RETCB['Year'].map(str)

consumption_btu_RETCB = consumption_btu_RETCB.set_index(['StateYear'])

for x,row in us_state_pop.iterrows():
    election_state_year = us_state_pop['State'] + us_state_pop['Year'].map(str)
    us_state_pop['Renewable energy total consumption, billion btu'] = consumption_btu_RETCB.loc[election_state_year, 'Reading'].values

us_state_pop.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0_level_0,State,Year,Population,Alignment,Outcome,"Total domestic water use per capita, gallons/person/day","Residential co2 emissions, million metric tons","Residential co2 emissions, metric tons/person","Renewable energy total consumption, billion btu"
StateYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama1970,Alabama,1970,3444354.0,,,,,,132471.0
Alabama1971,Alabama,1971,3497000.0,,,,,,158182.0
Alabama1972,Alabama,1972,3540000.0,,,,,,164946.0
Alabama1973,Alabama,1973,3581000.0,,,,,,181706.0
Alabama1974,Alabama,1974,3628000.0,,,,,,166734.0


In [20]:
us_state_pop['Renewable energy total consumption, billion btu/person'] = us_state_pop['Renewable energy total consumption, billion btu']/us_state_pop['Population']
us_state_pop

Unnamed: 0_level_0,State,Year,Population,Alignment,Outcome,"Total domestic water use per capita, gallons/person/day","Residential co2 emissions, million metric tons","Residential co2 emissions, metric tons/person","Renewable energy total consumption, billion btu","Renewable energy total consumption, billion btu/person"
StateYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Alabama1970,Alabama,1970,3444354.0,,,,,,132471.0,0.03846
Alabama1971,Alabama,1971,3497000.0,,,,,,158182.0,0.045234
Alabama1972,Alabama,1972,3540000.0,,,,,,164946.0,0.046595
Alabama1973,Alabama,1973,3581000.0,,,,,,181706.0,0.050742
Alabama1974,Alabama,1974,3628000.0,,,,,,166734.0,0.045958
Alabama1975,Alabama,1975,3680000.0,,,,,,184701.0,0.05019
Alabama1976,Alabama,1976,3737000.0,,,,,,160994.0,0.043081
Alabama1977,Alabama,1977,3783000.0,,,,,,174751.0,0.046194
Alabama1978,Alabama,1978,3834000.0,,,,,,148341.0,0.038691
Alabama1979,Alabama,1979,3869000.0,,,,,,190737.0,0.049299


In [21]:
value_list_btu_TERPB = ['TERPB']

consumption_btu_TERPB = consumption_btu[consumption_btu.MSN.isin(value_list_btu_TERPB)]
consumption_btu_TERPB.head()

Unnamed: 0.1,Unnamed: 0,State,MSN,Year,Reading
160,160,Alaska,TERPB,1970,82.0
351,351,Alabama,TERPB,1970,64.0
542,542,Arkansas,TERPB,1970,75.0
733,733,Arizona,TERPB,1970,50.0
924,924,California,TERPB,1970,52.0


In [22]:
for row in consumption_btu_TERPB:
    consumption_btu_TERPB['StateYear']= consumption_btu_TERPB['State'] + consumption_btu_TERPB['Year'].map(str)

consumption_btu_TERPB = consumption_btu_TERPB.set_index(['StateYear'])

for x,row in us_state_pop.iterrows():
    election_state_year = us_state_pop['State'] + us_state_pop['Year'].map(str)
    us_state_pop['Total energy consumption per capita in the residential sector, Million Btu/person'] = consumption_btu_TERPB.loc[election_state_year, 'Reading'].values

us_state_pop.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0_level_0,State,Year,Population,Alignment,Outcome,"Total domestic water use per capita, gallons/person/day","Residential co2 emissions, million metric tons","Residential co2 emissions, metric tons/person","Renewable energy total consumption, billion btu","Renewable energy total consumption, billion btu/person","Total energy consumption per capita in the residential sector, Million Btu/person"
StateYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Alabama1970,Alabama,1970,3444354.0,,,,,,132471.0,0.03846,64.0
Alabama1971,Alabama,1971,3497000.0,,,,,,158182.0,0.045234,65.0
Alabama1972,Alabama,1972,3540000.0,,,,,,164946.0,0.046595,68.0
Alabama1973,Alabama,1973,3581000.0,,,,,,181706.0,0.050742,71.0
Alabama1974,Alabama,1974,3628000.0,,,,,,166734.0,0.045958,70.0


In [23]:
consumption_phy_csv_path = "./data/consumption_phy_done.csv"
consumption_phy = pd.read_csv(consumption_phy_csv_path)
consumption_phy.head()


Unnamed: 0.1,Unnamed: 0,State,MSN,Year,Reading
0,0,Alaska,ABICP,1970,
1,1,Alaska,ARICP,1970,274.0
2,2,Alaska,ARTCP,1970,274.0
3,3,Alaska,ARTXP,1970,274.0
4,4,Alaska,AVACP,1970,462.0


In [24]:
# PARCP is the all petroleum products consumed by the residential sector
value_list_PARCP = ['PARCP']

consumption_phy_PARCP = consumption_phy[consumption_phy.MSN.isin(value_list_PARCP)]
consumption_phy_PARCP

Unnamed: 0.1,Unnamed: 0,State,MSN,Year,Reading
97,97,Alaska,PARCP,1970,1432.0
227,227,Alabama,PARCP,1970,4456.0
357,357,Arkansas,PARCP,1970,6491.0
487,487,Arizona,PARCP,1970,915.0
617,617,California,PARCP,1970,5182.0
747,747,Colorado,PARCP,1970,3353.0
877,877,Connecticut,PARCP,1970,15388.0
1007,1007,Delaware,PARCP,1970,2755.0
1137,1137,Florida,PARCP,1970,6306.0
1267,1267,Georgia,PARCP,1970,4085.0


In [25]:
for row in consumption_phy_PARCP:
    consumption_phy_PARCP['StateYear']= consumption_phy_PARCP['State'] + consumption_phy_PARCP['Year'].map(str)

consumption_phy_PARCP = consumption_phy_PARCP.set_index(['StateYear'])

for x,row in us_state_pop.iterrows():
    election_state_year = us_state_pop['State'] + us_state_pop['Year'].map(str)
    us_state_pop['All petroleum products consumed by the residential sector, thousand barrels'] = consumption_phy_PARCP.loc[election_state_year, 'Reading'].values

us_state_pop.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0_level_0,State,Year,Population,Alignment,Outcome,"Total domestic water use per capita, gallons/person/day","Residential co2 emissions, million metric tons","Residential co2 emissions, metric tons/person","Renewable energy total consumption, billion btu","Renewable energy total consumption, billion btu/person","Total energy consumption per capita in the residential sector, Million Btu/person","All petroleum products consumed by the residential sector, thousand barrels"
StateYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Alabama1970,Alabama,1970,3444354.0,,,,,,132471.0,0.03846,64.0,4456.0
Alabama1971,Alabama,1971,3497000.0,,,,,,158182.0,0.045234,65.0,4674.0
Alabama1972,Alabama,1972,3540000.0,,,,,,164946.0,0.046595,68.0,5080.0
Alabama1973,Alabama,1973,3581000.0,,,,,,181706.0,0.050742,71.0,4798.0
Alabama1974,Alabama,1974,3628000.0,,,,,,166734.0,0.045958,70.0,3850.0


In [26]:
us_state_pop['All petroleum products consumed by the residential sector, thousand barrels/person'] = us_state_pop['All petroleum products consumed by the residential sector, thousand barrels']/us_state_pop['Population']
us_state_pop

Unnamed: 0_level_0,State,Year,Population,Alignment,Outcome,"Total domestic water use per capita, gallons/person/day","Residential co2 emissions, million metric tons","Residential co2 emissions, metric tons/person","Renewable energy total consumption, billion btu","Renewable energy total consumption, billion btu/person","Total energy consumption per capita in the residential sector, Million Btu/person","All petroleum products consumed by the residential sector, thousand barrels","All petroleum products consumed by the residential sector, thousand barrels/person"
StateYear,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Alabama1970,Alabama,1970,3444354.0,,,,,,132471.0,0.03846,64.0,4456.0,0.001294
Alabama1971,Alabama,1971,3497000.0,,,,,,158182.0,0.045234,65.0,4674.0,0.001337
Alabama1972,Alabama,1972,3540000.0,,,,,,164946.0,0.046595,68.0,5080.0,0.001435
Alabama1973,Alabama,1973,3581000.0,,,,,,181706.0,0.050742,71.0,4798.0,0.00134
Alabama1974,Alabama,1974,3628000.0,,,,,,166734.0,0.045958,70.0,3850.0,0.001061
Alabama1975,Alabama,1975,3680000.0,,,,,,184701.0,0.05019,64.0,3539.0,0.000962
Alabama1976,Alabama,1976,3737000.0,,,,,,160994.0,0.043081,67.0,4039.0,0.001081
Alabama1977,Alabama,1977,3783000.0,,,,,,174751.0,0.046194,70.0,4320.0,0.001142
Alabama1978,Alabama,1978,3834000.0,,,,,,148341.0,0.038691,70.0,3594.0,0.000937
Alabama1979,Alabama,1979,3869000.0,,,,,,190737.0,0.049299,64.0,2247.0,0.000581


In [90]:
us_state_pop_for_analysis_results_wbk_csv = us_state_pop.to_csv('./data/us_state_pop_for_analysis_results_wbk.csv')