# Format Incentive and Policy Data

Now that the core dataset is finished, I need to start incorporating data on key factors. Right now that is DSIRE (incentives and policies) and URDB (utility rates), but I will likely include more if the initial model proves viable. This notebook is focused on the DSIRE dataset. I need to do more research to have an idea of exactly what features I can actually extract from this data, but I want to get the base infrastructure created. Fortunately, the functionality is practically identical to the work I did for EnergyHawk. Given a zip code (ZCTA), what are the programs (incentives and policies) that are offered/enforced? 

This notebook is focused on the initial functionality: finding all programs for a given location, sector, and technology. After I complete some more research I will proceed with gathering features from this data. The DSIRE data is stored in data/dsire/ and is divided across many csv files. Refer to `dsire-files.xlsx` in references/ for an overview of each file and their respective variables. The most recent version of the DSIRE archive along with the data definitions can be downloaded from the DSIRE website [here](http://www.dsireusa.org/resources/database-archives/). I am using the 2022-06 version.

In [1]:
# Import pandas
import pandas as pd

The main file is `program.csv`.

In [2]:
program = pd.read_csv('../data/dsire/program.csv')

In [3]:
program.head()

Unnamed: 0,id,state_id,is_entire_state,implementing_sector_id,program_category_id,program_type_id,created_by_user_id,code,name,updated_ts,...,administrator,fundingsource,budget,start_date,start_date_text,end_date,end_date_text,summary,additional_technologies,fromSir
0,1,39,1,1,2,44,534,ND02R,North Dakota Solar/Wind Easements and Laws,2021-08-02 13:47:11,...,,,,,,,,<p><b>Solar Policy</b></p><p>&#10;&#9;North Da...,,0
1,2,4,1,1,1,32,534,AZ10F,Qualifying Wood Stove Deduction,2021-06-26 02:18:36,...,,,,1994-01-01 05:00:00,1/1/1994,,,<div>&#10;&#9;This incentive allows Arizona ta...,,0
2,3,24,1,1,2,44,534,MD01R,Maryland Solar Easements & Rights Laws,2021-07-20 00:00:00,...,,,,,,,,<p>\r\n\tMaryland has a long-standing law prot...,,0
3,4,42,0,1,2,44,534,OR02R,Oregon Solar and Wind Easements/Rights Laws & ...,2021-07-27 20:09:43,...,,,,,,,,<p><span>Oregon has several laws that protect ...,,0
4,6,2,0,1,2,44,534,AK01R,Alaska Solar Easements,2021-07-19 19:54:03,...,,,,,,,,<p>\r\n\tAlaska&#39;s solar easement provision...,,0


Since my analysis is only looking at PV systems installed between 2010 and 2019, I will need to filter the programs accordingly. For now however, I want to collect all programs for a given ZCTA (zip code). 

According to the DSIRE documentation, all programs are associated with a zipcode. Programs can be implemented at various geographic levels. From federal (most general) to zipcode (most specific). Given a zipcode, I can collect the associated state, county, utility, and city ids and all programs available from those. Each geographic level has it's own file and a matching table used to indicate which programs are offered across each geography. For example, there is a `city` table which includes all cities in the US, along with a `program_city` file which indicates all programs offered at the city level and which city or cities they are offered in. This goes for all geographic levels. Below I read in the zipcode file and all geographic matching tables.

In [4]:
zipcode = pd.read_csv('../data/dsire/zipcode.csv')
program_city = pd.read_csv('../data/dsire/program_city.csv')
program_county = pd.read_csv('../data/dsire/program_county.csv')
program_utility = pd.read_csv('../data/dsire/program_utility.csv')
program_zipcode = pd.read_csv('../data/dsire/program_zipcode.csv')
utility_zipcode = pd.read_csv('../data/dsire/utility_zipcode.csv')

In [5]:
zipcode.head()

Unnamed: 0,id,zipcode,city_id,state_id,county_id,latitude,longitude
0,1,501,1,37,1,41,-73
1,2,544,1,37,1,41,-73
2,3,601,2,45,2,18,-67
3,4,602,3,45,3,18,-67
4,5,603,4,45,4,18,-67


Using the `zipcode` dataframe, I can collect the city, state, and county id for a given zipcode. I can also use the `utility_zipcode` table to select all utilities operating within that zipcode. Then, using the matching tables, find programs offered in that zipcode at each geographic level (or implementing sector as DSIRE calls it). Instead of filter one dataframe at a time, I want to figure out a way to join all of the data together and then collect all the program ids.

In [6]:
zipcode[zipcode['zipcode'] == 60046].merge(utility_zipcode, left_on='id', right_on='zipcode_id', how='left').merge(program_utility, on='utility_id', how='left').\
    merge(program_county, on='county_id', how='left').merge(program_city, on='city_id', how='left').merge(program_zipcode, left_on='id', right_on='zipcode_id', how='left')

  zipcode[zipcode['zipcode'] == 60046].merge(utility_zipcode, left_on='id', right_on='zipcode_id', how='left').merge(program_utility, on='utility_id', how='left').\


Unnamed: 0,id,zipcode,city_id,state_id,county_id,latitude,longitude,utility_id,zipcode_id_x,program_id_x,program_id_y,program_id_x.1,program_id_y.1,zipcode_id_y
0,25826,60046,19023,17,1913,42,-88,563,25826,585,,,,
1,25826,60046,19023,17,1913,42,-88,563,25826,1187,,,,
2,25826,60046,19023,17,1913,42,-88,563,25826,2331,,,,
3,25826,60046,19023,17,1913,42,-88,563,25826,2819,,,,
4,25826,60046,19023,17,1913,42,-88,563,25826,3061,,,,
5,25826,60046,19023,17,1913,42,-88,563,25826,3074,,,,
6,25826,60046,19023,17,1913,42,-88,563,25826,3076,,,,
7,25826,60046,19023,17,1913,42,-88,563,25826,3166,,,,
8,25826,60046,19023,17,1913,42,-88,563,25826,3716,,,,
9,25826,60046,19023,17,1913,42,-88,563,25826,4150,,,,


In [7]:
program_zipcode.head()

Unnamed: 0,program_id,zipcode_id
0,1946,23989
1,4127,7837
2,4127,7840
3,4127,7841
4,4127,7842


This merging method is much simpler than what I used previously, which was to filter each dataframe one by one throughout the entire process. I want to rename some of these columns so it doesn't say `program_id_x`, `program_id_y` etc.

In [8]:
zipcode = zipcode.rename(columns={'id': 'zipcode_id'})
program_city = program_city.rename(columns={'program_id': 'program_id_city'})
program_county = program_county.rename(columns={'program_id': 'program_id_county'})
program_utility = program_utility.rename(columns={'program_id': 'program_id_utility'})
program_zipcode = program_zipcode.rename(columns={'program_id': 'program_id_zipcode'})

Again, starting with the `zipcode` table, I perform an initial filter for a single zipcode, then merge each matching table according to it's corresponding id in `zipcode`.

In [9]:
zipcode[zipcode['zipcode'] == 60046].merge(utility_zipcode, on='zipcode_id', how='left').merge(program_utility, on='utility_id', how='left').\
    merge(program_zipcode, on='zipcode_id', how='left').merge(program_city, on='city_id', how='left').merge(program_county, on='county_id', how='left')

Unnamed: 0,zipcode_id,zipcode,city_id,state_id,county_id,latitude,longitude,utility_id,program_id_utility,program_id_zipcode,program_id_city,program_id_county
0,25826,60046,19023,17,1913,42,-88,563,585,,,
1,25826,60046,19023,17,1913,42,-88,563,1187,,,
2,25826,60046,19023,17,1913,42,-88,563,2331,,,
3,25826,60046,19023,17,1913,42,-88,563,2819,,,
4,25826,60046,19023,17,1913,42,-88,563,3061,,,
5,25826,60046,19023,17,1913,42,-88,563,3074,,,
6,25826,60046,19023,17,1913,42,-88,563,3076,,,
7,25826,60046,19023,17,1913,42,-88,563,3166,,,
8,25826,60046,19023,17,1913,42,-88,563,3716,,,
9,25826,60046,19023,17,1913,42,-88,563,4150,,,


This gives all programs offered at either the zipcode, city, county, or utility level. To get the programs at the state and federal level, I will just need to use main `program` table. For example, the state id for the zipcode above is 17 (Illinois), I can filter the `program` table for `state_id` equal to 17 and `is_entire_state` equal to 1 (indicating program is available to entire state - ie implemented at state level)

In [10]:
program[(program['state_id'] == 17) & (program['is_entire_state'] == 1)]

Unnamed: 0,id,state_id,is_entire_state,implementing_sector_id,program_category_id,program_type_id,created_by_user_id,code,name,updated_ts,...,administrator,fundingsource,budget,start_date,start_date_text,end_date,end_date_text,summary,additional_technologies,fromSir
129,138,17,1,1,1,78,534,IL01F,Special Assessment for Solar Energy Systems,2021-03-12 19:34:30,...,Illinois Department of Commerce and Economic O...,,,,,,,<p>&#10;&#9;Illinois offers a special assessme...,,0
191,201,17,1,1,1,87,534,IL02F,Alternative Energy Bond Fund Program,2003-02-26 00:00:00,...,Illinois Department of Commerce and Economic O...,,,,,,,This grant program funds capital projects of a...,,0
344,355,17,1,7,1,87,534,IL06F,Illinois Clean Energy Community Foundation Grants,2015-12-17 00:00:00,...,Illinois Clean Energy Community Foundation,,,1999-06-30 04:00:00,06/30/1999,,,<p><b><i>Note: For the Renewable Energy Januar...,,0
393,418,17,1,2,1,40,534,IL03F,Chicago - Industry Recruitment of Chicago Spir...,2021-07-13 16:01:48,...,,,,,,,,<div><b>Chicago Spire Solar has closed.</b></d...,,0
406,439,17,1,1,1,88,534,IL07F,Vehicle Conversion Rebate,2004-06-08 00:00:00,...,Illinois EPA,,,1995-01-26 21:59:30,1995,,No expiration date,Illinois' Alternate Fuels Rebate Program provi...,,0
440,486,17,1,1,2,46,534,IL01R,Renewable Energy Resources Trust Fund,2021-07-27 13:43:40,...,,,,,,,,"<p><b>According to <a href=""https://www.ilga.g...",,0
441,487,17,1,1,2,25,534,IL02R,Fuel Mix and Emissions Disclosure,2015-07-09 18:52:27,...,,,,,,,,<p>&#10;&#9;As part of the state's 1997 electr...,,0
510,583,17,1,1,1,87,534,IL04F,DCEO - Solar Energy Incentive Program,2009-08-31 00:00:00,...,Illinois Department of Commerce and Economic O...,Illinois Renewable Energy Resources Trust Fund,,,,,05/01/2009 (current solicitation),<b><i>Note: This program is currently closed d...,,0
511,584,17,1,1,2,38,534,IL04R,Renewable Portfolio Standard,2018-06-28 19:44:06,...,,,,2008-01-01 05:00:00,,,,"<p><em><strong>Note: In December 2016, Illinoi...","Landfill Gas, Anaerobic Digestion, Biodiesel",0
702,861,17,1,1,1,88,534,IL08F,OEM Vehicle Rebate,2004-06-08 00:00:00,...,Illinois EPA,,,1997-01-01 00:00:00,1/1/97,,,The Alternate Fuels Rebate Program provides re...,,0


Federal programs can be found using the `implementing_sector` table. All federal programs are considered state programs as well, meaning `is_entire_state` will be equal to 1 for federal programs. Performing the filter above returns all federal programs and state programs for the given state code. 

This seems like a very good opportunity to use OOP to organize this data. First I would like to finish the initial filtering. I want to a function that returns all programs for a given zipcode (ZCTA)

Returning to the merge, I just want to the unique program ids from the last four columns. Once I isolate the columns I can "flatten" the values into a 1-d numpy array

In [11]:
test_zipcode_ids = zipcode[zipcode['zipcode'] == 60046].merge(utility_zipcode, on='zipcode_id', how='left').merge(program_utility, on='utility_id', how='left').\
    merge(program_zipcode, on='zipcode_id', how='left').merge(program_city, on='city_id', how='left').merge(program_county, on='county_id', how='left').iloc[:, 8:].values.flatten()

test_zipcode_ids

array([ 585.,   nan,   nan,   nan, 1187.,   nan,   nan,   nan, 2331.,
         nan,   nan,   nan, 2819.,   nan,   nan,   nan, 3061.,   nan,
         nan,   nan, 3074.,   nan,   nan,   nan, 3076.,   nan,   nan,
         nan, 3166.,   nan,   nan,   nan, 3716.,   nan,   nan,   nan,
       4150.,   nan,   nan,   nan, 4454.,   nan,   nan,   nan, 5047.,
         nan,   nan,   nan, 5147.,   nan,   nan,   nan, 5152.,   nan,
         nan,   nan, 5173.,   nan,   nan,   nan, 5317.,   nan,   nan,
         nan, 5400.,   nan,   nan,   nan, 5506.,   nan,   nan,   nan,
       5572.,   nan,   nan,   nan])

Then I can filter the `program` table for these ids

In [12]:
program[program['id'].isin(test_zipcode_ids)]

Unnamed: 0,id,state_id,is_entire_state,implementing_sector_id,program_category_id,program_type_id,created_by_user_id,code,name,updated_ts,...,administrator,fundingsource,budget,start_date,start_date_text,end_date,end_date_text,summary,additional_technologies,fromSir
512,585,17,0,1,1,88,534,IL05F,Solar and Wind Energy Rebate Program,2017-03-15 14:47:55,...,Illinois Department of Commerce and Economic O...,Illinois Renewable Energy Resources Trust Fund,$2.5 million,1997-12-16 05:00:00,12/16/1997,,10/10/2014 (current applications),<p>&#10;&#9;<span>The State of Illinois Renewa...,,0
901,1187,17,0,1,1,87,534,IL14F,Efficient Housing Construction Grant,2016-02-11 19:34:53,...,Illinois Department of Commerce and Economic O...,Energy Efficiency Trust Fund and Energy Effici...,,2006-05-19 04:00:00,05/19/2006,,,<p><b><i>Note: The Illinois DCEO programs are ...,,0
1789,2331,17,0,1,1,88,492,IL19F,Large-Customer Energy Analysis Program (LEAP),2012-07-11 00:00:00,...,Illinois Department of Commerce and Economic O...,Illinois Energy Efficiency Portfolio Standard ...,,,,,,<p>The Large-Customer Energy Analysis Program ...,,0
2174,2819,17,0,1,1,87,534,IL26F,Biogas and Biomass to Energy Grant Program,2016-01-05 15:23:45,...,Illinois Department of Commerce and Economic O...,Renewable Energy Resources Trust Fund,,1997-12-16 05:00:00,12/16/1997,,,<p>&#10;&#9;<em><strong>Note: This program is ...,"Biogas, (methane produced by livestock manure...",0
2398,3061,17,0,1,1,40,534,IL27F,Renewable Energy Business Development Grant Pr...,2021-07-14 19:34:22,...,,,,,,,10/28/2011 (current solicitation),<p>\r\n\t<strong><em>NOTE: The most recent app...,,0
2407,3074,17,0,3,1,88,534,IL28F,ComEd -Energy Efficiency Program For Businesses,2022-05-25 22:48:14,...,,Illinois Energy Efficiency Portfolio Standard ...,,,,,,<p><span>Commonwealth Edison (ComEd) offers it...,,0
2408,3076,17,0,1,1,87,534,IL30F,Public Sector New Construction and Retrofit Pr...,2016-02-11 19:23:51,...,DCEO - Smart Energy Design Assistance Center,Illinois Energy Efficiency Portfolio Standard ...,,2007-08-27 04:00:00,08/27/2007,,,<p><b><i>The Illinois Energy Now programs are ...,,0
2489,3166,17,0,3,1,88,534,IL32F,ComEd - Energy Efficiency Program for Residential,2022-05-27 14:24:47,...,,Illinois Energy Efficiency Portfolio Standard ...,,,,,,<p>Commonwealth Edison (ComEd) offers resident...,,0
2998,3716,17,0,3,1,88,534,IL51F,ComEd - Energy Efficiency Program for Commerci...,2019-07-08 16:47:57,...,ComEd,ComEd and Nicor Gas customers in compliance wi...,,,,,,<p>&#10;&#9;The New Construction Service Team ...,,0
3377,4150,17,0,1,1,88,534,IL62F,Public Sector Energy Efficiency Programs,2015-03-26 00:00:00,...,Illinois Department of Commerce and Economic O...,Illinois Energy Efficiency Portfolio Standard ...,,2008-06-01 04:00:00,06/01/2008,,,<p>&#10;&#9;The Illinois Department of Commerc...,,0


Combining everything, I want all programs with an id in `test_zipcode_ids` OR with the correspondings state id AND `is_entire_state` equal to 1

In [13]:
program[(program['id'].isin(test_zipcode_ids)) | ((program['state_id'] == 17) & (program['is_entire_state'] == 1))]

Unnamed: 0,id,state_id,is_entire_state,implementing_sector_id,program_category_id,program_type_id,created_by_user_id,code,name,updated_ts,...,administrator,fundingsource,budget,start_date,start_date_text,end_date,end_date_text,summary,additional_technologies,fromSir
129,138,17,1,1,1,78,534,IL01F,Special Assessment for Solar Energy Systems,2021-03-12 19:34:30,...,Illinois Department of Commerce and Economic O...,,,,,,,<p>&#10;&#9;Illinois offers a special assessme...,,0
191,201,17,1,1,1,87,534,IL02F,Alternative Energy Bond Fund Program,2003-02-26 00:00:00,...,Illinois Department of Commerce and Economic O...,,,,,,,This grant program funds capital projects of a...,,0
344,355,17,1,7,1,87,534,IL06F,Illinois Clean Energy Community Foundation Grants,2015-12-17 00:00:00,...,Illinois Clean Energy Community Foundation,,,1999-06-30 04:00:00,06/30/1999,,,<p><b><i>Note: For the Renewable Energy Januar...,,0
393,418,17,1,2,1,40,534,IL03F,Chicago - Industry Recruitment of Chicago Spir...,2021-07-13 16:01:48,...,,,,,,,,<div><b>Chicago Spire Solar has closed.</b></d...,,0
406,439,17,1,1,1,88,534,IL07F,Vehicle Conversion Rebate,2004-06-08 00:00:00,...,Illinois EPA,,,1995-01-26 21:59:30,1995,,No expiration date,Illinois' Alternate Fuels Rebate Program provi...,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5023,22170,17,1,1,1,87,553,IL100F,Driving a Cleaner Illinois Program,2021-08-12 17:02:05,...,,,,,,,,<p>Driving a Cleaner Illinois was created by t...,,0
5027,22174,17,1,1,1,68,553,IL100F,Electric Vehicle Fleet Fee Exemption,2021-08-12 16:33:07,...,,,,,,,,<p>The State of Illinois charges a $20 per veh...,,0
5028,22175,17,1,1,1,88,553,IL100F,Electric Bus School District Reimbursement Pro...,2021-06-03 20:24:04,...,,,,,,,,<p>The State of Illinois gives transportation ...,,0
5086,22233,17,1,3,1,88,538,IL100F,ComEd - Distributed Generation Rebates,2021-06-18 17:58:20,...,Commonwealth Edison,,,,,,,,,0


Below I put this functionality into a function. 

**NOTE**: DSIRE uses zipcodes, but my analysis is using the Census ZCTA's, I will use this terminology hereafter.

In [14]:
def get_programs_by_zcta(zcta):

    # isolate row
    filtered = zipcode[zipcode['zipcode'] == zcta]

    # Isolate state_id
    state_id = int(filtered['state_id'])

    # Merge matching tables and collect programs id
    program_ids = filtered.merge(utility_zipcode, on='zipcode_id', how='left').merge(program_utility, on='utility_id', how='left').\
    merge(program_zipcode, on='zipcode_id', how='left').merge(program_city, on='city_id', how='left').merge(program_county, on='county_id', how='left').iloc[:, 8:].values.flatten()

    # Filter all programs with id in programs id or belonging to state with state_id
    programs = program[(program['id'].isin(program_ids)) | ((program['state_id'] == state_id) & (program['is_entire_state'] == 1))]

    # return dataframe or programs
    return programs

Now I can test this function with the ZCTA I used above, 60046. It should return 72 programs

In [15]:
get_programs_by_zcta(60046)

Unnamed: 0,id,state_id,is_entire_state,implementing_sector_id,program_category_id,program_type_id,created_by_user_id,code,name,updated_ts,...,administrator,fundingsource,budget,start_date,start_date_text,end_date,end_date_text,summary,additional_technologies,fromSir
129,138,17,1,1,1,78,534,IL01F,Special Assessment for Solar Energy Systems,2021-03-12 19:34:30,...,Illinois Department of Commerce and Economic O...,,,,,,,<p>&#10;&#9;Illinois offers a special assessme...,,0
191,201,17,1,1,1,87,534,IL02F,Alternative Energy Bond Fund Program,2003-02-26 00:00:00,...,Illinois Department of Commerce and Economic O...,,,,,,,This grant program funds capital projects of a...,,0
344,355,17,1,7,1,87,534,IL06F,Illinois Clean Energy Community Foundation Grants,2015-12-17 00:00:00,...,Illinois Clean Energy Community Foundation,,,1999-06-30 04:00:00,06/30/1999,,,<p><b><i>Note: For the Renewable Energy Januar...,,0
393,418,17,1,2,1,40,534,IL03F,Chicago - Industry Recruitment of Chicago Spir...,2021-07-13 16:01:48,...,,,,,,,,<div><b>Chicago Spire Solar has closed.</b></d...,,0
406,439,17,1,1,1,88,534,IL07F,Vehicle Conversion Rebate,2004-06-08 00:00:00,...,Illinois EPA,,,1995-01-26 21:59:30,1995,,No expiration date,Illinois' Alternate Fuels Rebate Program provi...,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5023,22170,17,1,1,1,87,553,IL100F,Driving a Cleaner Illinois Program,2021-08-12 17:02:05,...,,,,,,,,<p>Driving a Cleaner Illinois was created by t...,,0
5027,22174,17,1,1,1,68,553,IL100F,Electric Vehicle Fleet Fee Exemption,2021-08-12 16:33:07,...,,,,,,,,<p>The State of Illinois charges a $20 per veh...,,0
5028,22175,17,1,1,1,88,553,IL100F,Electric Bus School District Reimbursement Pro...,2021-06-03 20:24:04,...,,,,,,,,<p>The State of Illinois gives transportation ...,,0
5086,22233,17,1,3,1,88,538,IL100F,ComEd - Distributed Generation Rebates,2021-06-18 17:58:20,...,Commonwealth Edison,,,,,,,,,0


My analysis is focused on residential PV systems. I only want programs pertaining to the residential sector and PV technology. Below I bring in a few more tables to perform this filter

In [16]:
sector = pd.read_csv('../data/dsire/sector.csv')
program_sector = pd.read_csv('../data/dsire/program_sector.csv')
program_technology = pd.read_csv('../data/dsire/program_technology.csv')
technology = pd.read_csv('../data/dsire/technology.csv')

In [17]:
sector

Unnamed: 0,id,name,fieldname,is_selectable,parent_id
0,1,Commercial,Commercial,1,27.0
1,2,Construction,Construction,1,33.0
2,3,Industrial,Industrial,1,27.0
3,4,Investor-Owned Utility,IOU,1,32.0
4,5,Local Government,Local,1,28.0
5,6,Nonprofit,Nonprofit,1,28.0
6,8,Municipal Utilities,Municipal_Utilities,1,32.0
7,9,Residential,Residential,1,29.0
8,10,Cooperative Utilities,Cooperative_Utilities,1,32.0
9,11,Schools,Schools,1,28.0


Residential has 6 subcategories: Residential, Multifamily Residential, Low Income Residential, and the three Senior Citizens. I want all programs that belong to any one of these sector ids (9, 22, 23, 35, 36, 37)

In [18]:
residential_sector_ids = [9, 22, 23, 35, 36, 37]

In [19]:
program_sector.head()

Unnamed: 0,program_id,sector_id
0,1,1
1,3,1
2,4,1
3,6,1
4,7,1


In [20]:
# all program ids which belong to one of the residential sectors 
program_sector_ids = program_sector[program_sector['sector_id'].isin(residential_sector_ids)]['program_id'].values.tolist()
program_sector_ids

[1,
 2,
 3,
 4,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 16,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 27,
 28,
 30,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 48,
 49,
 50,
 51,
 54,
 56,
 60,
 61,
 63,
 64,
 65,
 66,
 67,
 72,
 74,
 75,
 76,
 80,
 83,
 84,
 85,
 89,
 90,
 91,
 94,
 103,
 105,
 106,
 107,
 108,
 109,
 110,
 116,
 118,
 119,
 120,
 123,
 124,
 130,
 131,
 132,
 135,
 136,
 137,
 138,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 151,
 154,
 156,
 157,
 159,
 160,
 161,
 162,
 164,
 165,
 166,
 168,
 169,
 170,
 171,
 173,
 175,
 177,
 178,
 179,
 181,
 183,
 184,
 185,
 186,
 189,
 192,
 194,
 196,
 198,
 203,
 204,
 207,
 209,
 212,
 214,
 217,
 219,
 220,
 223,
 229,
 230,
 232,
 235,
 236,
 237,
 240,
 241,
 243,
 245,
 246,
 248,
 250,
 253,
 254,
 255,
 256,
 257,
 258,
 259,
 261,
 264,
 270,
 271,
 273,
 274,
 275,
 276,
 277,
 279,
 280,
 281,
 282,
 283,
 284,
 285,
 286,
 287,
 288,
 290,
 291,
 292,
 293,
 294,
 295,
 301,
 302,
 304,
 307,
 3

In [21]:
residential_programs = program[program['id'].isin(program_sector_ids)]
residential_programs.shape

(2963, 23)

In [22]:
technology.head(10)

Unnamed: 0,id,name,technology_category_id,active
0,1,Solar - Passive,1,1
1,2,Solar Water Heat,1,1
2,3,Solar Space Heat,1,1
3,4,Geothermal Electric,2,1
4,5,Solar Thermal Electric,1,1
5,6,Solar Thermal Process Heat,1,1
6,7,Solar Photovoltaics,1,1
7,8,Wind (All),3,1
8,9,Biomass,4,1
9,10,Hydroelectric,5,1


Solar Photovoltaics has id 7

In [23]:
program_tech_ids = program_technology[program_technology['technology_id'] == 7]['program_id'].values.tolist()
len(program_tech_ids)

1712

In [24]:
residential_programs = residential_programs[residential_programs['id'].isin(program_tech_ids)]

In [25]:
residential_programs.shape

(1050, 23)

1,050 residential programs for PV. I am realizing now that filtering only for Photovoltaics will likely exclude some important programs (such as Renewable Portfolio Standards). This will actually be do to filtering by Residential alone, since Renewable Portfolio Standards are enforced on IOUs in the area. I think RPS are important and I will definitely have to include them later. For now I am looking at residential PV programs. Below I adjust the above function to use the new, filtered dataframe `residential_programs`

In [26]:
def get_programs_by_zcta(zcta):

    # isolate row
    filtered = zipcode[zipcode['zipcode'] == zcta]

    # Isolate state_id
    state_id = int(filtered['state_id'])

    # Merge matching tables and collect programs id
    program_ids = filtered.merge(utility_zipcode, on='zipcode_id', how='left').merge(program_utility, on='utility_id', how='left').\
    merge(program_zipcode, on='zipcode_id', how='left').merge(program_city, on='city_id', how='left').merge(program_county, on='county_id', how='left').iloc[:, 8:].values.flatten()

    # Filter all programs with id in programs id or belonging to state with state_id
    programs = residential_programs[(residential_programs['id'].isin(program_ids)) | (residential_programs['state_id'] == state_id)]

    # return dataframe or programs
    return programs

In [27]:
get_programs_by_zcta(60046)

Unnamed: 0,id,state_id,is_entire_state,implementing_sector_id,program_category_id,program_type_id,created_by_user_id,code,name,updated_ts,...,administrator,fundingsource,budget,start_date,start_date_text,end_date,end_date_text,summary,additional_technologies,fromSir
31,35,17,0,3,2,37,534,IL03R,ComEd - Wind & Photovoltaic Generation Program,2008-04-01 00:00:00,...,,,,,,,,"In April 2000, Commonwealth Edison (ComEd), an...",,0
129,138,17,1,1,1,78,534,IL01F,Special Assessment for Solar Energy Systems,2021-03-12 19:34:30,...,Illinois Department of Commerce and Economic O...,,,,,,,<p>&#10;&#9;Illinois offers a special assessme...,,0
440,486,17,1,1,2,46,534,IL01R,Renewable Energy Resources Trust Fund,2021-07-27 13:43:40,...,,,,,,,,"<p><b>According to <a href=""https://www.ilga.g...",,0
512,585,17,0,1,1,88,534,IL05F,Solar and Wind Energy Rebate Program,2017-03-15 14:47:55,...,Illinois Department of Commerce and Economic O...,Illinois Renewable Energy Resources Trust Fund,$2.5 million,1997-12-16 05:00:00,12/16/1997,,10/10/2014 (current applications),<p>&#10;&#9;<span>The State of Illinois Renewa...,,0
710,872,17,0,3,1,88,534,IL10F,Chicago Photovoltaic Incentive Program (PIP),2005-01-05 00:00:00,...,ComEd,,,,,,,"ComEd, in partnership with Spire Solar Chicago...",,0
759,950,17,1,3,2,14,534,IL07R,ComEd - Interconnection Guidelines,2008-04-01 00:00:00,...,,,,,,,,Illinois does not have statewide interconnecti...,,0
1902,2466,17,0,2,1,10,492,IL22F,City of Chicago - Green Building Permit Programs,2015-12-17 00:00:00,...,Chicago Center for Green Technology,,,,,,,<p>The Chicago Department of Buildings (DOB) G...,Rainwater Harvesting Systems,0
2089,2700,17,1,1,2,37,534,IL13R,Net Metering,2022-01-14 18:48:37,...,,,,2008-04-01 04:00:00,,,,<p><em><b>Note: The Climate and Equitable Jobs...,,0
2132,2759,17,1,1,1,87,534,IL24F,Green Neighborhood Grants,2007-10-23 00:00:00,...,Illinois Department of Commerce and Economic O...,,,,,,,The Illinois Dept. of Commerce and Economic Op...,,0
2250,2903,17,1,1,2,14,534,IL15R,Interconnection Standards,2017-03-24 15:01:49,...,,,,,,,,"<p><b><i>In December 2016, the Illinois Commer...",,0


15 total programs offered for residential PV for this ZCTA. For some reason these don't match those listed on the DSIRE website. I removed the `is_entire_state` equals 1 filter and 5 more were returned. I am going to take the broader route and leave this filter off for now.

Next I want to experiment with some OOP since I have the initial filtering functionality done. Ideally I would be able to have a Python script I import into my notebooks that handle the core functionality with DSIRE (and eventually URDB)

I need to take a step back and think about what it is I actually need from this data and how I intend to interact with it. What follows is a brief brainstorm (bit of a chaotic one) pertaining to those two ideas. The programs offer information specific to themselves upfront, but that is not what I am after. I am searching for a few metrics (features) which reflect the overall state of key factors in a given ZCTA. Those specific features I will determine later on but what is the best way to interact with this data? I should almost have a ZCTA class which program objects could belong to. I could then find aggregate metrics across all programs for a given ZCTA. The ZCTA class could also contain utility rates and other key factor metrics. 

For MVP model purposes, I am going to use a very simple feature from DSIRE. I just want the total number of incentives and policies applied to residential PV for each zip code. I don't want to worry about any time component or specific programs, just total number of policies and incentives. Below I read in my census dataframe which contains all of the ZCTA (zip codes) I need data for. 

In [28]:
df = pd.read_csv('../data/base_data.csv')
print(df.shape)
df.head()

(15740, 105)


Unnamed: 0.1,Unnamed: 0,zcta,state,lat,long,average_household_income,mean_household_income_lowest_quintile,mean_household_income_second_quintile,mean_household_income_third_quintile,mean_household_income_fourth_quintile,...,heating_degree_days,wind_speed,earth_temp,frost_days,earth_temp_amplitude,solar_azimuth_angle,num_systems,total_capacity,mean_system_size,median_system_size
0,0,85610,Arizona,31.744197,109.722324,53713.747228,15735.0,28976.0,41584.0,60403.0,...,108.36,2.46,17.4,3.3,18.02,-100.64,13.0,70.15,5.396154,5.38
1,1,85614,Arizona,31.814301,110.9194,67347.031441,15092.0,33942.0,52059.0,78902.0,...,64.42,2.43,20.51,0.8,18.44,-101.18,1012.0,7015.507,6.932319,5.985
2,2,85624,Arizona,31.504971,110.692999,56508.955224,12085.0,26596.0,40793.0,63481.0,...,64.42,2.43,20.51,0.8,18.44,-101.18,24.0,150.86,6.285833,5.865
3,3,85629,Arizona,31.917838,111.019035,91646.185302,24218.0,55100.0,82356.0,109225.0,...,64.42,2.43,20.51,0.8,18.44,-101.18,1186.0,8934.678,7.533455,7.2975
4,4,85630,Arizona,31.886572,110.181046,57186.339381,6123.0,16639.0,37332.0,58660.0,...,108.36,2.46,17.4,3.3,18.02,-100.64,37.0,258.01,6.973243,6.48


For each ZCTA, I will call the get programs by zcta function, then count number of policies and incentives. I am going to create a dictionary to store this data. Each key will be a zcta and each value will be a second dictionary with number of incentives and number of policies.

In [29]:
get_programs_by_zcta(60046)['program_category_id'].value_counts()

1    11
2     9
Name: program_category_id, dtype: int64

1 is financial incentive and 2 is policy.

In [30]:
zcta_programs = {}
for zcta in df['zcta'].values:
    
    # All residential pv programs for zcta
    all_res_pv_programs = get_programs_by_zcta(zcta)

    # num policies and incentives
    temp = all_res_pv_programs['program_category_id'].value_counts().to_dict()

    zcta_programs[zcta] = temp

In [31]:
zcta_programs_df = pd.DataFrame.from_dict(zcta_programs, orient='index').rename(columns={1: 'num_incentives', 2: 'num_policies'})
zcta_programs_df.head()

Unnamed: 0,num_incentives,num_policies
85610,17,12
85614,17,12
85624,17,12
85629,17,12
85630,17,12


In [32]:
zcta_programs_df.to_csv('../data/num_programs.csv')

In [33]:
df = df.merge(zcta_programs_df, left_on='zcta', right_index=True)

In [34]:
df.head()

Unnamed: 0.1,Unnamed: 0,zcta,state,lat,long,average_household_income,mean_household_income_lowest_quintile,mean_household_income_second_quintile,mean_household_income_third_quintile,mean_household_income_fourth_quintile,...,earth_temp,frost_days,earth_temp_amplitude,solar_azimuth_angle,num_systems,total_capacity,mean_system_size,median_system_size,num_incentives,num_policies
0,0,85610,Arizona,31.744197,109.722324,53713.747228,15735.0,28976.0,41584.0,60403.0,...,17.4,3.3,18.02,-100.64,13.0,70.15,5.396154,5.38,17,12
1,1,85614,Arizona,31.814301,110.9194,67347.031441,15092.0,33942.0,52059.0,78902.0,...,20.51,0.8,18.44,-101.18,1012.0,7015.507,6.932319,5.985,17,12
2,2,85624,Arizona,31.504971,110.692999,56508.955224,12085.0,26596.0,40793.0,63481.0,...,20.51,0.8,18.44,-101.18,24.0,150.86,6.285833,5.865,17,12
3,3,85629,Arizona,31.917838,111.019035,91646.185302,24218.0,55100.0,82356.0,109225.0,...,20.51,0.8,18.44,-101.18,1186.0,8934.678,7.533455,7.2975,17,12
4,4,85630,Arizona,31.886572,110.181046,57186.339381,6123.0,16639.0,37332.0,58660.0,...,17.4,3.3,18.02,-100.64,37.0,258.01,6.973243,6.48,17,12


In [35]:
df.to_csv('../data/data1.csv')