# RENEWABLES ACTIONS RAW DATA SOURCES


## Electric Vehicles
* Source: Center for Sustainable Energy (2020). Massachusetts Department of  Energy Resources Massachusetts Offers Rebates for Electric Vehicles, Rebate Statistics.  
* Retrieved 09/08/2020 from: https://mor-ev.org/program-statistics
* Data last updated 08/21/2020. Data date range includes 06/19/2014 - 08/15/2020.
* Sectors: Residential.


## Residential Air-source Heat Pumps (ASHP)

* Source: Massachusetts Clean Energy Center (2020). Air Source Heat Pump Program - Residential Projects.
* Retrieved 09/08/2020 from: http://files-cdn.masscec.com/ResidentialASHPProjectDatabase%2011.4.2019.xlsx
* Data last updated 11/04/2019. Data date range includes 12/26/2014 - 10/23/2019.
* Sectors: Residential.
* Observations: Raw data provider does not validate zipcode or town name.  It is "as provided" and includes typos.


## Ground-source Heat Pumps (GSHP)

* Source: Massachusetts Clean Energy Center (2020). Ground Source Heat Pump Program - Residential & Small-Scale Projects Database.
* Retrieved 09/08/2020 from: http://files-cdn.masscec.com/get-clean-energy/govt-np/clean-heating-cooling/ResidentialandSmallScaleGSHPProjectDatabase.xlsx 
* Data last updated June 2020. Data date range includes 01/02/2015 - 06/09/2020.
* Sectors: Residential, Small Commercial.

## Production Tracking System for Solar Photovoltaic Report (PV in PTS)

* Source: Massachusetts Clean Energy Center
* According to a [September 2017 Department of Public Utilities Report](https://fileservice.eea.comacloud.net/FileService.Api/file/FileRoom/9174030), "On a monthly basis, DOER and MassCEC compile data from the production tracking system to produce the MA PV Report, which is a publicly available document.  The MA PV Report is available electronically at
http://files.masscec.com/uploads/attachments/PVinPTSwebsite.xlsx."  However, the file at that URL does not seem to have been updated since November 2019.  We analyze it here, but need to search for a source of ongoing updated data.
* Sectors: Residential, Commercial, Institutional

In [1]:
import os
import re

import numpy as np
import pandas as pd
import rad_pipeline.rad_pipeline as rp
import rad_pipeline.zipcodes as zc

In [114]:
import importlib
importlib.reload(rp)
importlib.reload(zc)

<module 'rad_pipeline.zipcodes' from '/Users/alexhasha/repos/massenergize/rad_pipeline/rad_pipeline/zipcodes.py'>

In [3]:
DATA_DIR = "../data"
DATA_FILES = {
    "zip_code_community": os.path.join(DATA_DIR, "Zip Code Community.xlsx"),
    "Air-source Heat Pumps": os.path.join(DATA_DIR, "ResidentialASHPProjectDatabase 11.4.2019.xlsx"),
    "Solar Panels": os.path.join(DATA_DIR, "PVinPTSwebsite.xlsx"),
    "Ground-source Heat Pumps": os.path.join(DATA_DIR, "ResidentialandSmallScaleGSHPProjectDatabase.xlsx"),
    "EVs": os.path.join(DATA_DIR, "MOR-EV Stats Page Data Download.xlsx")
}
DATA_FILES

{'zip_code_community': '../data/Zip Code Community.xlsx',
 'Air-source Heat Pumps': '../data/ResidentialASHPProjectDatabase 11.4.2019.xlsx',
 'Solar Panels': '../data/PVinPTSwebsite.xlsx',
 'Ground-source Heat Pumps': '../data/ResidentialandSmallScaleGSHPProjectDatabase.xlsx',
 'EVs': '../data/MOR-EV Stats Page Data Download.xlsx'}

## Load raw data sources

In [4]:
ashp = rp.load_ashp()

  f_ashp = pd.read_excel(DATA_FILES["Air-source Heat Pumps"], 'Sheet1', skiprows=3)


In [5]:
ashp.head()

Unnamed: 0,Date Rebate Payment Approved by MassCEC,Site City/Town,Site Zip Code,Installer Company Name,Heating Fuel Being Replaced,Cooling Type Being Replaced,# of Outdoor Units,# of Indoor Units,Capacity of Heat Pumps at 5°F,Single- Head Heat Pump #1,Single- Head Heat Pump #2,Single- Head Heat Pump #3,Multi-Head Heat Pump #1,Multi-Head Heat Pump #2,Multi-Head Heat Pump #3,Total System Costs,Receiving an Income-Based Adder?,Rebate Amount
1,2019-10-23,CENTERVILLE,2632,"Seaside Gas Service, Inc",Natural Gas,,1,3,25000.761761,,,,,,,11575,0,1302.08
2,2019-10-23,NORTHFIELD,1360,Arctic Refrigeration,Pellet Stove,2 window units,1,1,20300.0,,,,,,,4050,0,625.0
3,2019-10-16,West Tisbury,2575,Nelson Mechanical Design Inc,Propane,Window Unit(s),2,-,57200.4,,,,Mitsubishi MXZ-3C30NAHZ2,Mitsubishi MXZ-3C30NAHZ2,,25560,No - Not Applicable,2000.0
4,2019-10-16,FITCHBURG,1420,Royal Steam Heater Co.,Natural Gas,Window fan,4,15,92997.920266,,,,,,,44352,0,6200.0
5,2019-10-09,Haverhill,1835,Climate Zone,Oil,Window Unit(s),1,,28600.0,0.0,0.0,0.0,Mitsubishi-MXZ-3C30NAHZ2,0,0.0,13700,No - Not Applicable,1191.67


In [6]:
gshp = rp.load_gshp()

  df_gshp = pd.read_excel(DATA_FILES["Ground-source Heat Pumps"], 'Sheet1', skiprows=2)


In [7]:
gshp.head()

Unnamed: 0,Rebate Approved by MassCEC,Site City/Town,Installer Name,Driller Company,Primary Heating Fuel Prior to Installation,Secondary Heating Fuel,Building Size (Square Feet),Previous Heat Distribution,Manual J Heat Load,Heat Pump Function,...,EER 3,Heating Capacity 3 (BTU/hr),Total Capacity (BTU/hr),"Eligible Capacity (BTU/hr) (maximum of 60,000 BTU/hr)",Backup Source,Heat Pump Costs (Equip + Install),Loop Cost (Equip + Install),Total System Cost,Rebate Amount,Income-Based Rebate Received?
0,2020-09-14,Arlington,"EarthTech Systems, LLC","Skillings & Sons, Inc.",Oil,,1771,Oil,32261,Heating and cooling,...,-,0,32000,32040.0,-,32752.0,4900.0,47252,8010,No
1,2020-08-27,Greenfield,"EarthTech Systems, LLC","Skillings & Sons, Inc.",Oil,,3920,Oil,50988,Heating and cooling,...,-,0,51000,51000.0,-,33100.0,5000.0,53100,8500,No
2,2020-08-27,Melrose,"Achieve Renewable Energy, LLC","Gap Mountain Drilling, LLC",Natural gas,,1695,Natural gas,71336,Heating and cooling,...,-,0,64000,60000.0,-,62303.0,0.0,79253,10000,No
3,2020-08-27,Freetown,Southcoast Greenlight Energy Inc.,"Gap Mountain Drilling, LLC",Electric resistance,Other,2580,Electric resistance,47263,"Heating, Cooling, and Hot Water",...,-,0,55200,55200.0,-,37050.0,17000.0,62050,9200,No
4,2020-08-27,Boston,Northstar Heating & Cooling,Desmond Well Drilling,-,-,4089,New Construction,102113,Heating and cooling,...,22.4,24900,92800,60000.0,-,27710.0,7000.0,41710,10000,No


In [21]:
gshp.columns

Index(['Rebate Approved by MassCEC', 'Site City/Town', 'Installer Name',
       'Driller Company', 'Primary Heating Fuel Prior to Installation',
       'Secondary Heating Fuel', 'Building Size (Square Feet)',
       'Previous Heat Distribution', 'Manual J Heat Load',
       'Heat Pump Function', 'Pump type', 'Loop Config.',
       'New Heating Distribution', 'Number of Bore Holes',
       'Total Bore Hole Depth (ft)', 'Peak Flow Rate (gallons per minute)',
       'HP Manufacturer', 'HP 1 Model', 'COP 1', 'EER 1',
       'Heating Capacity 1 (BTU/hr)', 'HP 2 Model', 'COP 2', 'EER 2',
       'Heating Capacity 2 (BTU/hr)', 'HP 3 Model', 'COP 3', 'EER 3',
       'Heating Capacity 3 (BTU/hr)', 'Total Capacity (BTU/hr)',
       'Eligible Capacity (BTU/hr) (maximum of 60,000 BTU/hr)',
       'Backup Source', 'Heat Pump Costs (Equip + Install) ',
       'Loop Cost (Equip + Install) ', 'Total System Cost', 'Rebate Amount',
       'Income-Based Rebate Received?'],
      dtype='object')

In [8]:
solar = rp.load_solar()

  df_pv = pd.read_excel(DATA_FILES["Solar Panels"], 'PV in PTS', skiprows=7)


In [9]:
solar.head()

Unnamed: 0,"Capacity \n(DC, kW)",Date In Service,Total Cost with Design Fees,Total Grant,City,Zip,County,Program Name,Facility Type,Installer,Module Manufacturer,Inverter Manufacturer,Meter Manufacturer,Utility,3rd Party Owner,SREC Eligible,Estimated Annual Production (kWhr)
0,1077.48,2019-08-08,3462750.0,0.0,Boston,2128,Suffolk,Non-RET Funded Grants,Industrial,ECA Solar,Jinko Solar,Solectria;SolarEdge Technologies,Elkor Technologies,NSTAR (DBA EverSource),N,Y,1345600.0
1,218.28,2019-07-30,600000.0,0.0,Pittsfield,1201,Berkshire,Non-RET Funded Grants,Commercial / Office,"BVD, LLC",Seraphim Solar System,Solectria;Solectria;Solectria,Elkor Technologies,WMECO (DBA EverSource),N,Y,288557.0
2,296.0,2019-07-09,760806.0,0.0,Great Barrington,1230,Berkshire,Non-RET Funded Grants,Commercial / Office,Solect Energy Development LLC,LG Electronics,HiQ Solar,eGauge,National Grid,Y,Y,309800.0
3,1408.96,2019-06-27,4064405.0,0.0,Millbury,1527,Worcester,Non-RET Funded Grants,Community Solar,M&amp;W Energy,Hansol Technics,Sungrow Power,Accuenergy,National Grid,N,Y,1760000.0
4,657.0,2019-06-21,738468.0,0.0,Walpole,2081,Norfolk,Non-RET Funded Grants,Industrial,ECA Solar,LG Electronics,Solectria;Solectria;Solectria,Elkor Technologies,NSTAR (DBA EverSource),N,Y,804168.0


In [10]:
evs = rp.load_evs()

  df_evs = pd.read_excel(DATA_FILES["EVs"], 'Data')


In [11]:
evs.head()

Unnamed: 0,Application Number,Total Amount,Application Received Date,Vehicle Category,Vehicle Model,Vehicle Year,Date of Purchase,Purchase or Lease?,Zip Code,County
0,M-000011,2500,2014-06-19,BEV,Nissan Leaf SV,2013,2014-06-18,Lease,1581,Worcester
1,M-000013,1500,2014-06-20,PHEV,Ford Fusion Energi,2014,2014-06-19,Lease,1826,Middlesex
2,M-000014,2500,2014-06-20,PHEV+,Chevrolet Volt,2014,2014-06-20,Lease,1915,Essex
3,M-000015,1500,2014-06-20,PHEV,Ford C-MAX Energi,2014,2014-06-18,Purchase,1985,Essex
4,M-000016,2500,2014-06-21,BEV,Nissan Leaf S,2013,2014-06-21,Purchase,1757,Worcester


## Map key column names by zipcode

Quantities of interest at the municipal level

* Total Rebates
* Average Rebate
* Total Cost
* Average Cost
* Quantity Income-Eligible

Fields needed: cost, rebate, zipcode, town, income eligible

In [20]:
rp.DATA_FILES

{'zip_code_community': '../data/Zip Code Community.xlsx',
 'Air-source Heat Pumps': '../data/ResidentialASHPProjectDatabase 11.4.2019.xlsx',
 'Solar Panels': '../data/PVinPTSwebsite.xlsx',
 'Ground-source Heat Pumps': '../data/ResidentialandSmallScaleGSHPProjectDatabase.xlsx',
 'EVs': '../data/MOR-EV Stats Page Data Download.xlsx'}

In [26]:
SOURCES = ["EVs", "Solar Panels", "Air-source Heat Pumps", "Ground-source Heat Pumps"]

FIELDS = {
    "EVs": {
        "rebate": "Total Amount",
        "zip": "Zip Code",
    },
    "Solar Panels": {
        "rebate": '',
        "cost": '',
        "zip": '',
        "town": '',
        "income": '',        
    },
    "Air-source Heat Pumps": {
        "rebate": 'Rebate Amount ', # That's right, with a space at the end...
        "cost": 'Total System Costs',
        "zip": 'Site Zip Code',
        "town": 'Site City/Town',
        "income": 'Receiving an Income-Based Adder?',
    },
    "Ground-source Heat Pumps": {
        "rebate": 'Rebate Amount', 
        "cost": 'Total System Cost',
        "zip": 'Site Zip Code',
        "town": 'Site City/Town',
        "income": 'Income-Based Rebate Received?',
    }
}

# Try out great_expectations data profilers

In [12]:
import great_expectations as ge

In [13]:
ashp_ge = ge.from_pandas(
    ashp
)

## Workout robust logic for cleaning zipcodes

In [15]:
# Load Zip Code Municipality -- 
# @params: file_name, column_name
zipcode_community_map = pd.read_excel(DATA_FILES["zip_code_community"], 'Villages to Muni with Zip')
municipalities = pd.read_excel(DATA_FILES["zip_code_community"], '351 Mass Munis')

In [16]:
import zipcodes

zipcodes.is_real('02186')

True

In [17]:
zipcodes.matching('02186')

[{'zip_code': '02186',
  'zip_code_type': 'STANDARD',
  'active': True,
  'city': 'Milton',
  'acceptable_cities': [],
  'unacceptable_cities': ['East Milton'],
  'state': 'MA',
  'county': 'Norfolk County',
  'timezone': 'America/New_York',
  'area_codes': ['617'],
  'world_region': 'NA',
  'country': 'US',
  'lat': '42.2396',
  'long': '-71.0811'}]

In [41]:
zipcodes.matching('02186-5827')

[{'zip_code': '02186',
  'zip_code_type': 'STANDARD',
  'active': True,
  'city': 'Milton',
  'acceptable_cities': [],
  'unacceptable_cities': ['East Milton'],
  'state': 'MA',
  'county': 'Norfolk County',
  'timezone': 'America/New_York',
  'area_codes': ['617'],
  'world_region': 'NA',
  'country': 'US',
  'lat': '42.2396',
  'long': '-71.0811'}]

In [49]:
zipcodes.matching('02215')

[{'zip_code': '02215',
  'zip_code_type': 'STANDARD',
  'active': True,
  'city': 'Boston',
  'acceptable_cities': [],
  'unacceptable_cities': ['Boston University', 'Kenmore'],
  'state': 'MA',
  'county': 'Suffolk County',
  'timezone': 'America/New_York',
  'area_codes': ['617', '857'],
  'world_region': 'NA',
  'country': 'US',
  'lat': '42.3452',
  'long': '-71.1061'}]

In [64]:
zipcodes.matching('02632')

[{'zip_code': '02632',
  'zip_code_type': 'STANDARD',
  'active': True,
  'city': 'Centerville',
  'acceptable_cities': [],
  'unacceptable_cities': [],
  'state': 'MA',
  'county': 'Barnstable County',
  'timezone': 'America/New_York',
  'area_codes': ['508', '774'],
  'world_region': 'NA',
  'country': 'US',
  'lat': '41.6571',
  'long': '-70.3474'}]

In [101]:
zipcodes.matching('08180')

[]

In [40]:
ashp["zip_cleaned"].dtypes

dtype('O')

In [94]:
def validate_zip_town_row(row: dict, town_field: str, zip_field: str) -> pd.Series:
    """
    Use zipcodes library to append town data from zipcodes and validate existing zip/town inputs
    """
    zip_results = zipcodes.matching(row[zip_field])
    if len(zip_results) > 1:
        raise ValueError(f"Multimatch zipcode {row[zip_field]} encountered!")
    elif len(zip_results) == 0:
        output = {
            "town": "INVALID",
            "zip_exists": False,
            "town_valid": False,
        }
    else: # len(zip_results)==1

        zip_results = zip_results[0]
        standardized_raw_town_name = str(row[town_field]).lower().strip()
        standardized_town_name = zip_results["city"].lower().strip()
        standardized_acceptable_towns = [x.lower().strip() for x in zip_results["acceptable_cities"]]
        standardized_acceptable_towns.append(standardized_town_name)

        town_valid = standardized_raw_town_name in standardized_acceptable_towns
        output = {
            "town": zip_results["city"],
            "zip_exists": True,
            "town_valid": town_valid
        }
    return pd.Series(output)

    
def validate_zip_town(df: pd.DataFrame, town_field: str, zip_field: str) -> pd.DataFrame:
    """
    Use zipcodes library to append town data from zipcodes and validate existing zip/town inputs

    Return: pandas.DataFrame
    - Pandas series of town name from zipcodes library: str
    - Pandas series indicating zipcode exists: boolean
    - Pandas series indicating raw data town is valid given zipcode: boolean
    """
    return df.merge(df.apply(lambda row: validate_zip_town_row(row, town_field, zip_field), axis=1), left_index=True, right_index=True)

In [95]:
ashp.head()

Unnamed: 0,zip_cleaned,zip4_cleaned,zip_valid,Date Rebate Payment Approved by MassCEC,Site City/Town,Site Zip Code,Installer Company Name,Heating Fuel Being Replaced,Cooling Type Being Replaced,# of Outdoor Units,...,Capacity of Heat Pumps at 5°F,Single- Head Heat Pump #1,Single- Head Heat Pump #2,Single- Head Heat Pump #3,Multi-Head Heat Pump #1,Multi-Head Heat Pump #2,Multi-Head Heat Pump #3,Total System Costs,Receiving an Income-Based Adder?,Rebate Amount
1,2632,,True,2019-10-23,CENTERVILLE,2632,"Seaside Gas Service, Inc",Natural Gas,,1,...,25000.761761,,,,,,,11575,0,1302.08
2,1360,,True,2019-10-23,NORTHFIELD,1360,Arctic Refrigeration,Pellet Stove,2 window units,1,...,20300.0,,,,,,,4050,0,625.0
3,2575,,True,2019-10-16,West Tisbury,2575,Nelson Mechanical Design Inc,Propane,Window Unit(s),2,...,57200.4,,,,Mitsubishi MXZ-3C30NAHZ2,Mitsubishi MXZ-3C30NAHZ2,,25560,No - Not Applicable,2000.0
4,1420,,True,2019-10-16,FITCHBURG,1420,Royal Steam Heater Co.,Natural Gas,Window fan,4,...,92997.920266,,,,,,,44352,0,6200.0
5,1835,,True,2019-10-09,Haverhill,1835,Climate Zone,Oil,Window Unit(s),1,...,28600.0,0.0,0.0,0.0,Mitsubishi-MXZ-3C30NAHZ2,0,0.0,13700,No - Not Applicable,1191.67


In [96]:
result = validate_zip_town(ashp[ashp.zip_valid], FIELDS["Air-source Heat Pumps"]["town"], "zip_cleaned")
result.head()

Unnamed: 0,zip_cleaned,zip4_cleaned,zip_valid,Date Rebate Payment Approved by MassCEC,Site City/Town,Site Zip Code,Installer Company Name,Heating Fuel Being Replaced,Cooling Type Being Replaced,# of Outdoor Units,...,Single- Head Heat Pump #3,Multi-Head Heat Pump #1,Multi-Head Heat Pump #2,Multi-Head Heat Pump #3,Total System Costs,Receiving an Income-Based Adder?,Rebate Amount,town,zip_exists,town_valid
1,2632,,True,2019-10-23,CENTERVILLE,2632,"Seaside Gas Service, Inc",Natural Gas,,1,...,,,,,11575,0,1302.08,Centerville,True,True
2,1360,,True,2019-10-23,NORTHFIELD,1360,Arctic Refrigeration,Pellet Stove,2 window units,1,...,,,,,4050,0,625.0,Northfield,True,True
3,2575,,True,2019-10-16,West Tisbury,2575,Nelson Mechanical Design Inc,Propane,Window Unit(s),2,...,,Mitsubishi MXZ-3C30NAHZ2,Mitsubishi MXZ-3C30NAHZ2,,25560,No - Not Applicable,2000.0,West Tisbury,True,True
4,1420,,True,2019-10-16,FITCHBURG,1420,Royal Steam Heater Co.,Natural Gas,Window fan,4,...,,,,,44352,0,6200.0,Fitchburg,True,True
5,1835,,True,2019-10-09,Haverhill,1835,Climate Zone,Oil,Window Unit(s),1,...,0.0,Mitsubishi-MXZ-3C30NAHZ2,0,0.0,13700,No - Not Applicable,1191.67,Haverhill,True,True


In [98]:
result[~result.town_valid].head()

Unnamed: 0,zip_cleaned,zip4_cleaned,zip_valid,Date Rebate Payment Approved by MassCEC,Site City/Town,Site Zip Code,Installer Company Name,Heating Fuel Being Replaced,Cooling Type Being Replaced,# of Outdoor Units,...,Single- Head Heat Pump #3,Multi-Head Heat Pump #1,Multi-Head Heat Pump #2,Multi-Head Heat Pump #3,Total System Costs,Receiving an Income-Based Adder?,Rebate Amount,town,zip_exists,town_valid
26,1581,,True,2019-07-03,Westboro,1581,"Rodenhiser Plumbing, Heating, AC & Electric",Oil,Window Unit(s),1,...,,Mitsubishi MXZ-3C30NAHZ2,,,13775.0,No - Not Applicable,1191.67,Westborough,True,False
27,2661,,True,2019-07-03,S. Harwich,2661,South Shore Heating & Cooling Inc.,Natural Gas,Window Unit(s),1,...,,Mitsubishi MXZ-3C30NAHZ2,,,9487.0,No - Not Applicable,1191.67,South Harwich,True,False
33,1002,,True,2019-06-26,Leverett,1002,Rock Valley HVAC Inc,,,3,...,,Mitsubishi MXZ-4C36NAHZ Non-Ducted Indoor Units,,,16900.0,No - Not Applicable,2000.0,Amherst,True,False
46,2132,,True,2019-06-19,W Roxbury,2132,NETR LLC ...,Natural Gas,Window Unit(s),1,...,,Mitsubishi MXZ-5C42NAHZ,,,21785.0,No - Not Applicable,2000.0,West Roxbury,True,False
73,1930,,True,2019-06-05,GLOUCESTR,1930,Morris Heating & Air Conditioning ...,Electric Resistance,2 window units,1,...,,,,,6217.68,No - Not Applicable,925.0,Gloucester,True,False


In [99]:
result[~result.town_valid].shape[0]

1085

In [106]:
# It seems the data provider is not validating zipcodes

In [100]:
result[~result.zip_exists].head()

Unnamed: 0,zip_cleaned,zip4_cleaned,zip_valid,Date Rebate Payment Approved by MassCEC,Site City/Town,Site Zip Code,Installer Company Name,Heating Fuel Being Replaced,Cooling Type Being Replaced,# of Outdoor Units,...,Single- Head Heat Pump #3,Multi-Head Heat Pump #1,Multi-Head Heat Pump #2,Multi-Head Heat Pump #3,Total System Costs,Receiving an Income-Based Adder?,Rebate Amount,town,zip_exists,town_valid
669,8180,,True,2019-05-08,Andover,8180,"Royal Air Systems, Inc.",Oil,Window Unit(s),1,...,,,,,7422,No - Not Applicable,500.0,INVALID,False,False
959,2174,,True,2019-04-24,ARLINGTON,2174,RER Fuel Inc,Natural Gas,Window Unit(s),2,...,,Fujitsu AOU18RLXFZH,Fujitsu AOU24RLXFZH,,23750,No - Not Applicable,1979.17,INVALID,False,False
8747,673,,True,2018-02-13,WEST YARMOUTH,673,"Cape Cod Mechanical Systems, Inc.",Natural Gas,,1,...,,FUJITSU AOU24RLXFZH,,,7900,Not Applicable,1328.12,INVALID,False,False


In [104]:
zipcodes.matching('08180')

[]

In [105]:
zipcodes.filter_by(city="Andover", state="MA")

[{'zip_code': '01810',
  'zip_code_type': 'STANDARD',
  'active': True,
  'city': 'Andover',
  'acceptable_cities': [],
  'unacceptable_cities': [],
  'state': 'MA',
  'county': 'Essex County',
  'timezone': 'America/New_York',
  'area_codes': ['508', '978'],
  'world_region': 'NA',
  'country': 'US',
  'lat': '42.6496',
  'long': '-71.1660'},
 {'zip_code': '01812',
  'zip_code_type': 'UNIQUE',
  'active': True,
  'city': 'Andover',
  'acceptable_cities': [],
  'unacceptable_cities': ['Internal Revenue Service'],
  'state': 'MA',
  'county': 'Essex County',
  'timezone': 'America/New_York',
  'area_codes': ['351'],
  'world_region': 'NA',
  'country': 'US',
  'lat': '42.6585',
  'long': '-71.1377'},
 {'zip_code': '01899',
  'zip_code_type': 'UNIQUE',
  'active': True,
  'city': 'Andover',
  'acceptable_cities': [],
  'unacceptable_cities': ['Bar Coded I R S'],
  'state': 'MA',
  'county': 'Essex County',
  'timezone': 'America/New_York',
  'area_codes': ['351'],
  'world_region': 'NA',

# Clean zipcode and town data

## ASHP

In [31]:
clean_zips = zc.clean(ashp[FIELDS["Air-source Heat Pumps"]["zip"]])

#Expect percentage of invalid zips to be b
clean_zips[~clean_zips.zip_valid]

Unnamed: 0,zip_cleaned,zip4_cleaned,zip_valid
1093,20,,False
1590,019081047,,False
1606,0212y,,False


In [33]:
# Check for less than .02% invalid zipcodes
clean_zips[~clean_zips.zip_valid].shape[0] / ashp.shape[0]

0.00015027048687637747

In [35]:
# Join fields
ashp = pd.concat([clean_zips, ashp], axis=1)
ashp.head()

Unnamed: 0,zip_cleaned,zip4_cleaned,zip_valid,Date Rebate Payment Approved by MassCEC,Site City/Town,Site Zip Code,Installer Company Name,Heating Fuel Being Replaced,Cooling Type Being Replaced,# of Outdoor Units,...,Capacity of Heat Pumps at 5°F,Single- Head Heat Pump #1,Single- Head Heat Pump #2,Single- Head Heat Pump #3,Multi-Head Heat Pump #1,Multi-Head Heat Pump #2,Multi-Head Heat Pump #3,Total System Costs,Receiving an Income-Based Adder?,Rebate Amount
1,2632,,True,2019-10-23,CENTERVILLE,2632,"Seaside Gas Service, Inc",Natural Gas,,1,...,25000.761761,,,,,,,11575,0,1302.08
2,1360,,True,2019-10-23,NORTHFIELD,1360,Arctic Refrigeration,Pellet Stove,2 window units,1,...,20300.0,,,,,,,4050,0,625.0
3,2575,,True,2019-10-16,West Tisbury,2575,Nelson Mechanical Design Inc,Propane,Window Unit(s),2,...,57200.4,,,,Mitsubishi MXZ-3C30NAHZ2,Mitsubishi MXZ-3C30NAHZ2,,25560,No - Not Applicable,2000.0
4,1420,,True,2019-10-16,FITCHBURG,1420,Royal Steam Heater Co.,Natural Gas,Window fan,4,...,92997.920266,,,,,,,44352,0,6200.0
5,1835,,True,2019-10-09,Haverhill,1835,Climate Zone,Oil,Window Unit(s),1,...,28600.0,0.0,0.0,0.0,Mitsubishi-MXZ-3C30NAHZ2,0,0.0,13700,No - Not Applicable,1191.67


In [115]:
ashp_cleaned = zc.validate_zip_town(ashp, FIELDS["Air-source Heat Pumps"]["town"], FIELDS["Air-source Heat Pumps"]["zip"])

In [117]:
rp.DATA_DIR

'../data'

In [123]:
ashp_cleaned.to_csv(os.path.join(rp.DATA_DIR, "cleaned", "ashp.csv"), sep=",", index_label="index")

In [127]:
ashp_from_disk = pd.read_csv(os.path.join(rp.DATA_DIR, "cleaned", "ashp.csv"), index_col="index")

In [128]:
pd.testing.assert_frame_equal(ashp_cleaned, ashp_from_disk)

AssertionError: DataFrame.index are different

Attribute "names" are different
[left]:  [None]
[right]: ['index']

In [126]:
set(ashp_from_disk.columns).difference(ashp_cleaned.columns)

{'index'}

## Output data structure

Because the datasets differ by fields provided, and so some will offer richer metrics than others, and 
because we may want to present data aggregated at multiple levels, I propose the following output data structure:

- locale: str (e.g. "02186" or "Milton" or "Norfolk County")
- technology: str (e.g. "ASHP" or "Solar Panels")
- metric: str, (e.g. "Total Cost" or "Percent income support")
- metric type
- value: varied type