### DeepSolar Data
> * Created by a research team at Stanford University: [DeepSolar Site](http://web.stanford.edu/group/deepsolar/home)
> * Used satellite imagery and DeepLearning to identify the GPS locations and sizes of solar photovoltaic (PV) panels.
> * Main Findings:
    * residential solar deployment density peaks at a population density of 1000 capita/mile<sup>2</sup>
    * residential solar deployment increases with annual household income asymptoting at ~$150K
    * residential solar deployment has an inverse correlation with the Gini index representing income inequality
    * uncovered a solar radiation threshold (4.5 kWh/m2/day) above which the solar deployment is “triggered”
    

## NREL SEEDS (Solar Energy Evolution and Diffusion Studies: 2017-2019)  II Data
* Can be found on the : [NREL SITE](https://www.nrel.gov/solar/seeds/2017-2019-study.html)
* Used lidar to model homes in a area to quantify how suitable the homes in the census tract are
* study focuses on identifying new strategies to dramatically scale up solar adoption rates in low-to-moderate income (LMI) communities with the goal of giving LMI communities the same access to photovoltaic (PV) power that wealthier communities often enjoy.
* Develop novel, data-driven, and evidence-based strategies that could identify pathways to dramatically scale up solar adoption rates in LMI communities
* Evaluate the potential for LMI market penetration
* Validate alternative marketing techniques and ownership models that could effectively target LMI households and cost-efficiently reach scale.
* ***We we use from the data set***
    * They provide various information not found in DeepSolar
        * socioeconmic (different income levels broken down into very low, low, moderate, median, and high income)
        * climate (climate zone)
        * locale (rural, suburban, urban)
        * policy (LMI incentive programs)
        * suitability for RPV adoption (number and area of suitable roof space for a census tract)
    * both are summation data around 2018
    * both at census tract level


# How the Data is joined
* Each data set has FIPS census track codes
* ***Census tracts*** are divisions of a county based on population lower and upper limits
<div>
<img src="./_IMG/TN_CT2.jpg" alt="Knox County CT's" height=40% width=40% style="float: left; margin right: 10px;" />
</div>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>


* Each code is a 12 digit number where each portion represents a different level of geographic information
<img src="./_IMG/FIPS_BREAKDOWN.png" alt="Knox County CT's" height=40% width=40% style="float: left; margin right: 10px;" />

<br>
<br>
<br>
<br>

* These numbers are Geolocation id numbers used to geographically locate a census tract
* Consist of a number where different sections represent different levels of information



# Python Tools
### How to write your code
***There are many ways to create your python code***
### IDE/text editors
* [VIM](https://opensource.com/resources/what-vim): text editor, lots of tools for quick code editing, takes some getting used to, no visual or typo indicators 
* [Pycharm](https://www.jetbrains.com/community/education/#students): IDE, can point out errors to you so you can fix them quickly, can use VIM with it
* [Anaconda](https://www.anaconda.com/products/individual): Allows you access to various programming tools such as python or R and many more
* Below are links to a listing of development sits, and the link to the VIM site

***I will be using pycharm/vim (because I'm lazy) but you can use what ever you want. Just find the one that works for you***

### Here is a site listing other development tools
* [Python development tools](https://hackr.io/blog/best-python-ide)


### VIM site
* <a href='https://opensource.com/resources/what-vim'>VIM Site</a>
      


# Ways to install other python modules 
***When working on various projects there are ofen already develped modules you can find and use that can do some task you wish to do. To use them you need to install them. There are several ways to do this. Some install the module for any python program you want to run, others are only availble to certain virtual environments you create. Bellow are some of the common ways to do this, and the way to do so in pycharm.***

## Using pip 
### ***PIP***
> [pip](https://pip.pypa.io/en/stable/installing/)


### Unix/Max checking for  pip
<div>
<img src="./_IMG/pip_check_mac_unix.png" alt="Markdown Monster icon" height=30% width=30% style="float: left; margin right: 10px;" />
</div>
<div>
<img src="./_IMG/pip_check_wind.png" alt="Markdown Monster icon" height=20% width=20% style="float: left; margin right: 10px;" />
</div>
<br>
<br>
<br>

### Unix/Max installing with pip
<span>
<div>
<img src="./_IMG/pip_bootstrap_mac_unix.png" alt="Markdown Monster icon" height=30% width=30% style="float: left; margin right: 10px; " />
</div>
<br>
<div>
<img src="./_IMG/pip_bootstrap_wind.png" alt="Markdown Monster icon" height=25% width=25% style="float: left; margin right: 10px; padding-bottom: 100px;" />
</div>
</span>
<br>
<br>


## Unix/Max checking for and installing with pip
> [conda download](https://conda.io/projects/conda/en/latest/user-guide/install/index.html).
* Or you can install Anaconda and go to the Anaconda command line. Then you can install modules with
* ***conda install [package]***
    * so to install pandas (what we are using today): ***conda install pandas***
  



## Installing with conda in a notebook cell
<img src="./_IMG/conda_notebook.png"
     alt="conda notebook install" height=50% width=50%
     style="float: left; margin-right: 10px;" />





## Installing modules in pycharm

If using pycharm you can just go to file/settings/interpretor and press the + button in window and search 
for what you need in the top text box. If it can find it will be in the list below so you can just select if and click install 
<img src="./_IMG/Installing with pycharm.png"
     alt="Markdown Monster icon" height=100% width=100%
     style="float: left; margin-right: 10px;" />

## Installing with pip in a notebook cell
<img src="./_IMG/pip_notebook.png"
     alt="conda notebook install" height=50% width=50%
     style="float: left; margin-right: 10px;" />

# Pandas

> Main Documentation Site: [Pandas](https://pandas.pydata.org/docs/)
## Loading Data
   > * Loading a CSV: [read_csv()](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)
   > * Load SPSS (.sav) file: [read_spss()](https://pandas.pydata.org/docs/reference/api/pandas.read_spss.html)
   > * Alternate SPSS file: [pyreadstat.read_sav()](https://www.marsja.se/how-to-read-write-spss-files-in-python-pandas/)
   > * Loading a excel file: [read_excel()](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html)
   > * Loading a hd5 [read_hdf()](https://pandas.pydata.org/docs/reference/api/pandas.read_hdf.html)
   > * Using sql query [read_sql()](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html)

## Basic Data Statistics Describtion

# All available variables in merged data set

In [1]:
# usually in any kind of programming you want to add any outside methods/classes etc in the beginning so 
# you can access them for the runtime of your program 
# this block imports some useful tools from other modules
# These modules can be downloaded/installed using tools such as pip and conda
# if using pycharm you can just go to file/settings/interpretor and press the + button in the bottom window and search 
# for what you need. 
import pandas as pd                           # import the pandas data frame moduel and call it pd
from _DeepSolarTools.__DEEPSOLAR_Resources import *
from _DeepSolarTools.__PAPER_RESOURCES import *
#__PAPER_RESOURCES.py
path_to_file = r"C:\Users\gjone\ConvergentDataTrainer\ConvergentMini.csv"



# convergent_dfALL = pd.read_csv(path_to_file, low_memory=False, nrows=2000)
convergent_dfALL = pd.read_csv(path_to_file, low_memory=False,nrows=100)
display(convergent_dfALL)

Unnamed: 0.1,Unnamed: 0,solar_system_count,total_panel_area,fips,Bachelors #,edu_college,# PhD's,# HS Grads,# Less HS,# MS's,...,Income_x_EnergyCost,popden_x_TotOK_cnt,popden_x_TotOK_RCnt,popden_x_TotOK_Rm2,ownership_x_TotOK_cnt,ownership_x_TotOK_Rcnt,ownership_x_TotOK_Rm2,High_Solar_Areas,Low_Solar_Areas,DS_HighSolar
0,0,0,0.000000,27145011200,569,1690,13,1757,336,157,...,7588.936933,6.952926e+04,1.277294e+05,3.937895e+06,761.427170,1398.787108,43124.585296,0,1,0
1,1,21,1133.436461,27145011301,674,1434,108,767,222,285,...,5547.918385,6.067308e+05,1.102871e+06,3.382866e+07,553.848427,1006.745318,30880.168466,0,0,0
2,2,3,64.505776,27145011302,854,1459,31,1541,289,276,...,7061.118753,3.070021e+05,5.586140e+05,1.711447e+07,819.714385,1491.533392,45696.664436,0,1,0
3,3,0,0.000000,27145011304,640,1116,68,1095,231,270,...,9438.881134,9.742360e+04,1.839676e+05,5.666547e+06,550.354252,1039.248827,32010.804416,0,1,0
4,4,5,164.583303,27145011400,654,1314,15,982,163,170,...,9017.289466,1.392015e+05,2.554662e+05,7.981974e+06,474.953332,871.646840,27234.378076,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,95,72,2798.185522,36071012800,513,1312,20,961,344,280,...,12643.234469,2.079548e+06,2.078285e+06,1.226586e+08,443.707703,443.438269,26171.335227,0,1,0
96,96,39,1231.460346,36071013000,612,975,45,791,126,417,...,12751.824443,1.601780e+06,1.600064e+06,8.224574e+07,399.604296,399.176223,20518.262760,0,1,0
97,97,106,3739.797743,36071013201,673,1200,11,979,148,412,...,16527.669700,1.937605e+06,1.935603e+06,1.212288e+08,573.530722,572.938054,35883.694583,0,1,0
98,98,76,2795.219740,36071013300,945,1577,38,1707,389,369,...,18257.781024,4.170177e+05,7.819392e+05,2.882928e+07,757.867363,1421.057626,52392.902713,0,1,0


In [4]:
for v in sorted(convergent_dfALL.columns.tolist()):
    print("{},".format(v))

# 1 person households,
# 2 personhouseholds,
# 3 person households,
# 4 person households,
# > 65 yrs,
# >= 25 years of age ,
# Coal Heat,
# Electric Heat,
# HS Grads,
# Homeowner,
# Homeowners costs > $1k,
# Homeowners costs > $1k.1,
# Housing Units,
# Kerosene Heat,
# Less HS,
# MS's,
# Nonresidential Incentives,
# Nonresidential State Inc,
# Other Heat Fuel,
# PhD's,
# Residential Incentives,
# Residential State Inc,
# Some college or More,
# age 25 + 2/ HS Edu,
# age 25 + no HS Edu,
# families in poverty,
# families under poverty,
% Admin Occu,
% Agriculture Occu,
% Arts Occu,
% Commuting by bicycle,
% Commuting by car,
% Commuting by carpool,
% Commuting by motorcycle,
% Commuting by public transportation,
% Commuting by walking,
% Commuting time 10-19 mins daily,
% Commuting time 20-29 mins daily,
% Commuting time 30-39 mins daily,
% Commuting time 40-59 mins daily,
% Commuting time 60-89 mins daily,
% Commuting time < 10 mins daily,
% Construction Occu,
% Education Occu,
% Elder

In [5]:
convergent_dfALL["State"]

0     mn
1     mn
2     mn
3     mn
4     mn
      ..
95    ny
96    ny
97    ny
98    ny
99    ny
Name: State, Length: 100, dtype: object

In [2]:
print(GetMainCols("MIX"))

['age_10_14_rate', 'age_15_17_rate', 'age_18_24_rate', 'age_25_34_rate', 'age_35_44_rate', 'age_45_54_rate', 'age_5_9_rate', 'age_55_64_rate', 'age_65_74_rate', 'age_75_84_rate', 'age_median', 'age_more_than_85_rate', 'cooling_design_temperature', 'heating_design_temperature', 'travel_time_10_19_rate', 'travel_time_20_29_rate', 'travel_time_30_39_rate', 'travel_time_40_59_rate', 'travel_time_60_89_rate', 'travel_time_average', 'travel_time_less_than_10_rate', 'housing_unit_median_gross_rent', 'education_bachelor', 'education_bachelor_rate', 'education_college', 'education_college_rate', 'education_doctoral', 'education_doctoral_rate', 'education_high_school_graduate', 'education_high_school_graduate_rate', 'education_less_than_high_school', 'education_less_than_high_school_rate', 'education_master', 'education_master_rate', 'education_population', 'education_professional_school', 'education_professional_school_rate', 'number_of_years_of_education', 'employ_rate', 'avg_electricity_retai

In [3]:
for v in sorted(usecols):
     print("'{}',".format(v))

'Adoption',
'AvgSres',
'DS_HighSolar',
'Green_Travelers',
'High_Solar_Areas',
'Hot_Spots_AvgAr',
'Hot_Spots_hh',
'Hot_Spots_hown',
'Inc_x_Consmpt_kwh',
'Income_x_EnergyCost',
'Low_Solar_Areas',
'Mid West',
'NT3',
'NorthEast',
'PV_HuOwn',
'Policy_Combo',
'South',
'T3',
'URBAN',
'West',
'WestNT3',
'Yrl_savings_$',
'age_10_14_rate',
'age_15_17_rate',
'age_18_24_rate',
'age_25_34_rate',
'age_35_44_rate',
'age_45_54_rate',
'age_55_64_rate',
'age_5_9_rate',
'age_65_74_rate',
'age_75_84_rate',
'age_median',
'average_household_size',
'avg_monthly_bill_dlrs',
'avg_monthly_consumption_kwh',
'cdd',
'cdd_std',
'climate_zone',
'company_na',
'company_ty',
'cooling_design_temperature',
'cust_cnt',
'daily_solar_radiation',
'diversity',
'dlrs_kwh',
'education_bachelor',
'education_bachelor_rate',
'education_college',
'education_college_rate',
'education_doctoral',
'education_doctoral_rate',
'education_high_school_graduate',
'education_high_school_graduate_rate',
'education_less_than_high_school',
'educ

In [4]:
for v in usecols:
    if v not in convergent_dfALL.columns.tolist():
        print("'{}',".format(v))

# CSV loading of select variables

In [5]:
# usually in any kind of programming you want to add any outside methods/classes etc in the beginning so 
# you can access them for the runtime of your program 
# this block imports some useful tools from other modules
# These modules can be downloaded/installed using tools such as pip and conda
# if using pycharm you can just go to file/settings/interpretor and press the + button in the bottom window and search 
# for what you need. 
import pandas as pd                           # import the pandas data frame moduel and call it pd
from _DeepSolarTools.__DEEPSOLAR_Resources import *
from _DeepSolarTools.__PAPER_RESOURCES import *
#__PAPER_RESOURCES.py
path_to_file = r'./_Data/_Mixed/US_set_all_OMEGA_1_24_21_Base.csv'

usecols = list(set(pd.read_excel(r'./_Data/MainUsecolumns/To_Add_to_model.xlsx', usecols=['Variables']).values.flatten().tolist() + model_varsD))

print(usecols)
print(len(usecols))

convergent_dfCSV = pd.read_csv(path_to_file, usecols=usecols, low_memory=False)
convergent_dfCSV.rename(label_translation_dict,inplace=True)
display(convergent_dfCSV)

['Green_Travelers', 'heating_fuel_housing_unit_count', 'hh_size_1', 'popden_x_TotOK_cnt', 'avg_monthly_bill_dlrs', 'travel_time_60_89_rate', 'heating_design_temperature', 'hu_2000toafter', 'number_of_solar_system_per_household', 'mortgage_with_rate', 'cdd_std', 'occupancy_owner_rate', 'hu_monthly_owner_costs_greaterthan_1000dlrs', 'High_Solar_Areas', 'cooling_design_temperature', 'very_low_sf_own_mwh', 'Yrl_savings_$', 'education_bachelor', 'Low_Solar_Areas', 'age_25_34_rate', 'mod_sf_own_mwh', 'education_professional_school_rate', 'occupation_transportation_rate', 'AvgSres', 'pop_under_18', 'transportation_public_rate', 'locale_recode(rural)', 'heating_fuel_other_rate', 'occupation_agriculture_rate', 'hu_own', 'South', 'race_asian', 'cdd', 'hh_med_income', 'age_median', 'heating_fuel_solar_rate', 'dlrs_kwh', 'hu_mortgage', 'West', 'heating_fuel_fuel_oil_kerosene_rate', 'age_18_24_rate', 'diversity', 'company_ty', 'occupation_education_rate', 'Policy_Combo', 'poverty_family_below_pover

ValueError: Usecols do not match columns, columns expected but not found: ['race_asian', 'race_asian_rate', 'race_black_africa_rate', 'race_black_africa', 'Anti_Adoption', 'E_MINRTY', 'race_islander', 'race_two_more_rate', 'sales_tax', 'rebate', 'E_NOVEH ', 'race_white_rate', 'E_AGE17 ', 'E_DAYPOP', 'race_other', 'race_two_more', 'race_indian_alaska', 'race_islander_rate', 'race_other_rate', 'race_indian_alaska_rate']

In [None]:
for v in convergent_dfCSV

# Excel loading of select variables

In [None]:
# usually in any kind of programming you want to add any outside methods/classes etc in the beginning so 
# you can access them for the runtime of your program 
# this block imports some useful tools from other modules
# These modules can be downloaded/installed using tools such as pip and conda
# if using pycharm you can just go to file/settings/interpretor and press the + button in the bottom window and search 
# for what you need. 
import pandas as pd                           # import the pandas data frame moduel and call it pd
from _DeepSolarTools.__DEEPSOLAR_Resources import *
from _DeepSolarTools.__PAPER_RESOURCES import *


path_to_file = r'./_Data/_Mixed/US_set_all_OMEGA_1_24_21_Base.xlsx'
usecols = list(set(pd.read_excel(r'./_Data/MainUsecolumns/To_Add_to_model.xlsx', usecols=['Variables']).values.flatten().tolist() + model_varsD))

print(usecols)
print(len(usecols))


convergent_dfEX = pd.read_excel(path_to_file, usecols=usecols, )

In [None]:
for v in convergent_dfCSV.columns.tolist():
    if v in label_translation_dict:
        print(v)
    else:
        print("\t\t\t{}".format(v))

In [None]:
for v in usecols:
    if v in label_translation_dict:
        print("{}:{},".format(v, label_translation_dict[v]))

In [None]:
display(pd.read_excel(r'./_Data/MainUsecolumns/To_Add_to_model.xlsx', usecols=['Variables']).values.flatten().tolist())

In [None]:
print(len(label_translation_dict))

In [None]:
for v in convergent_df.columns.tolist():
    print("{},".format(v))