# Linear Regression Models

Our team created three linear regression models to predict U.S. counties' estimated hesitant, hesitant or unsure, and strongly hesistant rates, respectively, based on 25 features.

----------------------------- 
### **Feature Groups**

**Vaccine Availability:**
1. `the_number_of_providers` per FIPS code
2. `the_average_supply_level` for all location providers per FIPS code

**Community Risk/Vaccination Measures:**

3. `social_vulnerability_index_svi` ranges from 0 (least) to 1 (most) to measure community's vulnerability to disaster
4. `cvac_level_of_concern_for_vaccination_rollout` ranges from 0 (lowest) to 1 (highest) for concern for difficult vaccine rollout
5. `percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21`
6. `community_transmission_level` is either low, moderate, substantial, or high rate of transmission:
* based on total new COVID-19 cases per 100,000 persons in the last 7 days and percentage of positive SARS-CoV-2 diagnostic nucleic acid amplification tests (NAAT) in the last 7 days (higher of the two if different)
7. `series_complete_pop_pct` tells percent of people fully vaccinated

**Orders/Bans and Other Guidance:**

8. `masks_order_code`where 1 = public mask mandate; 2 = no public mask mandate
9. `general_gb_order_code` indicates size of gatherings banned:
* 1 = No order (for gathering ban) found
* 6 = Ban of gatherings over 1-10 people
* 4 = Ban of gatherings over 26-50 people
* 2 = Ban of gatherings over 101 or more people
* 7 = Bans gatherings of any size
10. `general_or_under_6ft_bans_gatherings_over` tells max number of people that can gather without social distancing:
* No ban, 0, 6, 8, 10, 50, 150, or 250
11. `stay_at_home_order_code` based on stay at home order recommendation:
* 7 = No order for individuals to stay home or NA 
* 6 = Advisory/Recommendation
* 3 = Mandatory only for at-risk individuals in the jurisdiction
    
**Health Conditions** (county average estimates of prevalence of specified health conditions in adults 2019):

12. `avg_asthma`
13. `avg_chd` (coronary heart disease)
14. `avg_checkup`
15. `avg_copd` (chronic obstructive pulmonary disease)
16. `avg_smoking`
17. `avg_depression`
18. `avg_diabetes`
19. `avg_ghlth` for fair or poor health
20. `avg_lpa` for no leisure-time physical activity
21. `avg_mhlth` for mental health not good for >=14 days
22. `avg_obesity`
23. `avg_sleep` for sleeping less than 7 hours

**Other:**

24. `fips_code` is Federal Information Processing Standards (FIPS) Code; unique to each county
25. `metro_status` for metro (metropolitan county) vs non-metro

# Clean Data:

In [448]:
# import dependencies
import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt

In [449]:
# load file
df = pd.read_csv(Path('Resources/(new)joined_8_tables_on_county_level.csv'))
df

Unnamed: 0,county_fips_code,the_number_of_providers,the_average_supply_level,fips_code,county_name,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,estimated_hesitant,estimated_hesitant_or_unsure,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,10001,195,0.333333,10001,"Kent County, Delaware",0.73,0.32,0.435,0.0664,0.1391,...,80.593750,8.821875,19.321875,20.575000,12.378125,20.303125,31.053125,16.381250,40.696875,38.543750
1,10003,638,0.194357,10003,"New Castle County, Delaware",0.38,0.16,0.552,0.0564,0.1180,...,79.198450,7.067442,16.818605,20.256589,10.819380,18.208527,27.695349,14.471318,33.551938,36.662791
2,10005,295,0.196610,10005,"Sussex County, Delaware",0.40,0.12,0.569,0.0555,0.1121,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.600000,27.286792,13.303774,34.924528,33.426415
3,11001,678,0.244838,11001,"District of Columbia, District of Columbia",0.60,0.17,0.546,0.0655,0.0850,...,79.094944,4.977528,15.800000,20.562921,9.938202,16.479775,20.893820,13.581461,28.410112,37.917978
4,12001,345,0.382609,12001,"Alachua County, Florida",0.47,0.63,0.547,0.1167,0.1711,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.637500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2616,56037,39,0.102564,56037,"Sweetwater County, Wyoming",0.37,0.56,0.348,0.2184,0.2843,...,65.275000,5.983333,18.658333,19.225000,7.541667,15.700000,24.366667,13.900000,33.166667,35.808333
2617,56039,15,0.000000,56039,"Teton County, Wyoming",0.11,0.44,0.789,0.2050,0.2671,...,65.425000,4.425000,12.175000,16.825000,6.025000,10.950000,17.500000,10.875000,21.975000,27.325000
2618,56041,33,-0.090909,56041,"Uinta County, Wyoming",0.45,0.48,0.388,0.2184,0.2843,...,63.466667,7.200000,20.266667,19.833333,8.066667,17.100000,27.466667,14.533333,30.133333,36.333333
2619,56043,7,-0.714286,56043,"Washakie County, Wyoming",0.37,0.69,0.364,0.2283,0.2943,...,66.966667,8.033333,18.133333,18.000000,10.433333,18.466667,26.600000,13.033333,28.933333,31.466667


In [450]:
df.dtypes

county_fips_code                                                    int64
the_number_of_providers                                             int64
the_average_supply_level                                          float64
fips_code                                                           int64
county_name                                                        object
social_vulnerability_index_svi                                    float64
cvac_level_of_concern_for_vaccination_rollout                     float64
percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21    float64
estimated_hesitant                                                float64
estimated_hesitant_or_unsure                                      float64
estimated_strongly_hesitant                                       float64
masks_order_code                                                    int64
completeness_pct                                                  float64
metro_status                          

In [451]:
# drop redundant columns
df = df.drop(['county_fips_code', 'county_name', 'indoor_outdoor', 'completeness_pct', 'series_complete_yes', 'series_complete_pop_pct_svi', 'series_complete_pop_pct_ur_equity', 'general_gb_order_group', 'cases_per_100k_7_day_count_change'], axis=1)
df

Unnamed: 0,the_number_of_providers,the_average_supply_level,fips_code,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,estimated_hesitant,estimated_hesitant_or_unsure,estimated_strongly_hesitant,masks_order_code,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,195,0.333333,10001,0.73,0.32,0.435,0.0664,0.1391,0.0388,2,...,80.593750,8.821875,19.321875,20.575000,12.378125,20.303125,31.053125,16.381250,40.696875,38.543750
1,638,0.194357,10003,0.38,0.16,0.552,0.0564,0.1180,0.0329,2,...,79.198450,7.067442,16.818605,20.256589,10.819380,18.208527,27.695349,14.471318,33.551938,36.662791
2,295,0.196610,10005,0.40,0.12,0.569,0.0555,0.1121,0.0328,2,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.600000,27.286792,13.303774,34.924528,33.426415
3,678,0.244838,11001,0.60,0.17,0.546,0.0655,0.0850,0.0403,2,...,79.094944,4.977528,15.800000,20.562921,9.938202,16.479775,20.893820,13.581461,28.410112,37.917978
4,345,0.382609,12001,0.47,0.63,0.547,0.1167,0.1711,0.0755,2,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.637500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2616,39,0.102564,56037,0.37,0.56,0.348,0.2184,0.2843,0.1584,2,...,65.275000,5.983333,18.658333,19.225000,7.541667,15.700000,24.366667,13.900000,33.166667,35.808333
2617,15,0.000000,56039,0.11,0.44,0.789,0.2050,0.2671,0.1472,2,...,65.425000,4.425000,12.175000,16.825000,6.025000,10.950000,17.500000,10.875000,21.975000,27.325000
2618,33,-0.090909,56041,0.45,0.48,0.388,0.2184,0.2843,0.1584,2,...,63.466667,7.200000,20.266667,19.833333,8.066667,17.100000,27.466667,14.533333,30.133333,36.333333
2619,7,-0.714286,56043,0.37,0.69,0.364,0.2283,0.2943,0.1687,2,...,66.966667,8.033333,18.133333,18.000000,10.433333,18.466667,26.600000,13.033333,28.933333,31.466667


In [452]:
## check total nulls per column
df.isnull().sum()

the_number_of_providers                                             0
the_average_supply_level                                            0
fips_code                                                           0
social_vulnerability_index_svi                                      1
cvac_level_of_concern_for_vaccination_rollout                       0
percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21    214
estimated_hesitant                                                  0
estimated_hesitant_or_unsure                                        0
estimated_strongly_hesitant                                         0
masks_order_code                                                    0
metro_status                                                        0
series_complete_pop_pct                                             7
general_gb_order_code                                               0
general_or_under_6ft_bans_gatherings_over                         201
stay_at_home_order_c

In [453]:
# fill nulls: since 'percent_adults_fully_vaccinated' is continous data, fill with mean values (Mean Completer)
from sklearn.impute import SimpleImputer
mean_imputer = SimpleImputer(missing_values = np.NaN, strategy='mean',add_indicator=True)
mode_imputer = SimpleImputer(missing_values = np.NaN, strategy='most_frequent', add_indicator=True)

vaccine_series = mean_imputer.fit_transform(np.array(df['percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21']).reshape(-1,1))

df['percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21'] = np.array(vaccine_series)
df

Unnamed: 0,the_number_of_providers,the_average_supply_level,fips_code,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,estimated_hesitant,estimated_hesitant_or_unsure,estimated_strongly_hesitant,masks_order_code,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,195,0.333333,10001,0.73,0.32,0.435,0.0664,0.1391,0.0388,2,...,80.593750,8.821875,19.321875,20.575000,12.378125,20.303125,31.053125,16.381250,40.696875,38.543750
1,638,0.194357,10003,0.38,0.16,0.552,0.0564,0.1180,0.0329,2,...,79.198450,7.067442,16.818605,20.256589,10.819380,18.208527,27.695349,14.471318,33.551938,36.662791
2,295,0.196610,10005,0.40,0.12,0.569,0.0555,0.1121,0.0328,2,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.600000,27.286792,13.303774,34.924528,33.426415
3,678,0.244838,11001,0.60,0.17,0.546,0.0655,0.0850,0.0403,2,...,79.094944,4.977528,15.800000,20.562921,9.938202,16.479775,20.893820,13.581461,28.410112,37.917978
4,345,0.382609,12001,0.47,0.63,0.547,0.1167,0.1711,0.0755,2,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.637500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2616,39,0.102564,56037,0.37,0.56,0.348,0.2184,0.2843,0.1584,2,...,65.275000,5.983333,18.658333,19.225000,7.541667,15.700000,24.366667,13.900000,33.166667,35.808333
2617,15,0.000000,56039,0.11,0.44,0.789,0.2050,0.2671,0.1472,2,...,65.425000,4.425000,12.175000,16.825000,6.025000,10.950000,17.500000,10.875000,21.975000,27.325000
2618,33,-0.090909,56041,0.45,0.48,0.388,0.2184,0.2843,0.1584,2,...,63.466667,7.200000,20.266667,19.833333,8.066667,17.100000,27.466667,14.533333,30.133333,36.333333
2619,7,-0.714286,56043,0.37,0.69,0.364,0.2283,0.2943,0.1687,2,...,66.966667,8.033333,18.133333,18.000000,10.433333,18.466667,26.600000,13.033333,28.933333,31.466667


In [454]:
## check total nulls per column
df.isnull().sum()

the_number_of_providers                                             0
the_average_supply_level                                            0
fips_code                                                           0
social_vulnerability_index_svi                                      1
cvac_level_of_concern_for_vaccination_rollout                       0
percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21      0
estimated_hesitant                                                  0
estimated_hesitant_or_unsure                                        0
estimated_strongly_hesitant                                         0
masks_order_code                                                    0
metro_status                                                        0
series_complete_pop_pct                                             7
general_gb_order_code                                               0
general_or_under_6ft_bans_gatherings_over                         201
stay_at_home_order_c

In [455]:
# drop nulls
df = df.dropna()
df

Unnamed: 0,the_number_of_providers,the_average_supply_level,fips_code,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,estimated_hesitant,estimated_hesitant_or_unsure,estimated_strongly_hesitant,masks_order_code,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,195,0.333333,10001,0.73,0.32,0.435,0.0664,0.1391,0.0388,2,...,80.593750,8.821875,19.321875,20.575000,12.378125,20.303125,31.053125,16.381250,40.696875,38.543750
1,638,0.194357,10003,0.38,0.16,0.552,0.0564,0.1180,0.0329,2,...,79.198450,7.067442,16.818605,20.256589,10.819380,18.208527,27.695349,14.471318,33.551938,36.662791
2,295,0.196610,10005,0.40,0.12,0.569,0.0555,0.1121,0.0328,2,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.600000,27.286792,13.303774,34.924528,33.426415
3,678,0.244838,11001,0.60,0.17,0.546,0.0655,0.0850,0.0403,2,...,79.094944,4.977528,15.800000,20.562921,9.938202,16.479775,20.893820,13.581461,28.410112,37.917978
4,345,0.382609,12001,0.47,0.63,0.547,0.1167,0.1711,0.0755,2,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.637500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2616,39,0.102564,56037,0.37,0.56,0.348,0.2184,0.2843,0.1584,2,...,65.275000,5.983333,18.658333,19.225000,7.541667,15.700000,24.366667,13.900000,33.166667,35.808333
2617,15,0.000000,56039,0.11,0.44,0.789,0.2050,0.2671,0.1472,2,...,65.425000,4.425000,12.175000,16.825000,6.025000,10.950000,17.500000,10.875000,21.975000,27.325000
2618,33,-0.090909,56041,0.45,0.48,0.388,0.2184,0.2843,0.1584,2,...,63.466667,7.200000,20.266667,19.833333,8.066667,17.100000,27.466667,14.533333,30.133333,36.333333
2619,7,-0.714286,56043,0.37,0.69,0.364,0.2283,0.2943,0.1687,2,...,66.966667,8.033333,18.133333,18.000000,10.433333,18.466667,26.600000,13.033333,28.933333,31.466667


##### 440 rows dropped; we do not believe this will have a huge impact on our data given the abundance of data remaining

# Pre-process/Encode Data:

In [456]:
# view data types
df.dtypes

the_number_of_providers                                             int64
the_average_supply_level                                          float64
fips_code                                                           int64
social_vulnerability_index_svi                                    float64
cvac_level_of_concern_for_vaccination_rollout                     float64
percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21    float64
estimated_hesitant                                                float64
estimated_hesitant_or_unsure                                      float64
estimated_strongly_hesitant                                       float64
masks_order_code                                                    int64
metro_status                                                       object
series_complete_pop_pct                                           float64
general_gb_order_code                                               int64
general_or_under_6ft_bans_gatherings_o

In [457]:
# show columns with object data types and their unique values count
df_cols_to_encode = df.dtypes[df.dtypes == 'object'].index
df[df_cols_to_encode].nunique()

metro_status                                 2
general_or_under_6ft_bans_gatherings_over    8
community_transmission_level                 4
dtype: int64

In [458]:
# print unique values in object columns
for i in df_cols_to_encode:
    print(f"{i}: {df[i].unique()}\n")

metro_status: ['Metro' 'Non-metro']

general_or_under_6ft_bans_gatherings_over: ['No ban' '50' '150' '250' '10' '6' '8' '0']

community_transmission_level: ['moderate' 'low' 'substantial' 'high']



In [459]:
# label encoder for object type columns
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df_new = df.copy()
df_new['metro_status'] = le.fit_transform(df_new['metro_status'])
df_new['general_or_under_6ft_bans_gatherings_over'] = le.fit_transform(df_new['general_or_under_6ft_bans_gatherings_over'])
df_new['community_transmission_level'] = le.fit_transform(df_new['community_transmission_level'])
df_new

Unnamed: 0,the_number_of_providers,the_average_supply_level,fips_code,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,estimated_hesitant,estimated_hesitant_or_unsure,estimated_strongly_hesitant,masks_order_code,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,195,0.333333,10001,0.73,0.32,0.435,0.0664,0.1391,0.0388,2,...,80.593750,8.821875,19.321875,20.575000,12.378125,20.303125,31.053125,16.381250,40.696875,38.543750
1,638,0.194357,10003,0.38,0.16,0.552,0.0564,0.1180,0.0329,2,...,79.198450,7.067442,16.818605,20.256589,10.819380,18.208527,27.695349,14.471318,33.551938,36.662791
2,295,0.196610,10005,0.40,0.12,0.569,0.0555,0.1121,0.0328,2,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.600000,27.286792,13.303774,34.924528,33.426415
3,678,0.244838,11001,0.60,0.17,0.546,0.0655,0.0850,0.0403,2,...,79.094944,4.977528,15.800000,20.562921,9.938202,16.479775,20.893820,13.581461,28.410112,37.917978
4,345,0.382609,12001,0.47,0.63,0.547,0.1167,0.1711,0.0755,2,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.637500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2616,39,0.102564,56037,0.37,0.56,0.348,0.2184,0.2843,0.1584,2,...,65.275000,5.983333,18.658333,19.225000,7.541667,15.700000,24.366667,13.900000,33.166667,35.808333
2617,15,0.000000,56039,0.11,0.44,0.789,0.2050,0.2671,0.1472,2,...,65.425000,4.425000,12.175000,16.825000,6.025000,10.950000,17.500000,10.875000,21.975000,27.325000
2618,33,-0.090909,56041,0.45,0.48,0.388,0.2184,0.2843,0.1584,2,...,63.466667,7.200000,20.266667,19.833333,8.066667,17.100000,27.466667,14.533333,30.133333,36.333333
2619,7,-0.714286,56043,0.37,0.69,0.364,0.2283,0.2943,0.1687,2,...,66.966667,8.033333,18.133333,18.000000,10.433333,18.466667,26.600000,13.033333,28.933333,31.466667


In [460]:
df_new.dtypes

the_number_of_providers                                             int64
the_average_supply_level                                          float64
fips_code                                                           int64
social_vulnerability_index_svi                                    float64
cvac_level_of_concern_for_vaccination_rollout                     float64
percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21    float64
estimated_hesitant                                                float64
estimated_hesitant_or_unsure                                      float64
estimated_strongly_hesitant                                       float64
masks_order_code                                                    int64
metro_status                                                        int32
series_complete_pop_pct                                           float64
general_gb_order_code                                               int64
general_or_under_6ft_bans_gatherings_o

# Define Features Set and Three Targets:

In [461]:
# define features set used for all three targets
X = df_new.copy()
X = X.drop(['estimated_hesitant', 'estimated_hesitant_or_unsure', 'estimated_strongly_hesitant'], axis=1)
X.head()

Unnamed: 0,the_number_of_providers,the_average_supply_level,fips_code,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,masks_order_code,metro_status,series_complete_pop_pct,general_gb_order_code,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,195,0.333333,10001,0.73,0.32,0.435,2,0,34.9,1,...,80.59375,8.821875,19.321875,20.575,12.378125,20.303125,31.053125,16.38125,40.696875,38.54375
1,638,0.194357,10003,0.38,0.16,0.552,2,0,45.5,1,...,79.19845,7.067442,16.818605,20.256589,10.81938,18.208527,27.695349,14.471318,33.551938,36.662791
2,295,0.19661,10005,0.4,0.12,0.569,2,0,47.6,1,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.6,27.286792,13.303774,34.924528,33.426415
3,678,0.244838,11001,0.6,0.17,0.546,2,0,46.5,1,...,79.094944,4.977528,15.8,20.562921,9.938202,16.479775,20.89382,13.581461,28.410112,37.917978
4,345,0.382609,12001,0.47,0.63,0.547,2,0,46.6,1,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.6375


In [462]:
# set target for each model
hes_y = df_new['estimated_hesitant']

hes_unsure_y = df_new['estimated_hesitant_or_unsure']

strg_hes_y = df_new['estimated_strongly_hesitant']

# Train, Test, Evaluate Model
##  Target: Estimated Hesitant

In [463]:
# split data into training and testing
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    hes_y, 
                                                    random_state=1)
X_train.shape

(1793, 25)

In [464]:
# fit the model
from sklearn.linear_model import LinearRegression
classifier = LinearRegression().fit(X_train, y_train)

In [465]:
# predict with test data
y_pred = classifier.predict(X_test)
y_pred

array([0.15746501, 0.12209818, 0.14599018, 0.10012311, 0.08334795,
       0.12326321, 0.14624001, 0.09732415, 0.1287034 , 0.11222926,
       0.0622016 , 0.13122898, 0.14809121, 0.12921727, 0.17486518,
       0.13639113, 0.13946812, 0.08231533, 0.15532882, 0.13031898,
       0.11185762, 0.08242689, 0.11780875, 0.09998068, 0.11027831,
       0.1335816 , 0.1282838 , 0.09477465, 0.12903443, 0.12860962,
       0.10260331, 0.16421472, 0.08679773, 0.12635427, 0.10224537,
       0.18383562, 0.13503997, 0.07917969, 0.11906953, 0.11737223,
       0.16267667, 0.18678594, 0.1503613 , 0.12262256, 0.12731231,
       0.15560804, 0.09716048, 0.12331839, 0.12386725, 0.17759314,
       0.16062532, 0.11404845, 0.08779591, 0.12247822, 0.12734013,
       0.1715175 , 0.09363228, 0.18303104, 0.11090096, 0.12947047,
       0.17885431, 0.11110427, 0.08649926, 0.11873854, 0.13731168,
       0.17753192, 0.12337586, 0.15937838, 0.13095125, 0.09490221,
       0.09095727, 0.20390206, 0.16331793, 0.1211531 , 0.11982

In [466]:
# Model Evaluation:
from sklearn.metrics import r2_score
print("\n*****Linear Regression Model Evaluation*****")
print("---------------------------------------")
print("Target Predicted : Estimated Hesitant\n")
print(f"R2 Score : {r2_score(y_test, y_pred)}")


*****Linear Regression Model Evaluation*****
---------------------------------------
Target Predicted : Estimated Hesitant

R2 Score : 0.5616565516951978


# Train, Test, Evaluate Model
##  Target: Estimated Hesitant or Unsure

In [467]:
# split data into training and testing
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    hes_unsure_y, 
                                                    random_state=1)
X_train.shape

(1793, 25)

In [468]:
# fit the model
from sklearn.linear_model import LinearRegression
classifier = LinearRegression().fit(X_train, y_train)

In [469]:
# predict with test data
y_pred = classifier.predict(X_test)
y_pred

array([0.20143881, 0.18675944, 0.21511788, 0.15056862, 0.12374592,
       0.18020279, 0.21549734, 0.17115735, 0.17914885, 0.16369924,
       0.11806718, 0.1724288 , 0.20332604, 0.18252733, 0.24026443,
       0.20242539, 0.19720798, 0.13967956, 0.2091245 , 0.18244383,
       0.18070525, 0.153656  , 0.16478062, 0.15233761, 0.16532708,
       0.19332082, 0.17856427, 0.17644316, 0.19618939, 0.16659549,
       0.14708906, 0.21864617, 0.13973277, 0.18847961, 0.16774575,
       0.25712332, 0.19097946, 0.1297267 , 0.16800122, 0.17558267,
       0.2336135 , 0.24901979, 0.21292456, 0.18610292, 0.20794288,
       0.21257636, 0.14514269, 0.18011433, 0.17671547, 0.24614457,
       0.22688504, 0.1752409 , 0.13416516, 0.19905392, 0.19145842,
       0.2472224 , 0.14035231, 0.23751213, 0.18296007, 0.19562639,
       0.25467831, 0.17281308, 0.12688881, 0.17760374, 0.18945217,
       0.2442825 , 0.17606832, 0.23376847, 0.18358386, 0.14467798,
       0.13627236, 0.28438149, 0.22908147, 0.17104074, 0.18445

In [470]:
# Model Evaluation:
from sklearn.metrics import r2_score
print("\n*****Linear Regression Model Evaluation*****")
print("---------------------------------------")
print("Target Predicted : Estimated Hesitant or Unsure\n")
print(f"R2 Score : {r2_score(y_test, y_pred)}")


*****Linear Regression Model Evaluation*****
---------------------------------------
Target Predicted : Estimated Hesitant or Unsure

R2 Score : 0.5746196545497282


# Train, Test, Evaluate Model
##  Target: Estimated Strongly Hesitant

In [471]:
# split data into training and testing
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    strg_hes_y, 
                                                    random_state=1)
X_train.shape

(1793, 25)

In [472]:
# fit the model
from sklearn.linear_model import LinearRegression
classifier = LinearRegression().fit(X_train, y_train)

In [473]:
# predict with test data
y_pred = classifier.predict(X_test)
y_pred

array([0.10483921, 0.07866085, 0.0997444 , 0.07019202, 0.06251902,
       0.07344432, 0.0949849 , 0.06424531, 0.08125423, 0.0641979 ,
       0.03924213, 0.07918346, 0.09820071, 0.07671889, 0.11570516,
       0.08569315, 0.08536291, 0.05190007, 0.10016508, 0.07947666,
       0.07510344, 0.04948406, 0.0730941 , 0.05809878, 0.06606627,
       0.08967256, 0.08761935, 0.06488074, 0.084642  , 0.08097962,
       0.07069632, 0.1109247 , 0.06000013, 0.08820782, 0.06130143,
       0.11883434, 0.08576888, 0.05605666, 0.07393564, 0.07336534,
       0.10444573, 0.1291541 , 0.10202392, 0.07829845, 0.08554262,
       0.1086412 , 0.0548254 , 0.07410593, 0.08790839, 0.12233508,
       0.10744638, 0.07119777, 0.0576266 , 0.07709449, 0.08178288,
       0.11503362, 0.05687345, 0.12479037, 0.07257712, 0.0807751 ,
       0.11634115, 0.06717452, 0.06459309, 0.07977506, 0.09186975,
       0.11528901, 0.07166815, 0.10269005, 0.08725249, 0.05321178,
       0.05971104, 0.13450855, 0.10557044, 0.07366987, 0.07398

In [474]:
# Model Evaluation:
from sklearn.metrics import r2_score
print("\n*****Linear Regression Model Evaluation*****")
print("---------------------------------------")
print("Target Predicted : Estimated Strongly Hesitant")
print(f"R2 Score : {r2_score(y_test, y_pred)}")


*****Linear Regression Model Evaluation*****
---------------------------------------
Target Predicted : Estimated Strongly Hesitant
R2 Score : 0.5428491364996555
