# Linear Regression Models

Our team created three linear regression models to predict U.S. counties' estimated hesitant, hesitant or unsure, and strongly hesistant rates, respectively, based on 25 features.

----------------------------- 
### **Feature Groups**

**Vaccine Availability:**
1. `the_number_of_providers` per FIPS code
2. `the_average_supply_level` for all location providers per FIPS code

**Community Risk/Vaccination Measures:**

3. `social_vulnerability_index_svi` ranges from 0 (least) to 1 (most) to measure community's vulnerability to disaster
4. `cvac_level_of_concern_for_vaccination_rollout` ranges from 0 (lowest) to 1 (highest) for concern for difficult vaccine rollout
5. `percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21`
6. `community_transmission_level` is either low, moderate, substantial, or high rate of transmission:
* based on total new COVID-19 cases per 100,000 persons in the last 7 days and percentage of positive SARS-CoV-2 diagnostic nucleic acid amplification tests (NAAT) in the last 7 days (higher of the two if different)
7. `completeness_pct` tells percent of people fully vaccinated with a reported, valid county FIPS code in the jurisdiction  

**Orders/Bans and Other Guidance:**

8. `masks_order_code`where 1 = public mask mandate; 2 = no public mask mandate
9. `general_gb_order_code` indicates size of gatherings banned:
* 1 = No order (for gathering ban) found
* 6 = Ban of gatherings over 1-10 people
* 4 = Ban of gatherings over 26-50 people
* 2 = Ban of gatherings over 101 or more people
* 7 = Bans gatherings of any size
10. `general_or_under_6ft_bans_gatherings_over` tells max number of people that can gather without social distancing:
* No ban, 0, 6, 8, 10, 50, 150, or 250
11. `stay_at_home_order_code` based on stay at home order recommendation:
* 7 = No order for individuals to stay home or NA 
* 6 = Advisory/Recommendation
* 3 = Mandatory only for at-risk individuals in the jurisdiction
    
**Health Conditions** (county average estimates of prevalence of specified health conditions in adults 2019):

12. `avg_asthma`
13. `avg_chd` (coronary heart disease)
14. `avg_checkup`
15. `avg_copd` (chronic obstructive pulmonary disease)
16. `avg_smoking`
17. `avg_depression`
18. `avg_diabetes`
19. `avg_ghlth` for fair or poor health
20. `avg_lpa` for no leisure-time physical activity
21. `avg_mhlth` for mental health not good for >=14 days
22. `avg_obesity`
23. `avg_sleep` for sleeping less than 7 hours

**Other:**

24. `fips_code` is Federal Information Processing Standards (FIPS) Code; unique to each county
25. `metro_status` for metro (metropolitan county) vs non-metro

# Clean Data:

In [1]:
# import dependencies
import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt

In [2]:
# load file
df = pd.read_csv(Path('Resources/(new)joined_8_tables_on_county_level.csv'))
df

Unnamed: 0,county_fips_code,the_number_of_providers,the_average_supply_level,fips_code,county_name,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,estimated_hesitant,estimated_hesitant_or_unsure,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,10001,195,0.333333,10001,"Kent County, Delaware",0.73,0.32,0.435,0.0664,0.1391,...,80.593750,8.821875,19.321875,20.575000,12.378125,20.303125,31.053125,16.381250,40.696875,38.543750
1,10003,638,0.194357,10003,"New Castle County, Delaware",0.38,0.16,0.552,0.0564,0.1180,...,79.198450,7.067442,16.818605,20.256589,10.819380,18.208527,27.695349,14.471318,33.551938,36.662791
2,10005,295,0.196610,10005,"Sussex County, Delaware",0.40,0.12,0.569,0.0555,0.1121,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.600000,27.286792,13.303774,34.924528,33.426415
3,11001,678,0.244838,11001,"District of Columbia, District of Columbia",0.60,0.17,0.546,0.0655,0.0850,...,79.094944,4.977528,15.800000,20.562921,9.938202,16.479775,20.893820,13.581461,28.410112,37.917978
4,12001,345,0.382609,12001,"Alachua County, Florida",0.47,0.63,0.547,0.1167,0.1711,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.637500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2616,56037,39,0.102564,56037,"Sweetwater County, Wyoming",0.37,0.56,0.348,0.2184,0.2843,...,65.275000,5.983333,18.658333,19.225000,7.541667,15.700000,24.366667,13.900000,33.166667,35.808333
2617,56039,15,0.000000,56039,"Teton County, Wyoming",0.11,0.44,0.789,0.2050,0.2671,...,65.425000,4.425000,12.175000,16.825000,6.025000,10.950000,17.500000,10.875000,21.975000,27.325000
2618,56041,33,-0.090909,56041,"Uinta County, Wyoming",0.45,0.48,0.388,0.2184,0.2843,...,63.466667,7.200000,20.266667,19.833333,8.066667,17.100000,27.466667,14.533333,30.133333,36.333333
2619,56043,7,-0.714286,56043,"Washakie County, Wyoming",0.37,0.69,0.364,0.2283,0.2943,...,66.966667,8.033333,18.133333,18.000000,10.433333,18.466667,26.600000,13.033333,28.933333,31.466667


In [3]:
df.dtypes

county_fips_code                                                    int64
the_number_of_providers                                             int64
the_average_supply_level                                          float64
fips_code                                                           int64
county_name                                                        object
social_vulnerability_index_svi                                    float64
cvac_level_of_concern_for_vaccination_rollout                     float64
percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21    float64
estimated_hesitant                                                float64
estimated_hesitant_or_unsure                                      float64
estimated_strongly_hesitant                                       float64
masks_order_code                                                    int64
completeness_pct                                                  float64
metro_status                          

In [4]:
# drop redundant columns
df = df.drop(['county_fips_code', 'county_name', 'indoor_outdoor', 'series_complete_pop_pct', 'series_complete_yes', 'series_complete_pop_pct_svi', 'series_complete_pop_pct_ur_equity', 'general_gb_order_group', 'cases_per_100k_7_day_count_change'], axis=1)
df

Unnamed: 0,the_number_of_providers,the_average_supply_level,fips_code,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,estimated_hesitant,estimated_hesitant_or_unsure,estimated_strongly_hesitant,masks_order_code,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,195,0.333333,10001,0.73,0.32,0.435,0.0664,0.1391,0.0388,2,...,80.593750,8.821875,19.321875,20.575000,12.378125,20.303125,31.053125,16.381250,40.696875,38.543750
1,638,0.194357,10003,0.38,0.16,0.552,0.0564,0.1180,0.0329,2,...,79.198450,7.067442,16.818605,20.256589,10.819380,18.208527,27.695349,14.471318,33.551938,36.662791
2,295,0.196610,10005,0.40,0.12,0.569,0.0555,0.1121,0.0328,2,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.600000,27.286792,13.303774,34.924528,33.426415
3,678,0.244838,11001,0.60,0.17,0.546,0.0655,0.0850,0.0403,2,...,79.094944,4.977528,15.800000,20.562921,9.938202,16.479775,20.893820,13.581461,28.410112,37.917978
4,345,0.382609,12001,0.47,0.63,0.547,0.1167,0.1711,0.0755,2,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.637500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2616,39,0.102564,56037,0.37,0.56,0.348,0.2184,0.2843,0.1584,2,...,65.275000,5.983333,18.658333,19.225000,7.541667,15.700000,24.366667,13.900000,33.166667,35.808333
2617,15,0.000000,56039,0.11,0.44,0.789,0.2050,0.2671,0.1472,2,...,65.425000,4.425000,12.175000,16.825000,6.025000,10.950000,17.500000,10.875000,21.975000,27.325000
2618,33,-0.090909,56041,0.45,0.48,0.388,0.2184,0.2843,0.1584,2,...,63.466667,7.200000,20.266667,19.833333,8.066667,17.100000,27.466667,14.533333,30.133333,36.333333
2619,7,-0.714286,56043,0.37,0.69,0.364,0.2283,0.2943,0.1687,2,...,66.966667,8.033333,18.133333,18.000000,10.433333,18.466667,26.600000,13.033333,28.933333,31.466667


In [5]:
## check total nulls per column
df.isnull().sum()

the_number_of_providers                                             0
the_average_supply_level                                            0
fips_code                                                           0
social_vulnerability_index_svi                                      1
cvac_level_of_concern_for_vaccination_rollout                       0
percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21    214
estimated_hesitant                                                  0
estimated_hesitant_or_unsure                                        0
estimated_strongly_hesitant                                         0
masks_order_code                                                    0
completeness_pct                                                    0
metro_status                                                        0
general_gb_order_code                                               0
general_or_under_6ft_bans_gatherings_over                         201
stay_at_home_order_c

In [6]:
# fill nulls: since 'percent_adults_fully_vaccinated' is continous data, fill with mean values (Mean Completer)
from sklearn.impute import SimpleImputer
mean_imputer = SimpleImputer(missing_values = np.NaN, strategy='mean',add_indicator=True)
mode_imputer = SimpleImputer(missing_values = np.NaN, strategy='most_frequent', add_indicator=True)

vaccine_series = mean_imputer.fit_transform(np.array(df['percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21']).reshape(-1,1))

df['percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21'] = np.array(vaccine_series)
df

Unnamed: 0,the_number_of_providers,the_average_supply_level,fips_code,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,estimated_hesitant,estimated_hesitant_or_unsure,estimated_strongly_hesitant,masks_order_code,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,195,0.333333,10001,0.73,0.32,0.435,0.0664,0.1391,0.0388,2,...,80.593750,8.821875,19.321875,20.575000,12.378125,20.303125,31.053125,16.381250,40.696875,38.543750
1,638,0.194357,10003,0.38,0.16,0.552,0.0564,0.1180,0.0329,2,...,79.198450,7.067442,16.818605,20.256589,10.819380,18.208527,27.695349,14.471318,33.551938,36.662791
2,295,0.196610,10005,0.40,0.12,0.569,0.0555,0.1121,0.0328,2,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.600000,27.286792,13.303774,34.924528,33.426415
3,678,0.244838,11001,0.60,0.17,0.546,0.0655,0.0850,0.0403,2,...,79.094944,4.977528,15.800000,20.562921,9.938202,16.479775,20.893820,13.581461,28.410112,37.917978
4,345,0.382609,12001,0.47,0.63,0.547,0.1167,0.1711,0.0755,2,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.637500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2616,39,0.102564,56037,0.37,0.56,0.348,0.2184,0.2843,0.1584,2,...,65.275000,5.983333,18.658333,19.225000,7.541667,15.700000,24.366667,13.900000,33.166667,35.808333
2617,15,0.000000,56039,0.11,0.44,0.789,0.2050,0.2671,0.1472,2,...,65.425000,4.425000,12.175000,16.825000,6.025000,10.950000,17.500000,10.875000,21.975000,27.325000
2618,33,-0.090909,56041,0.45,0.48,0.388,0.2184,0.2843,0.1584,2,...,63.466667,7.200000,20.266667,19.833333,8.066667,17.100000,27.466667,14.533333,30.133333,36.333333
2619,7,-0.714286,56043,0.37,0.69,0.364,0.2283,0.2943,0.1687,2,...,66.966667,8.033333,18.133333,18.000000,10.433333,18.466667,26.600000,13.033333,28.933333,31.466667


In [7]:
## check total nulls per column
df.isnull().sum()

the_number_of_providers                                             0
the_average_supply_level                                            0
fips_code                                                           0
social_vulnerability_index_svi                                      1
cvac_level_of_concern_for_vaccination_rollout                       0
percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21      0
estimated_hesitant                                                  0
estimated_hesitant_or_unsure                                        0
estimated_strongly_hesitant                                         0
masks_order_code                                                    0
completeness_pct                                                    0
metro_status                                                        0
general_gb_order_code                                               0
general_or_under_6ft_bans_gatherings_over                         201
stay_at_home_order_c

In [8]:
# drop nulls
df = df.dropna()
df

Unnamed: 0,the_number_of_providers,the_average_supply_level,fips_code,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,estimated_hesitant,estimated_hesitant_or_unsure,estimated_strongly_hesitant,masks_order_code,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,195,0.333333,10001,0.73,0.32,0.435,0.0664,0.1391,0.0388,2,...,80.593750,8.821875,19.321875,20.575000,12.378125,20.303125,31.053125,16.381250,40.696875,38.543750
1,638,0.194357,10003,0.38,0.16,0.552,0.0564,0.1180,0.0329,2,...,79.198450,7.067442,16.818605,20.256589,10.819380,18.208527,27.695349,14.471318,33.551938,36.662791
2,295,0.196610,10005,0.40,0.12,0.569,0.0555,0.1121,0.0328,2,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.600000,27.286792,13.303774,34.924528,33.426415
3,678,0.244838,11001,0.60,0.17,0.546,0.0655,0.0850,0.0403,2,...,79.094944,4.977528,15.800000,20.562921,9.938202,16.479775,20.893820,13.581461,28.410112,37.917978
4,345,0.382609,12001,0.47,0.63,0.547,0.1167,0.1711,0.0755,2,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.637500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2616,39,0.102564,56037,0.37,0.56,0.348,0.2184,0.2843,0.1584,2,...,65.275000,5.983333,18.658333,19.225000,7.541667,15.700000,24.366667,13.900000,33.166667,35.808333
2617,15,0.000000,56039,0.11,0.44,0.789,0.2050,0.2671,0.1472,2,...,65.425000,4.425000,12.175000,16.825000,6.025000,10.950000,17.500000,10.875000,21.975000,27.325000
2618,33,-0.090909,56041,0.45,0.48,0.388,0.2184,0.2843,0.1584,2,...,63.466667,7.200000,20.266667,19.833333,8.066667,17.100000,27.466667,14.533333,30.133333,36.333333
2619,7,-0.714286,56043,0.37,0.69,0.364,0.2283,0.2943,0.1687,2,...,66.966667,8.033333,18.133333,18.000000,10.433333,18.466667,26.600000,13.033333,28.933333,31.466667


##### 223 rows with nulls were dropped; we do not believe this will have a huge impact on our data given the abundance of data remaining

# Pre-process/Encode Data:

In [9]:
# view data types
df.dtypes

the_number_of_providers                                             int64
the_average_supply_level                                          float64
fips_code                                                           int64
social_vulnerability_index_svi                                    float64
cvac_level_of_concern_for_vaccination_rollout                     float64
percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21    float64
estimated_hesitant                                                float64
estimated_hesitant_or_unsure                                      float64
estimated_strongly_hesitant                                       float64
masks_order_code                                                    int64
completeness_pct                                                  float64
metro_status                                                       object
general_gb_order_code                                               int64
general_or_under_6ft_bans_gatherings_o

In [10]:
# show columns with object data types and their unique values count
df_cols_to_encode = df.dtypes[df.dtypes == 'object'].index
df[df_cols_to_encode].nunique()

metro_status                                 2
general_or_under_6ft_bans_gatherings_over    8
community_transmission_level                 4
dtype: int64

In [11]:
# print unique values in object columns
for i in df_cols_to_encode:
    print(f"{i}: {df[i].unique()}\n")

metro_status: ['Metro' 'Non-metro']

general_or_under_6ft_bans_gatherings_over: ['No ban' '10' '50' '150' '250' '6' '8' '0']

community_transmission_level: ['moderate' 'low' 'substantial' 'high']



In [12]:
# label encoder for object type columns
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df_new = df.copy()
df_new['metro_status'] = le.fit_transform(df_new['metro_status'])
df_new['general_or_under_6ft_bans_gatherings_over'] = le.fit_transform(df_new['general_or_under_6ft_bans_gatherings_over'])
df_new['community_transmission_level'] = le.fit_transform(df_new['community_transmission_level'])
df_new

Unnamed: 0,the_number_of_providers,the_average_supply_level,fips_code,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,estimated_hesitant,estimated_hesitant_or_unsure,estimated_strongly_hesitant,masks_order_code,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,195,0.333333,10001,0.73,0.32,0.435,0.0664,0.1391,0.0388,2,...,80.593750,8.821875,19.321875,20.575000,12.378125,20.303125,31.053125,16.381250,40.696875,38.543750
1,638,0.194357,10003,0.38,0.16,0.552,0.0564,0.1180,0.0329,2,...,79.198450,7.067442,16.818605,20.256589,10.819380,18.208527,27.695349,14.471318,33.551938,36.662791
2,295,0.196610,10005,0.40,0.12,0.569,0.0555,0.1121,0.0328,2,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.600000,27.286792,13.303774,34.924528,33.426415
3,678,0.244838,11001,0.60,0.17,0.546,0.0655,0.0850,0.0403,2,...,79.094944,4.977528,15.800000,20.562921,9.938202,16.479775,20.893820,13.581461,28.410112,37.917978
4,345,0.382609,12001,0.47,0.63,0.547,0.1167,0.1711,0.0755,2,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.637500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2616,39,0.102564,56037,0.37,0.56,0.348,0.2184,0.2843,0.1584,2,...,65.275000,5.983333,18.658333,19.225000,7.541667,15.700000,24.366667,13.900000,33.166667,35.808333
2617,15,0.000000,56039,0.11,0.44,0.789,0.2050,0.2671,0.1472,2,...,65.425000,4.425000,12.175000,16.825000,6.025000,10.950000,17.500000,10.875000,21.975000,27.325000
2618,33,-0.090909,56041,0.45,0.48,0.388,0.2184,0.2843,0.1584,2,...,63.466667,7.200000,20.266667,19.833333,8.066667,17.100000,27.466667,14.533333,30.133333,36.333333
2619,7,-0.714286,56043,0.37,0.69,0.364,0.2283,0.2943,0.1687,2,...,66.966667,8.033333,18.133333,18.000000,10.433333,18.466667,26.600000,13.033333,28.933333,31.466667


In [13]:
df_new.dtypes

the_number_of_providers                                             int64
the_average_supply_level                                          float64
fips_code                                                           int64
social_vulnerability_index_svi                                    float64
cvac_level_of_concern_for_vaccination_rollout                     float64
percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21    float64
estimated_hesitant                                                float64
estimated_hesitant_or_unsure                                      float64
estimated_strongly_hesitant                                       float64
masks_order_code                                                    int64
completeness_pct                                                  float64
metro_status                                                        int32
general_gb_order_code                                               int64
general_or_under_6ft_bans_gatherings_o

# Define Features Set and Three Targets:

In [14]:
# define features set used for all three targets
X = df_new.copy()
X = X.drop(['estimated_hesitant', 'estimated_hesitant_or_unsure', 'estimated_strongly_hesitant'], axis=1)
X.head()

Unnamed: 0,the_number_of_providers,the_average_supply_level,fips_code,social_vulnerability_index_svi,cvac_level_of_concern_for_vaccination_rollout,percent_adults_fully_vaccinated_against_covid_19_as_of_6_10_21,masks_order_code,completeness_pct,metro_status,general_gb_order_code,...,avg_checkup,avg_copd,avg_smoking,avg_depression,avg_diabetes,avg_ghlth,avg_lpa,avg_mhlth,avg_obesity,avg_sleep
0,195,0.333333,10001,0.73,0.32,0.435,2,96.1,0,1,...,80.59375,8.821875,19.321875,20.575,12.378125,20.303125,31.053125,16.38125,40.696875,38.54375
1,638,0.194357,10003,0.38,0.16,0.552,2,96.1,0,1,...,79.19845,7.067442,16.818605,20.256589,10.81938,18.208527,27.695349,14.471318,33.551938,36.662791
2,295,0.19661,10005,0.4,0.12,0.569,2,96.1,0,1,...,81.164151,9.215094,16.686792,18.324528,12.662264,19.6,27.286792,13.303774,34.924528,33.426415
3,678,0.244838,11001,0.6,0.17,0.546,2,94.6,0,1,...,79.094944,4.977528,15.8,20.562921,9.938202,16.479775,20.89382,13.581461,28.410112,37.917978
4,345,0.382609,12001,0.47,0.63,0.547,2,98.7,0,1,...,76.055357,7.030357,16.417857,21.016071,9.007143,17.764286,24.160714,17.630357,30.223214,35.6375


In [15]:
# set target for each model
hes_y = df_new['estimated_hesitant']

hes_unsure_y = df_new['estimated_hesitant_or_unsure']

strg_hes_y = df_new['estimated_strongly_hesitant']

# Train, Test, Evaluate Model
##  Target: Estimated Hesitant

In [16]:
# split data into training and testing
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    hes_y, 
                                                    random_state=1)
X_train.shape

(1798, 25)

In [17]:
# fit the model
from sklearn.linear_model import LinearRegression
classifier = LinearRegression().fit(X_train, y_train)

In [18]:
# predict with test data
y_pred = classifier.predict(X_test)
y_pred

array([0.14305031, 0.11499829, 0.15149326, 0.13814763, 0.1186837 ,
       0.15452055, 0.16696081, 0.04245067, 0.1180571 , 0.09961323,
       0.11381692, 0.1360351 , 0.05144251, 0.12759564, 0.18002179,
       0.08941326, 0.13071495, 0.14549372, 0.15043144, 0.05509023,
       0.15856255, 0.11781001, 0.10826957, 0.1553233 , 0.14542552,
       0.04199855, 0.18213661, 0.13024353, 0.14065901, 0.10236505,
       0.17181415, 0.08480559, 0.06097526, 0.18119725, 0.1540648 ,
       0.12096328, 0.181639  , 0.16325602, 0.1579272 , 0.10454659,
       0.12042609, 0.10930471, 0.11119827, 0.14545962, 0.14850641,
       0.16990566, 0.13817358, 0.07533468, 0.15529719, 0.11341588,
       0.16728663, 0.16371292, 0.11878779, 0.12131308, 0.15776995,
       0.09224539, 0.10487212, 0.16234551, 0.10874858, 0.14457576,
       0.09149354, 0.15689987, 0.07565046, 0.15279436, 0.13370564,
       0.14125195, 0.04968354, 0.1245885 , 0.10953373, 0.14880591,
       0.15790159, 0.16378989, 0.13937129, 0.14610065, 0.15375

In [19]:
# Model Evaluation:
from sklearn.metrics import r2_score
print("\n*****Linear Regression Model Evaluation*****")
print("---------------------------------------")
print("Target Predicted : Estimated Hesitant\n")
print(f"R2 Score : {r2_score(y_test, y_pred)}")


*****Linear Regression Model Evaluation*****
---------------------------------------
Target Predicted : Estimated Hesitant

R2 Score : 0.5820480931865175


# Train, Test, Evaluate Model
##  Target: Estimated Hesitant or Unsure

In [20]:
# split data into training and testing
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    hes_unsure_y, 
                                                    random_state=1)
X_train.shape

(1798, 25)

In [21]:
# fit the model
from sklearn.linear_model import LinearRegression
classifier = LinearRegression().fit(X_train, y_train)

In [22]:
# predict with test data
y_pred = classifier.predict(X_test)
y_pred

array([0.19590684, 0.16351639, 0.22357303, 0.19855687, 0.18142977,
       0.19940895, 0.23279539, 0.08460727, 0.18329692, 0.15877247,
       0.16703397, 0.19106675, 0.10924171, 0.21802594, 0.24692212,
       0.17086385, 0.20005733, 0.20706072, 0.20931253, 0.09013502,
       0.22798241, 0.17252182, 0.16910852, 0.23579935, 0.21417395,
       0.07977344, 0.26141777, 0.19010764, 0.21214058, 0.18949656,
       0.23259653, 0.12796816, 0.11722503, 0.25011208, 0.21789208,
       0.17903902, 0.24234612, 0.22926311, 0.2075921 , 0.1542308 ,
       0.16767425, 0.17308028, 0.17616249, 0.18093959, 0.21433733,
       0.24177811, 0.21648605, 0.1208893 , 0.21671505, 0.16742753,
       0.2446058 , 0.23034082, 0.17888929, 0.18992544, 0.21610512,
       0.15430011, 0.14827953, 0.2280018 , 0.16620142, 0.19801798,
       0.15496013, 0.19497395, 0.12233298, 0.22033289, 0.20668933,
       0.19200302, 0.09063796, 0.18422306, 0.16360959, 0.21356683,
       0.23698473, 0.20406794, 0.20873329, 0.22139532, 0.21140

In [23]:
# Model Evaluation:
from sklearn.metrics import r2_score
print("\n*****Linear Regression Model Evaluation*****")
print("---------------------------------------")
print("Target Predicted : Estimated Hesitant or Unsure\n")
print(f"R2 Score : {r2_score(y_test, y_pred)}")


*****Linear Regression Model Evaluation*****
---------------------------------------
Target Predicted : Estimated Hesitant or Unsure

R2 Score : 0.6040122184333528


# Train, Test, Evaluate Model
##  Target: Estimated Strongly Hesitant

In [24]:
# split data into training and testing
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    strg_hes_y, 
                                                    random_state=1)
X_train.shape

(1798, 25)

In [25]:
# fit the model
from sklearn.linear_model import LinearRegression
classifier = LinearRegression().fit(X_train, y_train)

In [26]:
# predict with test data
y_pred = classifier.predict(X_test)
y_pred

array([0.09581188, 0.06654623, 0.0942894 , 0.09402692, 0.07786338,
       0.10747339, 0.11033275, 0.02657754, 0.07486617, 0.06812314,
       0.07703884, 0.08653483, 0.02942339, 0.07714062, 0.1265814 ,
       0.05565514, 0.08451224, 0.09249492, 0.09970338, 0.04163832,
       0.10304831, 0.07512068, 0.07299756, 0.09786017, 0.09483343,
       0.02649655, 0.12469242, 0.08216473, 0.09425928, 0.06356294,
       0.11803268, 0.06272266, 0.03845415, 0.12153094, 0.09802713,
       0.0757341 , 0.12327912, 0.11715699, 0.11015197, 0.06382105,
       0.07122209, 0.06895263, 0.07421377, 0.09530173, 0.09416809,
       0.11482024, 0.08691292, 0.05103455, 0.10519417, 0.06852963,
       0.1115099 , 0.10717044, 0.07288536, 0.08012986, 0.10373525,
       0.0617999 , 0.06147389, 0.10907918, 0.07165743, 0.0887568 ,
       0.05694792, 0.10214569, 0.0514739 , 0.09917522, 0.08968793,
       0.09122756, 0.0356225 , 0.0800698 , 0.06654975, 0.0978494 ,
       0.10262106, 0.1076242 , 0.08929899, 0.09644249, 0.09829

In [27]:
# Model Evaluation:
from sklearn.metrics import r2_score
print("\n*****Linear Regression Model Evaluation*****")
print("---------------------------------------")
print("Target Predicted : Estimated Strongly Hesitant")
print(f"R2 Score : {r2_score(y_test, y_pred)}")


*****Linear Regression Model Evaluation*****
---------------------------------------
Target Predicted : Estimated Strongly Hesitant
R2 Score : 0.5657270730273718
