<a href="https://colab.research.google.com/github/kushalnavghare/DSC-478_PRG_ML_APP/blob/main/Assignment_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### env setup

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
%pwd

'/content'

In [3]:
%cd drive/MyDrive/DSC-478_PRG_ML_APPS/

/content/drive/MyDrive/DSC-478_PRG_ML_APPS


In [4]:
# import libs
import pandas as pd
import numpy as np

In [14]:
pd.set_option('display.max_columns', 101)

In [11]:
import matplotlib.pyplot as plt

In [26]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [6]:
RANDOM_STATE = 33

For this assignment you will experiment with various regression approaches and you'll get your feet wet with some clustering. We will rely on subsets of some real-world data sets and on tools from the Scikit-learn machine learning package for Python as well as modules based on the textbook code (Machine Learning in Action).

# 1. Regression Analysis [Dataset: communities.zip]

For this problem you will experiment with multiple linear regression models to make predictions with numerical data. You will also explore more systematic methods for feature selection and for optimizing model parameters (model selection). The data set you will use is a subset of the "Communities and Crime" data set that combines information from the 1990 census data as well as FBI crime data from 1995. Please read the full description of the data, including the description and statistics on different variables. The target attribute for regression purposes is "ViolentCrimesPerPop". Note: The two identifier attributes "state" and "community name" should be excluded for the regression task.

Your tasks in this problem are the following.

## a.
Load and preprocess the data using Pandas and remove the unneeded attributes. For the purpose of this assignment you do not need to normalize or standardize the data unless explicitly required in one of the following tasks. However, you may need to handle missing values by imputing those values based on variable means. Compute and display basic statistics (mean, standard deviation, min, max, etc.) for the variables in the data set. Separate the target attribute for regression. Use scikit-learn's train_test_split function to create a 20%-80% randomized split of the data (important note: for reporducible output across multiple runs, please use "random_state = 33"). Set aside the 20% test portion; the 80% training data partition will be used for cross-validation on various tasks specified below.

In [7]:
# read data
raw_communities = pd.read_csv('data/communities.csv')

In [8]:
raw_communities.head()

Unnamed: 0,state,communityname,population,householdsize,racepctblack,racePctWhite,racePctAsian,racePctHisp,agePct12t21,agePct12t29,...,NumStreet,PctForeignBorn,PctBornSameState,PctSameHouse85,PctSameCity85,PctSameState85,LandArea,PopDens,PctUsePubTrans,ViolentCrimesPerPop
0,8,Lakewoodcity,0.19,0.33,0.02,0.9,0.12,0.17,0.34,0.47,...,0.0,0.12,0.42,0.5,0.51,0.64,0.12,0.26,0.2,0.2
1,53,Tukwilacity,0.0,0.16,0.12,0.74,0.45,0.07,0.26,0.59,...,0.0,0.21,0.5,0.34,0.6,0.52,0.02,0.12,0.45,0.67
2,24,Aberdeentown,0.0,0.42,0.49,0.56,0.17,0.04,0.39,0.47,...,0.0,0.14,0.49,0.54,0.67,0.56,0.01,0.21,0.02,0.43
3,34,Willingborotownship,0.04,0.77,1.0,0.08,0.12,0.1,0.51,0.5,...,0.0,0.19,0.3,0.73,0.64,0.65,0.02,0.39,0.28,0.12
4,42,Bethlehemtownship,0.01,0.55,0.02,0.95,0.09,0.05,0.38,0.38,...,0.0,0.11,0.72,0.64,0.61,0.53,0.04,0.09,0.02,0.03


In [10]:
# unique level of data check
raw_communities[['state', 'communityname']].value_counts()

state  communityname   
1      Alabastercity       1
39     Lebanoncity         1
       Miamitownship       1
       Miamisburgcity      1
       Mentorcity          1
                          ..
19     Newtoncity          1
       MasonCitycity       1
       Marshalltowncity    1
       Indianolacity       1
56     Sheridancity        1
Length: 1994, dtype: int64

In [39]:
# null check
((raw_communities.isna().sum())>0).value_counts()

False    100
dtype: int64

No null values in the dataset

In [31]:
raw_communities.dtypes.reset_index().groupby(0).agg({'index':'unique'}).reset_index()

Unnamed: 0,0,index
0,int64,[state]
1,float64,"[population, householdsize, racepctblack, race..."
2,object,"[communityname, OtherPerCap]"


In [38]:
# convert object to float
raw_communities['OtherPerCap'] = raw_communities.OtherPerCap.astype(float, errors="ignore")

In [41]:
(raw_communities.OtherPerCap.isna().sum())

0

In [42]:
raw_communities.describe()

Unnamed: 0,state,population,householdsize,racepctblack,racePctWhite,racePctAsian,racePctHisp,agePct12t21,agePct12t29,agePct16t24,agePct65up,numbUrban,pctUrban,medIncome,pctWWage,pctWFarmSelf,pctWInvInc,pctWSocSec,pctWPubAsst,pctWRetire,medFamInc,perCapInc,whitePerCap,blackPerCap,indianPerCap,AsianPerCap,HispPerCap,NumUnderPov,PctPopUnderPov,PctLess9thGrade,PctNotHSGrad,PctBSorMore,PctUnemployed,PctEmploy,PctEmplManu,PctEmplProfServ,MalePctDivorce,MalePctNevMarr,FemalePctDiv,TotalPctDiv,PersPerFam,PctFam2Par,PctKids2Par,PctYoungKids2Par,PctTeen2Par,PctWorkMomYoungKids,PctWorkMom,NumIlleg,PctIlleg,NumImmig,PctImmigRecent,PctImmigRec5,PctImmigRec8,PctImmigRec10,PctRecentImmig,PctRecImmig5,PctRecImmig8,PctRecImmig10,PctSpeakEnglOnly,PctNotSpeakEnglWell,PctLargHouseFam,PctLargHouseOccup,PersPerOccupHous,PersPerOwnOccHous,PersPerRentOccHous,PctPersOwnOccup,PctPersDenseHous,PctHousLess3BR,MedNumBR,HousVacant,PctHousOccup,PctHousOwnOcc,PctVacantBoarded,PctVacMore6Mos,MedYrHousBuilt,PctHousNoPhone,PctWOFullPlumb,OwnOccLowQuart,OwnOccMedVal,OwnOccHiQuart,RentLowQ,RentMedian,RentHighQ,MedRent,MedRentPctHousInc,MedOwnCostPctInc,MedOwnCostPctIncNoMtg,NumInShelters,NumStreet,PctForeignBorn,PctBornSameState,PctSameHouse85,PctSameCity85,PctSameState85,LandArea,PopDens,PctUsePubTrans,ViolentCrimesPerPop
count,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0,1994.0
mean,28.683551,0.057593,0.463395,0.179629,0.753716,0.153681,0.144022,0.424218,0.493867,0.336264,0.423164,0.064072,0.696269,0.361123,0.558154,0.29157,0.495687,0.471133,0.317778,0.479248,0.375677,0.350251,0.368049,0.291098,0.203506,0.322357,0.386279,0.055507,0.303024,0.315807,0.38333,0.361675,0.363531,0.501073,0.396384,0.440597,0.461244,0.434453,0.487568,0.494273,0.487748,0.610918,0.620657,0.664032,0.582884,0.501449,0.52669,0.036294,0.249995,0.03006,0.320211,0.360622,0.399077,0.427879,0.181364,0.182126,0.184774,0.182879,0.785903,0.150587,0.267608,0.251891,0.462101,0.494428,0.404097,0.562598,0.186264,0.495186,0.314694,0.076815,0.719549,0.548686,0.204529,0.433335,0.494178,0.264478,0.243059,0.264689,0.26349,0.268942,0.346379,0.372457,0.422964,0.384102,0.490125,0.449754,0.403816,0.029438,0.022778,0.215552,0.608892,0.53505,0.626424,0.65153,0.065231,0.232854,0.161685,0.237979
std,16.397553,0.126906,0.163717,0.253442,0.244039,0.208877,0.232492,0.155196,0.143564,0.166505,0.179185,0.128256,0.444811,0.209362,0.182913,0.204108,0.178071,0.173619,0.222137,0.167564,0.198257,0.191109,0.186804,0.171593,0.164775,0.195411,0.183081,0.127941,0.228474,0.21336,0.202508,0.209193,0.202171,0.174036,0.202386,0.175457,0.18246,0.175437,0.17517,0.183607,0.154594,0.201976,0.206353,0.218749,0.191507,0.168612,0.175241,0.108671,0.229946,0.087189,0.219088,0.210924,0.201498,0.19497,0.235792,0.236333,0.236739,0.234822,0.226869,0.219716,0.196567,0.190709,0.169551,0.157924,0.189301,0.197087,0.209956,0.172508,0.255182,0.150465,0.194024,0.185204,0.21777,0.188986,0.232467,0.242847,0.206295,0.224425,0.231542,0.235252,0.219323,0.209278,0.248286,0.213404,0.1695,0.187274,0.192593,0.102607,0.1004,0.231134,0.204329,0.181352,0.200521,0.198221,0.109459,0.203092,0.229055,0.232985
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,12.0,0.01,0.35,0.02,0.63,0.04,0.01,0.34,0.41,0.25,0.3,0.0,0.0,0.2,0.44,0.16,0.37,0.35,0.1425,0.36,0.23,0.22,0.24,0.1725,0.11,0.19,0.26,0.01,0.11,0.16,0.23,0.21,0.22,0.38,0.25,0.32,0.33,0.31,0.36,0.36,0.4,0.49,0.49,0.53,0.48,0.39,0.42,0.0,0.09,0.0,0.16,0.2,0.25,0.28,0.03,0.03,0.03,0.03,0.73,0.03,0.15,0.14,0.34,0.39,0.27,0.44,0.06,0.4,0.0,0.01,0.63,0.43,0.06,0.29,0.35,0.06,0.1,0.09,0.09,0.09,0.17,0.2,0.22,0.21,0.37,0.32,0.25,0.0,0.0,0.06,0.47,0.42,0.52,0.56,0.02,0.1,0.02,0.07
50%,34.0,0.02,0.44,0.06,0.85,0.07,0.04,0.4,0.48,0.29,0.42,0.03,1.0,0.32,0.56,0.23,0.48,0.475,0.26,0.47,0.33,0.3,0.32,0.25,0.17,0.28,0.345,0.02,0.25,0.27,0.36,0.31,0.32,0.51,0.37,0.41,0.47,0.4,0.5,0.5,0.47,0.63,0.64,0.7,0.61,0.51,0.54,0.01,0.17,0.01,0.29,0.34,0.39,0.43,0.09,0.08,0.09,0.09,0.87,0.06,0.2,0.19,0.44,0.48,0.36,0.56,0.11,0.51,0.5,0.03,0.77,0.54,0.13,0.42,0.52,0.185,0.19,0.18,0.17,0.18,0.31,0.33,0.37,0.34,0.48,0.45,0.37,0.0,0.0,0.13,0.63,0.54,0.67,0.7,0.04,0.17,0.07,0.15
75%,42.0,0.05,0.54,0.23,0.94,0.17,0.16,0.47,0.54,0.36,0.53,0.07,1.0,0.49,0.69,0.37,0.62,0.58,0.44,0.58,0.48,0.43,0.44,0.38,0.25,0.4,0.48,0.05,0.45,0.42,0.51,0.46,0.48,0.6275,0.52,0.53,0.59,0.5,0.62,0.63,0.56,0.76,0.78,0.84,0.72,0.62,0.65,0.02,0.32,0.02,0.43,0.48,0.53,0.56,0.23,0.23,0.23,0.23,0.94,0.16,0.31,0.29,0.55,0.58,0.49,0.7,0.22,0.6,0.5,0.07,0.86,0.67,0.27,0.56,0.67,0.42,0.33,0.4,0.39,0.38,0.49,0.52,0.59,0.53,0.59,0.58,0.51,0.01,0.0,0.28,0.7775,0.66,0.77,0.79,0.07,0.28,0.19,0.33
max,56.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [43]:
# dep vs indep vars
X = raw_communities.drop(["state",	"communityname", "ViolentCrimesPerPop"], axis=1)
y = raw_communities['ViolentCrimesPerPop']

In [44]:
# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=RANDOM_STATE, test_size=.20)

In [45]:
X_test.head()

Unnamed: 0,population,householdsize,racepctblack,racePctWhite,racePctAsian,racePctHisp,agePct12t21,agePct12t29,agePct16t24,agePct65up,numbUrban,pctUrban,medIncome,pctWWage,pctWFarmSelf,pctWInvInc,pctWSocSec,pctWPubAsst,pctWRetire,medFamInc,perCapInc,whitePerCap,blackPerCap,indianPerCap,AsianPerCap,OtherPerCap,HispPerCap,NumUnderPov,PctPopUnderPov,PctLess9thGrade,PctNotHSGrad,PctBSorMore,PctUnemployed,PctEmploy,PctEmplManu,PctEmplProfServ,MalePctDivorce,MalePctNevMarr,FemalePctDiv,TotalPctDiv,PersPerFam,PctFam2Par,PctKids2Par,PctYoungKids2Par,PctTeen2Par,PctWorkMomYoungKids,PctWorkMom,NumIlleg,PctIlleg,NumImmig,PctImmigRecent,PctImmigRec5,PctImmigRec8,PctImmigRec10,PctRecentImmig,PctRecImmig5,PctRecImmig8,PctRecImmig10,PctSpeakEnglOnly,PctNotSpeakEnglWell,PctLargHouseFam,PctLargHouseOccup,PersPerOccupHous,PersPerOwnOccHous,PersPerRentOccHous,PctPersOwnOccup,PctPersDenseHous,PctHousLess3BR,MedNumBR,HousVacant,PctHousOccup,PctHousOwnOcc,PctVacantBoarded,PctVacMore6Mos,MedYrHousBuilt,PctHousNoPhone,PctWOFullPlumb,OwnOccLowQuart,OwnOccMedVal,OwnOccHiQuart,RentLowQ,RentMedian,RentHighQ,MedRent,MedRentPctHousInc,MedOwnCostPctInc,MedOwnCostPctIncNoMtg,NumInShelters,NumStreet,PctForeignBorn,PctBornSameState,PctSameHouse85,PctSameCity85,PctSameState85,LandArea,PopDens,PctUsePubTrans
1158,0.0,0.46,0.01,0.97,0.09,0.04,0.4,0.26,0.2,0.49,0.02,1.0,1.0,0.55,1.0,0.88,0.42,0.09,0.46,1.0,1.0,1.0,1.0,0.81,0.71,0.24,1.0,0.0,0.07,0.04,0.04,0.97,0.16,0.38,0.2,0.64,0.24,0.21,0.2,0.22,0.37,0.97,0.95,0.97,0.91,0.25,0.0,0.0,0.02,0.0,0.45,0.45,0.51,0.46,0.19,0.17,0.17,0.15,0.89,0.06,0.16,0.19,0.51,0.48,0.45,0.99,0.01,0.0,1.0,0.02,0.63,0.99,0.09,0.47,0.67,0.0,0.0,0.85,1.0,1.0,0.78,1.0,1.0,1.0,0.46,0.44,0.22,0.0,0.0,0.18,0.13,0.73,0.76,0.55,0.04,0.06,0.03
1079,0.01,0.35,0.01,0.96,0.04,0.04,0.32,0.45,0.29,0.62,0.02,0.94,0.3,0.45,0.1,0.39,0.62,0.38,0.38,0.34,0.32,0.32,0.21,0.08,0.05,0.17,0.22,0.01,0.24,0.44,0.58,0.2,0.41,0.48,0.58,0.32,0.45,0.4,0.45,0.47,0.39,0.53,0.58,0.61,0.59,0.64,0.63,0.01,0.25,0.0,0.07,0.12,0.25,0.3,0.02,0.03,0.05,0.06,0.78,0.08,0.14,0.13,0.36,0.42,0.34,0.43,0.07,0.56,0.0,0.04,0.53,0.42,0.13,0.7,0.25,0.18,0.41,0.31,0.3,0.29,0.23,0.32,0.35,0.31,0.34,0.46,0.58,0.0,0.0,0.11,0.86,0.66,0.88,0.8,0.03,0.11,0.01
1633,0.51,0.31,0.6,0.46,0.06,0.09,0.42,0.54,0.39,0.48,0.52,1.0,0.11,0.28,0.15,0.34,0.57,0.92,0.53,0.15,0.18,0.22,0.2,0.13,0.23,0.15,0.19,0.74,0.68,0.43,0.57,0.24,0.8,0.27,0.36,0.63,0.6,0.74,0.59,0.62,0.45,0.12,0.21,0.19,0.13,0.31,0.29,0.62,0.85,0.1,0.33,0.32,0.34,0.34,0.11,0.09,0.09,0.08,0.8,0.12,0.27,0.2,0.31,0.41,0.29,0.31,0.1,0.49,0.5,0.83,0.58,0.29,0.87,0.66,0.0,0.54,0.3,0.06,0.06,0.08,0.14,0.16,0.18,0.2,0.81,0.24,0.57,0.17,0.01,0.13,0.8,0.61,0.87,0.84,0.12,0.68,0.75
1700,0.02,0.52,0.25,0.63,0.03,0.31,0.54,0.62,0.44,0.36,0.0,0.0,0.26,0.6,0.71,0.34,0.43,0.37,0.26,0.27,0.31,0.37,0.12,0.16,0.3,0.25,0.27,0.03,0.46,0.46,0.54,0.29,0.35,0.47,0.49,0.27,0.56,0.41,0.6,0.6,0.53,0.57,0.49,0.53,0.52,0.59,0.59,0.02,0.37,0.01,0.5,0.76,0.77,0.81,0.16,0.21,0.19,0.19,0.74,0.22,0.38,0.35,0.48,0.47,0.51,0.43,0.34,0.59,0.0,0.05,0.54,0.45,0.32,0.49,0.67,0.82,0.67,0.09,0.13,0.17,0.28,0.29,0.29,0.3,0.56,0.63,0.56,0.01,0.0,0.13,0.72,0.35,0.5,0.7,0.13,0.04,0.01
1956,0.03,0.37,0.4,0.68,0.03,0.01,0.39,0.47,0.34,0.55,0.04,1.0,0.23,0.52,0.17,0.36,0.56,0.35,0.44,0.27,0.28,0.32,0.19,0.17,0.29,0.24,0.17,0.03,0.31,0.47,0.59,0.22,0.22,0.52,0.64,0.25,0.6,0.37,0.52,0.57,0.38,0.49,0.46,0.54,0.62,0.64,0.73,0.03,0.42,0.0,0.46,0.68,0.59,0.57,0.04,0.05,0.04,0.04,0.97,0.03,0.17,0.16,0.37,0.37,0.38,0.53,0.15,0.58,0.0,0.04,0.73,0.53,0.27,0.59,0.4,0.31,0.12,0.09,0.1,0.13,0.16,0.21,0.23,0.24,0.3,0.38,0.42,0.01,0.01,0.03,0.78,0.59,0.71,0.69,0.06,0.1,0.0


In [49]:
(X == "?").values.nonzero()

(array([130]), array([25]))

In [52]:
X.iloc[[130]]

Unnamed: 0,population,householdsize,racepctblack,racePctWhite,racePctAsian,racePctHisp,agePct12t21,agePct12t29,agePct16t24,agePct65up,numbUrban,pctUrban,medIncome,pctWWage,pctWFarmSelf,pctWInvInc,pctWSocSec,pctWPubAsst,pctWRetire,medFamInc,perCapInc,whitePerCap,blackPerCap,indianPerCap,AsianPerCap,OtherPerCap,HispPerCap,NumUnderPov,PctPopUnderPov,PctLess9thGrade,PctNotHSGrad,PctBSorMore,PctUnemployed,PctEmploy,PctEmplManu,PctEmplProfServ,MalePctDivorce,MalePctNevMarr,FemalePctDiv,TotalPctDiv,PersPerFam,PctFam2Par,PctKids2Par,PctYoungKids2Par,PctTeen2Par,PctWorkMomYoungKids,PctWorkMom,NumIlleg,PctIlleg,NumImmig,PctImmigRecent,PctImmigRec5,PctImmigRec8,PctImmigRec10,PctRecentImmig,PctRecImmig5,PctRecImmig8,PctRecImmig10,PctSpeakEnglOnly,PctNotSpeakEnglWell,PctLargHouseFam,PctLargHouseOccup,PersPerOccupHous,PersPerOwnOccHous,PersPerRentOccHous,PctPersOwnOccup,PctPersDenseHous,PctHousLess3BR,MedNumBR,HousVacant,PctHousOccup,PctHousOwnOcc,PctVacantBoarded,PctVacMore6Mos,MedYrHousBuilt,PctHousNoPhone,PctWOFullPlumb,OwnOccLowQuart,OwnOccMedVal,OwnOccHiQuart,RentLowQ,RentMedian,RentHighQ,MedRent,MedRentPctHousInc,MedOwnCostPctInc,MedOwnCostPctIncNoMtg,NumInShelters,NumStreet,PctForeignBorn,PctBornSameState,PctSameHouse85,PctSameCity85,PctSameState85,LandArea,PopDens,PctUsePubTrans
130,0.02,0.38,0.98,0.22,0.01,0.01,0.44,0.4,0.27,0.58,0.0,0.0,0.09,0.24,0.35,0.31,0.65,0.78,0.49,0.12,0.18,0.32,0.15,0.08,0.37,?,0.63,0.05,0.85,0.54,0.54,0.29,0.86,0.19,0.31,0.53,0.62,0.44,0.62,0.65,0.49,0.2,0.2,0.13,0.19,0.59,0.5,0.04,0.85,0.0,0.22,0.16,0.13,0.11,0.01,0.0,0.0,0.0,0.98,0.02,0.28,0.24,0.4,0.36,0.47,0.52,0.17,0.46,0.5,0.05,0.52,0.55,0.31,0.68,0.38,0.51,0.55,0.06,0.06,0.08,0.0,0.06,0.11,0.11,0.6,0.44,0.76,0.0,0.0,0.01,0.84,0.7,0.83,0.77,0.04,0.12,0.05


## b.
Perform standard multiple linear regression on data using the scikit-learn Linear Regression module. Compute the RMSE values on the full training data (the 80% partition). Also, plot the correlation between the predicted and actual values of the target attribute. Display the obtained regression coefficients (weights) and plot them using matplotlib. Finally, perform 10-fold cross-validation on the training partition and compare the cross-validation RMSE to the training RMSE (for cross validation, you should use the KFold module from sklearn.model_selection).

In [46]:
# initiate model
lin_model = LinearRegression()

In [47]:
# fit model on train
lin_model.fit(X_train, y_train)

ValueError: ignored