# Using the Cleaned Numerical Columns - 2

_This notebook will be used to make a linear regresion model with only the cleaned numerical columns, combined with LassoCV to regularize._

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LassoCV, RidgeCV
from sklearn.preprocessing import StandardScaler, PolynomialFeatures

In [2]:
X = pd.read_csv('../datasets/num_train.csv')
y = X['SalePrice']
X_final = pd.read_csv('../datasets/num_test.csv')

In [3]:
X.drop('SalePrice',axis=1,inplace=True)

_PolyFit the features matrix._

In [4]:
pf = PolynomialFeatures(degree=2, include_bias = False)
X_poly = pf.fit_transform(X)
X_final_poly = pf.transform(X_final)

_Train-test-split and scaled the data!_

In [5]:
X_train_poly, X_test_poly, y_train_poly, y_test_poly = train_test_split(X_poly, y, random_state = 42)

In [6]:
ss = StandardScaler()
X_train_poly_scale = ss.fit_transform(X_train_poly)
X_test_poly_scale = ss.transform(X_test_poly)
X_final_poly_scale = ss.transform(X_final_poly)

_Instantiate LassoCV and fit._

In [7]:
%%time

lasso = LassoCV(n_alphas = 200, cv = 5)
lasso.fit(X_train_poly_scale, y_train_poly)



CPU times: user 22.7 s, sys: 1.67 s, total: 24.4 s
Wall time: 12.7 s


In [8]:
lasso.score(X_test_poly_scale, y_test_poly)

0.9004670408840262

In [9]:
y_hat = lasso.predict(X_final_poly_scale)

_Submitting scores for numerical poly features._

In [11]:
sub = pd.read_csv('../datasets/test.csv')
submission = sub[['Id']]
submission['SalePrice'] = y_hat
submission.to_csv('../submissions/submission7.csv',index=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


_At this point, I decided to go back and try the Lasso fit on the non-PolyFeatures matrix._

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)

In [13]:
ss1 = StandardScaler()
X_train_scale = ss1.fit_transform(X_train)
X_test_scale = ss1.transform(X_test)
X_final_scale = ss1.transform(X_final)

  return self.partial_fit(X, y)
  return self.fit(X, **fit_params).transform(X)
  This is separate from the ipykernel package so we can avoid doing imports until
  after removing the cwd from sys.path.


In [14]:
lasso1 = LassoCV(n_alphas = 200, cv = 5)
lasso1.fit(X_train_scale, y_train)

LassoCV(alphas=None, copy_X=True, cv=5, eps=0.001, fit_intercept=True,
    max_iter=1000, n_alphas=200, n_jobs=None, normalize=False,
    positive=False, precompute='auto', random_state=None,
    selection='cyclic', tol=0.0001, verbose=False)

In [15]:
lasso1.score(X_test_scale,y_test)

0.8664668666442801

_Because the score was lower than the poly features, I didn't see the need to submit. However, I was interested in the columns that were not zeroed out by Lasso. So I went through the coefficients to find the zeroed and the nonzeroed columns._

In [16]:
zero_cols = [col for col, x in enumerate(lasso1.coef_) if x == 0]

In [17]:
nonzero_cols = [col for col in range(len(X.columns)) if col not in zero_cols]

_I then made a dictionary with the columns and the corresponding coefficient values. I thought this list would be helpful at a later time._

In [18]:
dicto = {}
for col in nonzero_cols:
    dicto[X.columns[col]] = lasso.coef_[col]

In [19]:
dicto

{'Lot Area': 0.0,
 'Overall Qual': 0.0,
 'Overall Cond': 0.0,
 'Year Built': 0.0,
 'Year Remod/Add': 0.0,
 'Mas Vnr Area': 0.0,
 'BsmtFin SF 1': 0.0,
 'Total Bsmt SF': 0.0,
 '1st Flr SF': 0.0,
 'Gr Liv Area': 0.0,
 'Bsmt Full Bath': 0.0,
 'Kitchen AbvGr': -0.0,
 'TotRms AbvGrd': 0.0,
 'Garage Cars': 0.0,
 'Garage Area': 0.0,
 'Wood Deck SF': 0.0,
 'Screen Porch': 0.0,
 'Misc Val': -0.0,
 'Exter Qual': 0.0,
 'Bsmt Qual': 0.0,
 'BsmtFin Type 1': 0.0,
 'Heating QC': 0.0,
 'Kitchen Qual': 0.0,
 'Functional': 0.0,
 'Fireplace Qu': 0.0,
 'Garage Finish': 0.0,
 'Pool QC': -0.0}

_The above is a list of features to use once more..._