Skip to content

learn-co-students/dsc-multiple-linear-regression-statsmodels-lab-online-ds-ft-051319

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multiple Linear Regression in Statsmodels - Lab

Introduction

In this lab, you'll practice fitting a multiple linear regression model on the Ames Housing dataset!

Objectives

You will be able to:

  • Determine if it is necessary to perform normalization/standardization for a specific model or set of data
  • Use standardization/normalization on features of a dataset
  • Identify if it is necessary to perform log transformations on a set of features
  • Perform log transformations on different features of a dataset
  • Use statsmodels to fit a multiple linear regression model
  • Evaluate a linear regression model by using statistical performance metrics pertaining to overall model and specific parameters

The Ames Housing Data

Using the specified continuous and categorical features, preprocess your data to prepare for modeling:

  • Split off and one hot encode the categorical features of interest
  • Log and scale the selected continuous features
import pandas as pd
import numpy as np

ames = pd.read_csv('ames.csv')

continuous = ['LotArea', '1stFlrSF', 'GrLivArea', 'SalePrice']
categoricals = ['BldgType', 'KitchenQual', 'SaleType', 'MSZoning', 'Street', 'Neighborhood']

Continuous Features

# Log transform and normalize

Categorical Features

# One hot encode categoricals

Combine Categorical and Continuous Features

# combine features into a single dataframe called preprocessed

Run a linear model with SalePrice as the target variable in statsmodels

# Your code here

Run the same model in scikit-learn

# Your code here - Check that the coefficients and intercept are the same as those from Statsmodels

Predict the house price given the following characteristics (before manipulation!!)

Make sure to transform your variables as needed!

  • LotArea: 14977
  • 1stFlrSF: 1976
  • GrLivArea: 1976
  • BldgType: 1Fam
  • KitchenQual: Gd
  • SaleType: New
  • MSZoning: RL
  • Street: Pave
  • Neighborhood: NridgHt

Summary

Congratulations! You pre-processed the Ames Housing data using scaling and standardization. You also fitted your first multiple linear regression model on the Ames Housing data using statsmodels and scikit-learn!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published