Skip to content

Predictive Analytics: Linear Regression and Data Cleanup

Notifications You must be signed in to change notification settings

k-bosko/new_store_location

Repository files navigation

Table of Contents

  1. Installation
  2. Description
  3. Data
  4. File Descriptions
  5. Results
  6. Acknowledgements

Installation

You will also need to have software installed to run and execute an iPython Notebook

Description

The business problem was formulated as follows:

Pawdacity is a leading pet store chain in Wyoming with 13 stores throughout the state. This year, Pawdacity would like to expand and open a 14th store. Your manager has asked you to perform an analysis to recommend the city for Pawdacity’s newest store, based on predicted yearly sales.

Data

  • p2-2010-pawdacity-monthly-sales.csv - This file contains all of the monthly sales for all Pawdacity stores for 2010.
  • p2-partially-parsed-wy-web-scrape.csv - This is a partially parsed data file that can be used for population numbers.
  • p2-wy-453910-naics-data.csv - NAICS data on the sales of all competitor stores where total sales is equal to 12 months of sales
  • p2-wy-demographic-data.csv - This file contains demographic data for each city and county in Wyoming.

File Descriptions

You can find the results of the analysis in either html form or complete Jupyter Notebook:

Alterinatively, run one the following commands in a terminal after navigating to the top-level project directory new_store_location (that contains this README):

ipython notebook Pawdacity_New_Store_Location.ipynb

or

jupyter notebook Pawdacity_New_Store_Location.ipynb

This will open the iPython Notebook software and project file in your browser.

Results

To identify the new city location based on potential sales, I performed the following steps:

  • Step 1: Preprocessing

    • aggregated sales data from months to years
    • cleaned and joined census data with sales data and demographics data
    • dealt with outliers
  • Step 2: Predictive Modeling

    • checked for corr() between predictor variables to avoid multicollinearity
    • removed certain features based on their variance_inflation_factor()
    • calculated OLS() Regression and removed not statistically significant predictors
    • predicted sales for new cities with the final linear regression model
    • filtered data according to provided criteria for the new city

Acknowledgements

Having started learning Python, I decided to rewrite the project I first completed in Alteryx within the Predictive Analytics for Business Nanodegree at Udacity.

About

Predictive Analytics: Linear Regression and Data Cleanup

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published