- Python 3.7.2
- Libraries:
You will also need to have software installed to run and execute an iPython Notebook
The business problem was formulated as follows:
Pawdacity is a leading pet store chain in Wyoming with 13 stores throughout the state. This year, Pawdacity would like to expand and open a 14th store. Your manager has asked you to perform an analysis to recommend the city for Pawdacity’s newest store, based on predicted yearly sales.
p2-2010-pawdacity-monthly-sales.csv
- This file contains all of the monthly sales for all Pawdacity stores for 2010.p2-partially-parsed-wy-web-scrape.csv
- This is a partially parsed data file that can be used for population numbers.p2-wy-453910-naics-data.csv
- NAICS data on the sales of all competitor stores where total sales is equal to 12 months of salesp2-wy-demographic-data.csv
- This file contains demographic data for each city and county in Wyoming.
You can find the results of the analysis in either html form or complete Jupyter Notebook:
Alterinatively, run one the following commands in a terminal after navigating to the top-level project directory new_store_location
(that contains this README):
ipython notebook Pawdacity_New_Store_Location.ipynb
or
jupyter notebook Pawdacity_New_Store_Location.ipynb
This will open the iPython Notebook software and project file in your browser.
To identify the new city location based on potential sales, I performed the following steps:
-
Step 1: Preprocessing
- aggregated sales data from months to years
- cleaned and joined census data with sales data and demographics data
- dealt with outliers
-
Step 2: Predictive Modeling
- checked for
corr()
between predictor variables to avoid multicollinearity - removed certain features based on their
variance_inflation_factor()
- calculated
OLS()
Regression and removed not statistically significant predictors - predicted sales for new cities with the final linear regression model
- filtered data according to provided criteria for the new city
- checked for
Having started learning Python, I decided to rewrite the project I first completed in Alteryx within the Predictive Analytics for Business Nanodegree at Udacity.