| Variable | Description |
|---|---|
|'beds'| Number of bedrooms in home |
|'baths'| Number of bathrooms in home including fractional bathrooms|
|'sqft'| Calculated total finished living area of the home |
|'fips'| Federal Information Processing Standard code -  see https://en.wikipedia.org/wiki/FIPS_county_code for more details|
|'year'| The Year the principal residence was built |
|'taxes'|The total property tax assessed for that assessment year|
|'property_value'|The total tax assessed value of the parcel|

# Regression Project

### codeup/innis - 2020 mar 30

---
 
## Table of Contents

--- 
## I. Objective : 

"We want to be able to predict the property values ('taxvaluedollarcnt') of Single Family Properties that had a transaction during 2017."  
> https://ds.codeup.com/regression/project/

- a.k.a: eliminate the zestimate
- a.k.a: zestimate don't rate
- a.k.a: "zestimate", more like, 'let me rest, mate' (because their models performance is snoozing on the job compared to ours ='P ) 
---

## II. Dataset : Zillow  

- ### Description: 

	properties_2017.csv - all the properties with their home features for 2017 (released on 10/2/2017)

- ### Profile :

	"Zillow’s Zestimate home valuation has shaken up the U.S. real estate industry since first released 11 years ago.

	A home is often the largest and most expensive purchase a person makes in his or her lifetime. Ensuring homeowners have a trusted way to monitor this asset is incredibly important. The Zestimate was created to give consumers as much information as possible about homes and the housing market, marking the first time consumers had access to this type of home value information at no cost.

	“Zestimates” are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning."

	> https://www.kaggle.com/competitions/zillow-prize-1/data

---

## III. INITIAL QUESTIONS:

### Data-Focused Questions

- [x] Why do some properties have a much higher value than others when they are located so close to each other? 
- [] Does sqaure footage effect property value? 
- [] Does number of baths effect property value?
- [] Does number of beds effect property value?
- [] What is the optimal ratio of beds/baths?
- [] Does location effect property value?
- [] Does year constructed effect property value?

 
### Overall Project-Focused Questions

- What will the end product look like?
- What format will it be in?
- Who will it be delivered to?
- How will it be used?
- How will I know I'm done?
- What is my MVP?
- How will I know it's good enough?
 
---

## IV. FORMULATING HYPOTHESES
Initial hypothesis was sqaure footage would effect property value. Further investigation found this to be true. Other feature's we're iddintified as well, such as: beds, baths, location, and year built. 

---
 
#### 5. DELIVERABLES:
- [] Github Repo - containing a final report (.ipynb), acquire & prepare Modules (.py), other supplemental artifacts created while working on the project (e.g. exploratory/modeling notebook(s)).
- [] README file - provides an overview of the project and steps for project reproduction. 
- [] Draft Jupyter Notebooks - provide all steps taken to produce the project.
- [] Python Module File - provides reproducible code for acquiring,  preparing, exploring, & testing the data.
- [] acquire.py - used to acquire data
- [] prepare.py - used to prepare data
- [] Report Jupyter Notebook - provides final presentation-ready assessment and recommendations.
- [] 5 minute presentation to stakeholders (Zillow Data Science Team. 
 
 
## II. PROJECT DATA CONTEXT
 
#### 1. DATA DICTIONARY:
The final DataFrame used to explore the data for this project contains the following variables (columns).  The variables, along with their data types, are defined below:
 
 
| Variable          | Definition                                         |
|:------------------|:--------------------------------------------------:|
| beds          | number of bedrooms in the home                     |
| baths         | number of bathrooms and half-bathrooms in home     |
| fips              | federal information processing standards code      |
| property_id       | unique identifier for each property                |
| sqft       | total finished living area of the home             |
| taxes    | property taxes based on assessed value in USD      |
| property_value   | total tax assessed value of the property           |

## III. PROJECT PLAN - USING THE DATA SCIENCE PIPELINE:
I'll use the folllowing procedure: 
 
Plan➜ Acquire ➜ Prepare ➜ Explore ➜ Model & Evaluate ➜ Deliver
 
#### 1. PLAN

 
#### 2. ACQUIRE
- []  Create acquire.py module
- []  Store functions needed to acquire the Zillow dataset from mySQL
- []  Ensure all imports needed to run the functions are inside the acquire.py document
- []  Using Jupyter Notebook
- []  Run all required imports
- []  Import functions from aquire.py module
- []  Summarize dataset using methods and document observations
 
#### 3. PREPARE
Using Python Scripting Program (VS Code)
- [] Create prepare.py module
- [] Store functions needed to prepare the Zillow data such as:
   - [] Split Function: to split data into train, validate, and test
   - [] Cleaning Function: to clean data for exploration
   - [] Encoding Function: to create numeric columns for object column
   - [] Feature Engineering Function: to create new features
- [] Ensure all imports needed to run the functions are inside the prepare.py document
Using Jupyter Notebook
- [] Import functions from prepare.py module
- [] Summarize dataset using methods and document observations
- [] Clean data
- [] Features need to be turned into numbers
- [] Categorical features or discrete features need to be numbers that represent those categories
- [] Continuous features may need to be standardized to compare like datatypes
- [] Address missing values, data errors, unnecessary data, renaming
- [] Split data into train, validate, and test samples
 
#### 4. EXPLORE
Using Jupyter Notebook:
- [] Answer key questions about hypotheses and find drivers of churn
  - Run at least two statistical tests
  - Document findings
- [] Create visualizations with intent to discover variable relationships
  - Identify variables related _______________
  - Identify any potential data integrity issues
- [] Summarize conclusions, provide clear answers, and summarize takeaways
  - Explain plan of action as deduced from work to this point
 
#### 5. MODEL & EVALUATE
Using Jupyter Notebook:
- [] Establish baseline accuracy
- [] Train and fit multiple (3+) models with varying algorithms and/or hyperparameters
- [] Compare evaluation metrics across models
- [] Remove unnecessary features
- [] Evaluate best performing models using validate set
- [] Choose best performing validation model for use on test set
- [] Test final model on out-of-sample testing dataset
- [] Summarize performance
- [] Interpret and document findings
 
#### 6. DELIVERY
- [] Prepare five minute presentation using Jupyter Notebook
- [] Include introduction of project and goals
- [] Provide executive summary of findings, key takeaways, and recommendations
- [] Create walk through of analysis 
  - Visualize relationships
  - Document takeaways
  - Explicitly define questions asked during initial analysis
- [] Provide final takeaways, recommend course of action, and next steps
- [] Be prepared to answer questions following presentation

 
 
## IV. PROJECT MODULES:
- [] Python Module Files - provide reproducible code for acquiring,  preparing, exploring, & testing the data.
   - [] acquire.py - used to acquire data
   - [] prepare.py - used to prepare data
 
  
## V. PROJECT REPRODUCTION:
### Steps to Reproduce
 
- [] You will need an env.py file that contains the hostname, username, and password of the mySQL database that contains the telco_churn database
- [] Store that env file locally in the repository
- [] Make .gitignore and confirm .gitignore is hiding your env.py file
- [] Clone my repo (including the acquire.py and prepare.py)
- [] Import python libraries:  pandas, matplotlib, seaborn, numpy, and sklearn
- [] Follow steps as outlined in the README.md. and Churn_Work.ipynb
- [] Run Zillow_Report.ipynb to view the final product

In [None]:
# viz

In [None]:
# stat is