# Regression
## Acquisition and Preparation Exercises
***

##### Let's set up an example scenario as perspective for our regression exercises using the Zillow dataset:

As a Codeup data science graduate, you want to show off your skills to the Zillow data science team in hopes of getting an interview for a position you saw pop up on LinkedIn. _You thought it might look impressive to build an end-to-end project in which you use some of their Kaggle data to predict property values using some of their available features_; who knows, you might even do some feature engineering to blow them away.  
    
         Your goal is to predict the values of single unit properties using the obervations from 2017.
***
_In these exercises, you will complete the first step toward the above goal: acquire and prepare the necessary Zillow data from the zillow database in the Codeup database server_.
***
1. Acquire bedroomcnt, bathroomcnt, calculatedfinishedsquarefeet, taxvaluedollarcnt, yearbuilt, taxamount, and fips from the zillow database for all 'Single Family Residential' properties.
***
2. Using your acquired Zillow data, walk through the summarization and cleaning steps in your wrangle.ipynb file like we did above. You may handle the missing values however you feel is appropriate and meaninful; remember to document your process and decisions using markdown and code commenting where helpful.
***
3. Store all of the necessary functions to automate your process from acquiring the data to returning a cleaned dataframe witn no missing values in your wrangle.py file. Name your final function wrangle_zillow.
***
***

In [2]:
from env import host, username, password, get_db_url
import os
import pandas as pd 

# 1.) 
- Acquire from the zillow database for all 'Single Family Residential' properties (2017) the following:
   - bedroomcnt
   - bathroomcnt
   - calculatedfinishedsquarefeet
   - taxvaluedollarcn
   - yearbuilt
   - taxamount
   - fips

In [5]:
# query needs to predict the values of single unit properties using observations from 2017
# single family residential is identified at id 261, this must be what is meant by single unit 
# unless 279, inferred single family residential, is relevant as well.
# inferred: deduced or concluded (information) from evidence and reasoning rather than from explicit statements.
# 279 contains a single value, should be fine to avoid. 

# propertylandusetypeid will be the foreign key for joining propertylandusetype

def acquire_zillow_data(use_cache=True):
    if os.path.exists('zillow.csv') and use_cache:
        print('Using cached CSV')
        return pd.read_csv('zillow.csv')
    print('Acquiring data from SQL database')
    df = pd.read_sql('''
                    SELECT bedroomcnt, bathroomcnt, calculatedfinishedsquarefeet,
                    taxvaluedollarcnt, yearbuilt, taxamount, fips
                        FROM properties_2017
                        JOIN propertylandusetype USING(propertylandusetypeid)
                        WHERE propertylandusetypeid = 261
                     '''
                    , get_db_url('zillow'))
    df.to_csv('zillow.csv', index=False)
    
    
    return df

In [9]:
df = acquire_zillow_data()

Using cached CSV


In [10]:
df.head()

Unnamed: 0,bedroomcnt,bathroomcnt,calculatedfinishedsquarefeet,taxvaluedollarcnt,yearbuilt,taxamount,fips
0,0.0,0.0,,27516.0,,,6037.0
1,0.0,0.0,,10.0,,,6037.0
2,0.0,0.0,,10.0,,,6037.0
3,0.0,0.0,,2108.0,,174.21,6037.0
4,4.0,2.0,3633.0,296425.0,2005.0,6941.39,6037.0


##### Nailed it. 
                                                      
                                                      moving on.

# 2.)
Using your acquired Zillow data, walk through the summarization and cleaning steps in your wrangle.ipynb file like we did above. You may handle the missing values however you feel is appropriate and meaninful; remember to document your process and decisions using markdown and code commenting where helpful.