### Our Zillow scenario continues:

#### As a Codeup data science graduate, you want to show off your skills to the Zillow data science team in hopes of getting an interview for a position you saw pop up on LinkedIn. You thought it might look impressive to build an end-to-end project in which you use some of their Kaggle data to predict property values using some of their available features; who knows, you might even do some feature engineering to blow them away. Your goal is to predict the values of single unit properties using the observations from 2017.

#### In these exercises, you will run through the stages of exploration as you continue to work toward the above goal.

### 1. As with encoded vs. unencoded data, we recommend exploring un-scaled data in your EDA process.

Note: This notebook will be dedicated exploring data for the final zillow regression project.
Meaningful columns for data acquisition:

#### Acquire

In [None]:
'bathroomcnt'
'bedroomcnt'
'calculatedfinishedsquarefeet'
'fips'
'lotsizesquarefeet' # maybe?
'rawcensustractandblock' # maybe?
'regionidcounty'
'yearbuilt'
'taxvaluedollarcnt'
'taxamount'

In [2]:
# Importing libraries

import pandas as pd
import numpy as np
import requests
import os
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

from sklearn.model_selection import train_test_split
from scipy.stats import pearsonr, spearmanr

import env

In [18]:
# Getting conncection to mySQL database, and acquiring data

def get_connection(db, user=env.user, host=env.host, password=env.password):
    return f'mysql+pymysql://{user}:{password}@{host}/{db}'

# Loading raw data from Zillow database
def new_zillow_data():
    '''
    This function reads the Zillow data from the mySQL database into a df.
    '''
    # Create SQL query.
    sql_query = '''
    SELECT bedroomcnt as bedrooms, bathroomcnt as bathrooms, calculatedfinishedsquarefeet as square_feet, fips as fips_code, regionidcounty as county_id, yearbuilt as year_built, taxvaluedollarcnt as assessed_value, taxamount as tax_amount
FROM properties_2017 as p JOIN predictions_2017 as pred USING(parcelid) JOIN
propertylandusetype as ptype using (propertylandusetypeid)
WHERE ptype.propertylandusedesc LIKE '%%Single%%' and pred.transactiondate LIKE '2017%%';
    '''
    # Read in DataFrame from Codeup db.
    df = pd.read_sql(sql_query, get_connection('zillow'))
    
    return df

In [19]:
# load raw, messy data

rawdf = new_zillow_data()

In [20]:
rawdf.shape

(52441, 8)

#### Prepare
Create functions to clean data and split data into train, validate, test.

In [21]:
rawdf.isnull().sum()

bedrooms            0
bathrooms           0
square_feet        82
fips_code           0
county_id           0
year_built        116
assessed_value      1
tax_amount          4
dtype: int64

In [22]:
# Dropping null values

df = rawdf.dropna()

In [23]:
df.shape

(52315, 8)

### 2. Make sure to perform a train, validate, test split before and use only your train dataset to explore the relationships between independent variables with other independent variables or independent variables with your target variable.

### 3. Write a function named plot_variable_pairs that accepts a dataframe as input and plots all of the pairwise relationships along with the regression line for each pair.

### 4. Write a function named plot_categorical_and_continuous_vars that accepts your dataframe and the name of the columns that hold the continuous and categorical features and outputs 3 different plots for visualizing a categorical variable and a continuous variable.

### 5. Save the functions you have written to create visualizations in your explore.py file. Rewrite your notebook code so that you are using the functions imported from this file.

### 6. Use the functions you created above to explore your Zillow train dataset in your explore.ipynb notebook.

### 7. Come up with some initial hypotheses based on your goal of predicting property value.

### 8. Visualize all combinations of variables in some way.

### 9. Run the appropriate statistical tests where needed.

### 10. What independent variables are correlated with the dependent variable, home value?

### 11. Which independent variables are correlated with other independent variables (bedrooms, bathrooms, year built, square feet)?

### 12. Make sure to document your takeaways from visualizations and statistical tests as well as the decisions you make throughout your process.

### 13. Explore your dataset with any other visualizations you think will be helpful.