## ZILLOW PROJECT - FINAL REPORT

Project & Report Created By: Rachel Robbins-Mayhill 2022-03-29

---

## PROJECT DESCRIPTION

Zillow is the leading real estate and rental marketplace dedicated to empowering consumers with data, inspiration and knowledge around the place they call home, and connecting them with the best local professionals who can help. According to the National Association of Realtors, there are over 119 million homes in the United States, over 5 million of which are sold each year. 80% of these homes have been viewed on Zillow regardless of their market status.

Zillow serves the full lifecycle of owning and living in a home: buying, selling, renting, financing, remodeling and more. It starts with Zillow's living database of more than 110 million U.S. homes - including homes for sale, homes for rent and homes not currently on the market, as well as Zestimate home values, Rent Zestimates and other home-related information. 

The Zestimate is a key element driving webtraffic to Zillow, where sellers, buyers, agents, and curiosity-seekers gain knowledge of a home's value. In fact, over the years, Zillow has built a solid reputation around the Zestimate. The Zestimate takes in layers of data regarding a homes features and location and presents buyers and sellars with a value of a home. Zillow publishes Zestimates for 104 million homes, updating them weekly.

Although Zillow has a model to assist in predicting a home's value, they are looking to fine-tune the model and improve upon it. This project has been requested by the Zillow Data Science Team. 


### PROJECT GOAL

The goal of this project is to find key drivers of property value for Single Family Properties and to construct an improved Machine Learning Regression Model to predict property tax assessed values for these properties using the features of the properties. The improved model will help Zillow develop more accurate, dependable, and trustworthy Zestimates, thus sustaining and bolstering their loyal customer base. 

Upon completion of the model, the project will make recommendations on what does and doesn't impact property values and deliver the recommendations in a report to the Data Science team at Zillow, so they can understand the process that developed the conclusion and have the information available to replicate the findings. 


### INITIAL QUESTIONS

1. Is square feet of a property a driver of property value while controling for location?
2. Are the number of bedrooms and bathrooms a driver of the value of a property when controlling for square footage?
3. Is the square footage a driver of the value of a property when controllng for bedrooms and bathrooms?
4. Is adding a bedroom more valuable than adding square footage?

---

Imports used for this project can be viewed in the imports.py file located in the Regresssion Project Repository. 

In [11]:
from imports import *
import prepare

## AQUISITION & PREPARATION OF DATA

### I. Acquire the Data

The data for this report was acquired by accessing 'zillow' from the Codeup SQL database.
The following query was used to acquire the data:
   
   ________________________________________________________
    
    SELECT bedroomcnt AS bedrooms, 
        bathroomcnt AS bathrooms, 
        calculatedfinishedsquarefeet AS square_feet, 
        taxvaluedollarcnt AS assessed_value, 
        yearbuilt AS year_built, 
        taxamount AS tax_amount, 
        fips AS state_county_code,
        regionidcounty AS county_id
    FROM properties_2017
    JOIN predictions_2017 USING (parcelid)
    JOIN propertylandusetype USING(propertylandusetypeid)
    WHERE propertylandusedesc IN ("Single Family Residential",                       
                                  "Inferred Single Family Residential")
                                  AND predictions_2017.transactiondate LIKE '2017%%'
                                  

In [15]:
# Acquire data from SQL using module found in acquire.py
df = acquire.get_zillow_data()
# Obtain number of rows and columns for orginal dataframe
df.shape

Reading from csv file...


(52441, 8)

- Once acquired, a new table (Dataframe) containing all necessary data was created. 
- The table consisted of 52_2441 rows and 8 columns.

### II. Prepare the Data

This acquired table was then analyzed and adjusted to eliminate data errors, clarify confusion, and code non-numeric data into more useful numeric types. 
Some of the data correction strategies that were employed were:

Addressing Missing Data

 - Dropping 126 rows that contained missing values, as it was a very small portion of our data

Dropping Unnecessary Columns

Rename Columns

Standardize Data Types

Create categorical columns for data visualization and analysis:

Create Numeric Data Types for Data that was in Word From

The following columns needed to be transformed to numeric values:

In [16]:
df = prepare.wrangle_zillow(df)
df.head()
df.shape

(52315, 14)

- Upon completion of cleaning, the table had 52_315 rows and 14 columns.


### Results of Preparing the Data

### Splitting the Data

### Scaling the Data

---

## DATA EXPLORATION - Data in Context

### Correlations

### Exploratory Questions

## QUESTION 1:

## QUESTION 2:

## QUESTION 3:

### Hypothesis Testing

## Exploration Takeaways:

---

## DATA MODELING

### Baseline

### Best Models

---

## CONCLUSION

### Summary
The goal of this report was to identify drivers of property tax assessed value for Single Family Residences, and to construct an improved Machine Learning Regression Model to predict property tax assessed values for these properties using the features of the properties themselves. Additionally, this report aimed to make recommendations on what does and doesn't impact property values 

Through the process of data acquisition, preparation, exploration, and statistical testing, it was determined drivers of property tax assessed value for Single Family Residences were:

- 1: total rooms (bedrooms and bathrooms combined)
- 1: location -> Orange County
- 1. loaction -> Ventura County
- 1. location -> Los Angelas County
- 2. year built
- 3. square_feet


By using machine learning modeling, predictions to 





prevent churn were made with 80% accuracy within the best performing model. Using the modeling, a list of customers predicted to churn was created, to be useful in developing mechanisms within marketing and customer retention teams to prevent churn in high-risk customer groups.


### Recommendations

### Next Steps