### Creating A Model to Predict Walk Scores in Seattle

The dataset I'm using is based on 1000 random coordinates within the Seattle city limits. Based on these 1000 random coordinates, I merged on demographic data from the census tract that encompasses the coordinates. To round out the feature set, I definied a ammenity and distance combinations that I thought would be imporant for walkability. I queried these combinations using the Location IQ API. Then I tried to evaluate the surrounding street network within .25 km these points. I used the Open Street Maps API for this. Lastly, I merged information about the zoning features of the randomly generated points from data downloaded from the City of Seattle. 

Census Tract Features:
- Population
- Population Density

Ammenity and Distances:
- Number of restaurants within 1 km
- Number of schools within 1 km
- Number of parks within 1 km
- Number of Bus Stations within 10 km
- Number of Super Markets within 1 km
- Number of pubs within 1 km
- Number of parks within 2.5 km
- Number of restaurants within 1.5 km

Surrounding Street Network:
- Number of intersections
- Mean number of streets that emanate from each node
- Circuity is the sum of edge lengths divided by the sum of straight-line distances between edge endpoints. Calculates straight-line distance as euclidean distance if projected or great-circle distance if unprojected.
- Street Length Average: Total street segment length / count of street segments
- Distance to closest highway 
- Distance to closest primary road 
- Distance to closest secondary road 
- Distance to closest intersection
- Distance to closest traffic signal

Zoning Features:
- Zoned for light rail
- Zoned as pedestrian zone
- Zoned as Historic
- Zoning Class
- Distance to nearest residental zoned area
- Distance to nearest commerical zoned area
- Distance to nearest industrial zoned area

## Research Question

The research question I'm trying to answer is: 

What are the most important features to a walk score?

I'm trying to predict the walk score of a given random coordinates in Seattle. 
In this project, supervised learning will be done to evaluate my model vs. the walk scored from Redfin's model.

### Why is this algorithm a good way of answering your research question?

## Model Development

In [2]:
# import packages
import pandas as pd
import json
#import osmnx as ox
import requests
import random
import time

In [4]:
data_gdf = pd.read_csv('Data/master_walk_score_comp.csv', index_col=0)

In [5]:
data_gdf.head()

Unnamed: 0,lat_left,lon_left,GEOID,restaurant_count,school_count,park_count,bus_station_count,supermarket_count,pub_count,parkwide_count,...,population,AREA_ACRES,tract,geometry_y,Land_Area_Km2,pop_den,industry_dist,commercial_dist,residential_dist,walk_score
0,47.648701,-122.362286,53033005901,10,1,5,3,1,9,10,...,3570,235.297362,5901,POLYGON ((1178008.9642270755 852317.9935750181...,0.952215,3749.151383,0.0,345.834513,103.157745,70
1,47.658157,-122.373453,53033005901,10,0,2,1,4,10,10,...,3570,235.297362,5901,POLYGON ((1178008.9642270755 852317.9935750181...,0.952215,3749.151383,0.0,293.421351,360.651397,53
2,47.653362,-122.370374,53033005901,8,1,3,1,0,3,10,...,3570,235.297362,5901,POLYGON ((1178008.9642270755 852317.9935750181...,0.952215,3749.151383,97.780974,130.475997,0.0,64
3,47.694938,-122.304282,53033002100,2,5,2,7,0,2,10,...,4423,337.392028,2100,POLYGON ((1195070.9481855484 864684.7671727433...,1.365378,3239.395258,1886.512436,87.72318,0.0,57
4,47.690396,-122.305112,53033002100,5,5,4,7,0,3,10,...,4423,337.392028,2100,POLYGON ((1195070.9481855484 864684.7671727433...,1.365378,3239.395258,1922.132313,63.104194,0.0,64


Tinker with parameters at least 3 times (changing learning rate, changing features, changing k like in KNN, etc). You may tinker with the same kind of parameter 3 times, it doesn't have to be 3 different parameters. (example: you can just tinker with k. k=1, k=3, or k=8) Or you might want to tinker with features and also your k value or whatever is appropriate for your algorithm. (3pts)

Report the accuracy of your model. Either through RMSE or another metric. How did accuracy change with your parameter tinkering? (3pts)

Create a visualization demonstrating your findings. Make sure to include a title and axis labels. Describe what's being shown in your visualization. (3pts)

What challenges did you run into? Do you think it was because of the data, the model, the research question? How would you overcome these challenges? (4pts)

We learned a little bit about how our models can affect real people in the world. Name 2 potential benefits of your model and 2 potential harms. You can even look at the Wikipedia page on Algorithmic Bias (Links to an external site.) for inspiration. Every model has consequences, what can you think of? If your data is really not amenable to this question, simply write about any other example we covered in class, such as the Boston housing dataset or hate speech detectors. (6pts)

Name one research question you might ask next for future work (don't worry, you don't have to do it!) Why is it important? (2pts)