# Capstone Project - The Battle of Neighborhoods (Week 1)
## Analyzing Geospatial Data using Foursquare and Python

__created by MaroDataScience__

## Introduction

### Problem Description

This is about a problem which all visionaires are confronted with: __Searching the right place for establishing your company__. 
\
\
As I am starting my data science and machine learning career, I was wondering where to place a __Data Science Start-Up__ near by my residence. The city to look for a good place should be __Frankfurt in Germany__. But what are the expectations of such a location. So the idea I had was to look for existing and good rated Data Science companies (in Germany) and analyze their locations for comparing them to the neighborhoods in Frankfurt. 

### Business Question

The challenging part about this project is to generate features from location data of Data Science companies for clustering the neighborhoods in Frankfurt to answer the question: 
#### __Where to place my Data Science Start-Up in Frankfurt, Germany to get the best environment for workmates and customers?__

### Addressed Audience
In this project several important techniques are covert to collect environmental data about company locations and apply that data to a selection of locations in destination city. So in short this is interesting for all those future entrepreneurs considering to start a business in a selected city. 

## Data

### Required Data
To answer the __Business Question__ the following data is required:
* zips, neighborhoods and their coordinates in Frankfurt
* zips, neighborhoods and their coordinates of german data science companies
* venues in neighborhoods in Frankfurt
* venues nearby german data science companies

### How the data will be used answer the question
* Use Foursquare and pgeocode (geopy doesn't work here) to get:
 1. the top 10 venues nearby german data science companies
 1. venues nearby neighborhoods in Frankfurt
* Use good rated companies
* Analyze the venues nearby german data science companies and look for similarities between the different locations
* Apply the results of the analysis to the venues of the neighborhoods in Frankfurt
* Map the data with folium to visualize the findings

## Preview of Frankfurt Neighborhoods with folium

In [1]:
import pandas as pd 
import numpy as np
import folium
# !pip install pgeocode
import pgeocode

Get the latitude and longitiude of Frankfurt via google

In [2]:
latitude = 50.110924
longitude = 8.682127

I parsed the table including the __zip__ codes and __neighborhoods__ from a german zip code website and saved it in the following .csv file.

In [3]:
frankfurt_data = pd.read_csv('frankfurt_parts.csv')

In [4]:
frankfurt_data.rename({"Stadtteil": "Neighborhood", "Postleitzahl": "ZIP"}, axis=1, inplace=True)
frankfurt_data.head()

Unnamed: 0,Neighborhood,ZIP
0,Altstadt,"60311, 60313"
1,Bahnhofsviertel,60329
2,Bergen-Enkheim,"60388, 60389"
3,Berkersheim,60435
4,Bockenheim,"60325, 60431, 60486, 60487"


Split the comma seperated ZIP codes and insert for every ZIP code a new row with the same Neighborhood.

In [5]:
frankfurt_df = frankfurt_data.assign(ZIP=frankfurt_data['ZIP'].str.split(',')).explode('ZIP').reset_index(drop=True)

Group the dataframe by the ZIPs and join the Neighborhoods with commas.

In [6]:
print("shape before grouping zips: {}".format(frankfurt_df.shape))
# trim whitespaces
frankfurt_df['ZIP'] = frankfurt_df['ZIP'].str.strip()
fr_df = frankfurt_df.groupby('ZIP')['Neighborhood'].apply(lambda x: ", ".join(x)).to_frame().reset_index()
print("shape after grouping zips: {}".format(fr_df.shape))
fr_df.head()

shape before grouping zips: (116, 2)
shape after grouping zips: (42, 2)


Unnamed: 0,ZIP,Neighborhood
0,60306,Westend-Süd
1,60308,Westend-Süd
2,60310,Innenstadt
3,60311,"Altstadt, Innenstadt"
4,60312,Innenstadt


Initialize a __pgeocode__ instance with german __Nominatim__ and query the ZIP codes to get the related __latitudes and longitudes__.

In [7]:
zipcoder = pgeocode.Nominatim('de')
fr_lat_lng = zipcoder.query_postal_code(fr_df['ZIP'].values)
fr_lat_lng.head(5)

Unnamed: 0,postal_code,country_code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
0,60306,DE,Frankfurt am Main,Hessen,HE,Regierungsbezirk Darmstadt,64.0,"Frankfurt am Main, Stadt",6412.0,50.1159,8.6702,6.0
1,60308,DE,Frankfurt am Main,Hessen,HE,Regierungsbezirk Darmstadt,64.0,"Frankfurt am Main, Stadt",6412.0,50.1125,8.6529,6.0
2,60310,DE,Frankfurt am Main,Hessen,HE,Regierungsbezirk Darmstadt,64.0,"Frankfurt am Main, Stadt",6412.0,50.1107,8.673,6.0
3,60311,DE,Frankfurt am Main,Hessen,HE,Regierungsbezirk Darmstadt,64.0,"Frankfurt am Main, Stadt",6412.0,50.1112,8.6831,6.0
4,60312,,,,,,,,,,,


Merge the coordinates from the __fr_lat_lng__ dataframe with the __fr_df__ dataframe

In [8]:
frankframe = pd.concat([fr_df, fr_lat_lng[['latitude', 'longitude']]], axis=1)
# drop the row where we didn't receive coordinates
frankframe.dropna(axis=0, inplace=True)
frankframe.head()

Unnamed: 0,ZIP,Neighborhood,latitude,longitude
0,60306,Westend-Süd,50.1159,8.6702
1,60308,Westend-Süd,50.1125,8.6529
2,60310,Innenstadt,50.1107,8.673
3,60311,"Altstadt, Innenstadt",50.1112,8.6831
5,60313,"Altstadt, Innenstadt",50.1153,8.6823


__Now let's display the neighborhoods of Frankfurt via folium__

In [10]:
# create map of frankfurt using latitude and longitude values
map_frankfurt = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough in zip(frankframe['latitude'], frankframe['longitude'], frankframe['Neighborhood']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_frankfurt)  
    
map_frankfurt

## Methodology

In [None]:
# To be continued

## Results

In [None]:
# To be continued

## Discussion

In [None]:
# To be continued

## Conclusion

In [None]:
# To be continued