# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project analyze the places that people frequent in New York at night, and know the best places to visit

The goal of this is to show people whatever the best and safest places to visit on a vacation, or even if you want to go for a walk with your family.

This does not expect to solve social problems or anything like that, the main idea is to serve as a tool that guides people in their process of going on vacation or distracting the mind in New York City

## Data <a name="data"></a>

According to the definition of our problem, the factors that will influence our decision are:
* New York places visited most frequently
* The schedule that people frequent most (and in this case we are interested in night time)
* number of people visiting these different places

To carry out this study we take a database of visits in the city of New Your de Foursquare of the year 2013, which has user, lat, lng, time you visit, place you visit and with this to be able to order our data and show those places that we interested people can visit on vacation

First we will import the functions and data that you can get to use, and after that our csv file with the information

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from datetime import datetime
# import time
import re

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

Folium installed
Libraries imported.


In [2]:
df = pd.read_csv('NYC.csv', sep=',')
df.head()

Unnamed: 0,userId,venueId,venueCategoryId,venueCategory,latitude,longitude,timezoneOffset,utcTimestamp
0,470,49bbd6c0f964a520f4531fe3,4bf58dd8d48988d127951735,Arts & Crafts Store,40.71981,-74.002581,-240,Tue Apr 03 18:00:09 +0000 2012
1,979,4a43c0aef964a520c6a61fe3,4bf58dd8d48988d1df941735,Bridge,40.6068,-74.04417,-240,Tue Apr 03 18:00:25 +0000 2012
2,69,4c5cc7b485a1e21e00d35711,4bf58dd8d48988d103941735,Home (private),40.716162,-73.88307,-240,Tue Apr 03 18:02:24 +0000 2012
3,395,4bc7086715a7ef3bef9878da,4bf58dd8d48988d104941735,Medical Center,40.745164,-73.982519,-240,Tue Apr 03 18:02:41 +0000 2012
4,87,4cf2c5321d18a143951b5cec,4bf58dd8d48988d1cb941735,Food Truck,40.740104,-73.989658,-240,Tue Apr 03 18:03:00 +0000 2012


We will extract the time of each event, to be able to have the data filtered for a specific time and to have a new Dataframe with only the data that interests us .. we will also clean all the information a bit


In [3]:
df['latitude'] = df['latitude'].astype(float)
df['longitude'] = df['longitude'].astype(float)

df['utcTimestamp'] = pd.to_datetime(df['utcTimestamp'])
df['hour'] = df['utcTimestamp'].dt.hour
df.head()

Unnamed: 0,userId,venueId,venueCategoryId,venueCategory,latitude,longitude,timezoneOffset,utcTimestamp,hour
0,470,49bbd6c0f964a520f4531fe3,4bf58dd8d48988d127951735,Arts & Crafts Store,40.71981,-74.002581,-240,2012-04-03 18:00:09+00:00,18
1,979,4a43c0aef964a520c6a61fe3,4bf58dd8d48988d1df941735,Bridge,40.6068,-74.04417,-240,2012-04-03 18:00:25+00:00,18
2,69,4c5cc7b485a1e21e00d35711,4bf58dd8d48988d103941735,Home (private),40.716162,-73.88307,-240,2012-04-03 18:02:24+00:00,18
3,395,4bc7086715a7ef3bef9878da,4bf58dd8d48988d104941735,Medical Center,40.745164,-73.982519,-240,2012-04-03 18:02:41+00:00,18
4,87,4cf2c5321d18a143951b5cec,4bf58dd8d48988d1cb941735,Food Truck,40.740104,-73.989658,-240,2012-04-03 18:03:00+00:00,18


Once we have our date transformed and as we require it, what we are going to do in the first place is to show the places that people most frequent in New York City between 6 pm and 12am. to have a first impression of which are those places with greater visits. In the same way we will show the 5 least frequent.


In [4]:
df1 = df[df["hour"].between(18, 23, inclusive = True)]
df1.head()

Unnamed: 0,userId,venueId,venueCategoryId,venueCategory,latitude,longitude,timezoneOffset,utcTimestamp,hour
0,470,49bbd6c0f964a520f4531fe3,4bf58dd8d48988d127951735,Arts & Crafts Store,40.71981,-74.002581,-240,2012-04-03 18:00:09+00:00,18
1,979,4a43c0aef964a520c6a61fe3,4bf58dd8d48988d1df941735,Bridge,40.6068,-74.04417,-240,2012-04-03 18:00:25+00:00,18
2,69,4c5cc7b485a1e21e00d35711,4bf58dd8d48988d103941735,Home (private),40.716162,-73.88307,-240,2012-04-03 18:02:24+00:00,18
3,395,4bc7086715a7ef3bef9878da,4bf58dd8d48988d104941735,Medical Center,40.745164,-73.982519,-240,2012-04-03 18:02:41+00:00,18
4,87,4cf2c5321d18a143951b5cec,4bf58dd8d48988d1cb941735,Food Truck,40.740104,-73.989658,-240,2012-04-03 18:03:00+00:00,18


Let's visualize the data we have so far: city center location and candidate neighborhood centers:

In [5]:
most_visited = df.groupby(['venueCategory']).size().reset_index(name='Contador').sort_values(by='Contador', ascending=False)
most_visited.head(10)

Unnamed: 0,venueCategory,Contador
22,Bar,15978
121,Home (private),15382
165,Office,12740
223,Subway,9348
114,Gym / Fitness Center,9171
54,Coffee Shop,7510
94,Food & Drink Shop,6596
239,Train Station,6408
170,Park,4804
161,Neighborhood,4604


In this case we wanted to show that the majority of the population is doing at night and we can see that the main place they turn to is the bars, their home, and even Subway to eat. The gym is also one of the places that the people of New York frequent visiting.

Now, what are the 10 least frequented at night time?
With the following query we can see it

In [6]:
most_visited.tail(10)

Unnamed: 0,venueCategory,Contador
73,Distillery,8
127,Internet Cafe,6
111,Gluten-free Restaurant,5
211,Sorority House,4
0,Afghan Restaurant,4
174,Pet Service,3
153,Motorcycle Shop,2
176,Photography Lab,2
48,Castle,2
157,Music School,1


As expected places like the veterinary, the school, some motorcycle shops are the ones that people are less used to visiting, surely at that time to be the least frequented, they are the least safe places and we would not recommend visiting during your vacation.

Before showing Maps that show us those places or things like that, how about we see this but for visits in the afternoon hours quickly? will we be surprised with some sites ??

See it

In [7]:
df2 = df[df["hour"].between(12, 17, inclusive = True)]
most_visited1 = df2.groupby(['venueCategory']).size().reset_index(name='Contador').sort_values(by='Contador', ascending=False)
most_visited1.head(10)

Unnamed: 0,venueCategory,Contador
160,Office,8471
53,Coffee Shop,3862
216,Subway,3443
111,Gym / Fitness Center,3229
118,Home (private),3221
232,Train Station,2302
92,Food & Drink Shop,2074
66,Deli / Bodega,1727
55,College Academic Building,1724
138,Medical Center,1710


Apparently people in New York are very likely to keep in their work places, although they enjoy going to coffee, talking and enjoying the views offered by this beautiful city, talking about business and why not job opportunities

And clearly we also have to like shopping, eating with friends and touring the city by train


Returning to the theme of night schedules, we now proceed to show how the places visited by the most frequented people are distributed, so that there is a better visualization of the areas frequented by the inhabitants of this city. In this case we will look at it for the Bars that is the place of more visits in 102 North End Ave


In [8]:
CLIENT_ID = 'OFW3KNIFXUC5QP3FBCHJUQAQHTMCIW012MF4XTMQZ3JI40YY' # your Foursquare ID
CLIENT_SECRET = 'KHCOQXBMX54WX52CUAY3ZPEBY4E005MYBFIGJEN2AJEJNJVV' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OFW3KNIFXUC5QP3FBCHJUQAQHTMCIW012MF4XTMQZ3JI40YY
CLIENT_SECRET:KHCOQXBMX54WX52CUAY3ZPEBY4E005MYBFIGJEN2AJEJNJVV


In [9]:
df3 = df1[df1['venueCategory']=='Bar']

In [15]:
address = '102 North End Ave, New York, NY'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of New York are {}, {}.'.format(latitude, longitude))

The coordinates of New York are 40.7149555, -74.0153365.


In [16]:

venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Bar',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(df3.latitude, df3.longitude, df3.venueCategory):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

If we zoom the map a little and look closely at the Washington Square Village, there are usually the most popular bars of all, because it is a university and historical area of New York.

We are going to do the same exercise but this time to see which are the parks where New Yorkers go most often and can recommend some to those who want to go on vacation

In [21]:
address = '102 North End Ave, New York, NY'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of New York are {}, {}.'.format(latitude, longitude))

The coordinates of New York are 40.7149555, -74.0153365.


In [22]:
df4 = df1[df1['venueCategory']=='Park']

In [23]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Park',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(df4.latitude, df4.longitude, df4.venueCategory):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

Based on this map, for those who wish to go out at night and meet New York, we recommend going to Manhattan and spending a beautiful night in Central Park! The definitely the coolest place to be at night and get to know the best traditions and places of this beautiful city!


We hope you can enjoy, be distracted, aware and above all learn a lot about the culture of this city.


I hope this little file manages to instruct you a little on how to take a good holiday in New York and make the most of the nights in this city! that you recharge yourself with good energies and get drunk in the best bars in the city!

Until next time!
Bbē
