# New York’s Preference in What Cup of Drink

### Background and Problem 

1.1 Background

Where is the most densely populated city in the United States? Of course, it’s New York. New York is definitely one of the places to start up a business. Think of the foot traffic!The convenient made to order drinks market make up a good portion of venues one sees when going out and about. These are the categories that I include in the CMTO(Convenient made to order) market: Coffee, Bubble Tea, Tea, and Juice.  “A market research study by Euromonitor …reveal[ed] that the American continent is a region that favors consumption of coffee versus tea. The United States consumes three times more coffee (75.4%) than tea (24.6%).” One can presume that coffee relative to other drinks in the category will overall be much more popular in New York.   

1.2 Problem

Let’s explore the profile for New York overll and each neighborhood. I want to see the numbers of location for each type of drink/brand and what are the top brands people are going to for coffee, bubble tea, tea, and juice.  

![image.png](attachment:image.png)

**Outline**

Obtain the Data

    1.a. Import the data from NYU website for New York's borough and neighborhood

    1.b. Obtain information about most popular boroughs and neighborhood.

    1.c. Use Foursquare Data to retrieve all drink-related venues for each neighborhood

Data Visualization and Some Simple Statistical Analysis.

    Analysis Using Clustering, Specially K-Means Clustering.

    3.a. Maximize the number of clusters.

    3.b. Visualization using Folium Map

    Analyze the Neighborhoods and note any observations worth noting

    Inference From these Results and related Conclusions. 

### Target Audience

1. A new business owner can identify neighborhoods that lack a certain drink category and explore whether that area would be a good place to start his or her business. 
2. A potential resident could factor in the profiles of each neighborhood and find places to live based on their preferences for what type of drinks.
3. Marketing can utilize this data to better target their ads utilizing geolocation of the customers. One example, a tea shop wants to put up a billboard ad and can look for neighborhoods that have the highest preferences for tea. Another example, utilize mobile ads on smart phones that target users in neighborhoods that highly prefer your drink or utilize geofencing to broadcast in certain gps regions. Think of the the cost savings potential and how to zone in on your target audience.

## Data Acquisition and Cleaning

### Import Libraries

In [19]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

### Load and explore the data.

In [20]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Explore the json data in a formatted way. So much cleaner than just a normal output! :))

In [21]:
print(json.dumps(newyork_data,indent =5))

{
     "type": "FeatureCollection",
     "totalFeatures": 306,
     "features": [
          {
               "type": "Feature",
               "id": "nyu_2451_34572.1",
               "geometry": {
                    "type": "Point",
                    "coordinates": [
                         -73.84720052054902,
                         40.89470517661
                    ]
               },
               "geometry_name": "geom",
               "properties": {
                    "name": "Wakefield",
                    "stacked": 1,
                    "annoline1": "Wakefield",
                    "annoline2": null,
                    "annoline3": null,
                    "annoangle": 0.0,
                    "borough": "Bronx",
                    "bbox": [
                         -73.84720052054902,
                         40.89470517661,
                         -73.84720052054902,
                         40.89470517661
                    ]
               }
          },
  

neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

In [22]:
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'geometry': {'coordinates': [-73.84720052054902, 40.89470517661],
  'type': 'Point'},
 'geometry_name': 'geom',
 'id': 'nyu_2451_34572.1',
 'properties': {'annoangle': 0.0,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661],
  'borough': 'Bronx',
  'name': 'Wakefield',
  'stacked': 1},
 'type': 'Feature'}

### Process the Information From Json to Pandas Dataframe

** Create data frame and import data **

In [23]:
# define the dataframe columnsm
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

In [24]:
# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [25]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [26]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


** Save dataframe to csv file. **

In [27]:
#neighborhoods.to_csv(path_or_buf='ny_neighborhoods.csv')

**Display the number of borough and neighborhood in New York.**

In [28]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


###  Use geopy library to get the latitude and longitude values of New York City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent ny_explorer, as shown below.

In [31]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


###### Create a map of New York with neighborhoods superimposed on top.

In [32]:
#create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

In [33]:
# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
map_newyork 