# Adding a little fun to a Dallas business trip.

You are traveling to Dallas Texas for a business conference at the Hilton Anatole.   A rental car was not included in the company travel budget since the conference is in the hotel that you are staying at. You are unfamiliar with the area and like to explore some of the city without spending a lot of money on Uber fare. You want to find trending places in the area.


We will use Foursquare to see some of the top trending places that are in walking distance or a Uber ride away. The top places will be shown on a map using Folium to show the distance from the hotel. 

In [201]:
import requests 
import pandas as pd 
import numpy as np 
import random 
import matplotlib.pylab as plt 
from pandas.io.json import json_normalize
from urllib.request import urlopen
import ssl
import csv
from IPython.display import Image 
from IPython.core.display import HTML
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Done')

Done


In [208]:
!pip install geopy
from geopy.geocoders import Nominatim
print('Done')

Done


### Methodology: The exploratory data analysis.

#### We start by locating the hotel's longitude and latitude 

In [112]:
address = '2201 N Stemmons Fwy Dallas, TX 75207'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)


32.7997897 -96.82897066229901


#### Now we have our starting point. We explore the venues around our new found location. It will permit accurate results.

In [277]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)

In [120]:
results = requests.get(url).json()

{'meta': {'code': 200, 'requestId': '5f1760851db3e514e926ce9c'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Northwest Dallas',
  'headerFullLocation': 'Northwest Dallas, Dallas',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 24,
  'suggestedBounds': {'ne': {'lat': 32.8042897045, 'lng': -96.8236271335013},
   'sw': {'lat': 32.795289695499996, 'lng': -96.83431419109672}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '503eae61e4b0052cdc5903dc',
       'name': 'Sēr',
       'contact': {},
       'location': {'address': '2201 N Stemmons Fwy',
        'crossStreet': 'in Hilton Anatole',
        'lat': 32.799945,
        'l

In [287]:
items = results['response']['groups'][0]['items']

#### Now the results are in we can begin to start piecing things together by building a 2-dimensional labeled data structure with columns of potentially different types. It has similarities to a spreadsheet.

In [128]:
df = json_normalize(items) 
df.head(2)

Unnamed: 0,reasons.count,reasons.items,referralId,venue.beenHere.count,venue.beenHere.lastCheckinExpiredAt,venue.beenHere.marked,venue.beenHere.unconfirmedCount,venue.categories,venue.delivery.id,venue.delivery.provider.icon.name,...,venue.location.state,venue.name,venue.photos.count,venue.photos.groups,venue.stats.checkinsCount,venue.stats.tipCount,venue.stats.usersCount,venue.stats.visitsCount,venue.venuePage.id,venue.verified
0,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-503eae61e4b0052cdc5903dc-0,0,0,False,0,"[{'id': '4bf58dd8d48988d1cc941735', 'name': 'S...",,,...,TX,Sēr,0,[],0,0,0,0,,False
1,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4a6e9885f964a520edd41fe3-1,0,0,False,0,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,...,TX,Hilton,0,[],0,0,0,0,,True


#### In the beginning the dataframe has a vast amount of information. The next step would to clean the data. This process will help you improve the quality of your data. 

In [167]:
df.rename(columns={'venue.name': 'venue', 'venue.location.lat':'lat', 'venue.location.lng':'lng', 'venue.location.postalCode':'postcode', 'venue.location.address':'address'}, inplace=True)
df.head(2)

Unnamed: 0,reasons.count,reasons.items,referralId,venue.beenHere.count,venue.beenHere.lastCheckinExpiredAt,venue.beenHere.marked,venue.beenHere.unconfirmedCount,venue.categories,venue.delivery.id,venue.delivery.provider.icon.name,...,venue.location.state,venue,venue.photos.count,venue.photos.groups,venue.stats.checkinsCount,venue.stats.tipCount,venue.stats.usersCount,venue.stats.visitsCount,venue.venuePage.id,venue.verified
0,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-503eae61e4b0052cdc5903dc-0,0,0,False,0,"[{'id': '4bf58dd8d48988d1cc941735', 'name': 'S...",,,...,TX,Sēr,0,[],0,0,0,0,,False
1,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4a6e9885f964a520edd41fe3-1,0,0,False,0,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",,,...,TX,Hilton,0,[],0,0,0,0,,True


#### Now the dataframe is cleaner and easier to read. We now can see a list of addresses, latitudes, longitudes and the name of the venues near our hotel

In [168]:
df2 = df.drop(columns=['reasons.count', 'venue.location.distance', 'venue.location.formattedAddress', 'reasons.items', 'referralId', 'venue.beenHere.count', 'venue.beenHere.marked', 'venue.beenHere.unconfirmedCount', 'venue.delivery.id', 'venue.delivery.provider.icon.name', 'venue.delivery.provider.icon.name', 'venue.location.state', 'venue.photos.count', 'venue.photos.groups', 'venue.stats.checkinsCount', 'venue.stats.tipCount', 'venue.location.city', 'venue.photos.groups', 'venue.venuePage.id', 'venue.location.state', 'reasons.items', 'venue.photos.count', 'reasons.count', 'venue.venuePage.id', 'venue.delivery.provider.icon.name', 'venue.delivery.provider.icon.prefix', 'venue.stats.tipCount', "venue.stats.visitsCount", 'venue.stats.usersCount', 'venue.location.labeledLatLngs', 'venue.verified', 'venue.location.crossStreet', 'venue.location.cc', 'venue.hereNow.summary', 'venue.hereNow.groups', 'venue.hereNow.count', 'venue.delivery.url', 'venue.delivery.provider.name', 'venue.delivery.provider.icon.sizes', 'venue.categories', 'venue.beenHere.lastCheckinExpiredAt', 'venue.location.country', 'venue.id'], axis=1)
df2.head(15)

Unnamed: 0,address,lat,lng,postcode,venue
0,2201 N Stemmons Fwy,32.799945,-96.829562,75207.0,Sēr
1,2201 N Stemmons Fwy,32.799841,-96.829148,75207.0,Hilton
2,2026 Farrington St,32.796797,-96.828971,75207.0,Peticolas Brewing Company
3,,32.797895,-96.828018,,viva's
4,2201 N Stemmons Fwy,32.799401,-96.831787,75207.0,Verandah
5,,32.799755,-96.831831,,Sculpture Garden At Hilton Anatole
6,1950 Market Center Blvd,32.797356,-96.824487,75207.0,Ferris Wheeler's Backyard and BBQ
7,2201 N Stemmons Fwy,32.80025,-96.828774,75207.0,Counter Offer
8,2222 Vantage St,32.79917,-96.833974,75207.0,Pegasus City Brewery
9,Hilton Anatole,32.800518,-96.830171,75207.0,Executive Lounge


In [140]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
print("Done")

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                       

In [191]:
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

In [224]:
len(df2['venue'].unique())

24

#### Here we begin preprocessing. A big part of preprocessing is encoding. We are converting our data into computer code.

In [231]:
df_onehot = pd.get_dummies(df2[['venue']], prefix="", prefix_sep="")


df_onehot['postalcode'] = df2['postcode'] 
df_onehot['venue'] = df2['venue'] 


fixed_columns = list(df_onehot.columns[-3:]) + list(df_onehot.columns[:-3])
df_onehot = df_onehot[fixed_columns]

print(df_onehot.shape)
df_onehot.head()

(24, 26)


Unnamed: 0,viva's,postalcode,venue,Best Western Market Center,City View Terrace,Counter Offer,Courtyard by Marriott - Dallas Market Center,Days Inn,DoubleTree by Hilton,Executive Lounge,...,Media Grill + Bar,Pegasus City Brewery,Peticolas Brewing Company,Sculpture Garden At Hilton Anatole,Sheraton Suites Market Center Dallas,Sēr,Terrace Bar & Grill,The Anatole Pool & Bar,The Renaissance Club Lounge,Verandah
0,0,75207.0,Sēr,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
1,0,75207.0,Hilton,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,75207.0,Peticolas Brewing Company,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
3,1,,viva's,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,75207.0,Verandah,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [232]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### To narrow our findings will now sort the data and have it only return the to 10 venues near the hotel.

In [274]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['venue']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['venue'] = df2['venue']

for ind in np.arange(df2.shape[0]):
    row_categories = df2.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    venues_sorted.iloc[ind, 9:] = row_categories_sorted.index.values[0:num_top_venues]
    
venues_sorted.head()



Unnamed: 0,venue,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Sēr,,,,,,,,,venue,postcode
1,Hilton,,,,,,,,,venue,postcode
2,Peticolas Brewing Company,,,,,,,,,venue,postcode
3,viva's,,,,,,,,,venue,postcode
4,Verandah,,,,,,,,,venue,postcode


#### Refine and cluster the venues using the k-means algorithm. This process is used to find groups which have not been explicitly labeled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets. Once the algorithm has been run and the groups are defined, any new data can be easily assigned to the correct group.

In [245]:
kclusters = 2

df_clustering = df2.drop(['venue'], 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit

clustered_df = df2

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit

clustered_df


Unnamed: 0,address,lat,lng,postcode,venue
0,2201 N Stemmons Fwy,32.799945,-96.829562,75207.0,Sēr
1,2201 N Stemmons Fwy,32.799841,-96.829148,75207.0,Hilton
2,2026 Farrington St,32.796797,-96.828971,75207.0,Peticolas Brewing Company
3,,32.797895,-96.828018,,viva's
4,2201 N Stemmons Fwy,32.799401,-96.831787,75207.0,Verandah
5,,32.799755,-96.831831,,Sculpture Garden At Hilton Anatole
6,1950 Market Center Blvd,32.797356,-96.824487,75207.0,Ferris Wheeler's Backyard and BBQ
7,2201 N Stemmons Fwy,32.80025,-96.828774,75207.0,Counter Offer
8,2222 Vantage St,32.79917,-96.833974,75207.0,Pegasus City Brewery
9,Hilton Anatole,32.800518,-96.830171,75207.0,Executive Lounge


#### Our last step is data visualization. It will show a graphic representation of data. It involves producing images that communicate relationships among the represented data to viewers of the images. We will achieve this through the use of a systematic mapping between graphic marks and data values in the creation of the visualization.

In [271]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15)

folium.CircleMarker(
    [latitude, longitude],
    radius=11,
    color='red',
    popup='Hilton Anatole',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

markers_colors = []
for lat, lng, post, vne, in zip(clustered_df['lat'], clustered_df['lng'], clustered_df['postcode'], clustered_df['venue']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(lng, lat, vne, post), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='purple',
        fill=True,
        fill_color='pink',
        fill_opacity=0.7).add_to(venues_map)
       
venues_map

#### You can visualize the top 10 venues close to the Hilton Anatole hotel. The venues have been displayed in an interactive stamen style map. Map markers were used to display the individual geographic location of the venues. The red map marker is the hotel. When using maps to find locations it is always nice to show a beginning point. A "You are here" point. The pink markers outlined in purple indicate our venues. The way this map is interactive is you can click on any marker. It will display the coordinates and the name of the venues.

#### Most of the venues are different types of restaurants. The Hilton Anatole is a fairly large hotel. If you don't want to go far you can stay in the hotel. You can have a taste of the Texas life without going far. You have your choice of nightlife and restaurants. You can't go to Texas without having a little barbecue. Ferris Wheelers Backyard and BBQ is only a 7 minute walk away! Uber money well saved.

#### In conclusion we have found there a vast variety of things to do that are only walking distance from the hotel. It shows that even no matter where you are in Dallas there will always be something fun to do.