## Table of contents
* [Introduction: Business Trends of Sports Market in Istanbul and Predicting the Next Trend](#introduction)
* [Collecting the Data](#data)

## Trends of Sports Market in Istanbul and Predicting the Next Trend <a name="introduction"></a>

Individual sports become more and more popular all over the globe due to raise of the self awareness and well-being philosophy is spreading among people.

#### The Problem
As a result, many disciplines of sports become more and more popular. However, **not all of them become more popular and almost none of them can keep itself as the most preferred one.**

In this report, you will find an analysis of sports market in Istanbul, more specifically, Kadikoy and Besiktas Towns and an prediction for an upcoming trend if possible.

#### The Interest

The results of this report may give a deeper insight to the investors planning to found a business targeting the sports market around the Istanbul and in other regions.

***

<h1>1- Collecting the Data</h1> <a name="data"></a>

- **Kadikoy** and **Besiktas** are one of the towns in Istanbul and Turkey that is one of the highest cultural and financial levels in Turkey. This makes them a great candidate to inpect the trends as these regions are generally flag carriers for such trends. The neighborhoods of these towns will be used. The data of this collected from https://postakodu.ptt.gov.tr/ 

- We will also need the longitude and latitude values for these neighborhoods, This will be obtained using Google Maps API geocoding
- We will also need the venue information and its details. This will be obtained using Foursquare Places API

- The data that is obtained and processed will be saved to csv files due to limited API use.

In [52]:
#importing essential libraries

import requests
import pandas as pd
import json
import numpy as np # library to handle data in a vectorized manner

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

## The neighborhood data

The regional data is taken from Turkish Postal Office. [Google Geocoding API](https://developers.google.com/maps/documentation/geocoding/start?hl=tr&utm_source=google&utm_medium=cpc&utm_campaign=FY18-Q2-global-demandgen-paidsearchonnetworkhouseads-cs-maps_contactsal_saf&utm_content=text-ad-none-none-DEV_c-CRE_379539525578-ADGP_Hybrid+%7C+AW+SEM+%7C+SKWS+~+Places+%7C+EXA+%7C+Geocoding+API-KWID_43700047426639958-aud-599078372864:kwd-724659181873-userloc_1012782&utm_term=KW_geocoding%20api-ST_geocoding+API&gclid=Cj0KCQjwpNr4BRDYARIsAADIx9yNpFvxe1C-hcil81jviNGd5c8lE-jqKjrnpIiYYvG7ddGcJWxa7vYaApVYEALw_wcB) is used to get the coordinates for each borough.

In [55]:
#this location data is taken from Turkish Postal Office

df= pd.read_csv('bjk_kdky.csv',delimiter=";",encoding="utf-8-sig")

## The data itself is missing lat, lon values

Geocoding API is needed to define the coordinates

In [3]:
df.head()

Unnamed: 0,City,Town,Neighborhood,Borough,Post Code
0,İSTANBUL,BEŞİKTAŞ,ABBASAĞA,ABBASAĞA MAH ...,34022
1,İSTANBUL,BEŞİKTAŞ,ABBASAĞA,CİHANNÜMA MAH ...,34022
2,İSTANBUL,BEŞİKTAŞ,ABBASAĞA,SİNANPAŞA MAH ...,34022
3,İSTANBUL,BEŞİKTAŞ,AKATLAR,AKAT MAH ...,34335
4,İSTANBUL,BEŞİKTAŞ,ARNAVUTKÖY,ARNAVUTKÖY MAH ...,34345


## Google Geocode API is used to retrieve the latitude and longtitude

In [57]:
lat_l,long_l=[],[]
for borough in df['Borough']:
    #api key will be removed
    url='https://maps.googleapis.com/maps/api/geocode/json?address={}, {} , Istanbul, Turkey&key=[removed]'.format(borough,df.loc[df['Borough'] == borough]['Town'])
    api_call = requests.get(url).json()
    lat_l.append(api_call['results'][0]['geometry']["location"]["lat"])
    long_l.append(api_call['results'][0]['geometry']["location"]["lng"])

df['lat']=lat_l
df['lng']=long_l

df.head(15)

Unnamed: 0,City,Town,Neighborhood,Borough,Post Code,lat,lng
0,İSTANBUL,BEŞİKTAŞ,ABBASAĞA,ABBASAĞA MAH ...,34022,41.04801,29.004528
1,İSTANBUL,BEŞİKTAŞ,ABBASAĞA,CİHANNÜMA MAH ...,34022,41.047211,29.007439
2,İSTANBUL,BEŞİKTAŞ,ABBASAĞA,SİNANPAŞA MAH ...,34022,41.041866,29.005112
3,İSTANBUL,BEŞİKTAŞ,AKATLAR,AKAT MAH ...,34335,41.085614,29.025626
4,İSTANBUL,BEŞİKTAŞ,ARNAVUTKÖY,ARNAVUTKÖY MAH ...,34345,41.068488,29.041629
5,İSTANBUL,BEŞİKTAŞ,ARNAVUTKÖY,KURUÇEŞME MAH ...,34345,41.06163,29.0329
6,İSTANBUL,BEŞİKTAŞ,BEBEK,BEBEK MAH ...,34342,41.077744,29.041629
7,İSTANBUL,BEŞİKTAŞ,ETİLER,ETİLER MAH ...,34337,41.087042,29.037264
8,İSTANBUL,BEŞİKTAŞ,GAYRETTEPE,BALMUMCU MAH ...,34349,41.059105,29.014714
9,İSTANBUL,BEŞİKTAŞ,GAYRETTEPE,DİKİLİTAŞ MAH ...,34349,41.055735,29.003801


## Preprocessing the neighborhood data

Turkish Characters are causing problems, therefore the following code is essential to have a better dataframe.

In [58]:
Tr2Eng = str.maketrans("ÇĞIÖŞÜ", "CGIOSU")
for Borough in df['Borough']:
    df.loc[df['Borough'] == Borough, 'Borough'] = Borough.translate(Tr2Eng).capitalize().strip()
for Neighborhood in df['Neighborhood']:
    df.loc[df['Neighborhood'] == Neighborhood, 'Neighborhood'] = Neighborhood.translate(Tr2Eng).capitalize().strip()
for Town in df['Town']:
    df.loc[df['Town'] == Town, 'Town'] = Town.translate(Tr2Eng).capitalize().strip()
for City in df['City']:
    df.loc[df['City'] == City, 'City'] = City.translate(Tr2Eng).capitalize().strip()
df.head()

Unnamed: 0,City,Town,Neighborhood,Borough,Post Code,lat,lng
0,İstanbul,Besi̇ktas,Abbasaga,Abbasaga mah,34022,41.04801,29.004528
1,İstanbul,Besi̇ktas,Abbasaga,Ci̇hannuma mah,34022,41.047211,29.007439
2,İstanbul,Besi̇ktas,Abbasaga,Si̇nanpasa mah,34022,41.041866,29.005112
3,İstanbul,Besi̇ktas,Akatlar,Akat mah,34335,41.085614,29.025626
4,İstanbul,Besi̇ktas,Arnavutkoy,Arnavutkoy mah,34345,41.068488,29.041629


### Saving to a csv file not to spend API calls repeatedly

free API calls are limited. Therefore, the dataframe is exported to a CSV for future uses

In [6]:
df.to_csv("bjk_kdky_LL.csv")

## Generating the map with borough centers

For a simple overview

In [59]:
import folium

#Show an initial map of the neighborhoods in Kadıköy and Beşiktaş
# create map of Kadıköy and Beşiktaş using first entries latitude and longitude values
map_ist = folium.Map(location=[df["lat"][0], df["lng"][0]], zoom_start=12)

# add markers to map
for lat, lng, borough in zip(df['lat'], df['lng'], df['Borough']):
    label = 'Istanbul, {}'.format(borough)

    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ist)  
    
map_ist

<h1>Getting more information about Neighborhoods</h1>

### Splitting Kadıköy and Beşiktaş

We need to split the data to find out the pioneer town for the purpose. We will check the creation dates of the venues to find out which has the venues first

In [8]:
#Filter Kadikoy data to only use boroughs in Kadikoy
kadikoy_df = df[df['Town'].str.contains('Kadikoy')]
print(kadikoy_df.shape)
#Filter Kadikoy data to only use boroughs in Kadikoy
besiktas_df = df[df['Town'].str.contains('Besi̇ktas')]
print(besiktas_df.shape)

(21, 7)
(23, 7)


<h3>Use Four Square API to Explore</h3>
Using the Foursquare API to explore the neighborhoods and segment them

In [14]:
CLIENT_ID = '[RESETTED]' 
CLIENT_SECRET = '[RESETTED]' # WILL BE RESETTED AFTER RUN
VERSION = '20180605' # Foursquare API version

### The Foursquare API call

First endpoint will be the Venue - Search of Foursquare

We will use :

- **100 Venues limit**
- **500 m radius around the coordinates of neighborhood**
- **_Gym / Fitness Center_ category on Foursquare**

#### Let's create a function to find venue information around the given coordinates

In [12]:

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100 #max numbers of venues
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}&categoryId=4bf58dd8d48988d175941735'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            lat,
            lng,
            VERSION,
            radius,
            LIMIT
        )
            
        # make the GET request
        results = requests.get(url).json()['response']['venues']

        # return only relevant information for each nearby venue
        for v in results:
            venues_list.append([(
                name, 
                lat, 
                lng,
                v['id'],
                v['name'], 
                v['location']['lat'], 
                v['location']['lng'],
                v['location']['distance'],
                v['categories'][0]['name'])])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood', 
                                  'Neighborhood Latitude', 
                                  'Neighborhood Longitude', 
                                  'Venue ID',
                                  'Venue', 
                                  'Venue Latitude', 
                                  'Venue Longitude',
                                  'Venue Distance',
                                  'Venue Category']
    
    return(nearby_venues)
    

#### Now write the code to run the above function on each neighborhood and create 2 new dataframse called *kadikoy_venues_df* and *besiktas_venues_df*.

In [13]:
#Get all kadikoy gym/studio venues
kadikoy_venues_df = getNearbyVenues(names=kadikoy_df['Borough'],
                                   latitudes=kadikoy_df['lat'],
                                   longitudes=kadikoy_df['lng']
                                  )

Bostanci mah
Caddebostan mah
Caferaga mah
Egi̇ti̇m mah
Hasanpasa mah
Erenkoy mah
Fenerbahce mah
Feneryolu mah
Zuhtupasa mah
Dumlupinar mah
Fi̇ki̇rtepe mah
Goztepe mah
Acibadem mah
Kosuyolu mah
Kozyatagi mah
Merdi̇venkoy mah
19 mayis mah
Osmanaga mah
Rasi̇mpasa mah
Sahrayicedi̇t mah
Suadi̇ye mah


In [14]:
#Get all besiktas gym/studio venues
besiktas_venues_df = getNearbyVenues(names=besiktas_df['Borough'],
                                   latitudes=besiktas_df['lat'],
                                   longitudes=besiktas_df['lng']
                                  )

Abbasaga mah
Ci̇hannuma mah
Si̇nanpasa mah
Akat mah
Arnavutkoy mah
Kurucesme mah
Bebek mah
Eti̇ler mah
Balmumcu mah
Di̇ki̇li̇tas mah
Gayrettepe mah
Yildiz mah
Kultur mah
Levazim mah
Ni̇sbeti̇ye mah
Ulus mah
Konaklar mah
Levent mah
Meci̇di̇ye mah
Ortakoy mah
Muradi̇ye mah
Turkali̇ mah
Vi̇snezade mah


#### Let's check the size of the resulting dataframes

In [15]:
print('{} venues are returned from API call'.format(len(besiktas_venues_df.index)))
print('{} venues are returned from API call'.format(len(kadikoy_venues_df.index)))
print('{} unique venues are returned from API call'.format(len(besiktas_venues_df['Venue ID'].unique())))
print('{} unique venues are returned from API call'.format(len(kadikoy_venues_df['Venue ID'].unique())))

758 venues are returned from API call
658 venues are returned from API call
504 unique venues are returned from API call
572 unique venues are returned from API call


#### As it can be seen above, there are duplicate venues for different neighborhoods 

With the following preprocessing, the duplicate venues will be dropped according to venue distance to the borough center. Only the closest one will be kept.

In [16]:
besiktas_venues_df=besiktas_venues_df.loc[besiktas_venues_df.groupby('Venue ID')['Venue Distance'].idxmin()]
kadikoy_venues_df=kadikoy_venues_df.loc[kadikoy_venues_df.groupby('Venue ID')['Venue Distance'].idxmin()]

In [17]:
print('{} venues listed in the besiktas_venues_df after the process'.format(len(besiktas_venues_df.index)))
print('{} venues listed in the kadikoy_venues_df after the process'.format(len(kadikoy_venues_df.index)))

504 venues listed in the besiktas_venues_df after the process
572 venues listed in the kadikoy_venues_df after the process


### Perfect, this matches with the unique venue check above

#### Let's check the category accuracy

In [20]:
print(kadikoy_venues_df['Venue Category'].unique())
print(besiktas_venues_df['Venue Category'].unique())

['Basketball Court' 'Gym' 'Gym / Fitness Center' 'Hotel' 'Track'
 'Sports Club' 'Yoga Studio' 'Daycare' 'Martial Arts Dojo' 'Gym Pool'
 'General Entertainment' 'College Gym' 'Mobile Phone Shop'
 'Athletics & Sports' 'Animal Shelter' 'Track Stadium' "Doctor's Office"
 'Pilates Studio' 'Gymnastics Gym' 'Cycle Studio' 'Housing Development'
 'Boxing Gym' 'Sporting Goods Shop' 'Social Club' 'Outdoor Gym'
 'Concert Hall' 'Restaurant' 'Weight Loss Center']
['Gym' 'Gym / Fitness Center' 'Gym Pool' 'Spa' 'Yoga Studio' 'Pool'
 'Martial Arts Dojo' 'Shopping Mall' 'Track' 'Climbing Gym' 'Building'
 'Athletics & Sports' 'Gymnastics Gym' 'Cycle Studio' 'Boxing Gym'
 'Pilates Studio' 'Social Club' 'Miscellaneous Shop' 'Weight Loss Center'
 'Event Service']


### Unfortunately, we do have unwanted categories

[According to the documentation of Foursquare](https://developer.foursquare.com/docs/build-with-foursquare/categories/), the category ID for **Gym / Fitness Center** is  _4bf58dd8d48988d175941735_ and it should return 

- Gym / Fitness Center
- Boxing Gym
- Climbing Gym
- Cycle Studio
- Gym Pool
- Gymnastics Gym
- Gym
- Martial Arts Dojo
- Outdoor Gym
- Pilates Studio
- Track
- Weight Loss Center
- Yoga Studio

We will remove the redundant categories

In [21]:
Category_list=['Gym / Fitness Center','Boxing Gym','Climbing Gym','Cycle Studio','Gym Pool','Gymnastics Gym','Gym','Martial Arts Dojo','Outdoor Gym','Pilates Studio','Track','Weight Loss Center','Yoga Studio']
kadikoy_venues_df=kadikoy_venues_df[kadikoy_venues_df['Venue Category'].isin(Category_list)]
besiktas_venues_df=besiktas_venues_df[besiktas_venues_df['Venue Category'].isin(Category_list)]
print(kadikoy_venues_df['Venue Category'].unique())
print(besiktas_venues_df['Venue Category'].unique())


['Gym' 'Gym / Fitness Center' 'Track' 'Yoga Studio' 'Martial Arts Dojo'
 'Gym Pool' 'Pilates Studio' 'Gymnastics Gym' 'Cycle Studio' 'Boxing Gym'
 'Outdoor Gym' 'Weight Loss Center']
['Gym' 'Gym / Fitness Center' 'Gym Pool' 'Yoga Studio' 'Martial Arts Dojo'
 'Track' 'Climbing Gym' 'Gymnastics Gym' 'Cycle Studio' 'Boxing Gym'
 'Pilates Studio' 'Weight Loss Center']


#### Let's check how many venues were returned for each neighborhood

In [22]:
besiktas_venues_df.groupby('Neighborhood').count().loc[:,['Venue']]

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Abbasaga mah,4
Akat mah,28
Arnavutkoy mah,5
Balmumcu mah,16
Bebek mah,25
Ci̇hannuma mah,17
Di̇ki̇li̇tas mah,42
Eti̇ler mah,24
Gayrettepe mah,43
Konaklar mah,10


In [23]:
kadikoy_venues_df.groupby('Neighborhood').count().loc[:,['Venue']]

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
19 mayis mah,36
Acibadem mah,20
Bostanci mah,30
Caddebostan mah,47
Caferaga mah,42
Dumlupinar mah,21
Egi̇ti̇m mah,7
Erenkoy mah,27
Fenerbahce mah,31
Feneryolu mah,24


### Let's store the information and use it from CSV from now on

- besiktas_venues_df will be stored as *"besiktas_venues_df.csv"*
- kadikoy_venues_df will be stored as *"kadikoy_venues_df.csv"*

In [47]:
besiktas_venues_df.to_csv("besiktas_venues_df.csv")
kadikoy_venues_df.to_csv("kadikoy_venues_df.csv")

In [2]:
# to read the csv again
# besiktas_venues_df= pd.read_csv("besiktas_venues_df.csv")
# kadikoy_venues_df= pd.read_csv("kadikoy_venues_df.csv")

## Now it is time to get details for each venue

We need to define a function to run the API for each venue and get the details.

In [3]:
print('{} venues exist in Besiktas'.format(len(besiktas_venues_df.index)))
print('{} venues exist in Kadıkoy'.format(len(kadikoy_venues_df.index)))

492 venues exist in Besiktas
553 venues exist in Kadıkoy


This is a Premium call, which is limited to 500 per day, therefore we will split the data collection into three days as there are *1045 Unique Venues* listed.

The function below needs to be run as long as the venue details are done. The results will be recorded to CSVs merged afterwards.

In [9]:
def getVenueDetails(venueId,start_index=0):
    
    venue_details=[]
    #needed for tracking purposes
    iter_number=start_index
    
    for vid in venueId[start_index:]:
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(
            vid,
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION
        )
            
        # make the GET request
        v = requests.get(url).json()
        
        # if the quote is reached return the dataframe and give the information
        if (v['meta']['code'] != 200):
            print('error occured on index {}. Use this number as start index for next batch'.format(iter_number))
            print(v['meta']['errorDetail']) if 'errorDetail' in v['meta'] else print('no error details')
            print('{} number of venues returned. The start index is {}'.format(start_index,iter_number))
            return(venue_details_df)
        
        # retur venue details
        venue_details.append([(
            vid, 
            v['response']['venue']['stats']['checkinsCount'] if ('checkinsCount' in v['response']['venue']['stats'] and 'stats' in v['response']['venue']) else 0,
            v['response']['venue']['stats']['usersCount'] if 'usersCount' in v['response']['venue']['stats'] and 'stats' in v['response']['venue'] else 0, #total users checked in
            v['response']['venue']['url'] if 'url' in v['response']['venue'] else "",
            v['response']['venue']['contact']['twitter'] if 'twitter' in v['response']['venue']['contact'] else "",
            v['response']['venue']['contact']['instagram'] if 'url' in v['response']['venue']['contact'] else "",
            v['response']['venue']['contact']['phone'] if 'phone' in v['response']['venue']['contact'] else "",
            v['response']['venue']['contact']['formattedPhone'] if 'formattedPhone' in v['response']['venue']['contact'] else "",
            v['response']['venue']['rating'] if 'rating' in v['response']['venue'] else 0,
            v['response']['venue']['description']if 'description' in v['response']['venue'] else "",
            v['response']['venue']['createdAt'],
            v['response']['venue']['tips']['count'] if 'count' in v['response']['venue']['tips'] and 'tips' in v['response']['venue'] else 0,
            v['response']['venue']['likes']['count'] if 'count' in v['response']['venue']['likes'] and 'likes' in v['response']['venue'] else 0)])

        venue_details_df = pd.DataFrame([item for venue_details in venue_details for item in venue_details])
        venue_details_df.columns = ['Venue Id',
                                 'Venue Checkins Count',
                                 'Venue Checkin User Count',
                                 'Venue Url',
                                 'Venue Twitter',
                                 'Venue Instagram',
                                 'Venue Phone',
                                 'Venue Formatted Phone',
                                 'Venue Rating',
                                 'Venue Description',
                                 'Venue Created',
                                 'Venue Tips Count',
                                 'Venue Likes Count'
                                 ]
        iter_number=iter_number+1
    
    print('last index {}'.format(iter_number))
    return(venue_details_df)
    

In [23]:
kadikoy_venue_details = getVenueDetails(kadikoy_venues_df['Venue ID'],start_index=0)

last index 553


#### Recording the details to temporary dbase

These will be run as long as it is necessary

In [16]:
besiktas_venue_details.to_csv("besiktas_venue_details.csv")

In [24]:
kadikoy_venue_details.to_csv("kadikoy_venue_details.csv")

### Merging all the Venue details into kadikoy_venues_df and besiktas_venues_df

In [39]:
kadikoy_venues_df.rename(columns={'Venue ID':'Venue Id'})
kadikoy_venues_df.head()

Unnamed: 0.1,Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Distance,Venue Category
0,36,Caddebostan mah,40.967927,29.061993,4bdfd536be5120a15ec2fe70,Powerfull Atlethic Club,40.967527,29.06494,251,Gym
1,588,Rasi̇mpasa mah,40.996066,29.027081,4be31013660ec9287609cb3b,Sports,41.0003,29.031012,575,Gym
2,254,Feneryolu mah,40.981957,29.048902,4be4544d2468c928419dfe42,Eaglestar Fitness and Body Form Club,40.978751,29.047985,365,Gym
3,504,Osmanaga mah,40.991432,29.027081,4bf78371bb5176b0c3385bb2,Sports Universe,40.988533,29.024474,390,Gym
4,89,Caferaga mah,40.98372,29.025626,4c62e4dbfa7bc928cc380f27,Airport Form Center,40.985363,29.021208,413,Gym / Fitness Center


In [None]:
kadikoy_venue_details = pd.concat([pd.read_csv("kadikoy_venue_details_v1.csv"),pd.read_csv("kadikoy_venue_details_v2.csv"),pd.read_csv("kadikoy_venue_details_v3.csv")],  ignore_index=True)
kadikoy_venue_details.set_index('Venue Id')
kadikoy_venues_df.rename(columns={'Venue ID':'Venue Id'},inplace=True)
kadikoy_venues_df = pd.merge(kadikoy_venues_df, kadikoy_venue_details, on='Venue Id')

In [42]:
besiktas_venue_details = pd.concat([pd.read_csv("besiktas_venue_details_v1.csv"),pd.read_csv("besiktas_venue_details_v2.csv")],  ignore_index=True)
besiktas_venue_details.set_index('Venue Id')
besiktas_venues_df.rename(columns={'Venue ID':'Venue Id'},inplace=True)
besiktas_venues_df = pd.merge(besiktas_venues_df, besiktas_venue_details, on='Venue Id')

### Let's check the shape and content

In [43]:
print(besiktas_venues_df.shape)
print(kadikoy_venues_df.shape)

(492, 23)
(553, 23)


In [45]:
besiktas_venues_df.head(15)

Unnamed: 0,Unnamed: 0_x,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue Id,Venue,Venue Latitude,Venue Longitude,Venue Distance,Venue Category,Unnamed: 0_y,Venue Checkins Count,Venue Checkin User Count,Venue Url,Venue Twitter,Venue Instagram,Venue Phone,Venue Formatted Phone,Venue Rating,Venue Description,Venue Created,Venue Tips Count,Venue Likes Count
0,64,Ci̇hannuma mah,41.047211,29.007439,4b66f145f964a52075312be3,Turkcell Plaza Fitness Center,41.048224,29.008715,155,Gym,0,0,0,,,,,,0.0,,1265037637,0,0
1,197,Bebek mah,41.077744,29.041629,4b703459f964a520830b2de3,Mars Athletic Club,41.075919,29.038232,350,Gym / Fitness Center,1,0,0,http://www.marsathletic.com,mars_athletic,,902122600000.0,+90 212 263 86 96,8.8,,1265644633,52,317
2,223,Eti̇ler mah,41.087042,29.037264,4b7aed0ef964a52089462fe3,Hillside City Club,41.087993,29.031091,528,Gym / Fitness Center,2,0,0,https://www.hillside.com.tr,hillsideleisure,,902123500000.0,+90 212 352 23 33,8.9,,1266347278,129,1088
3,200,Bebek mah,41.077744,29.041629,4ba67962f964a520265639e3,Celebrity Fitness,41.081342,29.03598,620,Gym,3,0,0,,,,,,0.0,,1269201250,0,0
4,413,Kultur mah,41.073202,29.0329,4bc89f2192b376b0bdcb513a,Maya sports center,41.076425,29.029158,476,Gym / Fitness Center,4,0,0,,,,,,0.0,,1271439137,1,0
5,137,Akat mah,41.085614,29.025626,4be97d20c5220f478ee2aaca,Club Sporium,41.086891,29.024003,196,Gym / Fitness Center,5,0,0,http://akatlar.clubsporium.com.tr,clubsporium,,902123900000.0,+90 212 385 49 10,8.4,Sporla dolu yaşam merkezi,1273593120,236,1135
6,401,Kultur mah,41.073202,29.0329,4c14c74682a3c9b66bbcfdf8,Mayadrom Sports Center(MSC),41.077281,29.032698,454,Gym,6,0,0,,,,,,6.3,,1276430150,24,12
7,739,Vi̇snezade mah,41.041503,28.998708,4c37666c93db0f4737b51f92,Mars Athletic Club,41.043765,28.992381,587,Gym / Fitness Center,7,0,0,http://www.marsathletic.com,mars_athletic,,902122300000.0,+90 212 232 44 40,8.6,,1278699116,151,786
8,628,Muradi̇ye mah,41.04845,28.998708,4c4ae6c79c8d2d7fe10e316a,İTÜ Havuz,41.046354,28.995922,330,Gym Pool,8,0,0,http://www.itu.edu.tr,itu1773,,,,7.9,,1279977159,6,37
9,48,Ci̇hannuma mah,41.047211,29.007439,4c51ab6b1c67ef3be50f1bb9,The Alaaddin Club,41.04726,29.007811,31,Gym / Fitness Center,9,0,0,,,,902123300000.0,+90 212 327 63 93,7.3,,1280420715,6,44


In [46]:
kadikoy_venues_df.head(15)

Unnamed: 0,Unnamed: 0_x,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue Id,Venue,Venue Latitude,Venue Longitude,Venue Distance,Venue Category,Unnamed: 0_y,Venue Checkins Count,Venue Checkin User Count,Venue Url,Venue Twitter,Venue Instagram,Venue Phone,Venue Formatted Phone,Venue Rating,Venue Description,Venue Created,Venue Tips Count,Venue Likes Count
0,36,Caddebostan mah,40.967927,29.061993,4bdfd536be5120a15ec2fe70,Powerfull Atlethic Club,40.967527,29.06494,251,Gym,0,0,0,http://caddebostan.powerfullclub.com,powerfullclub,,902163600000.0,+90 216 356 71 71,7.8,Caddebostan Kültür Merkezi’nde hizmet veren te...,1272960310,52,209
1,588,Rasi̇mpasa mah,40.996066,29.027081,4be31013660ec9287609cb3b,Sports,41.0003,29.031012,575,Gym,1,0,0,,,,,,0.0,,1273171987,0,0
2,254,Feneryolu mah,40.981957,29.048902,4be4544d2468c928419dfe42,Eaglestar Fitness and Body Form Club,40.978751,29.047985,365,Gym,2,0,0,,,,,,8.1,,1273254989,11,57
3,504,Osmanaga mah,40.991432,29.027081,4bf78371bb5176b0c3385bb2,Sports Universe,40.988533,29.024474,390,Gym,3,0,0,,,,,,8.7,,1274512241,62,264
4,89,Caferaga mah,40.98372,29.025626,4c62e4dbfa7bc928cc380f27,Airport Form Center,40.985363,29.021208,413,Gym / Fitness Center,4,0,0,http://www.airportmoda.com,,,902163500000.0,+90 216 345 58 46,8.1,,1281549531,22,78
5,67,Caddebostan mah,40.967927,29.061993,4caf3964562d224b092c0c88,Sports Time,40.964574,29.065999,502,Gym,5,0,0,,,,,,0.0,,1286551908,3,3
6,462,19 mayis mah,40.974626,29.088173,4cc444b382388cfae5b46d35,Gengym Health & Fitness,40.974221,29.090556,205,Gym,6,0,0,http://www.gengym.com,,,902164500000.0,+90 216 445 19 00,8.6,,1287931059,21,147
7,407,Kozyatagi mah,40.969147,29.095444,4cd7fdd2a99d370447c402ce,Max Gym Fitness Studio,40.97137,29.094729,254,Gym / Fitness Center,7,0,0,,,,,,8.4,,1289223634,27,149
8,215,Fenerbahce mah,40.974286,29.043083,4cdad42f6ad1a0931883e856,Anantara Spa,40.972723,29.043529,178,Gym / Fitness Center,8,0,0,,,,,,0.0,,1289409583,1,0
9,369,Kosuyolu mah,41.009087,29.038719,4ceeaeff13aea14357206f9f,Fashion Sports Club,41.010069,29.042069,301,Gym / Fitness Center,9,0,0,http://www.fashionsports.com.tr,sportsfashionn,,902163400000.0,+90 216 340 71 71,8.6,,1290710783,151,617


### Let's drop the unnecessary columns and set the index as *Venue Id*

**Unnamed: 0_x, Venue Checkins Count, Venue Checkin User Count, Venue Phone** are redundant.

In [47]:
kadikoy_venues_df.drop(['Unnamed: 0_x', 'Venue Checkins Count', 'Venue Checkin User Count', 'Venue Phone'], axis=1, inplace=True)
besiktas_venues_df.drop(['Unnamed: 0_x', 'Venue Checkins Count', 'Venue Checkin User Count', 'Venue Phone'], axis=1, inplace=True)

In [50]:
kadikoy_venues_df.set_index('Venue Id', inplace=True)
besiktas_venues_df.set_index('Venue Id', inplace=True)

### Let's record the last version for the data analysis

In [51]:
kadikoy_venues_df.to_csv('kadikoy_venues.csv')
besiktas_venues_df.to_csv('besiktas_venues.csv')