# Valencia data by districts to begin a business


July 20, 2021 \
By Luis Enrique Palma \
https://www.linkedin.com/in/luisenriquepalma

## 1. Introducctory Section

En este laboratorio usted aprenderá detalladamente a realizar llamadas a la API de FourSquare con distintos propositos. Aprenderá a construir una URL para enviar peticiones a la API para buscar lugares específicos, explorar un sitio en particular, un usuario de FourSquare o alguna referencia geografica, además, podrá obtener los lugares de modam alrededor de alguna locación. También aprederá a usar la librería de visualización Folium para ver los resultados.
Valencia is the third largest city in Spain, it is the capital of the province of Valencia which is located in the Valencian Community, this report will show relevant demographic data, analysis of interest and an ideal tool that will serve as support for those interested in starting a business in the city.

Valencia is located in the east of Spain, facing the Mediterranean Sea, it is crossed by the Turia river, the greatest contribution to its economy is by tourism and services, on a smaller scale there are industrial, construction and agriculture. Valencia capital is made up of 19 districts, which will be presented in this report according to the average characteristics of its habitants to have an adequate idea of what type of business will be more worth starting in each of the districts. 

In [1]:
import numpy as np # librería para manejar datos vectorizados

import pandas as pd # librería para análisis de datos
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # librería para manejar archivos JSON 

!conda install -c conda-forge geopy --yes # retirar el comentario de esta línea si no ha completado el laboratorio de la API de FourSquare 
from geopy.geocoders import Nominatim # convertir una dirección en valores de latitud y longitud

import requests # librería para manejar solicitudes
from pandas.io.json import json_normalize # librería para convertir un archivo json en un dataframe pandas

# Matplotlib y módulos asociados para graficar
import matplotlib.cm as cm
import matplotlib.colors as colors

# importar k-means desde la fase de agrupación
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # retirar el comentario de esta línea si no ha completado el laboratorio de la API de FourSquare
import folium # librería para graficar mapas 

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [2]:
!pip install geopandas
!pip install geopy
!pip install geocoder
!pip install folium



In [3]:
import requests
import json 
import geocoder
from geopandas.tools import geocode

In [4]:
DF = pd.read_csv('DistValencia.csv', header=0)
DF.tail() # Data taken from https://www.valencia.es/es/cas/estadistica/mapa-barrios 

Unnamed: 0,District
14,Rascanya
15,Benicalap
16,Pobles del Nord
17,Pobles de l’Oest
18,Pobles del Sud


In [5]:
districts=DF['District']
districts=districts+', Valencia, Spain'

In [6]:
districts

0         Ciutat Vella, Valencia, Spain
1           l’Eixample, Valencia, Spain
2            Extramurs, Valencia, Spain
3             Campanar, Valencia, Spain
4            la Saïdia, Valencia, Spain
5      el Pla del Real, Valencia, Spain
6          l’Olivereta, Valencia, Spain
7              Patraix, Valencia, Spain
8                Jesús, Valencia, Spain
9      Quatre Carreres, Valencia, Spain
10    Poblats Marítims, Valencia, Spain
11      Camins al Grau, Valencia, Spain
12             Algirós, Valencia, Spain
13          Benimaclet, Valencia, Spain
14            Rascanya, Valencia, Spain
15           Benicalap, Valencia, Spain
16     Pobles del Nord, Valencia, Spain
17    Pobles de l’Oest, Valencia, Spain
18      Pobles del Sud, Valencia, Spain
Name: District, dtype: object

In [None]:
LLn=[]
for i in districts:
    lat_long_many=geocoder.osm(i)
    LLn.append(lat_long_many.osm)

In [None]:
LLn[0]

In [None]:
LLn[0]['y']

In [None]:
# definir las columnas del datagrama
column_names = ['Latitude', 'Longitude'] 

# inicializar el dataframe
distr = pd.DataFrame(columns=column_names)

distr

In [None]:
column_names = ['Latitude', 'Longitude'] 

# inicializar el dataframe
distr = pd.DataFrame(columns=column_names)

for i in LLn:
    
    distr_lat = i['y']
    distr_lon = i['x']
    
    distr = distr.append({'Latitude': distr_lat,
                          'Longitude': distr_lon}, ignore_index=True)
    
distr

In [None]:
DF['Latitude']=distr['Latitude']

In [None]:
DF['Longitude']=distr['Longitude']

In [None]:
DF.tail()

In [None]:
#El promedio de las coordenadas de Latitud y Longitud para poder centrar correctamente en el mapa
avg_Latitude=sum(DF['Latitude'])/len(DF['Latitude'])
print('Average Latitude is :', avg_Latitude)

avg_Longitude=sum(DF['Longitude'])/len(DF['Longitude'])
print('Average Longitude is: ', avg_Longitude)

In [None]:
# crear un mapa de Toronto utilizando los valores de latitud y longitud promedios
map_valencia = folium.Map(location=[avg_Latitude, avg_Longitude], zoom_start=11)

# añadir marcadores al mapa
for lat, lng, District, in zip(DF['Latitude'], DF['Longitude'], DF['District']):
    label = '{}'.format(District)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_valencia)  
    
map_valencia

## 2. Metodology Section

The methodology is divided into two stages, the first consists of collecting relevant and current quantitative information with which to achieve greater precision in the estimates suggested or made with the help of the results. The second stage consists of data processing with the use of MS Excel, MS Word and Python with its different libraries to achieve the results.


### 2.1. Firts Stage: Data Minning 
Demographic and descriptive information was collected of the year 2020 of the inhabitants of Valencia by their districts through the official page of the Valencia City Council, Gross average income of 2019, from the official page of the Institute of Statistics of Spain, coordinates of their districts in Google Maps, information of current most popular businesses on Foursquare and other qualitative information on Wikipedia (not being scientific).
The Data Frame (DF) was built in MS Excel from the tables collected in the sources mentioned in the previous paragraph and the following table resulted:


In [None]:
df_val=pd.read_csv('data_val.csv')
df_val.head()

## 2.2. Second Stage: Data Processing
To process the data, the previous DF will be imported in Jupyter labs and with the use of Python language and its libraries such as pandas for Data frame manipulation ; the folium library for the creation of interactive, informative and choropletic maps; the json library for the extraction of coordinates through the Beautifulsoup library, geocoder, geopy and geopandas; the seaborn and matplotlib libraries for creating frequency and other statistical data graphs; the sklearn library for the K Nearest Neighbors implementation of the most popular sites in their districts collected by Foursquare developer account and other libraries will be included to process or complement the information.


### 2.2.1 Analyzing the data
First of all, the data was cleaned and the elements of number columns that was in string type were converted to float type, then the data was analyzed with info and describe methods to see a type and statistic summary respectively. See figure (1) and (2):


In [None]:
df_val.info()

In [None]:
df_val.describe()

These three last tables are all the same, this is very important to have an idea of the behavior of the data recollected and how it describes the generality of the habitants of Valencia.

### 2.2.2 Data visualization
As we know the table data can be hard to understand, in this unit some plots will be shown to have a better familiarity of the data resulted from the Valencian Districts. 
The first plot is the population, there are to plots to know which district have more density and Population per area.


In [None]:
from numpy import median
import seaborn as sns



In [None]:
by_population=df_val.sort_values('Population', ascending=False)

g = sns.barplot(x='District', y='Population', data=by_population, color='brown')
g.set_xticklabels(g.get_xticklabels(), rotation=90)
sns.set(rc = {'figure.figsize':(10,6)})
g.set_title('Population in Valencian Districs 2020')


In [None]:
#Mapa Coropletico

In [None]:
#Importar librerías
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import json

%matplotlib inline

In [None]:
#Importante convertir los distritos a mayuscula, asi estan en el archivo Json.
df_val['DISTRICT']=df_val['District'].apply(lambda x: x.upper())

In [None]:
df_val.head()

In [None]:
!wget --quiet http://mapas.valencia.es/lanzadera/opendata/Distritos/JSON?srsName=EPSG%3A4326 -O valencia_geo.json
    
#http://mapas.valencia.es/lanzadera/opendata/Distritos/JSON?srsName=EPSG%3A4326    
print('GeoJSON file downloaded!')

valencia_geo = r'valencia_geo.json'

#Valencia La, Lo
latitude = avg_Latitude
longitude = avg_Longitude


In [None]:

# display Valencia
valencia_map = folium.Map(location=[latitude, longitude], zoom_start=11)
valencia_map.choropleth(
    geo_data=valencia_geo,
    data=df_val,
    columns=['DISTRICT','Population'],
    key_on='feature.properties.nombre', #Asi esta en json
    fill_color='OrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Population of Valencian Districts',
    highlight=True
)
for lat, lng, District, in zip(DF['Latitude'], DF['Longitude'], DF['District']):
    label = '{}'.format(District)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3.5,
        popup=label,
        color='black',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7,
        parse_html=False).add_to(valencia_map)  
    



#display map
valencia_map

Now we know that Districts named: Quatre Carreres, Camins al Grau, Patraix, Poblats Maritims and Rascanya are the most populated, there are some businesses that need lots of people like supermarkets, fast food restaurants, schools and other. Let’s look what districts have the most and less gross income. 

In [None]:
by_income=df_val.sort_values('Average Gross income 2018', ascending=False)

g = sns.barplot(x='District', y='Average Gross income 2018', data=by_income, color='brown')
g.set_xticklabels(g.get_xticklabels(), rotation=90)
sns.set(rc = {'figure.figsize':(10,6)})
g.set_title('Average Gross Income (€) of habitants in Valencia by Districs 2018')

In [None]:

# display Valencia
valencia_map_income = folium.Map(location=[latitude-0.02, longitude+0.1], zoom_start=11)
valencia_map_income.choropleth(
    geo_data=valencia_geo,
    data=df_val,
    columns=['DISTRICT','Average Gross income 2018'],
    key_on='feature.properties.nombre',
    fill_color='OrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Average Gross Income of habitants of Valencian by Districts',
    highlight=True
)
for lat, lng, District, in zip(DF['Latitude'], DF['Longitude'], DF['District']):
    label = '{}'.format(District)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3.5,
        popup=label,
        color='black',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7,
        parse_html=False).add_to(valencia_map_income)  
    



#display map
valencia_map_income

In [None]:

# display Valencia
valencia_map_foreign = folium.Map(location=[latitude-0.02, longitude+0.1], zoom_start=11)
valencia_map_foreign.choropleth(
    geo_data=valencia_geo,
    data=df_val,
    columns=['DISTRICT','Foreigns'],
    key_on='feature.properties.nombre',
    fill_color='OrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Foreigns in Valencian Districts',
    highlight=True
)
for lat, lng, District, in zip(DF['Latitude'], DF['Longitude'], DF['District']):
    label = '{}'.format(District)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3.5,
        popup=label,
        color='black',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7,
        parse_html=False).add_to(valencia_map_foreign)  
    



#display map
valencia_map_foreign

Now we know that Districts named: el Pla del Real, l’Eixample and Ciutat Vella are the district which habitants have the greatest economic power in Valencia, as we can see they tend to be at the center of city. This can be relevant to begin deluxe business like gourmet restaurants, spa (it could be better near the beach), wine store and other. There could be some business related to vehicles, in the next plots, we can see the districts with the most vehicle fleet. It’s divided in vehicle and heavy vehicle fleet.

In [None]:
g3 = sns.lineplot(x='District', y='Tourism ', data=df_val, color='brown')
g3 = sns.lineplot(x='District', y='Motorcycle', data=df_val, color='blue')
g3 = sns.lineplot(x='District', y='Moped', data=df_val, color='green')

g3.set_xticklabels(g.get_xticklabels(), rotation=90)
sns.set(rc = {'figure.figsize':(10,6)})
g3.set_title('Vehicle Fleet in Valencian Districts')
g3.set(xlabel='Districts', ylabel='Vehicle Fleet')
g3.legend(title='Vehicle type', loc='upper left', labels=['Tourism ', 'Motorcycle', 'Moped'])

In [None]:
g2 = sns.lineplot(x='District', y='Bus', data=df_val, color='brown')
g2 = sns.lineplot(x='District', y='Tractor', data=df_val, color='blue')
g2 = sns.lineplot(x='District', y='Truck', data=df_val, color='green')
g2 = sns.lineplot(x='District', y='Trailer', data=df_val, color='orange')



g2.set_xticklabels(g.get_xticklabels(), rotation=90)
sns.set(rc = {'figure.figsize':(10,6)})
g2.set_title('Heavy Vehicle Fleet in Valencian Districts')
g2.set(xlabel='Districts', ylabel='Heavy Vehicle Fleet')
g2.legend(title='Heavy Vehicle type', loc='upper left', labels=['Bus', 'Tractor', 'Truck', 'Trailer'])


There are a lot of plots that could be shown, but I think there are the most relevant to see, to know some of Valencian districts inhabitants. 

### 2.2.3 Clustering the venues
Foursquare have a venue database, which helps to find the most popular venues in a city, using the Foursquare API, I set a Radius of 750, and limit to 100 venues. We obtain a table with 597 rows like we see in Table 4


In [None]:
address = 'Valencia, Spain'

geolocator = Nominatim(user_agent="val_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

In [None]:
CLIENT_ID = 'N5WAI035VYJB4GO0WNCF2M2NUJX5TVOJMSV4UIFWP3DQB3IH' # su ID de Foursquare
CLIENT_SECRET = 'JBTDUFWQ1J0P1WAK2KPUOQKB3HU1ESYMZEGRAIECZHJOKE0M' # Secreto de Foursquare
VERSION = '20180605' # versión de la API de Foursquare
LIMIT = 100 # Un valor límite para la API de Foursquare

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 750 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    avg_Latitude, 
    avg_Longitude, 
    radius, 
    LIMIT)
url # display URL


In [None]:
results = requests.get(url).json()
results


In [None]:
# función para extraer la categoria del sitio
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [None]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # objeto JSON

# filtrar columnas
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filtrar la categoría para cada fila
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# limpiar columnas
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

In [None]:
nearby_venues.head(15)


In [None]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

In [None]:
"""
Hasta aqui se toma en cuenta el Centro promedio de la ciudad de Valencia
"""


In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # crear la URL de solicitud de API
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # solicitud GET
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # regresa solo información relevante de cada sitio cercano
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
# 
valencia_venues = getNearbyVenues(names=DF['District'],
                                   latitudes=DF['Latitude'],
                                   longitudes=DF['Longitude']
                                  )

In [None]:
print(valencia_venues.shape)
valencia_venues.head(10)

To cluster the common venues, I will use K-mean, an unsupervised learning algorithm, starting with 2 k-clusters and using the elbow method to kind the optimal k-cluster, we’ll have the graphic in figure 8. It helps us to corroborate the correct assumed k-cluster.m

In [None]:
valencia_venues.groupby('District').count()

In [None]:
print('There are {} uniques categories.'.format(len(valencia_venues['Venue Category'].unique())))

In [None]:
# codificación para analizar cada Distrito
valencia_onehot = pd.get_dummies(valencia_venues[['Venue Category']], prefix="", prefix_sep="")

# añadir la columna de District de regreso al dataframe
valencia_onehot['District'] = valencia_venues['District'] 

# mover la columna de barrio a la primer columna
fixed_columns = [valencia_onehot.columns[-1]] + list(valencia_onehot.columns[:-1])
valencia_onehot = valencia_onehot[fixed_columns]

valencia_onehot.head()

In [None]:
valencia_onehot.shape

In [None]:
valencia_grouped = valencia_onehot.groupby('District').mean().reset_index()
valencia_grouped

In [None]:
valencia_grouped.shape

In [None]:
num_top_venues = 5

for hood in valencia_grouped['District']:
    print("----"+hood+"----")
    temp = valencia_grouped[valencia_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# crear las columnas acorde al numero de sitios populares
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# crear un nuevo dataframe
District_venues_sorted = pd.DataFrame(columns=columns)
District_venues_sorted['District'] = valencia_grouped['District']

for ind in np.arange(valencia_grouped.shape[0]):
    District_venues_sorted.iloc[ind, 1:] = return_most_common_venues(valencia_grouped.iloc[ind, :], num_top_venues)

District_venues_sorted.head(19)

In [None]:
#Agrupar por distritos por agrupaciones de 2

# establecer el número de agrupaciones
kclusters = 2

valencia_grouped_clustering = valencia_grouped.drop('District', 1)

# ejecutar k-means
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(valencia_grouped_clustering)

# revisar las etiquetas de las agrupaciones generadas para cada fila del dataframe
kmeans.labels_[0:10] 

In [None]:
from scipy.spatial.distance import cdist
import matplotlib.pyplot as plt

distortions = []
K = range(1,10)
for k in K:
    kmeanModel = KMeans(n_clusters=k, random_state=0).fit(valencia_grouped_clustering)
    #kmeanModel.fit(istanbul_grouped_clustering)
    distortions.append(sum(np.min(cdist(valencia_grouped_clustering, kmeanModel.cluster_centers_, 'canberra'), axis=1)) / valencia_grouped_clustering.shape[0])

#There are different metric distance function for spatial distance. 
#I choose correlation instaed of euclidean because the canberra function gives me more clear view of elbow break point.

# Plot the elbow
plt.plot(K, distortions, 'bx-')
plt.xlabel('k')
plt.ylabel('Distortion')
plt.title('The Elbow Method showing the optimal k')
plt.show()

In [None]:
# añadir etiquetas
District_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

valencia_merged = DF

# juntar valencia_grouped con valencia_data 
valencia_merged = valencia_merged.join(District_venues_sorted.set_index('District'), on='District')

valencia_merged.head() # revisar las ultimas columnas

In [None]:
count_venue = valencia_merged
count_venue = count_venue.drop(['District', 'Latitude', 'Longitude'], axis=1)
count_venue = count_venue.groupby(['Cluster Labels','1st Most Common Venue']).size().reset_index(name='Counts')

#we can transpose it to plot bar chart
cv_cluster = count_venue.pivot(index='Cluster Labels', columns='1st Most Common Venue', values='Counts')
cv_cluster = cv_cluster.fillna(0).astype(int).reset_index(drop=True)
cv_cluster

In [None]:
#creating a bar chart of "Number of Venues in Each Cluster"
frame=cv_cluster.plot(kind='bar',figsize=(20,8),width = 0.8)

plt.legend(labels=cv_cluster.columns,fontsize= 14)
plt.title("Number of Venues in Each Cluster",fontsize= 16)
plt.xticks(fontsize=14)
plt.xticks(rotation=0)
plt.xlabel('Number of Venue', fontsize=14)
plt.ylabel('Clusters', fontsize=14)

In [None]:
# crear mapa
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# establecer el esquema de color para las agrupaciones
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# añadir marcadores al mapa
markers_colors = []
for lat, lon, poi, cluster in zip(valencia_merged['Latitude'], valencia_merged['Longitude'], valencia_merged['District'], valencia_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
#Examinar agrupaciones

In [None]:
#Agrupacion #1
valencia_merged.loc[valencia_merged['Cluster Labels'] == 0, valencia_merged.columns[[0] + list(range(5, valencia_merged.shape[1]))]]

In [None]:
#Agrupacion #2
valencia_merged.loc[valencia_merged['Cluster Labels'] == 1, valencia_merged.columns[[0] + list(range(5, valencia_merged.shape[1]))]]

## 3.	Results Section
### 3.1 Foursquare to find the most popular venues in Valencian Districts
Now we can obtain Table 4, it gives an idea of what is working well per district. A particular visible characteristic is that all the districts less Pobles de l’Oest and l’Olivereta have restaurants as most popular venue in its district.  


In [None]:
District_venues_sorted.head(19)

## 4.	Discussion Section
Valencia as I mentioned at the beginning of this report, most economic activity is the service: tourism as well, which include restaurant, bars, plazas, beaches. It can be verifiable in the result plots and tables.  
The clustering algorithm used was k-mean, starting with k-cluster 2, and confirmed with the elbow method. 
The habitants with the greatest economic power use to live in the center districts, where are the historical places, it could be considered some luxor restaurants to take advantages of the lots of people doing tourism and the rich neighbors.


## 5.	Conclusions
Valencia’s most principles existing businesses are the restaurants, bars, cafeterias, plazas. There are a lot of national and international tourist that every year goes to enjoy the city. I recommend adding some qualitative data to future analysis. 


## 6.	References

Valencia Town Hall. (December de 2020). Obtenido de Ajuntament de Valencia: https://www.valencia.es/es/cas/estadistica/mapa-barrios
Enterat. (2021). Enterat. Obtenido de https://www.enterat.com/actualidad/habitantes-valencia.php
INE. (2020). Instituto Nacional de Estadística. Obtenido de https://ine.es
Wikipedia. (2021). Wikipedia.org. Obtenido de https://es.wikipedia.org/wiki/Valencia
Foursquare. (2021). Foursquare. Obtenido de https://foursquare.com
Google. (2021). Google maps. Obtenido de https://www.google.com/maps/place/Valencia,+Espa%C3%B1a/
Espana, D. d. (2021). Datos de gobierno de Espana. Obtenido de http://mapas.valencia.es/lanzadera/opendata/Distritos/JSON?srsName=EPSG%3A4326



## 7. Apendix

In [None]:
df_val.corr()


In [None]:
sns.regplot(x="Population", y="Foreigns", data=df_val)

In [None]:
sns.regplot(x="Women population", y="Tourism ", data=df_val)

In [None]:
by_age=df_val.sort_values('Age 20-39', ascending=False)

g4 = sns.lineplot(x='District', y='Age 0-19', data=by_age, color='yellow')
g4 = sns.lineplot(x='District', y='Age 20-39', data=by_age, color='orange')
g4 = sns.lineplot(x='District', y='Age 40-59', data=by_age, color='red')
g4 = sns.lineplot(x='District', y='Age 60 greater', data=by_age, color='brown')

g4.set_xticklabels(g.get_xticklabels(), rotation=90)
sns.set(rc = {'figure.figsize':(10,6)})
g4.set_title('Age of habitants in Valencian Districts')
g4.set(xlabel='Districts', ylabel='Age')
g4.legend(title='Vehicle type', loc='upper left', labels=['Age 0-19', 'Age 20-39', 'Age 40-59', 'Age 60 greater'])


July 20, 2021 \
By Luis Enrique Palma \
https://www.linkedin.com/in/luisenriquepalma