# **Introduction**
>*This Analysis of Paris Arrondissements, will include information pulled from the internet, including FourSquare Data to provide travelers, researchers, and other interested parties with neighborhood statistics such as density, population, landmarks, and locations.  Initially created to pull information on the 20 arrondissements of Paris, however data on the four quarters (Quartiers) of each arrondissementseemed to be more relevant and at a more detailed segment for neighborhood exploration at the venue level.

# **Data Requirements and Descriptions**

### **Install and Import Libraries**

>*Setup the initial environment, using pip, it may be necessary to install the below models before importing modules </br>(some may not be needed, depending on prior usage and installations)*

>*For this project, I have choosen the following libraries and modules which will be used throughout the analysis*

In [443]:
import bs4
import os
import csv
import json
import geopy
import folium
import requests
import geocoder
import descartes
import geopandas
import numpy as np
import pandas as pd
import geopandas as gpd
from geopy import geocoders
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
from geopy.geocoders import Bing
from geopy.geocoders import GoogleV3
from geopy.geocoders import Nominatim
from shapely.geometry import Point, Polygon

### 80 Parisian Neighborhoods or *'Quartiers'* within 20 Parisian *'Arrondissements'*.  


>As an initial starting point, we first scrape our list of neighborhoods from data within the website 'https://en.wikipedia.org/wiki/Quarters_of_Paris'.  
>
>This data will be the beginning of our neighborhood dataset.</br>
>
>*Use "requests.get" to grab the html which is then parsed with Pandas and stored in a list.*
>
>Once displayed we see 80 rows of data (4 quarters X 20 arrondissements), each with features detailing: </br>
>**Arrondissement Number** and 'called' **Name, Quarter Number, Quarter Name, Population**, and **Area** in hectares
>

In [444]:
url = 'https://en.wikipedia.org/wiki/Quarters_of_Paris'
data = requests.get(url)
df_list = pd.read_html(data.text)

df = df_list[0]
df.head()

Unnamed: 0,Arrondissement(Districts),Quartiers(Quarters),Quartiers(Quarters).1,Population in1999[3],Area(hectares)[3],Map
0,"1st arrondissement(Called ""du Louvre"")",1st,Saint-Germain-l'Auxerrois,1672,86.9,
1,"1st arrondissement(Called ""du Louvre"")",2nd,Les Halles,8984,41.2,
2,"1st arrondissement(Called ""du Louvre"")",3rd,Palais-Royal,3195,27.4,
3,"1st arrondissement(Called ""du Louvre"")",4th,Place-Vendôme,3044,26.9,
4,"2nd arrondissement(Called ""de la Bourse"")",5th,Gaillon,1345,18.8,


>I did not use BeautifulSoup, but it should produce similar results. 

>*Add two columns "Place" and "Arrondissement" in order to hold the split data from the first column "Arrondissement(Districts)"*

In [445]:
column_names = ['Arrondissements', 'Quartiers_num', 'Quartier', 'Population', 'Area', 'Map', 'Arrond',  'Aname', 'Place', 'Anum',  'Qnum']
df = df.set_axis(['Arrondissements', 'Quartiers_num', 'Quartier', 'Population', 'Area', 'Map'], axis=1, inplace=False)
P_df = pd.DataFrame(df, columns=column_names)

In [446]:
P_df['Arrond'] = df['Arrondissements'].str.split('Called').str[0]
P_df['Aname'] = P_df['Arrondissements'].str.split('Called').str[1]
P_df['Aname'] = P_df['Aname'].str.split('"').str[1]
P_df['Arrond'] = P_df['Arrondissements'].str.split('(').str[0]
P_df['Place'] = P_df['Arrondissements'].str.split('"').str[1]
P_df['Anum'] = P_df['Arrond'].str.split("s").str[0]
P_df['Anum'] = P_df['Anum'].str.split('n').str[0]
P_df['Anum'] = P_df['Anum'].str.split('t').str[0]
P_df['Anum'] = P_df['Anum'].str.split('r').str[0]
P_df['Qnum'] = P_df['Quartiers_num'].str.split('s').str[0]
P_df['Qnum'] = P_df['Qnum'].str.split('n').str[0]
P_df['Qnum'] = P_df['Qnum'].str.split('r').str[0]
P_df['Qnum'] = P_df['Qnum'].str.split('t').str[0]


In [447]:
P_df

Unnamed: 0,Arrondissements,Quartiers_num,Quartier,Population,Area,Map,Arrond,Aname,Place,Anum,Qnum
0,"1st arrondissement(Called ""du Louvre"")",1st,Saint-Germain-l'Auxerrois,1672,86.9,,1st arrondissement,du Louvre,du Louvre,1,1
1,"1st arrondissement(Called ""du Louvre"")",2nd,Les Halles,8984,41.2,,1st arrondissement,du Louvre,du Louvre,1,2
2,"1st arrondissement(Called ""du Louvre"")",3rd,Palais-Royal,3195,27.4,,1st arrondissement,du Louvre,du Louvre,1,3
3,"1st arrondissement(Called ""du Louvre"")",4th,Place-Vendôme,3044,26.9,,1st arrondissement,du Louvre,du Louvre,1,4
4,"2nd arrondissement(Called ""de la Bourse"")",5th,Gaillon,1345,18.8,,2nd arrondissement,de la Bourse,de la Bourse,2,5
5,"2nd arrondissement(Called ""de la Bourse"")",6th,Vivienne,2917,24.4,,2nd arrondissement,de la Bourse,de la Bourse,2,6
6,"2nd arrondissement(Called ""de la Bourse"")",7th,Mail,5783,27.8,,2nd arrondissement,de la Bourse,de la Bourse,2,7
7,"2nd arrondissement(Called ""de la Bourse"")",8th,Bonne-Nouvelle,9595,28.2,,2nd arrondissement,de la Bourse,de la Bourse,2,8
8,"3rd arrondissement(Called ""du Temple"")",9th,Arts-et-Métiers,9560,31.8,,3rd arrondissement,du Temple,du Temple,3,9
9,"3rd arrondissement(Called ""du Temple"")",10th,Enfants-Rouges,8562,27.2,,3rd arrondissement,du Temple,du Temple,3,10


>*Needing to generate an address for each Quartier in order to pull the geo-coordinates into our data so that data can be pulled from FourSquare to provide end users with the venue specific and street level information that they need*

#### **Generate the Postal Codes for each Arrondissement**
Postal codes are given for each arrondissement within Paris where the numbers 751 proceed the number of the arrondissement, for example the 10th Arrondissement of Canal St. Martin woould have a postal zip of 75110.   

>*The Postal Code of the Arrondissements follow the simple formula: Postcode = {75100 + "arrondissement number"}*

In [448]:
post = pd.DataFrame(columns=['Post', 'Quartiers', 'Postcode'])
post['Qnum'] = df.index.astype(float) + df.index 
post['Post'] = 75100 
post['Postcode'] = post['Post'].astype(float) + P_df['Anum'].astype(float)

>The data frame on the individual Quartiers is created with the following elements:
    
    Qrts (Quartiers Number)
    Quartier
    Pop (Population)
    Density (Population / Area)
    Latitude
    Longitude
    Postcode
    Arrnum (Arrondissement Number)
    Arrondissement
    City
    Country
    
    

In [449]:
arronds = pd.DataFrame(P_df, columns=['Quartier', 'Qrts', 'Anum', 'Arrondissement', 'City', 'Country', 'Postcode', 'Pop', 'Density', 'Latitude', 'Longitude'])
arronds['Arrondissement'] = P_df['Arrondissements']
arronds['Quartier'] = P_df['Quartier']
arronds['Qrts'] = P_df['Quartiers_num']
arronds['City'] = 'Paris'
arronds['Country'] = 'FR'
arronds['Pop'] = P_df['Population']
arronds.reset_index(drop=False, inplace=False)
arronds.head()

Unnamed: 0,Quartier,Qrts,Anum,Arrondissement,City,Country,Postcode,Pop,Density,Latitude,Longitude
0,Saint-Germain-l'Auxerrois,1st,1,"1st arrondissement(Called ""du Louvre"")",Paris,FR,,1672,,,
1,Les Halles,2nd,1,"1st arrondissement(Called ""du Louvre"")",Paris,FR,,8984,,,
2,Palais-Royal,3rd,1,"1st arrondissement(Called ""du Louvre"")",Paris,FR,,3195,,,
3,Place-Vendôme,4th,1,"1st arrondissement(Called ""du Louvre"")",Paris,FR,,3044,,,
4,Gaillon,5th,2,"2nd arrondissement(Called ""de la Bourse"")",Paris,FR,,1345,,,


In [450]:
arronds['Density']  = arronds['Pop'].astype(float) / P_df['Area'].astype(float)
arronds['Postcode'] = post['Postcode'].astype(str)
arronds['Postcode'] = arronds['Postcode'].str[:5]

>*addy DataFrame is created to house the address and location information*

In [451]:
column_names = ['Qrts', 'Qnum', 'Quartier', 'addressline1', 'addressline', 'town', 'IsoCode', 'Lat', 'Long', 'Error', 'formatted_address', 'location_type']
df_addy = pd.DataFrame(arronds.Quartier,  columns=column_names)
df_addy['Quartier'] = arronds['Quartier'].map(str)
df_addy['Qrts'] = arronds['Qrts']
df_addy['addressline1'] = arronds['Arrondissement'].map(str)
df_addy['town'] = arronds['City'].map(str) 
df_addy['state'] = arronds['Country'].map(str)
df_addy['IsoCode'] = arronds['Postcode']
def removeNonAscii(addy): return "".join(i for i in addy if ord(i)<126 and ord(i)>31)
df_addy['addressline'] = df_addy['addressline1'].str.split('Called').str[0]
df_addy['addressline'] = df_addy['addressline1'].str.split('(').str[0]
df_addy['Add'] = df_addy['Quartier'] + ', ' + df_addy['Qrts'] + ' Quartier' + ',  ' +  df_addy['addressline'] + ', ' + ' Paris, FR  ' + df_addy['IsoCode']
df_addy.to_csv('addresses.csv')

In [452]:
df_addy['coordinates'] = df_addy['Quartier'] + ' Paris, FR'

In [453]:
add = df_addy['coordinates']
addy = pd.DataFrame(df_addy, columns = ['Qnum', 'Qrts', 'Quartier', 'addressline', 'IsoCode', 'Add', 'coordinates'])
addy['Qnum'] = P_df['Qnum']
addy['Qrts'] = df_addy['Qrts']
addy['Quartier'] = df_addy['Quartier']
addy['coordinates'] = add

>*Use the geocoder to pull the longitude and latitude*

In [454]:
location= [x for x in addy['coordinates'].unique().tolist() 
            if type(x) == str]

latitude = []
longitude =  []
qrts = []

for i in range(0, len(location)):
    try:
        address = location[i]
        qrts = addy.Qrts[i]
        geolocator = Nominatim(user_agent="paris_explorer")
        loc = geolocator.geocode(address)
        latitude.append(loc.latitude)
        longitude.append(loc.longitude)
        print('Geo Coordinates: {}, {}, {}.'.format(loc.latitude, loc.longitude, qrts))
    except:
        latitude.append(np.nan)
        longitude.append(np.nan)

df_ = pd.DataFrame({'location':location, 
                    'location_qrts':qrts,
                    'location_latitude': latitude,
                    'location_longitude':longitude,
                    })

Geo Coordinates: 48.860211199999995, 2.3362988847682233, 1st.
Geo Coordinates: 48.8621801, 2.3458118, 2nd.
Geo Coordinates: 48.863584700000004, 2.3362042200938715, 3rd.
Geo Coordinates: 48.867463400000005, 2.329428116825194, 4th.
Geo Coordinates: 48.869135150000005, 2.332908770335507, 5th.
Geo Coordinates: 48.86885895, 2.3393625582679, 6th.
Geo Coordinates: 48.8680539, 2.344592949731121, 7th.
Geo Coordinates: 48.8706233, 2.3487498, 8th.
Geo Coordinates: 48.8654414, 2.3561316, 9th.
Geo Coordinates: 48.864240949999996, 2.3625854822185506, 10th.
Geo Coordinates: 48.859571349999996, 2.3625762007242033, 11th.
Geo Coordinates: 48.862699750000004, 2.354135471358302, 12th.
Geo Coordinates: 48.85845555, 2.3517023379560156, 13th.
Geo Coordinates: 48.8555813, 2.3583593578227955, 14th.
Geo Coordinates: 48.85157155, 2.364795174126021, 15th.
Geo Coordinates: 48.85293705, 2.3500501225000026, 16th.
Geo Coordinates: 48.84792605, 2.355269043333334, 17th.
Geo Coordinates: 48.8432224, 2.3595089570948424, 

# **Methodology** 

>Venue Availability and Popularity
>>*Utilizing the Foursquare API to pull data on establishments in various Paris neighborhoods, comparing both to each other and to the population to show availabiltiy and popularity of types of establishments in certain areas.*


In [455]:
addy['Lat'] = df_['location_latitude']
addy['Lon'] = df_['location_longitude']

>*data is pulled together into the DataFrame. "parcoord", for a more concise output*

In [456]:
parcoord = pd.DataFrame(arronds[['Quartier','Latitude', 'Longitude', 'Postcode']])
parcoord['Qrts'] = addy['Qrts']
parcoord['c_qu'] = addy['Qnum']
parcoord['num_arrond'] = arronds['Anum'].astype(str) +  'e'
parcoord['Quartier'] = arronds['Quartier']
parcoord['Postcode'] = addy['IsoCode']
parcoord['Address'] = addy['Add']
parcoord['Latitude'] = addy['Lat']
parcoord['Longitude'] = addy['Lon']
parcoord['Landmark'] = addy['Quartier']
parcoord['Density'] = arronds['Density']
parcoord['Population'] = arronds['Pop']
parcoord.head()                               

Unnamed: 0,Quartier,Latitude,Longitude,Postcode,Qrts,c_qu,num_arrond,Address,Landmark,Density,Population
0,Saint-Germain-l'Auxerrois,48.860211,2.336299,75101,1st,1,1e,"Saint-Germain-l'Auxerrois, 1st Quartier, 1st ...",Saint-Germain-l'Auxerrois,19.240506,1672
1,Les Halles,48.86218,2.345812,75101,2nd,2,1e,"Les Halles, 2nd Quartier, 1st arrondissement,...",Les Halles,218.058252,8984
2,Palais-Royal,48.863585,2.336204,75101,3rd,3,1e,"Palais-Royal, 3rd Quartier, 1st arrondissemen...",Palais-Royal,116.605839,3195
3,Place-Vendôme,48.867463,2.329428,75101,4th,4,1e,"Place-Vendôme, 4th Quartier, 1st arrondisseme...",Place-Vendôme,113.159851,3044
4,Gaillon,48.869135,2.332909,75102,5th,5,2e,"Gaillon, 5th Quartier, 2nd arrondissement, P...",Gaillon,71.542553,1345


>*In  order to establish the Paris area being analyzed, and plot the boundaries of our density areas, a map shapefile is downloaded to our repository and opened.   An appropriate file showing the administrative zones of Arrondissements and Quartiers in Paris is found thorugh the geo-file libraries at Stanford University.  Our later pulled FourSquare venue data based on our coordinatess pulled in by . 

In [457]:
import geopandas as gdp
fname = 'quartier_paris.geojson'
geo_data = gpd.read_file(fname)
geo_data.head(3)

Unnamed: 0,n_sq_qu,n_sq_ar,c_qu,surface,l_qu,perimetre,c_quinsee,c_ar,geometry
0,750000026,750000007,26,1073734.0,Invalides,4434.656489,7510702,7,"POLYGON ((2.31901 48.85174, 2.31903 48.85170, ..."
1,750000035,750000009,35,417335.1,Faubourg-Montmartre,2786.541926,7510903,9,"POLYGON ((2.34026 48.87660, 2.34228 48.87651, ..."
2,750000005,750000002,5,188012.2,Gaillon,1866.982041,7510201,2,"POLYGON ((2.33632 48.86797, 2.33587 48.86700, ..."


>*Only the "c_qu" and "geometry" columns are needed from our json shapefile to create the base and boundaries of the map*

In [458]:
geo_data = geo_data.sort_values(by=['c_qu'])
geo_data = geo_data[['c_qu', 'geometry']]
geo_data.head(3)

Unnamed: 0,c_qu,geometry
7,1,"POLYGON ((2.34459 48.85405, 2.34459 48.85405, ..."
30,2,"POLYGON ((2.34937 48.86058, 2.34822 48.85852, ..."
23,3,"POLYGON ((2.33947 48.86214, 2.33912 48.86148, ..."


>*From the former parcoord DataFrame,  we can merge information for our  map, including the already calculated Density (Population per hectare) figure and the Population*

In [459]:
parcoord.head(1)

Unnamed: 0,Quartier,Latitude,Longitude,Postcode,Qrts,c_qu,num_arrond,Address,Landmark,Density,Population
0,Saint-Germain-l'Auxerrois,48.860211,2.336299,75101,1st,1,1e,"Saint-Germain-l'Auxerrois, 1st Quartier, 1st ...",Saint-Germain-l'Auxerrois,19.240506,1672


In [460]:
column_names = ['c_qu', 'num_arrond', 'Quartier', 'Density', 'Population', 'Longitude', 'Latitude']
dens = pd.DataFrame(parcoord, columns = column_names)                 
dens.head(3)

Unnamed: 0,c_qu,num_arrond,Quartier,Density,Population,Longitude,Latitude
0,1,1e,Saint-Germain-l'Auxerrois,19.240506,1672,2.336299,48.860211
1,2,1e,Les Halles,218.058252,8984,2.345812,48.86218
2,3,1e,Palais-Royal,116.605839,3195,2.336204,48.863585


>*all map plotting data is pulled toogether in "paris_plotting"*

In [461]:
geo_data['c_qu'] = geo_data.c_qu.astype(int)
dens['c_qu'] = dens.c_qu.astype(int)
paris_plotting = pd.DataFrame(geo_data)
paris_plotting = geo_data.merge(dens, on = 'c_qu')
paris_plotting.head(2)

Unnamed: 0,c_qu,geometry,num_arrond,Quartier,Density,Population,Longitude,Latitude
0,1,"POLYGON ((2.34459 48.85405, 2.34459 48.85405, ...",1e,Saint-Germain-l'Auxerrois,19.240506,1672,2.336299,48.860211
1,2,"POLYGON ((2.34937 48.86058, 2.34822 48.85852, ...",1e,Les Halles,218.058252,8984,2.345812,48.86218


Normalize the density data to values between 0 and 1 

>*Density and Population data was normalized*

In [462]:
dens['Dens_norm'] = (dens['Density'] - min(dens.Density)) / (max(dens.Density)-min(dens.Density))
paris_plotting['Dens_norm'] = dens['Dens_norm']

dens['Pop_norm'] = (dens['Population'] - min(dens['Population'])) / (max(dens['Population'])-min(dens['Population']))
paris_plotting['Pop_norm'] = dens['Pop_norm']

dens = dens.sort_values(by=['Dens_norm'])
dens['c_qu'] = dens['c_qu'].astype(int)  
dens.head(3)

Unnamed: 0,c_qu,num_arrond,Quartier,Density,Population,Longitude,Latitude,Dens_norm,Pop_norm
52,53,14e,Montparnasse,16.492007,18570,2.328299,48.843475,0.0,0.213479
0,1,1e,Saint-Germain-l'Auxerrois,19.240506,1672,2.336299,48.860211,0.006274,0.004053
28,29,8e,Champs-Élysées,40.438212,4614,2.305331,48.870757,0.054662,0.040515


>*'paris_plot' is created to combine the population information and the geospacial data*

In [463]:
paris_plot = pd.DataFrame(paris_plotting, columns = ['c_qu', 'geometry', 'Quartier', 'Density', 'Dens_norm', 'Pop_norm', 'Population'])
paris_plot = paris_plot.sort_values(by=['Dens_norm'])
paris_plot.tail(3)

Unnamed: 0,c_qu,geometry,Quartier,Density,Dens_norm,Pop_norm,Population
76,77,"POLYGON ((2.38323 48.86710, 2.38314 48.86708, ...",Belleville,443.283767,0.97423,0.426686,35773
35,36,"POLYGON ((2.34971 48.88222, 2.34983 48.88109, ...",Rochechouart,443.353293,0.974389,0.258617,22212
40,41,"POLYGON ((2.37010 48.86376, 2.36690 48.86246, ...",Folie-Méricourt,454.573003,1.0,0.392343,33002


In [464]:
paris_plot.to_csv('paris_plot.csv')

>*Creating a colormap in atttempt to show density levels on a map*

>This will be used to give us a range of color codes one for each quartiers weight in population

In [465]:
from branca.colormap import linear

colormap = linear.GnBu_09.scale(
    paris_plot.Dens_norm.min(),
    paris_plot.Dens_norm.max())

print(colormap(1.0))

colormap

#084081ff


In [466]:
paris_plotcolor = paris_plot.set_index('c_qu')['Dens_norm']
paris_plotcolor[8]
color_dict = {key: colormap(paris_plotcolor[key]) for key in paris_plotcolor.keys()}
colorscale = pd.DataFrame.from_dict(color_dict, orient='index', columns=['colorscale'])
ppc = paris_plot.merge(colorscale, left_on=None, right_on=None, left_index=True,  right_index=True)
ppc = ppc.sort_values(by=['Dens_norm'])
ppc.head(1)

Unnamed: 0,c_qu,geometry,Quartier,Density,Dens_norm,Pop_norm,Population,colorscale
52,53,"POLYGON ((2.34159 48.83481, 2.34127 48.83312, ...",Montparnasse,16.492007,0.0,0.213479,18570,#55b7d1ff


>*Generating the map center point, this will be used for the basemap below (pmap)*

In [467]:
x_map=paris_plotting.Latitude[1]
y_map=paris_plotting.Longitude[2]
print(x_map,y_map)

48.8621801 2.3362042200938715


In [468]:
pmap = folium.Map(location=[x_map, y_map], zoom_start=12,tiles='OpenStreetMap')
pmap

>*the geo_data file is pulled from a downloaded geojson file containing the boundaries for the Paris quartiers*

In [570]:
geo_data=geopandas.read_file('quartier_paris.geojson')
with open('quartier_paris.geojson') as f:
    paris_dict = json.load(f)
    
geo_data=gpd.GeoDataFrame(geo_data)
geo_data['Population'] = round(ppc['Population'],2)
geo_data['Density'] = round(ppc['Density'],2)
geo_data = gpd.GeoDataFrame(geo_data , geometry = geo_data.geometry)
geo_data.crs={'init':'EPSG:4326'}
geo_data = geo_data.sort_values(by=['Density'])
geo_data['perimetre'] = round(geo_data['perimetre'], 0)

  return _prepare_from_string(" ".join(pjargs))


In [571]:
geo_data

Unnamed: 0,n_sq_qu,n_sq_ar,c_qu,surface,l_qu,perimetre,c_quinsee,c_ar,geometry,Population,Density
52,750000053,750000014,53,1.126205e+06,Montparnasse,4565.0,7511401,14,"POLYGON ((2.34159 48.83481, 2.34127 48.83312, ...",18570.0,16.49
28,750000036,750000009,36,5.004354e+05,Rochechouart,2862.0,7510904,9,"POLYGON ((2.34971 48.88222, 2.34983 48.88109, ...",4614.0,40.44
25,750000044,750000011,44,9.296092e+05,Sainte-Marguerite,4591.0,7511104,11,"POLYGON ((2.39624 48.85415, 2.39708 48.85308, ...",6276.0,58.44
33,750000030,750000008,30,7.965891e+05,Faubourg-du-Roule,3774.0,7510802,8,"POLYGON ((2.31197 48.86993, 2.31011 48.86898, ...",3488.0,64.24
4,750000014,750000004,14,4.220282e+05,Saint-Gervais,2678.0,7510402,4,"POLYGON ((2.36376 48.85568, 2.36294 48.85456, ...",1345.0,71.54
46,750000006,750000002,6,2.435508e+05,Vivienne,2058.0,7510202,2,"POLYGON ((2.34123 48.86580, 2.34118 48.86575, ...",13987.0,73.50
30,750000002,750000001,2,4.124585e+05,Halles,2606.0,7510102,1,"POLYGON ((2.34937 48.86058, 2.34822 48.85852, ...",6045.0,79.43
73,750000052,750000013,52,6.920677e+05,Croulebarbe,3289.0,7511304,13,"POLYGON ((2.35166 48.83678, 2.35176 48.83678, ...",24584.0,103.42
15,750000063,750000016,63,3.086718e+06,Porte-Dauphine,7447.0,7511603,16,"POLYGON ((2.27098 48.87877, 2.27749 48.87796, ...",4087.0,107.84
3,750000075,750000019,75,1.835720e+06,Amérique,6399.0,7511903,19,"POLYGON ((2.40940 48.88019, 2.40995 48.87952, ...",3044.0,113.16


In [572]:
b1=round(paris_plot.Pop_norm.max(),2)/4
b2=round(paris_plot.Population.max(),2)/2
b3=b1*3
b4=round(paris_plot.Population.max(),2)

>*Code for the paris map, with highlighting and a colorscale with legend*

In [573]:
bins = list(ppc['Dens_norm'].quantile([0,0.2,0.5,0.8,1]))

paris_map = folium.Map(location=[x_map, y_map], zoom_start=12)

tiles = ['stamenwatercolor', 'cartodbpositron', 'openstreetmap', 'stamenterrain']

for tile in tiles:
    folium.TileLayer(tile).add_to(paris_map)

choropleth = folium.Choropleth(      
    geo_data=geo_data,
    data=ppc,
    columns=['c_qu', 'Dens_norm'],
    key_on= 'feature.properties.c_qu',
    style_function = lambda colorscale: {
        'color': ppc.colorscale,
        'linecolor': 'black',
        'weight': 1,
        'fillOpacity': 0.1,
        'font-size': '10px',
        'font-weight': 'bold'
        },
    labels='l_qu',
    legend_name='Relative Population',
    bins=bins,
    highlight=True,
    reset=True
).add_to(paris_map)
    
folium.LayerControl().add_to(paris_map)

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(
        fields=['l_qu', 'c_qu', 'c_ar', 'Population', 'perimetre'],
        aliases=['Name', 'Qtr #', 'Arr #', 'Population', 'Perimeter']
        ))                                                                                                   
paris_map

The map above shows each Quartier, shaded based on the colorscale created from the normalized density data. Total Population and the Perimeter distance is displayed when hovered over.  Darker areas convey more crowded quartiers, even though population may be higher in lighter areas which have more area.  

In [574]:
CLIENT_ID = 'U4KD030Y4D1W4KHOW2TIKATHHUT4Z1NS2XR1T5FHYHTMCMPZ'
CLIENT_SECRET = 'AZUKDNM2EGIRD1PPR5DVQY114QXGWTAJUTN00Q5ABI0TBFF0' 
VERSION = '20180604'

LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)

Your credentails:
CLIENT_ID: U4KD030Y4D1W4KHOW2TIKATHHUT4Z1NS2XR1T5FHYHTMCMPZ


In [575]:
radius = 1000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(paris_plotting['Latitude'], paris_plotting['Longitude'], paris_plotting['Quartier']): 
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()['response']['groups'][0]['items']
    
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [576]:
venues_df = pd.DataFrame(venues)

venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
df_na = venues_df['Neighborhood'] != 'Na'
venues_df = venues_df[df_na]
print(venues_df.shape)
venues_df.head()

(7801, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Saint-Germain-l'Auxerrois,48.860211,2.336299,Musée du Louvre,48.860847,2.33644,Art Museum
1,Saint-Germain-l'Auxerrois,48.860211,2.336299,Cour Carrée du Louvre,48.86036,2.338543,Pedestrian Plaza
2,Saint-Germain-l'Auxerrois,48.860211,2.336299,La Vénus de Milo (Vénus de Milo),48.859943,2.337234,Exhibit
3,Saint-Germain-l'Auxerrois,48.860211,2.336299,Pont des Arts,48.858565,2.337635,Bridge
4,Saint-Germain-l'Auxerrois,48.860211,2.336299,Cour Napoléon,48.861172,2.335088,Plaza


Tables below display data on Venue Category 

In [577]:
venues_df.tail()

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
7796,Charonne,48.854744,2.385356,Ragazzi,48.850873,2.392455,Italian Restaurant
7797,Charonne,48.854744,2.385356,Miss Lunch,48.850258,2.376652,Restaurant
7798,Charonne,48.854744,2.385356,Café de la Danse,48.854095,2.373044,Music Venue
7799,Charonne,48.854744,2.385356,Mademoiselle Jeanne,48.855926,2.375477,Women's Store
7800,Charonne,48.854744,2.385356,Ethiopia,48.860833,2.38,Ethiopian Restaurant


In [578]:
venues_df.groupby(['Neighborhood']).mean().tail()


Unnamed: 0_level_0,Latitude,Longitude,VenueLatitude,VenueLongitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Sorbonne,48.849123,2.345325,48.849413,2.345893
Val-de-Grâce,48.842213,2.343882,48.843554,2.346202
Vivienne,48.868859,2.339363,48.868356,2.339159
École-Militaire,48.851848,2.304756,48.853709,2.304489
Épinettes,48.893751,2.319856,48.889612,2.32036


Count of Venues in each Quartier 

In [579]:
venues_df.groupby(['Neighborhood', 'VenueCategory']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude
Neighborhood,VenueCategory,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Amérique,Art Gallery,1,1,1,1,1
Amérique,Art Museum,1,1,1,1,1
Amérique,Arts & Entertainment,1,1,1,1,1
Amérique,Asian Restaurant,1,1,1,1,1
Amérique,Bakery,4,4,4,4,4
Amérique,Bar,6,6,6,6,6
Amérique,Bed & Breakfast,1,1,1,1,1
Amérique,Beer Garden,1,1,1,1,1
Amérique,Bistro,1,1,1,1,1
Amérique,Burger Joint,1,1,1,1,1


In [580]:
print('There are {} unique categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 295 unique categories.


In [581]:
#venues_df['VenueCategory'].unique()[:]

In [582]:
venues_df['Population'] = paris_plotting['Population']
venues_df

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory,Population
0,Saint-Germain-l'Auxerrois,48.860211,2.336299,Musée du Louvre,48.860847,2.336440,Art Museum,1672.0
1,Saint-Germain-l'Auxerrois,48.860211,2.336299,Cour Carrée du Louvre,48.860360,2.338543,Pedestrian Plaza,8984.0
2,Saint-Germain-l'Auxerrois,48.860211,2.336299,La Vénus de Milo (Vénus de Milo),48.859943,2.337234,Exhibit,3195.0
3,Saint-Germain-l'Auxerrois,48.860211,2.336299,Pont des Arts,48.858565,2.337635,Bridge,3044.0
4,Saint-Germain-l'Auxerrois,48.860211,2.336299,Cour Napoléon,48.861172,2.335088,Plaza,1345.0
5,Saint-Germain-l'Auxerrois,48.860211,2.336299,Vestige de la Forteresse du Louvre,48.861577,2.333508,Historic Site,2917.0
6,Saint-Germain-l'Auxerrois,48.860211,2.336299,Place du Palais Royal,48.862523,2.336688,Plaza,5783.0
7,Saint-Germain-l'Auxerrois,48.860211,2.336299,Place du Louvre,48.859841,2.340822,Plaza,9595.0
8,Saint-Germain-l'Auxerrois,48.860211,2.336299,Palais Royal,48.863236,2.337127,Historic Site,9560.0
9,Saint-Germain-l'Auxerrois,48.860211,2.336299,Comédie-Française,48.863088,2.336612,Theater,8562.0


In [583]:
label_list = {'Neighborhood', 'Latitude', 'Longitude', 'Population', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory'}
label_list = pd.DataFrame(label_list)
paris_hot = venues_df.groupby(venues_df['VenueCategory']).sum()
#parislist = parislist.reset_index(label_list)
print(paris_hot.shape)
paris_hot[1:100]

(295, 5)


Unnamed: 0_level_0,Latitude,Longitude,VenueLatitude,VenueLongitude,Population
VenueCategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghan Restaurant,97.718331,4.756002,97.724653,4.759999,0.0
African Restaurant,977.489479,47.228610,977.499224,47.198664,0.0
Alsatian Restaurant,293.157295,14.116191,293.149650,14.102284,0.0
American Restaurant,781.931240,37.466109,781.924194,37.448262,0.0
Arepa Restaurant,97.786184,4.679202,97.781968,4.679183,0.0
Argentinian Restaurant,488.680061,23.336139,488.671213,23.335432,0.0
Art Gallery,2834.038378,136.280517,2834.028182,136.283908,64797.0
Art Museum,3322.538343,158.585048,3322.517366,158.562485,47178.0
Arts & Crafts Store,244.202696,11.813672,244.200374,11.815173,0.0
Arts & Entertainment,48.882424,2.394025,48.879041,2.389791,0.0


>*Find the number of Cafes and Creperies and compare to the population density of each*

In [584]:
len(paris_hot.index['Creperie'] > 0)

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

In [None]:
len(parislist[parislist['Coffee Shop'] > 0])

In [None]:
len(paris_hot[paris_hot['Chinese Restaurant'] > 0])

In [None]:
paris_crepes = paris_hot[['Quartier','Creperie','Coffee Shop', 'Chinese Restaurant']]

In [None]:
paris_crepes_group = paris_crepes.groupby(['Quartier']).sum().reset_index()
paris_crepes_group.head()

# **Results**
>*From the data we can extract and compare the availabilty of various establishments.  Paris has a wide variety of establishments, usually clustering in specific neighborhoods.  In addition, Each neighboorhood seems to have a least one of each establishment*

In [None]:
par_d = pd.DataFrame(paris_hot_grouped, columns = ['Quartier', 'Creperies', 'Cafes', 'Chinese', 'Density (pop per hectare)'])
par_d['Quartier'] = parcoord['Quartier']
par_d['Creperies'] = paris_crepes_group['Creperie']
par_d['Cafes'] = paris_crepes_group['Coffee Shop']
par_d['Chinese'] = paris_crepes_group['Chinese Restaurant']
par_d['Density (pop per hectare)'] = parcoord['Density'].astype(int)
par_d

In [None]:
paris_hot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

paris_hot['Quartier'] = venues_df['Quartier'] 

fixed_columns = [paris_hot.columns[-1]] + list(paris_hot.columns[:-1])
paris_hot = paris_hot[fixed_columns]

print(paris_hot.shape)

paris_hot.head()

In [None]:
lower_left = [latlow, lonlow]
upper_right = [lathigh, lonhigh]
grid = get_geojson_grid(upper_right, lower_left , n=8)

for i, geo_json in enumerate(grid):

    color = plt.cm.Reds(i / len(grid))
    color = mpl.colors.to_hex(color)

    gj = folium.GeoJson(geo_json,
                        style_function=lambda feature, color=color: {
                            'fillColor': color,
                            'color':"black",
                            'weight': 2,
                            'dashArray': '5, 5',
                            'fillOpacity': 0.35,
                        })
    popup = folium.Popup("Quartier {}".format(i))
    gj.add_child(popup)

    
    m.add_child(gj)
m

In [None]:
latlow = min(df_.location_latitude) + .0025
lonlow = min(df_.location_longitude)  + .0025
lathigh = max(df_.location_latitude) + .0025
lonhigh = max(df_.location_longitude) + .0025

m = folium.Map(zoom_start = 5, location=[48.8, 2.30])

top_right = [lathigh, lonlow]
top_left = [latlow, lonhigh]

grid = get_geojson_grid(top_right, top_left, n=6)

popups = []
regional_counts = []

for box in grid:
    upper_right = box["properties"]["upper_right"]
    lower_left = box["properties"]["lower_left"]

    mask = (
        (lonhigh = upper_right[1]) & (latlow = lower_left[1]) &
        (lathigh = upper_right[0]) & (lonlow = lower_left[0])
           )

    region_density = len(df_[mask])
    regional_counts.append(region_density)

    total_pop = parcoord[mask].Population.sum()
    total_density = parcoord[mask].Density.sum()
    content = "total population {:,.0f}, total density {:,.0f}".format(total_pop, total_density)
    popup = folium.Popup(content)
    popups.append(popup)

worst_region = max(regional_counts)

for i, box in enumerate(grid):
    geo_json = json.dumps(box)

    color = plt.cm.Reds(regional_counts[i] / worst_region)
    color = mpl.colors.to_hex(color)

    gj = folium.GeoJson(geo_json,
                        style_function=lambda feature, color=color: {
                                                                        'fillColor': color,
                                                                        'color':"black",
                                                                        'weight': 2,
                                                                        'dashArray': '5, 5',
                                                                        'fillOpacity': 0.35,
                                                                    })

    gj.add_child(popups[i])
    m.add_child(gj)

locations = list(zip(df_.location_latitude, df_.location_longitude))
icons = [folium.Icon(icon="star", prefix="fa") for _ in range(len(locations))]

popup_content = []
for incident in pop_data.itertuples():
    Population = "Population: {} ".format(incident.Population)
    Density = "Density: {}".format(incident.Density)
    content = number_of_vehicles + number_of_casualties
    popup_content.append(content)

popups = [folium.Popup(content) for content in popup_content]

cluster = MarkerCluster(locations=locations, icons=icons, popups=popups)
m.add_child(cluster)

m.save("pop_paris.html")

*Create and push data into a new csv file*

In [None]:
location = pd.DataFrame() 
#location[1] = parcoord['Latitude'].values.tolist()
#location[0] = parcoord['Longitude'].values.tolist()
#location[1]

In [None]:
location.values[0:80]
location

In [None]:
paris_map = gpd.read_file('arrondissements.shp')
print(paris_map)

In [None]:
gdf.plot(ax=ax, color='orange')

plt.show()

In [None]:
df_.head()

*Create a Pandas DataFrame with the listed column headers as a more concise version*

In [None]:
label = []

p_map = folium.Map(zoom_start=12)

for row in df_.iterrows():
    folium.CircleMarker(
    #label = '{}'.format(name),
    label = folium.Popup(label, parse_html=True),
    location = [df_.location_latittude.values, df_.location_longitude.values],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc')

add(p_map)

p_map

In [None]:
venues_df = pd.DataFrame(venues)

venues_df.columns = ['Arrondissement', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
venues_df['Density'] = arronds['Density']
df_na = venues_df['Arrondissement'] != 'Na'
df_bar = venues_df['VenueCategory'] != 'Bar'
count_df = count(venues_df['VenueCategory'] / count(venue_df)
venues_df.count(['VenueCategory'])
venue['VenueConcentration'] = count[i]
venues_df = venues_df[df_na]
print(venues_df.shape)
venues_df.head()
venues_df.tail()

In [None]:

grouped_cat = venues_df. groupby(["VenueCategory", "Density"])
for key,item in grouped_cat:
    cat_group = grouped_cat.get_group(key)
print(cat_group, "\n")