![Kayak](https://seekvectorlogo.com/wp-content/uploads/2018/01/kayak-vector-logo.png)

# Plan your trip with Kayak 

## Company's description 📇

<a href="https://www.kayak.com" target="_blank">Kayak</a> is a travel search engine that helps user plan their next trip at the best price.

The company was founded in 2004 by Steve Hafner & Paul M. English. After a few rounds of fundraising, Kayak was acquired by <a href="https://www.bookingholdings.com/" target="_blank">Booking Holdings</a> which now holds: 

* <a href="https://booking.com/" target="_blank">Booking.com</a>
* <a href="https://kayak.com/" target="_blank">Kayak</a>
* <a href="https://www.priceline.com/" target="_blank">Priceline</a>
* <a href="https://www.agoda.com/" target="_blank">Agoda</a>
* <a href="https://Rentalcars.com/" target="_blank">RentalCars</a>
* <a href="https://www.opentable.com/" target="_blank">OpenTable</a>

With over \$300 million revenue a year, Kayak operates in almost all countries and all languages to help their users book travels accros the globe. 

## Project 🚧

The marketing team needs help on a new project. After doing some user research, the team discovered that **70% of their users who are planning a trip would like to have more information about the destination they are going to**. 

In addition, user research shows that **people tend to be defiant about the information they are reading if they don't know the brand** which produced the content. 

Therefore, Kayak Marketing Team would like to create an application that will recommend where people should plan their next holidays. The application should be based on real data about:

* Weather 
* Hotels in the area 

The application should then be able to recommend the best destinations and hotels based on the above variables at any given time. 

## Goals 🎯

As the project has just started, your team doesn't have any data that can be used to create this application. Therefore, your job will be to: 

* Scrape data from destinations 
* Get weather data from each destination 
* Get hotels' info about each destination
* Store all the information above in a data lake
* Extract, transform and load cleaned data from your datalake to a data warehouse

## Scope of this project 🖼️

Marketing team wants to focus first on the best cities to travel to in France. According <a href="https://one-week-in.com/35-cities-to-visit-in-france/" target="_blank">One Week In.com</a> here are the top-35 cities to visit in France: 

```python 
["Mont Saint Michel",
"St Malo",
"Bayeux",
"Le Havre",
"Rouen",
"Paris",
"Amiens",
"Lille",
"Strasbourg",
"Chateau du Haut Koenigsbourg",
"Colmar",
"Eguisheim",
"Besancon",
"Dijon",
"Annecy",
"Grenoble",
"Lyon",
"Gorges du Verdon",
"Bormes les Mimosas",
"Cassis",
"Marseille",
"Aix en Provence",
"Avignon",
"Uzes",
"Nimes",
"Aigues Mortes",
"Saintes Maries de la mer",
"Collioure",
"Carcassonne",
"Ariege",
"Toulouse",
"Montauban",
"Biarritz",
"Bayonne",
"La Rochelle"]
```

Your team should focus **only on the above cities for your project**. 


## Helpers 🦮

To help you achieve this project, here are a few tips that should help you

### Get weather data with an API 

*   Use https://nominatim.org/ to get the gps coordinates of all the cities (no subscription required) Documentation : https://nominatim.org/release-docs/develop/api/Search/

*   Use https://openweathermap.org/appid (you have to subscribe to get a free apikey) and https://openweathermap.org/api/one-call-api to get some information about the weather for the 35 cities and put it in a DataFrame

*   Determine the list of cities where the weather will be the nicest within the next 7 days For example, you can use the values of daily.pop and daily.rain to compute the expected volume of rain within the next 7 days... But it's only an example, actually you can have different opinions on a what a nice weather would be like 😎 Maybe the most important criterion for you is the temperature or humidity, so feel free to change the rules !

*   Save all the results in a `.csv` file, you will use it later 😉 You can save all the informations that seem important to you ! Don't forget to save the name of the cities, and also to create a column containing a unique identifier (id) of each city (this is important for what's next in the project)

*   Use plotly to display the best destinations on a map

### Scrape Booking.com 

Since BookingHoldings doesn't have aggregated databases, it will be much faster to scrape data directly from booking.com 

You can scrap as many information asyou want, but we suggest that you get at least:

*   hotel name,
*   Url to its booking.com page,
*   Its coordinates: latitude and longitude
*   Score given by the website users
*   Text description of the hotel


### Create your data lake using S3 

Once you managed to build your dataset, you should store into S3 as a csv file. 

### ETL 

Once you uploaded your data onto S3, it will be better for the next data analysis team to extract clean data directly from a Data Warehouse. Therefore, create a SQL Database using AWS RDS, extract your data from S3 and store it in your newly created DB. 

## Deliverable 📬

To complete this project, your team should deliver:

* A `.csv` file in an S3 bucket containing enriched information about weather and hotels for each french city

* A SQL Database where we should be able to get the same cleaned data from S3 

* Two maps where you should have a Top-5 destinations and a Top-20 hotels in the area. You can use plotly or any other library to do so. It should look something like this: 

![Map](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/Kayak_best_destination_project.png)

In [1]:
# IMPORT librairies

import pandas as pd
import requests

In [2]:
# PREPARE city LIST for data collection from API

original_city_list = ["Mont Saint Michel",
"St Malo",
"Bayeux",
"Le Havre",
"Rouen",
"Paris",
"Amiens",
"Lille",
"Strasbourg",
"Chateau du Haut Koenigsbourg",
"Colmar",
"Eguisheim",
"Besancon",
"Dijon",
"Annecy",
"Grenoble",
"Lyon",
"Gorges du Verdon",
"Bormes les Mimosas",
"Cassis",
"Marseille",
"Aix en Provence",
"Avignon",
"Uzes",
"Nimes",
"Aigues Mortes",
"Saintes Maries de la mer",
"Collioure",
"Carcassonne",
"Ariege",
"Toulouse",
"Montauban",
"Biarritz",
"Bayonne",
"La Rochelle"]

In [3]:
original_city_list[0]

'Mont Saint Michel'

In [4]:
# CITY LIST to be AMENDED to manage spaces into anmes and being well interpretated by API

amended_city_list = []
for s in original_city_list:
    new_list = s.replace(" ", "+")
# Modify old string
    amended_city_list.append(new_list)


In [5]:
# TEST amended CITY LIST -> ok
len(amended_city_list)

35

In [6]:
# SELECT 1st element
amended_city_list

['Mont+Saint+Michel',
 'St+Malo',
 'Bayeux',
 'Le+Havre',
 'Rouen',
 'Paris',
 'Amiens',
 'Lille',
 'Strasbourg',
 'Chateau+du+Haut+Koenigsbourg',
 'Colmar',
 'Eguisheim',
 'Besancon',
 'Dijon',
 'Annecy',
 'Grenoble',
 'Lyon',
 'Gorges+du+Verdon',
 'Bormes+les+Mimosas',
 'Cassis',
 'Marseille',
 'Aix+en+Provence',
 'Avignon',
 'Uzes',
 'Nimes',
 'Aigues+Mortes',
 'Saintes+Maries+de+la+mer',
 'Collioure',
 'Carcassonne',
 'Ariege',
 'Toulouse',
 'Montauban',
 'Biarritz',
 'Bayonne',
 'La+Rochelle']

In [7]:
# PREPARE URL for API

url = ('https://nominatim.openstreetmap.org/search?q={}&format=json').format(amended_city_list[0])

In [8]:
# TEST url -> ok
# url

In [9]:
# GET values from API

r = requests.get(url)
soup = r.json()


In [10]:
# TEST json answer -> ok
soup

[{'place_id': 258697296,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'relation',
  'osm_id': 376823,
  'boundingbox': ['48.6119741', '48.637031', '-1.5495487', '-1.5094805'],
  'lat': '48.6355232',
  'lon': '-1.5102571',
  'display_name': 'Le Mont-Saint-Michel, Avranches, Manche, Normandie, France métropolitaine, 50170, France',
  'class': 'boundary',
  'type': 'administrative',
  'importance': 0.8512740929650575,
  'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_boundary_administrative.p.20.png'},
 {'place_id': 139469248,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'way',
  'osm_id': 211285890,
  'boundingbox': ['48.6349172', '48.637031', '-1.5133292', '-1.5094796'],
  'lat': '48.6359541',
  'lon': '-1.511459954959514',
  'display_name': 'Mont Saint-Michel, Chemin de Ronde Abbatial, Le Mont-Saint-Michel, Avranches, Manche, Normandie, France métropolitaine, 50

In [11]:
# check lat data collection -> OK
latitude = soup[0]["lat"]

# latitude

In [12]:
# check lon data collection 1st element -> OK
longitude = soup[0]["lon"]
# longitude

In [13]:
# EXTRACT data to load from API

# data_to_load = city + longitude + latitude
# data_to_load

In [14]:
# PREPAPRE data collection OUTPUT table

cities_geo_coordinates = pd.DataFrame(columns = ["place_id","city", "latitude", "longitude"])

# cities_geo_coordinates

In [15]:
# test OUTPUT table

cities_geo_coordinates

Unnamed: 0,place_id,city,latitude,longitude


In [16]:
# ASSEMBLE code

for c in range (0,len(amended_city_list)):
    
    #append city geocoordinates
    city = amended_city_list[c]
    url_1 = ('https://nominatim.openstreetmap.org/search?q={}&format=json&countrycodes=fr&dedupe=[1]').format(city)
    r_1 = requests.get(url_1)
    soup_1 = r_1.json()
    cities_geo_coordinates = cities_geo_coordinates.append({"place_id":soup_1[0]["place_id"], "city":amended_city_list[c], "latitude":soup_1[0]["lat"], "longitude":soup_1[0]["lon"]}, ignore_index=True)
#     if c == len(amended_city_list)+1:
#         print(cities_geo_coordinates)

In [17]:
cities_geo_coordinates

Unnamed: 0,place_id,city,latitude,longitude
0,258697296,Mont+Saint+Michel,48.6355232,-1.5102571
1,257985771,St+Malo,48.649518,-2.0260409
2,257654882,Bayeux,49.2764624,-0.7024738
3,256418097,Le+Havre,49.4938975,0.1079732
4,303984676,Rouen,49.4404591,1.0939658
5,111607,Paris,48.8566969,2.3514616
6,259023929,Amiens,49.8941708,2.2956951
7,256373580,Lille,50.6365654,3.0635282
8,258573835,Strasbourg,48.584614,7.7507127
9,106552831,Chateau+du+Haut+Koenigsbourg,48.249489800000006,7.34429620253195


In [18]:
soup_1

[{'place_id': 258418538,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'relation',
  'osm_id': 117858,
  'boundingbox': ['46.1331664', '46.1908971', '-1.2419231', '-1.111097'],
  'lat': '46.1591126',
  'lon': '-1.1520434',
  'display_name': 'La Rochelle, Charente-Maritime, Nouvelle-Aquitaine, France métropolitaine, 17000, France',
  'class': 'boundary',
  'type': 'administrative',
  'importance': 0.8014837096874572,
  'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_boundary_administrative.p.20.png'},
 {'place_id': 258998809,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'relation',
  'osm_id': 1215878,
  'boundingbox': ['47.7388479', '47.7647325', '5.7060641', '5.7454734'],
  'lat': '47.7470598',
  'lon': '5.7321403',
  'display_name': 'La Rochelle, Vesoul, Haute-Saône, Bourgogne-Franche-Comté, France métropolitaine, 70120, France',
  'class': 'boundary',
  'type'

In [19]:
cities_infos_detailled = pd.DataFrame(columns=['city_id','city','latitude','longitude',
                                     'current_temp_feels_like',"current_humidity",
                                     'J+1_temp_feels_like',
                                     'J+2_temp_feels_like',
                                     'J+3_temp_feels_like',
                                     'J+4_temp_feels_like',
                                     'J+5_temp_feels_like',
                                     'J+6_temp_feels_like',
                                     'J+7_temp_feels_like',
                                     'temp_mean'
                                     'J+1_humidity',
                                     'J+2_humidity',
                                     'J+3_humidity',
                                     'J+4_humidity',
                                     'J+5_humidity',
                                     'J+6_humidity',
                                     'J+7_humidity',
                                     'humidity_mean'])
cities_infos_detailled

Unnamed: 0,city_id,city,latitude,longitude,current_temp_feels_like,current_humidity,J+1_temp_feels_like,J+2_temp_feels_like,J+3_temp_feels_like,J+4_temp_feels_like,...,J+6_temp_feels_like,J+7_temp_feels_like,temp_meanJ+1_humidity,J+2_humidity,J+3_humidity,J+4_humidity,J+5_humidity,J+6_humidity,J+7_humidity,humidity_mean


In [20]:
# APPEND city needed infos

for c in range (0,len(cities_geo_coordinates)):
    
    city_id = cities_geo_coordinates.loc[c, "place_id"]
    latitude = cities_geo_coordinates.loc[c, "latitude"]
    longitude = cities_geo_coordinates.loc[c, "longitude"]
    api_key = "06c201b8b437d60655925748fa2efc34"
    url_2 = ("https://api.openweathermap.org/data/2.5/onecall?lat={}&lon={}&appid={}&exclude=hourly,minutely&units=metric").format(latitude,longitude,api_key)
    r_2 = requests.get(url_2)
    city_info = r_2.json()
    cities_infos_detailled = cities_infos_detailled.append({
                                        "city_id": city_id,
                                        "city": original_city_list[c],
                                        "latitude": latitude,
                                        "longitude": longitude,
                                        'current_temp_feels_like': city_info["current"]["feels_like"],
                                        'current_humidity': city_info["current"]["humidity"],
                                        'J+1_temp_feels_like': city_info["daily"][1]['feels_like']['day'],
                                        'J+2_temp_feels_like':city_info["daily"][2]['feels_like']['day'],
                                        'J+3_temp_feels_like':city_info["daily"][3]['feels_like']['day'],
                                        'J+4_temp_feels_like':city_info["daily"][4]['feels_like']['day'],
                                        'J+5_temp_feels_like':city_info["daily"][5]['feels_like']['day'],
                                        'J+6_temp_feels_like':city_info["daily"][6]['feels_like']['day'],
                                        'J+7_temp_feels_like':city_info["daily"][7]['feels_like']['day'],
                                        "J+1_humidity": city_info["daily"][1]['humidity'],
                                        "J+2_humidity": city_info["daily"][2]['humidity'],                                        
                                        "J+3_humidity": city_info["daily"][3]['humidity'],                                        
                                        "J+4_humidity": city_info["daily"][4]['humidity'],                                        
                                        "J+5_humidity": city_info["daily"][5]['humidity'],                                        
                                        "J+6_humidity": city_info["daily"][6]['humidity'],                                        
                                        "J+7_humidity": city_info["daily"][7]['humidity']
                                        },
                                        ignore_index=True)
    print(c, end=' ')


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 

In [21]:
 city_info = r_2.json()
# city_info

In [22]:
cities_infos_detailled.sort_values(by=['current_temp_feels_like','current_humidity'], ascending=False)

Unnamed: 0,city_id,city,latitude,longitude,current_temp_feels_like,current_humidity,J+1_temp_feels_like,J+2_temp_feels_like,J+3_temp_feels_like,J+4_temp_feels_like,...,J+7_temp_feels_like,temp_meanJ+1_humidity,J+2_humidity,J+3_humidity,J+4_humidity,J+5_humidity,J+6_humidity,J+7_humidity,humidity_mean,J+1_humidity
20,258706271,Marseille,43.2961743,5.3699525,27.31,48,23.16,23.83,26.48,24.12,...,23.07,,53,61,81,55,55,50,,59.0
26,258488432,Saintes Maries de la mer,43.4522771,4.4287172,26.96,47,22.14,23.6,23.42,22.81,...,21.37,,52,80,86,56,46,47,,67.0
25,258768268,Aigues Mortes,43.5658225,4.1912837,26.66,43,22.25,24.21,23.39,23.32,...,21.63,,49,75,79,53,40,40,,57.0
22,257880005,Avignon,43.9492493,4.8059012,26.38,45,24.64,26.19,29.05,25.03,...,22.26,,41,39,68,55,47,45,,44.0
21,258570202,Aix en Provence,43.5298424,5.4474738,26.27,44,24.23,26.24,28.28,26.27,...,24.81,,36,45,55,51,49,40,,42.0
19,258573465,Cassis,43.2140359,5.5396318,25.69,53,23.45,23.99,27.94,24.49,...,23.57,,54,57,78,55,54,50,,59.0
24,258697886,Nimes,43.8374249,4.3600687,25.33,30,24.62,26.61,26.99,24.06,...,22.81,,38,47,71,48,35,37,,43.0
23,258914515,Uzes,44.0121279,4.4196718,25.28,33,23.7,26.04,26.96,22.19,...,21.6,,40,44,82,53,39,41,,43.0
18,258573486,Bormes les Mimosas,43.1506968,6.3419285,25.05,65,23.19,23.87,24.1,23.36,...,23.86,,53,74,81,54,52,53,,58.0
28,255437880,Carcassonne,43.2130358,2.3491069,24.72,29,23.78,24.79,23.61,23.57,...,21.32,,44,77,56,45,45,35,,47.0


In [23]:
cities_infos_synthesis = cities_infos_detailled.loc[0:,["city"]]
cities_infos_synthesis

Unnamed: 0,city
0,Mont Saint Michel
1,St Malo
2,Bayeux
3,Le Havre
4,Rouen
5,Paris
6,Amiens
7,Lille
8,Strasbourg
9,Chateau du Haut Koenigsbourg


In [24]:
cities_infos_synthesis = pd.DataFrame(columns = ['city_id', 'city', 'latitude', 'longitude', 'temp_mean', 'humidity_mean'])
cities_infos_synthesis

Unnamed: 0,city_id,city,latitude,longitude,temp_mean,humidity_mean


In [25]:

for c in range(1,len(cities_infos_detailled)):
    cities_infos_synthesis = cities_infos_synthesis.append({"city_id":  cities_infos_detailled.loc[c,'city_id'],
                                                            "city": cities_infos_detailled.loc[c,'city'],
                                                            "latitude": cities_infos_detailled.loc[c,'latitude'],
                                                            "longitude": cities_infos_detailled.loc[c,'longitude'],
                                                            "temp_mean": cities_infos_detailled.loc[c,['current_temp_feels_like','J+1_temp_feels_like','J+2_temp_feels_like','J+3_temp_feels_like','J+4_temp_feels_like','J+5_temp_feels_like','J+6_temp_feels_like','J+7_temp_feels_like']].mean(),
                                                            "humidity_mean": cities_infos_detailled.loc[c,['current_humidity', 'J+1_humidity', 'J+2_humidity', 'J+3_humidity', 'J+4_humidity', 'J+5_humidity', 'J+6_humidity', 'J+7_humidity']].mean()
                                                           },
                                                           ignore_index=True)
    print(c, end=" ")
cities_infos_synthesis

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 

Unnamed: 0,city_id,city,latitude,longitude,temp_mean,humidity_mean
0,257985771,St Malo,48.649518,-2.0260409,17.38625,67.875
1,257654882,Bayeux,49.2764624,-0.7024738,18.3175,54.375
2,256418097,Le Havre,49.4938975,0.1079732,15.5,72.125
3,303984676,Rouen,49.4404591,1.0939658,19.27125,52.5
4,111607,Paris,48.8566969,2.3514616,19.84875,50.5
5,259023929,Amiens,49.8941708,2.2956951,19.0925,55.25
6,256373580,Lille,50.6365654,3.0635282,19.46125,52.25
7,258573835,Strasbourg,48.584614,7.7507127,19.01,56.25
8,106552831,Chateau du Haut Koenigsbourg,48.249489800000006,7.34429620253195,16.39125,56.25
9,258701048,Colmar,48.0777517,7.3579641,19.55,58.0


In [26]:


cities_infos_synthesis.sort_values(by=['temp_mean','humidity_mean'], ascending=False)

Unnamed: 0,city_id,city,latitude,longitude,temp_mean,humidity_mean
20,258570202,Aix en Provence,43.5298424,5.4474738,25.19625,45.25
21,257880005,Avignon,43.9492493,4.8059012,24.545,48.0
23,258697886,Nimes,43.8374249,4.3600687,24.45125,43.625
18,258573465,Cassis,43.2140359,5.5396318,24.44,57.5
19,258706271,Marseille,43.2961743,5.3699525,24.19375,57.75
26,258528870,Collioure,42.52505,3.0831554,23.91875,55.125
17,258573486,Bormes les Mimosas,43.1506968,6.3419285,23.8925,61.25
22,258914515,Uzes,44.0121279,4.4196718,23.57125,46.875
24,258768268,Aigues Mortes,43.5658225,4.1912837,23.3275,54.5
27,255437880,Carcassonne,43.2130358,2.3491069,23.12875,47.25


### and the winner is ... Carcassone because I prefer south west ;-)

In [27]:
cities_infos_synthesis.columns

Index(['city_id', 'city', 'latitude', 'longitude', 'temp_mean',
       'humidity_mean'],
      dtype='object')

### let's create a custom city ID in case of needed later on


In [28]:
cities_infos_synthesis['city_id_df']=''

In [29]:
cities_infos_synthesis

Unnamed: 0,city_id,city,latitude,longitude,temp_mean,humidity_mean,city_id_df
0,257985771,St Malo,48.649518,-2.0260409,17.38625,67.875,
1,257654882,Bayeux,49.2764624,-0.7024738,18.3175,54.375,
2,256418097,Le Havre,49.4938975,0.1079732,15.5,72.125,
3,303984676,Rouen,49.4404591,1.0939658,19.27125,52.5,
4,111607,Paris,48.8566969,2.3514616,19.84875,50.5,
5,259023929,Amiens,49.8941708,2.2956951,19.0925,55.25,
6,256373580,Lille,50.6365654,3.0635282,19.46125,52.25,
7,258573835,Strasbourg,48.584614,7.7507127,19.01,56.25,
8,106552831,Chateau du Haut Koenigsbourg,48.249489800000006,7.34429620253195,16.39125,56.25,
9,258701048,Colmar,48.0777517,7.3579641,19.55,58.0,


In [30]:
cities_infos_synthesis.index.values

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33])

In [31]:
# permet de créer une colonne avec les index

cities_infos_synthesis.reset_index(inplace=True)
cities_infos_synthesis

Unnamed: 0,index,city_id,city,latitude,longitude,temp_mean,humidity_mean,city_id_df
0,0,257985771,St Malo,48.649518,-2.0260409,17.38625,67.875,
1,1,257654882,Bayeux,49.2764624,-0.7024738,18.3175,54.375,
2,2,256418097,Le Havre,49.4938975,0.1079732,15.5,72.125,
3,3,303984676,Rouen,49.4404591,1.0939658,19.27125,52.5,
4,4,111607,Paris,48.8566969,2.3514616,19.84875,50.5,
5,5,259023929,Amiens,49.8941708,2.2956951,19.0925,55.25,
6,6,256373580,Lille,50.6365654,3.0635282,19.46125,52.25,
7,7,258573835,Strasbourg,48.584614,7.7507127,19.01,56.25,
8,8,106552831,Chateau du Haut Koenigsbourg,48.249489800000006,7.34429620253195,16.39125,56.25,
9,9,258701048,Colmar,48.0777517,7.3579641,19.55,58.0,


### CUSTOM CITY ID - contenation

In [32]:
type(cities_infos_synthesis)
cities_infos_synthesis['city_id_df'] = 'city_'+ cities_infos_synthesis["index"].astype(str)
# cities_infos_synthesis = cities_infos_synthesis[['city_id', 'city', 'latitude', 'longitude', 'temp_mean', 'humidity_mean']]
cities_infos_synthesis

Unnamed: 0,index,city_id,city,latitude,longitude,temp_mean,humidity_mean,city_id_df
0,0,257985771,St Malo,48.649518,-2.0260409,17.38625,67.875,city_0
1,1,257654882,Bayeux,49.2764624,-0.7024738,18.3175,54.375,city_1
2,2,256418097,Le Havre,49.4938975,0.1079732,15.5,72.125,city_2
3,3,303984676,Rouen,49.4404591,1.0939658,19.27125,52.5,city_3
4,4,111607,Paris,48.8566969,2.3514616,19.84875,50.5,city_4
5,5,259023929,Amiens,49.8941708,2.2956951,19.0925,55.25,city_5
6,6,256373580,Lille,50.6365654,3.0635282,19.46125,52.25,city_6
7,7,258573835,Strasbourg,48.584614,7.7507127,19.01,56.25,city_7
8,8,106552831,Chateau du Haut Koenigsbourg,48.249489800000006,7.34429620253195,16.39125,56.25,city_8
9,9,258701048,Colmar,48.0777517,7.3579641,19.55,58.0,city_9


In [33]:
type(cities_infos_synthesis)

pandas.core.frame.DataFrame

In [34]:
cities_infos_synthesis.columns = ['index_pa', 'city_id', 'city', 'latitude', 'longitude', 'temp_mean', 'humidity_mean', 'city_id_df']

In [35]:
cities_infos_synthesis

Unnamed: 0,index_pa,city_id,city,latitude,longitude,temp_mean,humidity_mean,city_id_df
0,0,257985771,St Malo,48.649518,-2.0260409,17.38625,67.875,city_0
1,1,257654882,Bayeux,49.2764624,-0.7024738,18.3175,54.375,city_1
2,2,256418097,Le Havre,49.4938975,0.1079732,15.5,72.125,city_2
3,3,303984676,Rouen,49.4404591,1.0939658,19.27125,52.5,city_3
4,4,111607,Paris,48.8566969,2.3514616,19.84875,50.5,city_4
5,5,259023929,Amiens,49.8941708,2.2956951,19.0925,55.25,city_5
6,6,256373580,Lille,50.6365654,3.0635282,19.46125,52.25,city_6
7,7,258573835,Strasbourg,48.584614,7.7507127,19.01,56.25,city_7
8,8,106552831,Chateau du Haut Koenigsbourg,48.249489800000006,7.34429620253195,16.39125,56.25,city_8
9,9,258701048,Colmar,48.0777517,7.3579641,19.55,58.0,city_9


In [36]:
# cities_infos_synthesis.pop('index_pa')
cities_infos_synthesis

Unnamed: 0,index_pa,city_id,city,latitude,longitude,temp_mean,humidity_mean,city_id_df
0,0,257985771,St Malo,48.649518,-2.0260409,17.38625,67.875,city_0
1,1,257654882,Bayeux,49.2764624,-0.7024738,18.3175,54.375,city_1
2,2,256418097,Le Havre,49.4938975,0.1079732,15.5,72.125,city_2
3,3,303984676,Rouen,49.4404591,1.0939658,19.27125,52.5,city_3
4,4,111607,Paris,48.8566969,2.3514616,19.84875,50.5,city_4
5,5,259023929,Amiens,49.8941708,2.2956951,19.0925,55.25,city_5
6,6,256373580,Lille,50.6365654,3.0635282,19.46125,52.25,city_6
7,7,258573835,Strasbourg,48.584614,7.7507127,19.01,56.25,city_7
8,8,106552831,Chateau du Haut Koenigsbourg,48.249489800000006,7.34429620253195,16.39125,56.25,city_8
9,9,258701048,Colmar,48.0777517,7.3579641,19.55,58.0,city_9


In [37]:
cities_infos_synthesis = cities_infos_synthesis[['city_id', 'city_id_df', 'city', 'latitude', 'longitude', 'temp_mean', 'humidity_mean']]
cities_infos_synthesis

Unnamed: 0,city_id,city_id_df,city,latitude,longitude,temp_mean,humidity_mean
0,257985771,city_0,St Malo,48.649518,-2.0260409,17.38625,67.875
1,257654882,city_1,Bayeux,49.2764624,-0.7024738,18.3175,54.375
2,256418097,city_2,Le Havre,49.4938975,0.1079732,15.5,72.125
3,303984676,city_3,Rouen,49.4404591,1.0939658,19.27125,52.5
4,111607,city_4,Paris,48.8566969,2.3514616,19.84875,50.5
5,259023929,city_5,Amiens,49.8941708,2.2956951,19.0925,55.25
6,256373580,city_6,Lille,50.6365654,3.0635282,19.46125,52.25
7,258573835,city_7,Strasbourg,48.584614,7.7507127,19.01,56.25
8,106552831,city_8,Chateau du Haut Koenigsbourg,48.249489800000006,7.34429620253195,16.39125,56.25
9,258701048,city_9,Colmar,48.0777517,7.3579641,19.55,58.0
