![Kayak](https://seekvectorlogo.com/wp-content/uploads/2018/01/kayak-vector-logo.png)

# Plan your trip with Kayak

## Company's description 📇

<a href="https://www.kayak.com" target="_blank">Kayak</a> is a travel search engine that helps user plan their next trip at the best price.

The company was founded in 2004 by Steve Hafner & Paul M. English. After a few rounds of fundraising, Kayak was acquired by <a href="https://www.bookingholdings.com/" target="_blank">Booking Holdings</a> which now holds:

* <a href="https://booking.com/" target="_blank">Booking.com</a>
* <a href="https://kayak.com/" target="_blank">Kayak</a>
* <a href="https://www.priceline.com/" target="_blank">Priceline</a>
* <a href="https://www.agoda.com/" target="_blank">Agoda</a>
* <a href="https://Rentalcars.com/" target="_blank">RentalCars</a>
* <a href="https://www.opentable.com/" target="_blank">OpenTable</a>

With over \$300 million revenue a year, Kayak operates in almost all countries and all languages to help their users book travels accros the globe.

## Project 🚧

The marketing team needs help on a new project. After doing some user research, the team discovered that **70% of their users who are planning a trip would like to have more information about the destination they are going to**.

In addition, user research shows that **people tend to be defiant about the information they are reading if they don't know the brand** which produced the content.

Therefore, Kayak Marketing Team would like to create an application that will recommend where people should plan their next holidays. The application should be based on real data about:

* Weather
* Hotels in the area

The application should then be able to recommend the best destinations and hotels based on the above variables at any given time.

## Goals 🎯

As the project has just started, your team doesn't have any data that can be used to create this application. Therefore, your job will be to:

* Scrape data from destinations
* Get weather data from each destination
* Get hotels' info about each destination
* Store all the information above in a data lake
* Extract, transform and load cleaned data from your datalake to a data warehouse

## Scope of this project 🖼️

Marketing team wants to focus first on the best cities to travel to in France. According <a href="https://one-week-in.com/35-cities-to-visit-in-france/" target="_blank">One Week In.com</a> here are the top-35 cities to visit in France:

```python
["Mont Saint Michel", "St Malo", "Bayeux", "Le Havre", "Rouen", "Paris",
"Amiens", "Lille", "Strasbourg", "Chateau du Haut Koenigsbourg", "Colmar",
"Eguisheim", "Besancon", "Dijon", "Annecy", "Grenoble", "Lyon", "Gorges du Verdon",
"Bormes les Mimosas", "Cassis", "Marseille", "Aix en Provence", "Avignon", "Uzes",
"Nimes", "Aigues Mortes", "Saintes Maries de la mer", "Collioure", "Carcassonne",
"Ariege", "Toulouse", "Montauban", "Biarritz", "Bayonne", "La Rochelle"]
```

Your team should focus **only on the above cities for your project**.

## Deliverable 📬

To complete this project, your team should deliver:

* A `.csv` file in an S3 bucket containing enriched information about weather and hotels for each french city
* A SQL Database where we should be able to get the same cleaned data from S3
* Two maps where you should have a Top-5 destinations and a Top-20 hotels in the area.

### Import librairies

In [2]:
import requests
import pandas as pd

### Get weather data with API

In [1]:
cities = [
    "Mont Saint Michel", "St Malo", "Bayeux", "Le Havre", "Rouen", "Paris",
    "Amiens", "Lille", "Strasbourg", "Chateau du Haut Koenigsbourg", "Colmar",
    "Eguisheim", "Besancon", "Dijon", "Annecy", "Grenoble", "Lyon", "Gorges du Verdon",
    "Bormes les Mimosas", "Cassis", "Marseille", "Aix en Provence", "Avignon", "Uzes",
    "Nimes", "Aigues Mortes", "Saintes Maries de la mer", "Collioure", "Carcassonne",
    "Ariege", "Toulouse", "Montauban", "Biarritz", "Bayonne", "La Rochelle"
]

In [None]:
# data recovery
list_ = []
count_city = 0

for city in cities:
    count_city += 1
    url = "https://nominatim.openstreetmap.org/search?q=" + city + "&format=json&countrycodes=fr&limit=0"
    response = requests.get(url)
    lat = response.json()[0]["lat"]
    lon = response.json()[0]["lon"]
    
    dict_ = {
        "id": count_city,
        "city": city,
        "latitude": lat,
        "longitude": lon
    }
    
    list_.append(dict_)

# dataframe creation
dataset_nominatim = pd.DataFrame(list_)

# current temperature retrieval
list_ = []

for i in range(len(dataset_nominatim)):
    url = "https://api.openweathermap.org/data/2.5/onecall?lat=" + dataset_nominatim.latitude[i] + "&lon=" + dataset_nominatim.longitude[i] + "&exclude=minutely,hourly,alerts&appid=50c2f1c97c97d43a85cdbab142865185&units=metric"
    response = requests.get(url)
    temp = response.json()["current"]["temp"]
    
    list_.append(temp)

# update dataset
dataset_weather = dataset_nominatim.copy()
dataset_weather['temperature'] = list_

In [None]:
# copy dataset
dataset_weather2 = dataset_weather.copy()

In [None]:
# preparing the dataset for future display
dataset_weather2 = pd.concat([dataset_weather2]*8, ignore_index=True)
dataset_weather2 = dataset_weather2.sort_values("id", ascending=True)
dataset_weather2 = dataset_weather2.reset_index(drop=True)
dataset_weather2["day"] = ["day 1", "day 2", "day 3", "day 4", "day 5", "day 6", "day 7", "day 8"]*35

In [None]:
# retrieval of current temperatures and the next 7 days for each city
list_city = []

for i in range(len(dataset_weather2)):
    url = "https://api.openweathermap.org/data/2.5/onecall?lat=" + dataset_weather2.latitude[i] + "&lon=" + dataset_weather2.longitude[i] + "&exclude=minutely,hourly,alerts&appid=c75f7d44de1e2063f962807925d6dc7f&units=metric"
    response = requests.get(url)
    daily = response.json()["daily"]
    list_day = []
    
    for j in range(len(daily)):
        temp = daily[j]["feels_like"]["day"]
        humidity = daily[j]["humidity"]
        
        dict_ = {
            "feels_temp": temp,
            "humidity": humidity
        }
        
        list_day.append(dict_)
    
    list_city.append(list_day)

# transformation of the previous result as a list for the temperature
list_ = []

for ville in range(len(dataset_weather)):
    temp1 = list_city[ville]
    
    for day in range(8):
        temp2 = temp1[day]["feels_temp"]
        list_.append(temp2)

dataset_weather2["day_temperature"] = list_

# transformation of the previous result as a list for humidity
list_ = []

for ville in range(len(dataset_weather)):
    humidity1 = list_city[ville]
    
    for day in range(8):
        humidity2 = humidity1[day]["humidity"]
        list_.append(humidity2)

dataset_weather2["humidity"] = list_

In [None]:
# update dataset & export into CSV file
dataset_weather2["latitude"] = dataset_weather2["latitude"].astype(float)
dataset_weather2["longitude"] = dataset_weather2["longitude"].astype(float)
dataset_weather2.to_csv("src/weather_data.csv")