# Group Project: Data-driven Business Manager with APIs

#### Number of points: 30 (weights 30% in the final grade)
#### Deadline to form the groups: October 18th at 12:30 pm CET
#### Deadline for the code submission: October 24th at 01:29 pm CET
#### Presentations on October 24th

## Objective
In this project, you will create a Business that utilizes various APIs to make informed decisions about running your local business. If you want to sell drinks or street food or whatever floats your boat, do not hesitate to find your data and design the project accordingly. You will collect data from meteorological and non-meteorological APIs to help your business determine when, where, and how much inventory is needed to maximize sales. 

## Grading

In total, the group project counts for 30% of the final grade and represents 30 points.

The points are distributed in two parts: the code and the presentation.

- Some items e.g. Statistics are represented in both parts and will require to compute them in the code **and** present them during the presentation.
- Please note that the data must come from an API. You need to use at least 3 APIs (weather + two others). You can also of course use data downloaded from the internet, however they cannot replace the API data.
- Additionally, emphasis will be put on the **Storytelling** and whether or not the choice of APIs, data processing, statistics and visualisations are relevant for your business.
- To make grading easier, please provide **clean code** with **relevant comments** to make it straightforward what you are doing.
- Everyone in the group project must present during presentation day. **A penalty of -2 points** will be applied to each person **who does not present a significant part** during the presentation.


| **Code** | **15 points** |
| --- | --- |
| A. Collect data from weather API | 3 points |
| B. Collect data from two other non-meteorological APIs | 4 points |
| C. Data cleaning and processing | 3 points |
| D. Compute relevant statistics | 2 points |
| E. Clean and clear visualisations  | 3 points |


| **Presentation** | **15 points** |
| --- | --- |
| 1. Description of unique business idea | 1 point |
| 2. Presentation of all the APIs used and how it serves your business | 3 points |
| 3. Presentation of the data cleaning and processing | 2 points |
| 4. Presentation of the statistics | 2 points |
| 5. Presentation of the visualisations and how they serve the business| 3 points |
| 6. Storytelling | 4 points |

**Penalty: -2 points to each person who does not present a significant part during the presentation.**


**Penalty for unexcused absence or lateness**: 
- If you are absent or late on presentation day without an official excuse, you will receive 0 for the presentation part of the group project.
- If you are late without an official excuse and can still make it to the presentation of your team, you will still receive 0 for the presentation part of the group project.

## Getting Ready
#### Recommended deadline: October 8th
#### Deadline: October 18th at 12:30 pm CET

1. Form your group and select a group name. Communicate your group name to the teacher along with the First Name and Last Name of all the team members.

2. Create a branch on the **Students** repository with your group name (exactly the same as the one communicated to the teacher).

3. Discuss with your group and answer the following questions:

   - What kind of business do we run? What do we sell ? The choice of the business must be original and unique to your group.
   - How do we name our business?
   - When do we operate? Is it an all-year-round business or a seasonal one? If so, which seasons? Which months / weeks / days / hours of the day do we operate?
   - Where do we operate? In which countries / cities are we currently active ? Where do we want to develop in the future ? Determine where to set up your business stand based on weather conditions, local attractions, or events. The location should maximize customer traffic and sales.

In [16]:
import requests
import pandas as pd
import numpy as np
from geopy.distance import geodesic
import folium
import random

import os
from dotenv import load_dotenv

In [2]:
# Use double quotes to assign your API key to private_api_key variable as a string
# .env to prevent this
# Load environment variables from .env file
load_dotenv(dotenv_path="./data/.env")

# Access the private API key
private_api_key = os.getenv("METEOSTAT_PRIVATE_KEY")

## Code | A. Collect data from weather API | 3 points
#### Recommended deadline: October 18th
#### Deadline: October 24th at 01:29 pm CET
Use the OpenWeatherMap API to fetch weather data for your chosen location. You can select any city or location for your business.

- Fetch your chosen location's current temperature and weather conditions.
- Fetch the forecasted weather data for the next few days (e.g., five days).

In [3]:
# Define API details
api_url = "https://meteostat.p.rapidapi.com/point/daily"
api_key = private_api_key
headers = {
    'x-rapidapi-host': "meteostat.p.rapidapi.com",
    'x-rapidapi-key': api_key
}

In [4]:
# Parameters: Latitude, Longitude, Altitude, Start and End dates
params = {
    'lat': 52.5244,  # Latitude for Berlin
    'lon': 13.4105,  # Longitude for Berlin
    'alt': 43,       # Altitude in meters
    'start': '2020-01-01',  # Start date
    'end': '2020-12-31'     # End date
}

In [5]:
# Request data from the API
response = requests.get(api_url, headers=headers, params=params)

In [6]:
# Check the response status
if response.status_code == 200:
    weather_data = response.json()
    print("Weather data retrieved successfully!")
else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")


Weather data retrieved successfully!


## Code | B. Collect data from two other APIs | 4 points
#### Recommended deadline: October 18th
#### Deadline: October 24th at 01:29 pm CET
Integrate with **at least two** of the non-meteorological APIs you've learned about based on the location and the season.

It has to be with an API (not a downloaded dataset). 

You can of course use a downloaded dataset on top of the two APIs you've chosen.

You can also use more APIs, sky is the limit!

You can choose from:
- Google Maps,
- TripAdvisor,
- News API,
- Yelp,
- Wikipedia,
- Booking,
- Amadeus Travel API,
- Foursquare,
- etc. (make your own research and be original!)

Each API can provide different types of information. Pick the ones that best suit your application.


After collecting all the data you need, save them.

In [36]:
# load foursquare api key
foursquare_api_key = os.getenv("FOUR_SQUARE_API_KEY")

In [37]:
location = '52.5200,13.4050'  # Berlin coordinates
radius = 1000  # 1 km radius
places_url = f"https://api.foursquare.com/v3/places/search?ll={location}&radius={radius}"
headers = {'Authorization': foursquare_api_key}
response = requests.get(places_url, headers=headers)
places_data = response.json()

In [9]:
foursquare_api_key

'fsq3O6s9T3SbgrhkZQIvy3KYTDKlcf887L26FXt4pCV8av4='

In [10]:
# load Google API
google_api_key = os.getenv("GOOGLE_API_KEY")

In [11]:
origin = '52.5200,13.4050'  # Berlin
destination = '52.5150,13.3950'  # A destination in Berlin
directions_url = f'https://maps.googleapis.com/maps/api/directions/json?origin={origin}&destination={destination}&key={google_api_key}'
response = requests.get(directions_url)
directions_data = response.json()

In [12]:
# Extract the route
if directions_data['status'] == 'OK':
    routes = directions_data['routes']
    for route in routes:
        print(f"Route summary: {route['summary']}")

Route summary: B2


## Code | C. Data cleaning and processing | 3 points
#### Recommended deadline: from October 18th until October 21st
#### Deadline: October 24th at 01:29 pm CET

In order to make data-driven decisions, you will need to clean the collected data, fill missing values, merge datasets etc.

Take some time to clean and process the collected data so that you can use it.

Organize the dataset into a structured format, such as a CSV file, HTML file, EXCEL file, and a table where each row represents the achieved data.

In [17]:
# function to generate random points
def generate_random_points(n_points, vehicle_type):
    """
    Generate random points within a defined area around Berlin
    """
    lat_min, lat_max = 52.50, 52.55  # Latitude range for Berlin
    lon_min, lon_max = 13.35, 13.45  # Longitude range for Berlin
    
    data = {
        'latitude': np.random.uniform(lat_min, lat_max, n_points),
        'longitude': np.random.uniform(lon_min, lon_max, n_points),
        'vehicle_type': [vehicle_type] * n_points
    }
    
    df = pd.DataFrame(data)
    
    # Introduce some missing values to simulate real-world data
    missing_indices = random.sample(range(n_points), int(n_points * 0.1))  # 10% missing values
    df.loc[missing_indices, 'vehicle_type'] = np.nan  # Missing vehicle type
    df.loc[missing_indices[:int(len(missing_indices)/2)], 'latitude'] = np.nan  # Missing latitude
    
    return df

In [23]:
# Generate 100 random bike points and 100 random scooter points
bike_df = generate_random_points(100, 'bike')
scooter_df = generate_random_points(100, 'scooter')

# Combine both datasets
vehicle_df = pd.concat([bike_df, scooter_df])

# Introduce more randomness (shuffle the data)
vehicle_df = vehicle_df.sample(frac=1).reset_index(drop=True)

# Fill missing vehicle types with 'unknown'
vehicle_df['vehicle_type'] = vehicle_df['vehicle_type'].fillna('unknown')

# Fill missing latitude and longitude with the median of the existing values
vehicle_df['latitude'] = vehicle_df['latitude'].fillna(vehicle_df['latitude'].median())
vehicle_df['longitude'] = vehicle_df['longitude'].fillna(vehicle_df['longitude'].median())

# Save the cleaned data to a CSV file
vehicle_df.to_csv('./data/cleaned_vehicle_data.csv', index=False)

In [24]:
# Create a map centered around Berlin
berlin_map = folium.Map(location=[52.5200, 13.4050], zoom_start=12)

# Add the points to the map
for idx, row in vehicle_df.iterrows():
    # Choose different marker colors for bikes and scooters
    color = 'blue' if row['vehicle_type'] == 'bike' else 'red'
    folium.Marker([row['latitude'], row['longitude']],
                  popup=f"Type: {row['vehicle_type']}",
                  icon=folium.Icon(color=color)).add_to(berlin_map)

# Save the map to an HTML file
berlin_map.save('./data/berlin_vehicle_map.html')

## Code | D. Compute relevant statistics | 2 points
#### Recommended deadline: from October 18th until October 21st
#### Deadline: October 24th at 01:29 pm CET

Get together as a group and ask yourselves: what business questions would you like to answer? For example:

- On which days are there maximum customer traffic?
- On which days do we expect to make more sales?
- How much inventory should we get? Why?
- Which impact would the weather conditions, local attractions or events have on your business?
- How would you like to develop the business in the future?
    - Do you wish to expand to new locations?
    - Launch a new product?
    - Target more elderly or young people?
    - Target vegetarian or book-worm people?

Compute descriptive statistics that inform you about the future of your business and enable you to answer the business questions.|


## Code | E. Clean and clear visualisations  | 3 points
#### Deadline: October 24th at 01:29 pm CET

Create **at least 3 data visualisations** that clearly state your point and support your decision-making. 


**Presentation: present each data visualisation and integrate them in your storytelling. Explain why they are relevant for your decision-making.**

## Presentation | 15 points
#### Deadline: October 24th during class

> Make a presentation about your business, the data you've collected and the direction you're taking the business in the next months.

**Presentation | 1. Description of unique business idea | 1 point**

Summarise the name and choice of business as well as location and the time of year it operates (you can add some branding, logo, etc.)


**Presentation | 2. Presentation of all the APIs used and how it serves your business | 3 points**

Present each API and explain why the collected data is relevant for your business.


**Presentation | 3. Presentation of the data cleaning and processing | 2 points**

Explain the steps your team took in order to get to a clean and structured dataset.


**Presentation | 4. Presentation of the statistics  & 5. Data visualisations | 5 points**

Display the statistics and relevant data visualisations that helped you make informed decisions about your business. The descriptive statistics and visualisations enable you to draw conclusions that take your business in one or the other direction. You need to explain how this information serves your business and the next steps you will take.

**Presentation | 6. Storytelling | 4 points**

Why did you pick this business idea? Why this name?

Who is your target audience? What problem does it solve?

What decisions did you make to make your business thrive in the future? What are your current challenges? Opportunities?

Can Data save your business or make it expand to new territories?


Create a good story!

# Project

In [54]:
# Load the random vehicle location data
vehicle_df = pd.read_csv('cleaned_vehicle_data.csv')
vehicle_df.head()

Unnamed: 0,latitude,longitude,vehicle_type
0,52.534653,13.427584,scooter
1,52.522841,13.414398,unknown
2,52.507589,13.371446,scooter
3,52.503778,13.390401,bike
4,52.524235,13.406673,bike


In [31]:
# Load weather data to pandas dataframe
weather_df = pd.DataFrame(weather_data['data'])

# Convert 'date' to datetime format
weather_df['date'] = pd.to_datetime(weather_df['date'])

# Show the first few rows of the cleaned weather data
weather_df.head()

Unnamed: 0,date,tavg,tmin,tmax,prcp,snow,wdir,wspd,wpgt,pres,tsun
0,2020-01-01,2.7,-1.7,5.7,0.0,0.0,255.0,9.0,30.2,1033.2,90
1,2020-01-02,0.9,-3.8,6.7,0.0,0.0,195.0,6.1,20.2,1027.0,372
2,2020-01-03,4.6,1.5,7.2,4.3,0.0,213.0,15.1,54.4,1017.7,0
3,2020-01-04,4.2,1.6,5.8,5.7,0.0,280.0,22.7,62.6,1020.4,0
4,2020-01-05,1.8,-0.6,3.8,0.0,0.0,256.0,11.5,35.3,1032.7,102


In [51]:
# Load places data
# Convert Foursquare data into a DataFrame
foursquare_data = []
for place in places_data['results']:
    foursquare_data.append({
        'fsq_id': place['fsq_id'],
        'name': place['name'],
        'category': place['categories'][0]['name'],
        'latitude': place['geocodes']['main']['latitude'],
        'longitude': place['geocodes']['main']['longitude'],
        # Use .get() to handle missing 'address'
        'address': place['location'].get('address', 'No address available'),
        'locality': place['location'].get('locality', 'Unknown locality')
    })

places_df = pd.DataFrame(foursquare_data)
places_df.head()

Unnamed: 0,fsq_id,name,category,latitude,longitude,address,locality
0,5c336440a30619002c17d14a,Block House,Steakhouse,52.520591,13.405309,Karl-Liebknecht-Str. 7,Berlin
1,4d34488a2c76a143335383c7,Neptunbrunnen,Fountain,52.519583,13.406861,Karl-Liebknecht-Str. 8,Berlin
2,55509010498e1ed2ce2d90f0,Waffel oder Becher,Coffee Shop,52.52111,13.403922,Spandauer Str. 2,Berlin
3,4adcda7ef964a520c54721e3,Marienkirche,Church,52.520604,13.407174,Karl-Liebknecht-Str. 8,Berlin
4,4bebeeb16295c9b6b8388808,Marx-Engels-Forum,Park,52.518538,13.406295,No address available,Berlin


In [53]:
# google directions
# Request Directions
directions_url = f'https://maps.googleapis.com/maps/api/directions/json?origin={origin}&destination={destination}&key={google_api_key}'
response = requests.get(directions_url)
directions_data = response.json()

# Extract route and duration
if directions_data['status'] == 'OK':
    route = directions_data['routes'][0]['legs'][0]
    print(f"Route summary: {route['distance']['text']} in {route['duration']['text']}")
else:
    print(f"Error fetching directions: {directions_data['status']}")

Route summary: 1.1 km in 3 mins


## Merging datasets

In [55]:
def find_nearest_place(vehicle_lat, vehicle_lon, places_df):
    min_distance = float('inf')
    nearest_place = None
    
    for idx, place in places_df.iterrows():
        place_coords = (place['latitude'], place['longitude'])
        vehicle_coords = (vehicle_lat, vehicle_lon)
        
        # Calculate distance
        distance = geodesic(vehicle_coords, place_coords).meters
        
        if distance < min_distance:
            min_distance = distance
            nearest_place = place
            
    return nearest_place, min_distance

# Adding nearby places and distances to vehicle_df
vehicle_places_data = []
for idx, vehicle in vehicle_df.iterrows():
    nearest_place, distance = find_nearest_place(vehicle['latitude'], vehicle['longitude'], places_df)
    vehicle_places_data.append({
        'latitude': vehicle['latitude'],
        'longitude': vehicle['longitude'],
        'vehicle_type': vehicle['vehicle_type'],
        'nearest_place_name': nearest_place['name'] if nearest_place is not None else 'None',
        'place_category': nearest_place['category'] if nearest_place is not None else 'None',
        'distance_to_place': distance if nearest_place is not None else 'N/A'
    })

# Create a DataFrame with the results
vehicle_places_df = pd.DataFrame(vehicle_places_data)
vehicle_places_df.head()

Unnamed: 0,latitude,longitude,vehicle_type,nearest_place_name,place_category,distance_to_place
0,52.534653,13.427584,scooter,Wasserkaskaden am Fernsehturm,Fountain,2078.898631
1,52.522841,13.414398,unknown,Wasserkaskaden am Fernsehturm,Fountain,513.000014
2,52.507589,13.371446,scooter,Berlin Cathedral (Berliner Dom),Structure,2382.388907
3,52.503778,13.390401,bike,Berlin Cathedral (Berliner Dom),Structure,1850.674084
4,52.524235,13.406673,bike,Waffel oder Becher,Coffee Shop,394.690963


In [56]:
# Use today's weather for all vehicles
today_weather = weather_df.iloc[0]  # Assuming first row is today's weather

# Add weather info to each vehicle in the dataset
vehicle_places_df['tavg'] = today_weather['tavg']
vehicle_places_df['tmin'] = today_weather['tmin']
vehicle_places_df['tmax'] = today_weather['tmax']
vehicle_places_df['prcp'] = today_weather['prcp']
vehicle_places_df['wspd'] = today_weather['wspd']
vehicle_places_df['wpgt'] = today_weather['wpgt']
vehicle_places_df['tsun'] = today_weather['tsun']

In [58]:
vehicle_places_df.head()

Unnamed: 0,latitude,longitude,vehicle_type,nearest_place_name,place_category,distance_to_place,tavg,tmin,tmax,prcp,wspd,wpgt,tsun
0,52.534653,13.427584,scooter,Wasserkaskaden am Fernsehturm,Fountain,2078.898631,2.7,-1.7,5.7,0.0,9.0,30.2,90
1,52.522841,13.414398,unknown,Wasserkaskaden am Fernsehturm,Fountain,513.000014,2.7,-1.7,5.7,0.0,9.0,30.2,90
2,52.507589,13.371446,scooter,Berlin Cathedral (Berliner Dom),Structure,2382.388907,2.7,-1.7,5.7,0.0,9.0,30.2,90
3,52.503778,13.390401,bike,Berlin Cathedral (Berliner Dom),Structure,1850.674084,2.7,-1.7,5.7,0.0,9.0,30.2,90
4,52.524235,13.406673,bike,Waffel oder Becher,Coffee Shop,394.690963,2.7,-1.7,5.7,0.0,9.0,30.2,90


In [57]:
# Save the final DataFrame to a CSV file
vehicle_places_df.to_csv('./data/final_vehicle_places_weather_data.csv', index=False)