# Group Project: Data-driven Business Manager with APIs

#### Number of points: 30 (weights 30% in the final grade)
#### Deadline to form the groups: October 18th at 12:30 pm CET
#### Deadline for the code submission: October 24th at 01:29 pm CET
#### Presentations on October 24th

## Objective
In this project, you will create a Business that utilizes various APIs to make informed decisions about running your local business. If you want to sell drinks or street food or whatever floats your boat, do not hesitate to find your data and design the project accordingly. You will collect data from meteorological and non-meteorological APIs to help your business determine when, where, and how much inventory is needed to maximize sales. 

## Grading

In total, the group project counts for 30% of the final grade and represents 30 points.

The points are distributed in two parts: the code and the presentation.

- Some items e.g. Statistics are represented in both parts and will require to compute them in the code **and** present them during the presentation.
- Please note that the data must come from an API. You need to use at least 3 APIs (weather + two others). You can also of course use data downloaded from the internet, however they cannot replace the API data.
- Additionally, emphasis will be put on the **Storytelling** and whether or not the choice of APIs, data processing, statistics and visualisations are relevant for your business.
- To make grading easier, please provide **clean code** with **relevant comments** to make it straightforward what you are doing.
- Everyone in the group project must present during presentation day. **A penalty of -2 points** will be applied to each person **who does not present a significant part** during the presentation.


| **Code** | **15 points** |
| --- | --- |
| A. Collect data from weather API | 3 points |
| B. Collect data from two other non-meteorological APIs | 4 points |
| C. Data cleaning and processing | 3 points |
| D. Compute relevant statistics | 2 points |
| E. Clean and clear visualisations  | 3 points |


| **Presentation** | **15 points** |
| --- | --- |
| 1. Description of unique business idea | 1 point |
| 2. Presentation of all the APIs used and how it serves your business | 3 points |
| 3. Presentation of the data cleaning and processing | 2 points |
| 4. Presentation of the statistics | 2 points |
| 5. Presentation of the visualisations and how they serve the business| 3 points |
| 6. Storytelling | 4 points |

**Penalty: -2 points to each person who does not present a significant part during the presentation.**


**Penalty for unexcused absence or lateness**: 
- If you are absent or late on presentation day without an official excuse, you will receive 0 for the presentation part of the group project.
- If you are late without an official excuse and can still make it to the presentation of your team, you will still receive 0 for the presentation part of the group project.

## Getting Ready
#### Recommended deadline: October 8th
#### Deadline: October 18th at 12:30 pm CET

1. Form your group and select a group name. Communicate your group name to the teacher along with the First Name and Last Name of all the team members.

2. Create a branch on the **Students** repository with your group name (exactly the same as the one communicated to the teacher).

3. Discuss with your group and answer the following questions:

   - What kind of business do we run? What do we sell ? The choice of the business must be original and unique to your group.
   - How do we name our business?
   - When do we operate? Is it an all-year-round business or a seasonal one? If so, which seasons? Which months / weeks / days / hours of the day do we operate?
   - Where do we operate? In which countries / cities are we currently active ? Where do we want to develop in the future ? Determine where to set up your business stand based on weather conditions, local attractions, or events. The location should maximize customer traffic and sales.

## Code | A. Collect data from weather API | 3 points
#### Recommended deadline: October 18th
#### Deadline: October 24th at 01:29 pm CET
Use the OpenWeatherMap API to fetch weather data for your chosen location. You can select any city or location for your business.

- Fetch your chosen location's current temperature and weather conditions.
- Fetch the forecasted weather data for the next few days (e.g., five days).

In [3]:
import pandas as pd
!pip install requests
!pip install googlemaps
import requests
import time
from datetime import datetime, timedelta




In [92]:
#Done later after importing data from Ticketmaster API to get more precise insights




Current Temperature in Berlin: 16.83°C
Current Weather Conditions: clear sky

5-day Forecast:
2024-10-18: 16.53°C, overcast clouds
2024-10-19: 18.2°C, overcast clouds
2024-10-20: 17.46°C, overcast clouds
2024-10-21: 17.43°C, overcast clouds
2024-10-22: 16.59°C, clear sky


## Code | B. Collect data from two other APIs | 4 points
#### Recommended deadline: October 18th
#### Deadline: October 24th at 01:29 pm CET
Integrate with **at least two** of the non-meteorological APIs you've learned about based on the location and the season.

It has to be with an API (not a downloaded dataset). 

You can of course use a downloaded dataset on top of the two APIs you've chosen.

You can also use more APIs, sky is the limit!

You can choose from:
- Google Maps,
- TripAdvisor,
- News API,
- Yelp,
- Wikipedia,
- Booking,
- Amadeus Travel API,
- Foursquare,
- etc. (make your own research and be original!)

Each API can provide different types of information. Pick the ones that best suit your application.


After collecting all the data you need, save them.

In [4]:
# Ticketmaster API
!pip install ticketpy
!pip install folium
import ticketpy, folium



In [5]:
tm_client = ticketpy.ApiClient('')

pages = tm_client.events.find(
    country_code = 'DE',
    classification_name = 'music',
    start_date_time='2024-10-17T20:00:00Z',
    end_date_time='2024-11-17T20:00:00Z'
).all()


In [7]:
# Create List of Dictionaries of API data

event_data = []

# Loop through the events in the pages
for event in pages:
    # Access the venue object (assuming there's at least one venue)
    venue = event.venues[0]
    
    genre = None
    if event.classifications:
        classification = event.classifications[0]  # Access the first classification object

        # Check if the classification has a 'genre' attribute and extract it
        if hasattr(classification, 'genre') and classification.genre:
            genre = classification.genre.name 
    # Create a dictionary for the current event
    event_dict = {
        'Event Name': event.name,
        'Event Date': event.utc_datetime,
        'Venue Name': venue.name,
        'Venue Address': venue.address,
        'Venue Postalcode': venue.postal_code,
        'Venue Latitude': venue.latitude,
        'Venue Longitude': venue.longitude,
        'Venue City': venue.city,
        'Box Office': venue.dmas,
        'Event Genre': genre
    }
    
    # Add the event dictionary to the list
    event_data.append(event_dict)


venue_list = [item['Venue Address'] for item in event_data]



In [62]:
# Check data structure
for event in pages:
    print(f"Event: {event.name}")
    print(event.utc_datetime)
    # Access the venue object
    venue = event.venues[0]

    print(venue.address)
    print(venue.latitude)
    print(venue.longitude)

Event: ALBA BERLIN - Fenerbahce Istanbul | Box seat in the Ticketmaster Suite
2024-10-17 17:30:00
Uber-Platz 1
52.50573
13.44314
Event: Das perfekte Geheimnis
2024-10-17 17:30:00
Marlene-Dietrich-Platz 1
52.5076
13.37386
Event: Eisbären Berlin - Adler Mannheim | Logen-Seat
2024-10-18 17:30:00
Uber-Platz 1
52.50573
13.44314
Event: Das perfekte Geheimnis
2024-10-18 17:30:00
Marlene-Dietrich-Platz 1
52.5076
13.37386
Event: Baby Smith - EP II Tour 2024
2024-10-18 18:00:00
Am Wriezener Bahnhof
52.51098
13.44159
Event: Coults - F*CK COULTS EU TOUR
2024-10-18 18:00:00
Skalitzer Str. 134
52.49927
13.4205
Event: Das perfekte Geheimnis
2024-10-19 17:30:00
Marlene-Dietrich-Platz 1
52.5076
13.37386
Event: Lindsey Stirling - The Duality Tour | Box-Seat
2024-10-19 18:00:00
Uber-Platz 1
52.50573
13.44314
Event: Lindsey Stirling - The Duality Tour | All-In Package
2024-10-19 18:00:00
Uber-Platz 1
52.50573
13.44314
Event: Lindsey Stirling - The Duality Tour | Premium Seat
2024-10-19 18:00:00
Uber-Platz

In [103]:
import folium
import pandas as pd

In [148]:
df_all = pd.DataFrame()

venue_events = {}

for event in event_data:
    venue_lat = event['Venue Latitude']
    venue_lon = event['Venue Longitude']
    venue_key = (venue_lat, venue_lon)  # Use a tuple as the key for the venue
    
    # If the venue_key is not in the dictionary, initialize it with an empty list
    if venue_key not in venue_events:
        venue_events[venue_key] = []
    
    # Append the event details to the list for this venue
    venue_events[venue_key].append(event)

for venue, events in venue_events.items():
    venue_lat, venue_lon = venue

    df_new = pd.DataFrame(events)
    df_all = pd.concat([df_all, df_new], ignore_index=True)

df_all = df_all.drop(columns=['Venue Name', 'Box Office'])

#df_name = df_all[df_all['Venue Name'].notna()]
#df_name

df_clean_ticketmaster = df_all.dropna()
df_clean_ticketmaster = df_clean_ticketmaster[(df_clean_ticketmaster['Venue Latitude'] != 0) & (df_clean_ticketmaster['Venue Longitude'] != 0)]

type(df_clean_ticketmaster['Venue Latitude'][0])

df_clean_ticketmaster = df_clean_ticketmaster[(df_clean_ticketmaster['Venue Latitude'] != "0") & (df_clean_ticketmaster['Venue Longitude'] != "0")]
df_sorted_tmaster = df_clean_ticketmaster.sort_values(by='Event Date', ascending = True, ignore_index=True)
df_sorted_tmaster = df_sorted_tmaster.drop_duplicates(subset=['Event Date', 'Venue Address'], keep='first', ignore_index=True)

 
df_sorted_tmaster





Unnamed: 0,Event Name,Event Date,Venue Address,Venue Postalcode,Venue Latitude,Venue Longitude,Venue City,Event Genre
0,Baby Smith - EP II Tour 2024,2024-10-18 18:00:00,Am Wriezener Bahnhof,10243,52.51098,13.44159,Berlin,Alternative
1,Bob Dylan - Rough and Rowdy Ways,2024-10-18 18:00:00,Pfaffenwiese 301,65929,50.09914,8.51972,Frankfurt am Main,Rock
2,Coults - F*CK COULTS EU TOUR,2024-10-18 18:00:00,Skalitzer Str. 134,10999,52.49927,13.4205,Berlin,Hip-Hop/Rap
3,MEUTE,2024-10-18 18:00:00,Albersloher Weg 32,48155,51.94799,7.63785,Münster,Dance/Electronic
4,LUNA 25/8 Tour 2024,2024-10-18 18:00:00,Quellenstraße 7,70376,48.80811,9.20116,Stuttgart,Rock
...,...,...,...,...,...,...,...,...
127,Jacob Collier - DJESSE UK & EUROPE TOUR | VIP ...,2024-11-16 19:00:00,Kurt-Emmerich-Platz 10,21109,53.49572,10.00251,Hamburg,Rock
128,Elderbrook - Another Touch – European Tour 2024,2024-11-16 19:00:00,Lichtstraße 30,50825,50.94918,6.91012,Cologne,Dance/Electronic
129,Bob Vylan,2024-11-17 19:00:00,Luxemburger Straße 40,50674,50.89238,7.06362,Cologne,Rock
130,Joey Valence & Brae - The NO HANDS Tour,2024-11-17 19:00:00,Bartholomäus-Schink-Straße 65/67,50825,50.95181,6.91637,Cologne,Hip-Hop/Rap


In [94]:
cities_planned = df_clean_ticketmaster['Venue City'].unique()
cities_planned = cities_planned.tolist()
print(cities_planned)


['Berlin', 'Cloppenburg', 'Hamburg', 'Frankfurt am Main', 'Düsseldorf', 'Cologne', 'Munich', 'Lübeck', 'Wacken', 'Diepholz', 'Offenbach am Main', 'Nürnberg', 'Stuttgart', 'Halle (Westfalen)', 'Chemnitz', 'Oldenburg', 'Kempten', 'Bochum']


In [91]:

germany_map = folium.Map(location=[51.1657, 10.4515], zoom_start=6)  # Zoom level for Germany

#Loop through the filtered DataFrame `df_clean_ticketmaster` and add markers
for _, row in df_clean_ticketmaster.iterrows():
    # Extract relevant details from each row
    venue_lat = row['Venue Latitude']
    venue_lon = row['Venue Longitude']
    event_name = row['Event Name']
    event_date = row['Event Date']
    
    # Create a popup with event information (name and date)
    popup_content = f"{event_name} at {event_date}"

    #Add the marker to the map
    folium.Marker(
        location=[venue_lat, venue_lon],
        popup=folium.Popup(popup_content, max_width=300),
        tooltip=event_name
    ).add_to(germany_map)

germany_map.save("germany_events_map.html")

In [117]:
##Task A: Get OpenWeatherMapAPI Information

forecast_url = "http://api.openweathermap.org/data/2.5/forecast"
api_key = ""

# Function to fetch the weather forecast from OpenWeatherMap
def get_weather_forecast(lat, lon, event_date, api_key):
    # Call OpenWeatherMap API to get the forecast
    params = {
        'lat': lat,
        'lon': lon,
        'appid': api_key,
        'units': 'metric'
    }
    
    response = requests.get(forecast_url, params=params).json()

    # Check if the API returned the forecast list
    if 'list' not in response:
        return None

    # Iterate through forecast data to find the closest forecast to the event date/time
    for forecast in response['list']:
        forecast_time = datetime.strptime(forecast['dt_txt'], "%Y-%m-%d %H:%M:%S")

        # Find the closest forecast to the event date
        if abs((forecast_time - event_date).total_seconds()) < 3 * 3600:  # +/- 3 hours
            weather_description = forecast['weather'][0]['description']
            temp = forecast['main']['temp']
            return f"{temp}°C, {weather_description}"
    
    return None

# Add a new 'Weather Forecast' column
df_sorted_tmaster['Weather Forecast'] = None

# Current date/time
current_date = datetime.now()

# Loop through each event in the DataFrame
for index, row in df_sorted_tmaster.iterrows():
    event_date = row['Event Date']
    
    # Check if the event date is within the next 5 days
    if 0 <= (event_date - current_date).days <= 5:
        lat = row['Venue Latitude']
        lon = row['Venue Longitude']

        # Get the weather forecast for the event
        forecast = get_weather_forecast(lat, lon, event_date, api_key)

        # Update the DataFrame with the forecast
        df_sorted_tmaster.at[index, 'Weather Forecast'] = forecast
    else:
        df_sorted_tmaster.at[index, 'Weather Forecast'] = "No forecast available (event too far in future)"

# Print the updated DataFrame


In [123]:
df_sorted_tmaster[:23]
# Split the weather forecast into int(temperature) and string(weather comments)


Unnamed: 0,Event Name,Event Date,Venue Address,Venue Postalcode,Venue Latitude,Venue Longitude,Venue City
0,Baby Smith - EP II Tour 2024,2024-10-18 18:00:00,Am Wriezener Bahnhof,10243,52.51098,13.44159,Berlin
1,Bob Dylan - Rough and Rowdy Ways,2024-10-18 18:00:00,Pfaffenwiese 301,65929,50.09914,8.51972,Frankfurt am Main
2,Coults - F*CK COULTS EU TOUR,2024-10-18 18:00:00,Skalitzer Str. 134,10999,52.49927,13.4205,Berlin
3,MEUTE,2024-10-18 18:00:00,Albersloher Weg 32,48155,51.94799,7.63785,Münster
4,LUNA 25/8 Tour 2024,2024-10-18 18:00:00,Quellenstraße 7,70376,48.80811,9.20116,Stuttgart
5,Jazmin Bean - Traumatic Livelihood World Tour,2024-10-18 18:00:00,Luxemburger Straße 40,50674,50.89238,7.06362,Cologne
6,Beartooth,2024-10-18 18:00:00,Lilienthalallee 29,80939,48.1947,11.60784,Munich
7,The Driver Era: X Girlfriend Tour,2024-10-18 18:00:00,Atelierstraße 24,81671,48.12444,11.60731,Munich
8,Lindsey Stirling - The Duality Tour | Box seat,2024-10-18 18:00:00,Hellgrundweg 44,22525,53.58894,9.899,Hamburg
9,Voodoo Jürgens,2024-10-18 19:00:00,Neuer Kamp 30,20357,53.55789,9.9679,Hamburg


In [127]:
# Google Maps API
import googlemaps

df_sorted_tmaster = df_sorted_tmaster[:5]
api_key_google = ""
gmaps = googlemaps.Client(key=api_key_google)

def find_nearest_airport(lat, lon):
    places_result = gmaps.places_nearby(
        location=(lat, lon),
        radius=50000,  # 50 km search radius
        type='airport'
    )
    if places_result['results']:
        airport = places_result['results'][0]
        airport_name = airport['name']
        airport_lat = airport['geometry']['location']['lat']
        airport_lon = airport['geometry']['location']['lng']
        return airport_name, airport_lat, airport_lon
    else:
        return None, None, None

# Function to calculate the distance using the Google Maps Distance Matrix API
def calculate_distance_matrix(origins, destinations, mode='driving', departure_time=None):
    distance_matrix = gmaps.distance_matrix(
        origins, 
        destinations, 
        mode=mode, 
        departure_time=departure_time  # Incorporate departure time
    )
    if distance_matrix['rows'][0]['elements'][0]['status'] == 'OK':
        distance_km = distance_matrix['rows'][0]['elements'][0]['distance']['value'] / 1000  # in kilometers
        duration_minutes = distance_matrix['rows'][0]['elements'][0]['duration']['value'] / 60  # in minutes
        return distance_km, duration_minutes
    else:
        return None, None

# Adding distance and travel time columns to the DataFrame
df_sorted_tmaster['Nearest Airport'] = None
df_sorted_tmaster['Distance to Airport (km)'] = None
df_sorted_tmaster['Time to Airport (min)'] = None

# Loop through each event
for index, row in df_sorted_tmaster.iterrows():
    venue_lat = row['Venue Latitude']
    venue_lon = row['Venue Longitude']
    
    # Find the nearest airport using Google Places API
    airport_name, airport_lat, airport_lon = find_nearest_airport(venue_lat, venue_lon)
    
    if airport_name:
        # Calculate the departure time as the event date/time (Unix timestamp)
        event_date = pd.to_datetime(row['Event Date'])
        departure_time = int(time.mktime(event_date.timetuple()))  # Convert to Unix timestamp
        
        # Calculate distance and time to the nearest airport with departure time
        venue_coords = (venue_lat, venue_lon)
        airport_coords = (airport_lat, airport_lon)
        airport_distance_km, airport_duration_minutes = calculate_distance_matrix(
            venue_coords, airport_coords, mode='driving', departure_time=departure_time
        )
        
        df_sorted_tmaster.at[index, 'Nearest Airport'] = airport_name
        df_sorted_tmaster.at[index, 'Distance to Airport (km)'] = airport_distance_km
        df_sorted_tmaster.at[index, 'Time to Airport (min)'] = airport_duration_minutes

# Print the updated DataFrame
df_sorted_tmaster

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted_tmaster['Nearest Airport'] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted_tmaster['Distance to Airport (km)'] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted_tmaster['Time to Airport (min)'] = None


                         Event Name          Event Date         Venue Address  \
0      Baby Smith - EP II Tour 2024 2024-10-18 18:00:00  Am Wriezener Bahnhof   
1  Bob Dylan - Rough and Rowdy Ways 2024-10-18 18:00:00      Pfaffenwiese 301   
2      Coults - F*CK COULTS EU TOUR 2024-10-18 18:00:00    Skalitzer Str. 134   
3                             MEUTE 2024-10-18 18:00:00    Albersloher Weg 32   
4               LUNA 25/8 Tour 2024 2024-10-18 18:00:00       Quellenstraße 7   

  Venue Postalcode Venue Latitude Venue Longitude         Venue City  \
0            10243       52.51098        13.44159             Berlin   
1            65929       50.09914         8.51972  Frankfurt am Main   
2            10999       52.49927         13.4205             Berlin   
3            48155       51.94799         7.63785            Münster   
4            70376       48.80811         9.20116          Stuttgart   

  Distance to Train Station (km) Time to Train Station (min)  \
0               

In [128]:
df_sorted_tmaster
# adjust berlin airport
# Calculate Time of Pickup: Event Date - (Time To airport + 2 hrs)

Unnamed: 0,Event Name,Event Date,Venue Address,Venue Postalcode,Venue Latitude,Venue Longitude,Venue City,Distance to Train Station (km),Time to Train Station (min),Nearest Airport,Distance to Airport (km),Time to Airport (min)
0,Baby Smith - EP II Tour 2024,2024-10-18 18:00:00,Am Wriezener Bahnhof,10243,52.51098,13.44159,Berlin,,,BVM Ragow e.V.,33.584,41.266667
1,Bob Dylan - Rough and Rowdy Ways,2024-10-18 18:00:00,Pfaffenwiese 301,65929,50.09914,8.51972,Frankfurt am Main,,,Frankfurt Airport,17.952,24.2
2,Coults - F*CK COULTS EU TOUR,2024-10-18 18:00:00,Skalitzer Str. 134,10999,52.49927,13.4205,Berlin,,,BVM Ragow e.V.,30.566,39.033333
3,MEUTE,2024-10-18 18:00:00,Albersloher Weg 32,48155,51.94799,7.63785,Münster,,,Münster Osnabrück International Airport,35.78,27.2
4,LUNA 25/8 Tour 2024,2024-10-18 18:00:00,Quellenstraße 7,70376,48.80811,9.20116,Stuttgart,,,Stuttgart Airport,31.355,39.516667


## Code | C. Data cleaning and processing | 3 points
#### Recommended deadline: from October 18th until October 21st
#### Deadline: October 24th at 01:29 pm CET

In order to make data-driven decisions, you will need to clean the collected data, fill missing values, merge datasets etc.

Take some time to clean and process the collected data so that you can use it.

Organize the dataset into a structured format, such as a CSV file, HTML file, EXCEL file, and a table where each row represents the achieved data.

## Code | D. Compute relevant statistics | 2 points
#### Recommended deadline: from October 18th until October 21st
#### Deadline: October 24th at 01:29 pm CET

Get together as a group and ask yourselves: what business questions would you like to answer? For example:

- On which days are there maximum customer traffic?
- On which days do we expect to make more sales?
- How much inventory should we get? Why?
- Which impact would the weather conditions, local attractions or events have on your business?
- How would you like to develop the business in the future?
    - Do you wish to expand to new locations?
    - Launch a new product?
    - Target more elderly or young people?
    - Target vegetarian or book-worm people?

Compute descriptive statistics that inform you about the future of your business and enable you to answer the business questions.|


## Code | E. Clean and clear visualisations  | 3 points
#### Deadline: October 24th at 01:29 pm CET

Create **at least 3 data visualisations** that clearly state your point and support your decision-making. 


**Presentation: present each data visualisation and integrate them in your storytelling. Explain why they are relevant for your decision-making.**

## Presentation | 15 points
#### Deadline: October 24th during class

> Make a presentation about your business, the data you've collected and the direction you're taking the business in the next months.

**Presentation | 1. Description of unique business idea | 1 point**

Summarise the name and choice of business as well as location and the time of year it operates (you can add some branding, logo, etc.)


**Presentation | 2. Presentation of all the APIs used and how it serves your business | 3 points**

Present each API and explain why the collected data is relevant for your business.


**Presentation | 3. Presentation of the data cleaning and processing | 2 points**

Explain the steps your team took in order to get to a clean and structured dataset.


**Presentation | 4. Presentation of the statistics  & 5. Data visualisations | 5 points**

Display the statistics and relevant data visualisations that helped you make informed decisions about your business. The descriptive statistics and visualisations enable you to draw conclusions that take your business in one or the other direction. You need to explain how this information serves your business and the next steps you will take.

**Presentation | 6. Storytelling | 4 points**

Why did you pick this business idea? Why this name?

Who is your target audience? What problem does it solve?

What decisions did you make to make your business thrive in the future? What are your current challenges? Opportunities?

Can Data save your business or make it expand to new territories?


Create a good story!