# Group Project: Data-driven Business Manager with APIs

#### Number of points: 30 (weights 30% in the final grade)
#### Deadline to form the groups: October 18th at 12:30 pm CET
#### Deadline for the code submission: October 24th at 01:29 pm CET
#### Presentations on October 24th

## Objective
In this project, you will create a Business that utilizes various APIs to make informed decisions about running your local business. If you want to sell drinks or street food or whatever floats your boat, do not hesitate to find your data and design the project accordingly. You will collect data from meteorological and non-meteorological APIs to help your business determine when, where, and how much inventory is needed to maximize sales. 

## Grading

In total, the group project counts for 30% of the final grade and represents 30 points.

The points are distributed in two parts: the code and the presentation.

- Some items e.g. Statistics are represented in both parts and will require to compute them in the code **and** present them during the presentation.
- Please note that the data must come from an API. You need to use at least 3 APIs (weather + two others). You can also of course use data downloaded from the internet, however they cannot replace the API data.
- Additionally, emphasis will be put on the **Storytelling** and whether or not the choice of APIs, data processing, statistics and visualisations are relevant for your business.
- To make grading easier, please provide **clean code** with **relevant comments** to make it straightforward what you are doing.
- Everyone in the group project must present during presentation day. **A penalty of -2 points** will be applied to each person **who does not present a significant part** during the presentation.


| **Code** | **15 points** |
| --- | --- |
| A. Collect data from weather API | 3 points |
| B. Collect data from two other non-meteorological APIs | 4 points |
| C. Data cleaning and processing | 3 points |
| D. Compute relevant statistics | 2 points |
| E. Clean and clear visualisations  | 3 points |


| **Presentation** | **15 points** |
| --- | --- |
| 1. Description of unique business idea | 1 point |
| 2. Presentation of all the APIs used and how it serves your business | 3 points |
| 3. Presentation of the data cleaning and processing | 2 points |
| 4. Presentation of the statistics | 2 points |
| 5. Presentation of the visualisations and how they serve the business| 3 points |
| 6. Storytelling | 4 points |

**Penalty: -2 points to each person who does not present a significant part during the presentation.**


**Penalty for unexcused absence or lateness**: 
- If you are absent or late on presentation day without an official excuse, you will receive 0 for the presentation part of the group project.
- If you are late without an official excuse and can still make it to the presentation of your team, you will still receive 0 for the presentation part of the group project.

## Getting Ready
#### Recommended deadline: October 8th
#### Deadline: October 18th at 12:30 pm CET

1. Form your group and select a group name. Communicate your group name to the teacher along with the First Name and Last Name of all the team members.

2. Create a branch on the **Students** repository with your group name (exactly the same as the one communicated to the teacher).

3. Discuss with your group and answer the following questions:

   - What kind of business do we run? What do we sell ? The choice of the business must be original and unique to your group.
   - How do we name our business?
   - When do we operate? Is it an all-year-round business or a seasonal one? If so, which seasons? Which months / weeks / days / hours of the day do we operate?
   - Where do we operate? In which countries / cities are we currently active ? Where do we want to develop in the future ? Determine where to set up your business stand based on weather conditions, local attractions, or events. The location should maximize customer traffic and sales.

## Code | A. Collect data from weather API | 3 points
#### Recommended deadline: October 18th
#### Deadline: October 24th at 01:29 pm CET
Use the OpenWeatherMap API to fetch weather data for your chosen location. You can select any city or location for your business.

- Fetch your chosen location's current temperature and weather conditions.
- Fetch the forecasted weather data for the next few days (e.g., five days).

<Response [200]>

## Code | B. Collect data from two other APIs | 4 points
#### Recommended deadline: October 18th
#### Deadline: October 24th at 01:29 pm CET
Integrate with **at least two** of the non-meteorological APIs you've learned about based on the location and the season.

It has to be with an API (not a downloaded dataset). 

You can of course use a downloaded dataset on top of the two APIs you've chosen.

You can also use more APIs, sky is the limit!

You can choose from:
- Google Maps,
- TripAdvisor,
- News API,
- Yelp,
- Wikipedia,
- Booking,
- Amadeus Travel API,
- Foursquare,
- etc. (make your own research and be original!)

Each API can provide different types of information. Pick the ones that best suit your application.


After collecting all the data you need, save them.

In [8]:
!pip install matplotlib
import requests
import pandas as pd
from datetime import datetime, timedelta
import matplotlib.pyplot as plt



In [10]:
# Calculate date range for the next month
today = datetime.now().date()
next_month = today + timedelta(days=30)
today_str = today.strftime("%Y-%m-%d")
next_month_str = next_month.strftime("%Y-%m-%d")

# Get API Key
api_token = "a9kompRf5e_NPnPID6Hg9PeJBB9qwaZZTmMAJ0jO"


# Creating events formula for pagins
def get_events(api_token, start_date, end_date, max_results=1000):
    url = "https://api.predicthq.com/v1/events/"
    headers = {
        "Authorization": f"Bearer {api_token}",
        "Accept": "application/json"
    }
    params = {
        "categories": "community, festivals, sports",
        "saved_location.location_id": "N333B4M3T98yr-3FQ3nIHw",
        "phq_attendance.gte": "500", # Avoid minor events which would not be profitable for us
        "active.gte": start_date,
        "active.lte": end_date,
        "limit": 100  # Maximum allowed by the API
    }

    all_events = []
    while len(all_events) < max_results:
        response = requests.get(url, headers=headers, params=params)
        if response.status_code != 200:
            print(f"Error: {response.status_code}")
            print(response.text)
            break

        data = response.json()
        events = data['results']
        all_events.extend(events)

        if not data['next']:
            break  # No more pages

        params['offset'] = data['next'].split('offset=')[1].split('&')[0]

    return all_events[:max_results]

In [12]:
# Get events into dataframe
events = get_events(api_token, today_str, next_month_str)
berlin_events = pd.DataFrame(events)

In [32]:
berlin_events.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 81 entries, 0 to 80
Data columns (total 33 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   relevance                         81 non-null     float64
 1   id                                81 non-null     object 
 2   title                             81 non-null     object 
 3   alternate_titles                  13 non-null     object 
 4   description                       81 non-null     object 
 5   category                          81 non-null     object 
 6   labels                            81 non-null     object 
 7   rank                              81 non-null     int64  
 8   local_rank                        81 non-null     int64  
 9   phq_attendance                    81 non-null     int64  
 10  entities                          81 non-null     object 
 11  duration                          81 non-null     int64  
 12  start     

In [72]:
events_summary = berlin_events[["title", "category", "labels", "phq_attendance", "predicted_event_spend", "start_local", "predicted_end_local", "location"]]

# Create new columns for start date and time
events_summary.loc[:, "start_date"] = events_summary["start_local"].str.split('T').str[0]
events_summary.loc[:, "start_time"] = events_summary["start_local"].str.split('T').str[1]
events_summary["start_date"] = pd.to_datetime(events_summary["start_date"])
events_summary["start_time"] = pd.to_datetime(events_summary["start_time"])

# Create new columns for end date and time + convert data type
events_summary.loc[:, "end_date"] = events_summary["predicted_end_local"].str.split('T').str[0]
events_summary.loc[:, "end_time"] = events_summary["predicted_end_local"].str.split('T').str[1]
events_summary["end_date"] = pd.to_datetime(events_summary["end_date"])
events_summary["end_time"] = pd.to_datetime(events_summary["end_time"])

events_summary

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  events_summary.loc[:, "start_date"] = events_summary["start_local"].str.split('T').str[0]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  events_summary.loc[:, "start_time"] = events_summary["start_local"].str.split('T').str[1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  events_summary["start_dat

Unnamed: 0,title,category,labels,phq_attendance,predicted_event_spend,start_local,predicted_end_local,location,start_date,start_time,end_date,end_time
0,WhoMadeWho,concerts,"[concert, music]",702,22356,2024-11-20T20:00:00,2024-11-21T02:40:00,"[13.381228, 52.50162599999999]",2024-11-20,2024-10-21 20:00:00,2024-11-21,2024-10-21 02:40:00
1,Radio Doria,community,"[concert, music]",1756,36305,2024-11-20T19:30:00,,"[13.3883474, 52.5207127]",2024-11-20,2024-10-21 19:30:00,NaT,NaT
2,Mord im Orientexpress,performing-arts,"[entertainment, performing-arts]",749,22488,2024-11-20T19:30:00,2024-11-20T21:20:00,"[13.372268, 52.50757]",2024-11-20,2024-10-21 19:30:00,2024-11-20,2024-10-21 21:20:00
3,Jacob Lee,concerts,"[concert, music]",640,20381,2024-11-20T19:00:00,2024-11-20T23:10:00,"[13.4227942, 52.5131365]",2024-11-20,2024-10-21 19:00:00,2024-11-20,2024-10-21 23:10:00
4,Salt Tree,concerts,"[concert, music]",577,17964,2024-11-17T20:00:00,2024-11-18T00:10:00,"[13.4227942, 52.5131365]",2024-11-17,2024-10-21 20:00:00,2024-11-18,2024-10-21 00:10:00
...,...,...,...,...,...,...,...,...,...,...,...,...
76,Java Script Days,conferences,[conference],500,119580,2024-10-21T09:00:00,,"[13.3882436, 52.5183247]",2024-10-21,2024-10-21 09:00:00,NaT,NaT
77,German Kita Management Congress,conferences,[conference],500,62164,2024-10-21T09:00:00,,"[13.416334, 52.520431]",2024-10-21,2024-10-21 09:00:00,NaT,NaT
78,International Research Symposium on Agricultur...,conferences,[conference],500,90872,2024-10-21T09:00:00,,"[13.3871292, 52.5087155]",2024-10-21,2024-10-21 09:00:00,NaT,NaT
79,API Conference,conferences,[conference],500,90872,2024-10-21T00:00:00,,"[13.3882436, 52.5183247]",2024-10-21,2024-10-21 00:00:00,NaT,NaT


## Code | C. Data cleaning and processing | 3 points
#### Recommended deadline: from October 18th until October 21st
#### Deadline: October 24th at 01:29 pm CET

In order to make data-driven decisions, you will need to clean the collected data, fill missing values, merge datasets etc.

Take some time to clean and process the collected data so that you can use it.

Organize the dataset into a structured format, such as a CSV file, HTML file, EXCEL file, and a table where each row represents the achieved data.

## Code | D. Compute relevant statistics | 2 points
#### Recommended deadline: from October 18th until October 21st
#### Deadline: October 24th at 01:29 pm CET

Get together as a group and ask yourselves: what business questions would you like to answer? For example:

- On which days are there maximum customer traffic?
- On which days do we expect to make more sales?
- How much inventory should we get? Why?
- Which impact would the weather conditions, local attractions or events have on your business?
- How would you like to develop the business in the future?
    - Do you wish to expand to new locations?
    - Launch a new product?
    - Target more elderly or young people?
    - Target vegetarian or book-worm people?

Compute descriptive statistics that inform you about the future of your business and enable you to answer the business questions.|


## Code | E. Clean and clear visualisations  | 3 points
#### Deadline: October 24th at 01:29 pm CET

Create **at least 3 data visualisations** that clearly state your point and support your decision-making. 


**Presentation: present each data visualisation and integrate them in your storytelling. Explain why they are relevant for your decision-making.**

## Presentation | 15 points
#### Deadline: October 24th during class

> Make a presentation about your business, the data you've collected and the direction you're taking the business in the next months.

**Presentation | 1. Description of unique business idea | 1 point**

Summarise the name and choice of business as well as location and the time of year it operates (you can add some branding, logo, etc.)


**Presentation | 2. Presentation of all the APIs used and how it serves your business | 3 points**

Present each API and explain why the collected data is relevant for your business.


**Presentation | 3. Presentation of the data cleaning and processing | 2 points**

Explain the steps your team took in order to get to a clean and structured dataset.


**Presentation | 4. Presentation of the statistics  & 5. Data visualisations | 5 points**

Display the statistics and relevant data visualisations that helped you make informed decisions about your business. The descriptive statistics and visualisations enable you to draw conclusions that take your business in one or the other direction. You need to explain how this information serves your business and the next steps you will take.

**Presentation | 6. Storytelling | 4 points**

Why did you pick this business idea? Why this name?

Who is your target audience? What problem does it solve?

What decisions did you make to make your business thrive in the future? What are your current challenges? Opportunities?

Can Data save your business or make it expand to new territories?


Create a good story!