# Lab Instructions

Find a dataset that interests you. I'd recommend starting on [Kaggle](https://www.kaggle.com/). Read through all of the material about the dataset and download a .CSV file.

1. Write a short summary of the data.  Where did it come from?  How was it collected?  What are the features in the data?  Why is this dataset interesting to you?

The source of my dataset is Foodpanda, an online food delivery platform. It was published on Kaggle. Data was gathered from Foodpanda's historical records, including customer information, restuarant details, delivery times, ratings, and payment methods. Then it was anonymized and cleaned for research use. This dataset interests me because it gives insight into consumer behavior.

* customer_id
* gender
* age
* city
* signup_date
* order_id
* order_date
* restaurant_name
* dish_name
* category
* price
* payment_method
* order_frequency
* last_order_date
* loyalty_points
* churned
* rating
* rating_date
* delivery_status

2. Identify 5 interesting questions about your data that you can answer using Pandas methods.

* Which restaurants received the highest average ratings?
* What is the average order by city?
* How do loyalty points differ across age groups?
* What are the most popular payment methods among active customers?
* What percentage of orders are successfully delivered vs canceled?

3. Answer those questions!  You may use any method you want (including LLMs) to help you write your code; however, you should use Pandas to find the answers.  LLMs will not always write code in this way without specific instruction.  

4. Write the answer to your question in a text box underneath the code you used to calculate the answer.



In [12]:
import pandas as pd
df = pd.read_csv("food_dataset.csv")

The five top rated restaurants are:

In [20]:
top_restaurants = (df.groupby('restaurant_name')['rating'].mean().sort_values(ascending=False).head(5))
top_restaurants

restaurant_name
Subway         3.087302
McDonald's     3.002629
KFC            2.974673
Burger King    2.966116
Pizza Hut      2.949346
Name: rating, dtype: float64

The five cities with the largest order amounts are:

In [21]:
avg_order_value = (df.groupby('city')['price'].mean().sort_values(ascending=False).head(5))
avg_order_value

city
Peshawar     824.361054
Multan       803.346871
Islamabad    795.089174
Lahore       793.004503
Karachi      786.180629
Name: price, dtype: float64

In [28]:
loyalty_by_age = (df.groupby('age')['loyalty_points'].mean().round(2).sort_values(ascending=False))
loyalty_by_age

age
Teenager    256.76
Adult       247.18
Senior      246.26
Name: loyalty_points, dtype: float64

Teenagers show the highest loyalty

In [30]:
active_customers = df[df['churned'] == 'Active']
payment_distribution = (active_customers['payment_method'].value_counts(normalize=True)*100).round(2)
payment_distribution

payment_method
Card      33.69
Cash      33.39
Wallet    32.92
Name: proportion, dtype: float64

Card payments are most popular among active users.

In [31]:
delivery_status_counts = (df['delivery_status'].value_counts(normalize=True)*100).round(2)
delivery_status_counts

delivery_status
Delivered    34.33
Delayed      32.87
Cancelled    32.80
Name: proportion, dtype: float64

The majority of deliveries are cancelled or delayed.