# UBC Lost & Found Predictive Analysis

<img src="./img/access.jpg" alt="drawing" style="width:90%;"/>
Access Services. Retrieved from <a href = "https://parking.ubc.ca/">UBC Parking</a> 

## Introduction 
---

Losing personal belongings is a common and often frustrating experience for students and staff at UBC. Items ranging from everyday essentials like wallets and keys to electronics and jewelry frequently end up in the lost and found. While the university maintains a central lost and found system, the lack of systematic analysis in these services can lead to inefficiencies, making it harder to identify trends and optimize recovery processes. By analyzing the UBC lost and found dataset, this project aims to uncover patterns in lost items, predict future trends, and provide actionable insights to enhance the efficiency of UBC's lost and found services.

**Objectives**: 
- **Item Classification Analysis**: Identify the most frequently lost item categories (e.g., electronics, wallets, keys).
- **Temporal Analysis**: Examine seasonal and daily trends in lost item reports.
- **Location Analysis**: Identify the most common campus locations where items are lost and found.
- **Time Series Forecasting**: Develop predictive models to forecast the number of items reported to the lost and found using historical data.
- **Recommendations**: Provide actionable recommendations to improve the lost and found system, including targeted awareness campaigns, optimized item storage, and enhanced communication channels.


## Dataset
---

The [UBC Lost and Found database](https://lostandfound.ubc.ca/all-items) is a publicly accessible dataset containing information about items reported lost within the University of British Columbia campus. The dataset spans from May 2024 to the current day, with over 140 entries. Each entry includes details such as the item type, description, date lost, and location. We will be using the dataset retrieved as of January 2025 for the project.

### Loading libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

### Importing the data

In [2]:
url = "https://lostandfound.ubc.ca/all-items/export.csv"
df = pd.read_csv(url, index_col = "Date")

df.head()

Unnamed: 0_level_0,Ticket Number,Item Type,Status,Item Description,Lost Item Location
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"January 14, 2025",47168,Jewelry,Lost,Long Necklace with Pendant,Other
"January 14, 2025",47167,Jewelry,Lost,Necklace chain,Other
"January 14, 2025",47166,Electronics,Lost,Ear Buds and Case,Other
"January 14, 2025",47165,Electronics,Lost,Air Pods with Case,Other
"January 14, 2025",47163,Keys,Lost,Keys and key chains,Other


## Data Wrangling & Cleaning
---

### Examining the data

We begin by exploring the dataset to understand its structure.

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 145 entries, January 14, 2025 to May 31, 2024
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Ticket Number       145 non-null    int64 
 1   Item Type           145 non-null    object
 2   Status              143 non-null    object
 3   Item Description    145 non-null    object
 4   Lost Item Location  103 non-null    object
dtypes: int64(1), object(4)
memory usage: 6.8+ KB


In [4]:
df.describe(include = "all")

Unnamed: 0,Ticket Number,Item Type,Status,Item Description,Lost Item Location
count,145.0,145,143,145,103
unique,,7,1,128,4
top,,Electronics,Lost,Smart Watch,Other
freq,,57,143,3,80
mean,47057.303448,,,,
std,65.572407,,,,
min,46923.0,,,,
25%,47006.0,,,,
50%,47058.0,,,,
75%,47117.0,,,,


### Handling missing values

In [5]:
df.isna().sum()

Ticket Number          0
Item Type              0
Status                 2
Item Description       0
Lost Item Location    42
dtype: int64

- For the `Lost Item Location` column, missing values will be replaced with "`Unknown`" to ensure data completeness.
- Missing values in the `Status` column are minimal (2 rows) and will not be addressed as they have no impact on analysis.

In [6]:
value = {"Lost Item Location": "Unknown"}
df.fillna(value=value, inplace=True)

### Datetime Conversion

To facilitate temporal analysis, the `Date` index is converted to a proper datetime object. Additional features like `Year`, `Month`, and `Weekday` are also extracted.

In [7]:
df.index = pd.to_datetime(df.index)
df['Year'] = df.index.year
df['Month'] = df.index.month_name()
df['Weekday'] = df.index.day_name()

### Cleaning the data

To finalize the data preparation, we will check for any duplicated values and ensure chronological consistency by sorting the dataset by date.

In [8]:
print(f"Number of duplicates: {df.duplicated().sum()}")

Number of duplicates: 0


In [9]:
df = df.sort_values('Date', ascending=True)

After cleaning, the dataset is sorted by date. Below is a preview of the first five rows:

In [10]:
df.head()

Unnamed: 0_level_0,Ticket Number,Item Type,Status,Item Description,Lost Item Location,Year,Month,Weekday
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2024-05-31,46923,Jewelry,Lost,Woman's ring,Library,2024,May,Friday
2024-06-05,46927,Electronics,Lost,Digital Pen,Other,2024,June,Wednesday
2024-06-05,46925,Keys,Lost,Single key on ring,Library,2024,June,Wednesday
2024-06-05,46926,Jewelry,Lost,Silver ear ring,Library,2024,June,Wednesday
2024-06-05,46928,Jewelry,Lost,Ear ring (beaded),Library,2024,June,Wednesday


## Exploratory Data Analysis
--- 