<a href="https://colab.research.google.com/github/ishadvay3928/Local-Food-Wastage-Management-System-Project/blob/main/Local_Food_Wastage_Management_System_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **Local Food Wastage Management System Analysis**



# **GitHub Link -**

https://github.com/ishadvay3928/Local-Food-Wastage-Management-System-Project/blob/main/Local_Food_Wastage_Management_System_Analysis.ipynb

# **Problem Statement**


Food wastage is a significant issue, with many households and restaurants discarding surplus food while numerous people struggle with food insecurity. This project aims to develop a Local Food Wastage Management System, where:
* Restaurants and individuals can list surplus food.
* NGOs or individuals in need can claim the food.
* SQL stores available food details and locations.
* A Streamlit app enables interaction, filtering, CRUD operation and visualization.






#### **Define Your Business Objective?**

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
# Load all the  Datasets

df_providers = pd.read_csv("/content/providers_data.csv")
df_receivers = pd.read_csv("/content/providers_data.csv")
df_food_listings = pd.read_csv("/content/food_listings_data.csv")
df_claims = pd.read_csv("/content/claims_data.csv")

### Dataset First View

In [None]:
# First Look
df_providers.head()

In [None]:
df_receivers.head()

In [None]:
df_food_listings.head()

In [None]:
df_claims.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df_providers.shape

In [None]:
df_receivers.shape

In [None]:
df_food_listings.shape

In [None]:
df_claims.shape

### Dataset Information

In [None]:
# Dataset Info
df_providers.info()

In [None]:
df_receivers.info()

In [None]:
df_food_listings.info()

In [None]:
df_claims.info()

#### Duplicate Values

In [None]:
# Duplicate Value Count
df_providers.duplicated().sum()

In [None]:
df_receivers.duplicated().sum()

In [None]:
df_food_listings.duplicated().sum()

In [None]:
df_claims.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count of datasets
df_providers.isnull().sum()

In [None]:
df_receivers.isnull().sum()

In [None]:
df_food_listings.isnull().sum()

In [None]:
df_claims.isnull().sum()

### What did you know about your dataset?

**Provider dataset**
- There are 1000 rows and 6 columns in the dataset.
- There are no missing Values.

**Receiver dataset**
- There are 1000 rows and 6 columns in the dataset.
- There are no missing Values.

**Food Listings dataset**
- There are 1000 rows and 9 columns in the dataset.
- There are no missing Values.

**Claims dataset**
- There are 1000 rows and 5 columns in the dataset.
- There are no missing Values.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df_providers.columns

In [None]:
df_receivers.columns

In [None]:
df_food_listings.columns

In [None]:
df_claims.columns

In [None]:
#Dataset Describe
df_providers.describe(include='all')

In [None]:
df_receivers.describe(include='all')

In [None]:
df_food_listings.describe(include='all')

In [None]:
df_claims.describe(include='all')

### Variables Description

**1. Providers Dataset**

*The providers.csv file contains details of food providers who contribute surplus food to the system.*
* Provider_ID (Integer) – Unique identifier for each provider.
* Name (String) – Name of the food provider (e.g., restaurants, grocery stores, supermarkets).
* Type (String) – Category of provider (e.g., Restaurant, Grocery Store, Supermarket).
* Address (String) – Physical address of the provider.
* City (String) – City where the provider is located.
* Contact (String) – Contact information (e.g., phone number).

**2. Receivers Dataset**

*The receivers.csv file contains details of individuals or organizations receiving food.*

* Receiver_ID (Integer) – Unique identifier for each receiver.
* Name (String) – Name of the receiver (individual or organization).
* Type (String) – Category of receiver (e.g., NGO, Community Center, Individual).
* City (String) – City where the receiver is located.
* Contact (String) – Contact details (e.g., phone number).

**3. Food Listings Dataset**

*The food_listings.csv file stores details of available food items that can be claimed by receivers*

* Food_ID (Integer) – Unique identifier for each food item.
* Food_Name (String) – Name of the food item.
* Quantity (Integer) – Quantity available for distribution.
* Expiry_Date (Date) – Expiry date of the food item.
* Provider_ID (Integer) – Reference to the provider offering the food.
* Provider_Type (String) – Type of provider offering the food.
* Location (String) – City where the food is available.
* Food_Type (String) – Category of food (e.g., Vegetarian, Non-Vegetarian, Vegan).
* Meal_Type (String) – Type of meal (e.g., Breakfast, Lunch, Dinner, Snacks).

**4. Claims Dataset**

*The claims.csv file tracks food claims made by receivers.*

* Claim_ID (Integer) – Unique identifier for each claim.
* Food_ID (Integer) – Reference to the food item being claimed.
* Receiver_ID (Integer) – Reference to the receiver claiming the food.
* Status (String) – Current status of the claim (e.g., Pending, Completed, Cancelled).
* Timestamp (Datetime) – Date and time when the claim was made.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable of dataset.
df_providers.nunique()

In [None]:
df_receivers.nunique()

In [None]:
df_food_listings.nunique()

In [None]:
df_claims.nunique()

## ***3. Data Wrangling***

### Data Wrangling Code

In [None]:
# Clean contact numbers: keep digits, plus sign, and 'x' for extensions
import re

# Function to clean contact numbers
def clean_contact(contact):
    if pd.isna(contact):
        return None
    # Ensure it's a string before cleaning
    contact_str = str(contact)
    # Keep only +, digits, and 'x' (for extensions)
    return re.sub(r"[^0-9x\+]", "", contact_str)

# Apply cleaning
df_providers['Contact'] = df_providers['Contact'].apply(clean_contact)
df_receivers['Contact'] = df_receivers['Contact'].apply(clean_contact)

In [None]:
# CHANGE DATATYPES

# Convert Expiry_Date in df_food_listings to datetime
df_food_listings['Expiry_Date'] = pd.to_datetime(df_food_listings['Expiry_Date'], errors='coerce')

# Convert Timestamp in df_claims to datetime, then extract time only
df_claims['Timestamp'] = pd.to_datetime(df_claims['Timestamp'],errors='coerce')


In [None]:
# Save all cleaned dataset
df_providers.to_csv("providers_clean_dataset.csv", index=False)
df_receivers.to_csv("receivers_clean_dataset.csv", index=False)
df_food_listings.to_csv("food_listing_clean_dataset.csv", index=False)
df_claims.to_csv("claims_clean_dataset.csv", index=False)

### What all manipulations have you done and insights you found?

#### **Key Manipulations:**

* **Merged Forest and Grassland Datasets** to create a single unified dataset for analysis.
* **Dropped `Sub_Unit_Code`** as it contained very few non-null values (sparse, low-utility data).
* **Imputed Missing Values**:

  * `Site_Name` and `Distance` → `"Unknown"`
  * `Sex` → `"Undetermined"`
  * `NPSTaxonCode` & `TaxonCode` → `"N/A"`
  * `Previously_Obs` → Filled using the most frequent (mode) value.
* **Dropped Rows with Missing `ID_Method` and `AcceptedTSN`** to ensure essential identification information is retained.
* **Converted `Year` to Integer** (`Int64`) for consistent numeric analysis.
* **Cleaned and Standardized Time Fields** (`Start_Time` and `End_Time`): removed extra spaces, extracted `HH:MM:SS`, converted to proper time format.
* **Created `Observation_Hour` Column** from `Start_Time` to enable hourly trend analysis.
* **Standardized Boolean-like Columns** (`Flyover_Observed`, `PIF_Watchlist_Status`, `Regional_Stewardship_Status`, `Initial_Three_Min_Cnt`) by mapping variations like `'yes'/'no'` and `'true'/'false'` to `True`/`False`.
* **Converted Categorical Columns** (e.g., location codes, observer, species, environmental conditions) to category dtype for efficiency and consistency.
* **Removed Duplicate Rows** to maintain data integrity.


#### **Insights Gained:**

* **Data Deduplication** ensures no repeated entries, preventing double counting in species observations.
* **Consistent Missing Value Handling** preserves maximum usable data while avoiding gaps in analysis.
* **Categorical Standardization** improves the accuracy of grouping, filtering, and summary statistics.
* **Time Cleaning and Hour Extraction** enables meaningful time-based pattern detection (e.g., peak bird activity hours).
* **Boolean Standardization** supports reliable filtering and aggregation for conservation status and observation methods.
* **Dropping Low-Value Columns** removes noise and improves dataset quality for focused analysis.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

* **Prioritize habitat-specific strategies** — Focus more monitoring and conservation in forests, which currently have higher observation counts, while strengthening grassland biodiversity programs to balance efforts.
* **Enhance data collection quality** — Train observers to improve sex determination and accurately record observation distances to close current data gaps.
* **Leverage peak observation times** — Schedule eco-tourism activities and research during early morning hours (especially 6–7 a.m.) to maximize sightings.
* **Target conservation for rare species** — Create special monitoring programs for low-observation species to prevent population decline.
* **Empower top observers** — Recognize and incentivize leading contributors like Elizabeth Oswald to mentor others and improve team-wide performance.
* **Integrate environmental condition tracking** — Use insights on wind, sky, and disturbance levels to plan optimal birdwatching and research conditions.
* **Diversify observation coverage** — Increase monitoring in varied seasons and times to capture a more complete picture of bird diversity.
* **Promote eco-tourism marketing** — Highlight popular species and high-sighting times to attract birdwatching enthusiasts and boost tourism revenue.
* **Collaborate with conservation bodies** — Partner with wildlife NGOs, research institutions, and local communities to implement habitat-specific conservation measures.


# **Conclusion**

This analysis highlights significant patterns in bird species distribution across forests and grasslands, revealing valuable insights for conservation and biodiversity management. Forest habitats showed slightly higher observation rates, with notable dominance by a few adaptable species. Observation success was strongly influenced by early morning hours, calm weather, and low disturbance. However, gaps in sex identification and limited monitoring of grassland species indicate areas for improvement. Leveraging top observers, enhancing data collection accuracy, and balancing habitat focus can strengthen conservation outcomes. These findings provide a foundation for informed decision-making, optimized resource allocation, and sustainable biodiversity preservation efforts in diverse ecosystems.
