# EcoPackAI – Module 1: Data Collection & Database Setup

Module 1 focuses on collecting realistic datasets for packaging materials and products, organizing them in a structured format, and storing them in a PostgreSQL database. This module creates the foundation for sustainability-based material recommendation in later modules.


## 1. Objective of Module 1

The main objectives of this module are:

1. To collect real-world data related to packaging materials and products.
2. To organize the data into raw and processed formats.
3. To convert Excel datasets into CSV format for database usage.
4. To store the datasets in a PostgreSQL database.
5. To verify successful data import using SQL queries.

This structured data will be used in Module 2 for material filtering and ranking.


## 2. Project Data Organization

The project follows a clean and professional data organization structure:

- **data/raw/**
  - ecopackai_materials_dataset.xlsx
  - ecopackai_products_dataset.xlsx

- **data/processed/**
  - materials_dataset.csv
  - products_dataset.csv

The raw folder contains original datasets, while the processed folder contains cleaned CSV files used for database import and analysis.


## 3. Materials Dataset Description

The materials dataset contains information about packaging materials and their sustainability characteristics.

Key columns:
- material_id: Unique identifier for each material (e.g., M001)
- material_name: Name of the packaging material
- strength_score: Strength rating (1–10)
- weight_capacity_kg: Maximum load capacity
- biodegradability_score: Environmental friendliness score (1–10)
- co2_emission_kg: Carbon emission value
- recyclability_percent: Percentage of recyclability
- cost_per_unit_inr: Cost of packaging material
- product_category: Suitable product category
- used_for_products: Real-world usage examples

This dataset is used to evaluate which materials are suitable for different products.


## 4. Products Dataset Description

The products dataset represents different products along with their packaging requirements.

Key columns:
- product_id: Unique product identifier (e.g., P001)
- product_name: Name of the product
- product_category: Product type (Electronics, Food, Pharma, etc.)
- product_weight_kg: Weight of the product
- fragility_level: Product fragility (Low / Medium / High)
- required_strength_score: Minimum packaging strength required
- preferred_biodegradability_score: Sustainability preference
- max_packaging_cost_inr: Budget limit for packaging
- temperature_sensitive: Indicates cold-chain requirement

This dataset defines the constraints used for material selection in later modules.


In [1]:
import pandas as pd

materials_path = "../data/processed/materials_dataset.csv"
products_path = "../data/processed/products_dataset.csv"

materials_df = pd.read_csv(materials_path)
products_df = pd.read_csv(products_path)

print("Materials dataset shape:", materials_df.shape)
print("Products dataset shape:", products_df.shape)

materials_df.head()


Materials dataset shape: (120, 10)
Products dataset shape: (175, 9)


Unnamed: 0,material_id,material_name,strength_score,weight_capacity_kg,biodegradability_score,co2_emission_kg,recyclability_percent,cost_per_unit_inr,product_category,used_for_products
0,M001,Single-wall corrugated cardboard,8,15,9,1.6,92,55,Electronics,Shipping boxes
1,M002,Double-wall corrugated cardboard,9,20,8,2.0,90,75,Electronics,Heavy-duty boxes
2,M003,Triple-wall corrugated cardboard,10,30,7,2.5,88,95,Industrial,Machinery packaging
3,M004,Kraft linerboard,7,12,9,1.5,90,50,Food,Outer cartons
4,M005,Test linerboard,7,10,8,1.8,85,48,Retail,Packaging cartons


In [2]:
products_df.head()


Unnamed: 0,product_id,product_name,product_category,product_weight_kg,fragility_level,required_strength_score,preferred_biodegradability_score,max_packaging_cost_inr,temperature_sensitive
0,P001,Smartphone,Electronics,0.22,High,7,7,100,No
1,P002,Laptop,Electronics,2.1,High,8,7,260,No
2,P003,Tablet,Electronics,0.55,High,7,7,160,No
3,P004,Smartwatch,Electronics,0.18,Medium,6,7,90,No
4,P005,Bluetooth Earbuds,Electronics,0.12,Medium,6,7,75,No


## 5. Why PostgreSQL Was Used

PostgreSQL was chosen as the database system because:
- It supports structured relational data.
- It allows efficient filtering and querying using SQL.
- It integrates easily with backend APIs and machine learning workflows.
- It is scalable and industry-relevant.

Storing data in PostgreSQL enables rule-based decision making in Module 2.


## 6. Database Setup

- Database Name: ecopackai_db
- Schema: public
- Tables Created:
  - materials
  - products

Important design decision:
- material_id and product_id are stored as TEXT to support values like M001 and P001.


## 7. Table Creation SQL

### Materials Table and Products Table
```sql
DROP TABLE IF EXISTS materials;

CREATE TABLE materials (
  material_id TEXT PRIMARY KEY,
  material_name TEXT,
  strength_score INT,
  weight_capacity_kg NUMERIC,
  biodegradability_score INT,
  co2_emission_kg NUMERIC,
  recyclability_percent INT,
  cost_per_unit_inr NUMERIC,
  product_category TEXT,
  used_for_products TEXT
);





DROP TABLE IF EXISTS products;

CREATE TABLE products (
  product_id TEXT PRIMARY KEY,
  product_name TEXT,
  product_category TEXT,
  product_weight_kg NUMERIC,
  fragility_level TEXT,
  required_strength_score INT,
  preferred_biodegradability_score INT,
  max_packaging_cost_inr NUMERIC,
  temperature_sensitive TEXT
);


## 8. Data Import Process
CSV files were imported into PostgreSQL using pgAdmin:

Steps:
1. Right-click table → Import/Export Data
2. Select Import option
3. Choose CSV file from data/processed/
4. Enable Header option
5. Use comma (,) as delimiter

This method ensures accurate and efficient data loading.

## 9. Data Verification

After import, the following SQL queries were used to verify data integrity:

```sql
SELECT COUNT(*) FROM materials;
SELECT COUNT(*) FROM products;
Results confirmed:
    Materials records: 120
    Products records: 175


SELECT * FROM materials LIMIT 5;
SELECT * FROM products LIMIT 5;





## 10. Screenshots

Screenshots have been captured and stored in:
reports/Screenshots/

These include:
- Database tables in pgAdmin
- Record count verification
- Sample data previews

## 11. Conclusion

Module 1 successfully established the data foundation for EcoPackAI.
Realistic materials and products datasets were collected, cleaned, and stored in PostgreSQL.
The verified database will be used in Module 2 for filtering and ranking sustainable packaging materials.
