# EcoPackAI – Module 2: Rule-Based Material Selection

Module 2 focuses on using cleaned datasets to logically filter and rank packaging materials based on product requirements such as strength, cost, and sustainability.


In [46]:
import sys
import os

sys.path.append(os.path.abspath(".."))


In [51]:
import pandas as pd
from sqlalchemy import create_engine

# ✅ Update these values based on your pgAdmin settings
DB_USER = "postgres"
DB_PASSWORD = "042006"
DB_HOST = "localhost"
DB_PORT = "5432"
DB_NAME = "ecopackai_db"   # your database name

engine = create_engine(
    f"postgresql+psycopg2://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"
)

print("✅ Connected to PostgreSQL successfully!")


✅ Connected to PostgreSQL successfully!


In [18]:
import pandas as pd

materials_df = pd.read_csv("../data/processed/materials_dataset.csv")
products_df = pd.read_csv("../data/processed/products_dataset.csv")

print(materials_df.shape)
print(products_df.shape)


(120, 10)
(175, 9)


In [3]:
print("Materials columns:", list(materials_df.columns))
print("Products columns:", list(products_df.columns))


Materials columns: ['material_id', 'material_name', 'strength_score', 'weight_capacity_kg', 'biodegradability_score', 'co2_emission_kg', 'recyclability_percent', 'cost_per_unit_inr', 'product_category', 'used_for_products']
Products columns: ['product_id', 'product_name', 'product_category', 'product_weight_kg', 'fragility_level', 'required_strength_score', 'preferred_biodegradability_score', 'max_packaging_cost_inr', 'temperature_sensitive']


In [10]:
print("Missing values in Materials dataset:")
print(materials_df.isnull().sum())

print("\nMissing values in Products dataset:")
print(products_df.isnull().sum())


Missing values in Materials dataset:
material_id               0
material_name             0
strength_score            0
weight_capacity_kg        0
biodegradability_score    0
co2_emission_kg           0
recyclability_percent     0
cost_per_unit_inr         0
product_category          0
used_for_products         0
dtype: int64

Missing values in Products dataset:
product_id                          0
product_name                        0
product_category                    0
product_weight_kg                   0
fragility_level                     0
required_strength_score             0
preferred_biodegradability_score    0
max_packaging_cost_inr              0
temperature_sensitive               0
dtype: int64


In [11]:
print("Duplicate rows in Materials:", materials_df.duplicated().sum())
print("Duplicate rows in Products:", products_df.duplicated().sum())


Duplicate rows in Materials: 0
Duplicate rows in Products: 0


In [12]:
print("Materials data types:")
print(materials_df.dtypes)

print("\nProducts data types:")
print(products_df.dtypes)


Materials data types:
material_id                object
material_name              object
strength_score              int64
weight_capacity_kg          int64
biodegradability_score      int64
co2_emission_kg           float64
recyclability_percent       int64
cost_per_unit_inr           int64
product_category           object
used_for_products          object
dtype: object

Products data types:
product_id                           object
product_name                         object
product_category                     object
product_weight_kg                   float64
fragility_level                      object
required_strength_score               int64
preferred_biodegradability_score      int64
max_packaging_cost_inr                int64
temperature_sensitive                object
dtype: object


In [13]:
print("Strength score range:", materials_df["strength_score"].min(), "to", materials_df["strength_score"].max())
print("Biodegradability score range:", materials_df["biodegradability_score"].min(), "to", materials_df["biodegradability_score"].max())
print("CO2 emission range:", materials_df["co2_emission_kg"].min(), "to", materials_df["co2_emission_kg"].max())
print("Cost per unit range:", materials_df["cost_per_unit_inr"].min(), "to", materials_df["cost_per_unit_inr"].max())


Strength score range: 4 to 10
Biodegradability score range: 1 to 10
CO2 emission range: 0.9 to 7.5
Cost per unit range: 38 to 300


In [14]:
print("Product weight range:", products_df["product_weight_kg"].min(), "to", products_df["product_weight_kg"].max())
print("Required strength range:", products_df["required_strength_score"].min(), "to", products_df["required_strength_score"].max())
print("Fragility levels:", products_df["fragility_level"].unique())


Product weight range: 0.06 to 25.0
Required strength range: 3 to 10
Fragility levels: ['High' 'Medium' 'Low']


### Data Cleaning Summary

- No missing values detected in materials or products datasets.
- No duplicate records found.
- All numerical features are within expected ranges.
- No imputation or row removal was required.

The datasets are clean and suitable for rule-based filtering and feature engineering.


In [20]:

product_name = "Smartphone"

selected_product = products_df[products_df["product_name"] == product_name].iloc[0]
selected_product


product_id                                 P001
product_name                         Smartphone
product_category                    Electronics
product_weight_kg                          0.22
fragility_level                            High
required_strength_score                       7
preferred_biodegradability_score              7
max_packaging_cost_inr                      100
temperature_sensitive                        No
Name: 0, dtype: object

In [21]:
filtered_materials = materials_df[
    (materials_df["strength_score"] >= selected_product["required_strength_score"]) &
    (materials_df["biodegradability_score"] >= selected_product["preferred_biodegradability_score"]) &
    (materials_df["cost_per_unit_inr"] <= selected_product["max_packaging_cost_inr"])
].copy()

print("Selected product:", selected_product["product_name"])
print("Filtered materials count:", len(filtered_materials))

filtered_materials.head()


Selected product: Smartphone
Filtered materials count: 11


Unnamed: 0,material_id,material_name,strength_score,weight_capacity_kg,biodegradability_score,co2_emission_kg,recyclability_percent,cost_per_unit_inr,product_category,used_for_products
0,M001,Single-wall corrugated cardboard,8,15,9,1.6,92,55,Electronics,Shipping boxes
1,M002,Double-wall corrugated cardboard,9,20,8,2.0,90,75,Electronics,Heavy-duty boxes
2,M003,Triple-wall corrugated cardboard,10,30,7,2.5,88,95,Industrial,Machinery packaging
3,M004,Kraft linerboard,7,12,9,1.5,90,50,Food,Outer cartons
4,M005,Test linerboard,7,10,8,1.8,85,48,Retail,Packaging cartons


In [22]:
ranked_materials = filtered_materials.sort_values(
    by=["co2_emission_kg", "cost_per_unit_inr", "recyclability_percent"],
    ascending=[True, True, False]
)

ranked_materials.head(10)


Unnamed: 0,material_id,material_name,strength_score,weight_capacity_kg,biodegradability_score,co2_emission_kg,recyclability_percent,cost_per_unit_inr,product_category,used_for_products
3,M004,Kraft linerboard,7,12,9,1.5,90,50,Food,Outer cartons
0,M001,Single-wall corrugated cardboard,8,15,9,1.6,92,55,Electronics,Shipping boxes
18,M019,Kenaf fiber board,7,10,9,1.6,80,88,Industrial,Boards
15,M016,Jute fiber sheet,7,12,9,1.7,80,95,Agriculture,Produce sacks
4,M005,Test linerboard,7,10,8,1.8,85,48,Retail,Packaging cartons
111,M112,Corrugated mailers,7,10,9,1.8,90,75,E-commerce,Book shipping
24,M025,Honeycomb paperboard,8,18,9,1.8,90,90,Industrial,Pallets
7,M008,Folding boxboard (FBB),7,9,8,2.0,88,65,Pharma,Medicine cartons
1,M002,Double-wall corrugated cardboard,9,20,8,2.0,90,75,Electronics,Heavy-duty boxes
6,M007,Solid unbleached board (SUB),8,12,7,2.2,85,70,Food,Dry food boxes


In [23]:
ranked_materials.head(20).to_csv("../data/processed/module2_ranked_materials_top20.csv", index=False)
print("Saved: data/processed/module2_ranked_materials_top20.csv")


Saved: data/processed/module2_ranked_materials_top20.csv


In [24]:
ranked_materials = filtered_materials.sort_values(
    by=["co2_emission_kg", "cost_per_unit_inr", "recyclability_percent"],
    ascending=[True, True, False]
)

ranked_materials.head(10)


Unnamed: 0,material_id,material_name,strength_score,weight_capacity_kg,biodegradability_score,co2_emission_kg,recyclability_percent,cost_per_unit_inr,product_category,used_for_products
3,M004,Kraft linerboard,7,12,9,1.5,90,50,Food,Outer cartons
0,M001,Single-wall corrugated cardboard,8,15,9,1.6,92,55,Electronics,Shipping boxes
18,M019,Kenaf fiber board,7,10,9,1.6,80,88,Industrial,Boards
15,M016,Jute fiber sheet,7,12,9,1.7,80,95,Agriculture,Produce sacks
4,M005,Test linerboard,7,10,8,1.8,85,48,Retail,Packaging cartons
111,M112,Corrugated mailers,7,10,9,1.8,90,75,E-commerce,Book shipping
24,M025,Honeycomb paperboard,8,18,9,1.8,90,90,Industrial,Pallets
7,M008,Folding boxboard (FBB),7,9,8,2.0,88,65,Pharma,Medicine cartons
1,M002,Double-wall corrugated cardboard,9,20,8,2.0,90,75,Electronics,Heavy-duty boxes
6,M007,Solid unbleached board (SUB),8,12,7,2.2,85,70,Food,Dry food boxes


### Result Interpretation

For the selected product (Smartphone), 11 materials satisfied all packaging constraints.
These materials were further ranked based on environmental impact, cost efficiency, and recyclability.
The top-ranked materials represent the most sustainable and cost-effective packaging options.


In [25]:
filtered_materials = filtered_materials.copy()


In [26]:
filtered_materials["strength_score_norm"] = (
    filtered_materials["strength_score"] / filtered_materials["strength_score"].max()
)


In [27]:
filtered_materials["recyclability_score"] = filtered_materials["recyclability_percent"] / 100


In [28]:
filtered_materials["biodegradability_score_norm"] = filtered_materials["biodegradability_score"] / 10


In [29]:
filtered_materials[["material_name","strength_score_norm","recyclability_score","biodegradability_score_norm"]].head()


Unnamed: 0,material_name,strength_score_norm,recyclability_score,biodegradability_score_norm
0,Single-wall corrugated cardboard,0.8,0.92,0.9
1,Double-wall corrugated cardboard,0.9,0.9,0.8
2,Triple-wall corrugated cardboard,1.0,0.88,0.7
3,Kraft linerboard,0.7,0.9,0.9
4,Test linerboard,0.7,0.85,0.8


In [30]:
co2_max = filtered_materials["co2_emission_kg"].max()
filtered_materials["co2_score"] = 1 - (filtered_materials["co2_emission_kg"] / co2_max)


In [31]:
filtered_materials[["material_name","co2_emission_kg","co2_score"]].head()


Unnamed: 0,material_name,co2_emission_kg,co2_score
0,Single-wall corrugated cardboard,1.6,0.36
1,Double-wall corrugated cardboard,2.0,0.2
2,Triple-wall corrugated cardboard,2.5,0.0
3,Kraft linerboard,1.5,0.4
4,Test linerboard,1.8,0.28


In [32]:
cost_max = filtered_materials["cost_per_unit_inr"].max()
filtered_materials["cost_score"] = 1 - (filtered_materials["cost_per_unit_inr"] / cost_max)


In [33]:
filtered_materials[["material_name","cost_per_unit_inr","cost_score"]].head()


Unnamed: 0,material_name,cost_per_unit_inr,cost_score
0,Single-wall corrugated cardboard,55,0.421053
1,Double-wall corrugated cardboard,75,0.210526
2,Triple-wall corrugated cardboard,95,0.0
3,Kraft linerboard,50,0.473684
4,Test linerboard,48,0.494737


In [34]:
filtered_materials["strength_margin"] = (
    filtered_materials["strength_score"] - selected_product["required_strength_score"]
)


In [35]:
filtered_materials["weight_fit_score"] = (
    filtered_materials["weight_capacity_kg"] >= selected_product["product_weight_kg"]
).astype(int)


In [36]:
filtered_materials[["material_name","strength_margin","weight_fit_score"]].head()


Unnamed: 0,material_name,strength_margin,weight_fit_score
0,Single-wall corrugated cardboard,1,1
1,Double-wall corrugated cardboard,2,1
2,Triple-wall corrugated cardboard,3,1
3,Kraft linerboard,0,1
4,Test linerboard,0,1


In [37]:
fragility_map = {"Low": 0.3, "Medium": 0.6, "High": 1.0}
fragility_weight = fragility_map.get(selected_product["fragility_level"], 0.6)

filtered_materials["fragility_weight"] = fragility_weight


In [38]:
filtered_materials["sustainability_score"] = (
    0.40 * filtered_materials["biodegradability_score_norm"] +
    0.30 * filtered_materials["recyclability_score"] +
    0.30 * filtered_materials["co2_score"]
)


In [39]:
filtered_materials[["material_name","sustainability_score"]].head()


Unnamed: 0,material_name,sustainability_score
0,Single-wall corrugated cardboard,0.744
1,Double-wall corrugated cardboard,0.65
2,Triple-wall corrugated cardboard,0.544
3,Kraft linerboard,0.75
4,Test linerboard,0.659


In [40]:
filtered_materials["material_suitability_score"] = (
    0.30 * filtered_materials["strength_score_norm"] +
    0.25 * filtered_materials["sustainability_score"] +
    0.20 * filtered_materials["cost_score"] +
    0.15 * filtered_materials["weight_fit_score"] +
    0.10 * (filtered_materials["strength_score_norm"] * fragility_weight)
)


In [41]:
final_ranked = filtered_materials.sort_values("material_suitability_score", ascending=False)
final_ranked[["material_id","material_name","material_suitability_score","sustainability_score","cost_score","co2_score"]].head(10)


Unnamed: 0,material_id,material_name,material_suitability_score,sustainability_score,cost_score,co2_score
0,M001,Single-wall corrugated cardboard,0.740211,0.744,0.421053,0.36
1,M002,Double-wall corrugated cardboard,0.714605,0.65,0.210526,0.2
3,M004,Kraft linerboard,0.712237,0.75,0.473684,0.4
4,M005,Test linerboard,0.693697,0.659,0.494737,0.28
2,M003,Triple-wall corrugated cardboard,0.686,0.544,0.0,0.0
6,M007,Solid unbleached board (SUB),0.665382,0.571,0.263158,0.12
24,M025,Honeycomb paperboard,0.659026,0.714,0.052632,0.28
7,M008,Folding boxboard (FBB),0.654158,0.644,0.315789,0.2
111,M112,Corrugated mailers,0.650605,0.714,0.210526,0.28
18,M019,Kenaf fiber board,0.621737,0.708,0.073684,0.36


In [42]:
final_ranked = filtered_materials.sort_values("material_suitability_score", ascending=False)
final_ranked[["material_id","material_name","material_suitability_score","sustainability_score","cost_score","co2_score"]].head(20)


Unnamed: 0,material_id,material_name,material_suitability_score,sustainability_score,cost_score,co2_score
0,M001,Single-wall corrugated cardboard,0.740211,0.744,0.421053,0.36
1,M002,Double-wall corrugated cardboard,0.714605,0.65,0.210526,0.2
3,M004,Kraft linerboard,0.712237,0.75,0.473684,0.4
4,M005,Test linerboard,0.693697,0.659,0.494737,0.28
2,M003,Triple-wall corrugated cardboard,0.686,0.544,0.0,0.0
6,M007,Solid unbleached board (SUB),0.665382,0.571,0.263158,0.12
24,M025,Honeycomb paperboard,0.659026,0.714,0.052632,0.28
7,M008,Folding boxboard (FBB),0.654158,0.644,0.315789,0.2
111,M112,Corrugated mailers,0.650605,0.714,0.210526,0.28
18,M019,Kenaf fiber board,0.621737,0.708,0.073684,0.36


In [43]:
final_ranked.to_csv("../data/processed/module2_feature_engineered_rankings.csv", index=False)
print("Saved: data/processed/module2_feature_engineered_rankings.csv")


Saved: data/processed/module2_feature_engineered_rankings.csv


In [47]:
from src.feature_engineering import engineer_features


In [48]:
from src.feature_engineering import engineer_features

feature_engineered_df = engineer_features(filtered_materials, selected_product)
final_ranked = feature_engineered_df.sort_values("material_suitability_score", ascending=False)
final_ranked.head(10)


Unnamed: 0,material_id,material_name,strength_score,weight_capacity_kg,biodegradability_score,co2_emission_kg,recyclability_percent,cost_per_unit_inr,product_category,used_for_products,strength_score_norm,recyclability_score,biodegradability_score_norm,co2_score,cost_score,strength_margin,weight_fit_score,fragility_weight,sustainability_score,material_suitability_score
0,M001,Single-wall corrugated cardboard,8,15,9,1.6,92,55,Electronics,Shipping boxes,0.8,0.92,0.9,0.36,0.421053,1,1,1.0,0.744,0.740211
1,M002,Double-wall corrugated cardboard,9,20,8,2.0,90,75,Electronics,Heavy-duty boxes,0.9,0.9,0.8,0.2,0.210526,2,1,1.0,0.65,0.714605
3,M004,Kraft linerboard,7,12,9,1.5,90,50,Food,Outer cartons,0.7,0.9,0.9,0.4,0.473684,0,1,1.0,0.75,0.712237
4,M005,Test linerboard,7,10,8,1.8,85,48,Retail,Packaging cartons,0.7,0.85,0.8,0.28,0.494737,0,1,1.0,0.659,0.693697
2,M003,Triple-wall corrugated cardboard,10,30,7,2.5,88,95,Industrial,Machinery packaging,1.0,0.88,0.7,0.0,0.0,3,1,1.0,0.544,0.686
6,M007,Solid unbleached board (SUB),8,12,7,2.2,85,70,Food,Dry food boxes,0.8,0.85,0.7,0.12,0.263158,1,1,1.0,0.571,0.665382
24,M025,Honeycomb paperboard,8,18,9,1.8,90,90,Industrial,Pallets,0.8,0.9,0.9,0.28,0.052632,1,1,1.0,0.714,0.659026
7,M008,Folding boxboard (FBB),7,9,8,2.0,88,65,Pharma,Medicine cartons,0.7,0.88,0.8,0.2,0.315789,0,1,1.0,0.644,0.654158
111,M112,Corrugated mailers,7,10,9,1.8,90,75,E-commerce,Book shipping,0.7,0.9,0.9,0.28,0.210526,0,1,1.0,0.714,0.650605
18,M019,Kenaf fiber board,7,10,9,1.6,80,88,Industrial,Boards,0.7,0.8,0.9,0.36,0.073684,0,1,1.0,0.708,0.621737


In [49]:
print("materials_df:", materials_df.shape)
print("products_df:", products_df.shape)
print("selected_product:", selected_product["product_name"])
print("filtered_materials:", filtered_materials.shape)


materials_df: (120, 10)
products_df: (175, 9)
selected_product: Smartphone
filtered_materials: (11, 20)


In [50]:
final_ranked.to_csv("../data/processed/module2_feature_engineered_rankings.csv", index=False)
print("Saved: data/processed/module2_feature_engineered_rankings.csv")


Saved: data/processed/module2_feature_engineered_rankings.csv


### Feature Engineering Summary

In this module, additional features were engineered to improve material comparison and ranking.
These include normalized strength, CO₂ score, cost score, sustainability score, and a final
material suitability score. These engineered features convert raw material attributes into
meaningful signals that support data-driven packaging material recommendations.


### Module 2 Summary

This module uses rule-based logic to filter and rank packaging materials
based on product requirements such as strength, sustainability, and cost.
For the selected product (Smartphone), the system identified 11 suitable
materials and ranked them to recommend the most eco-friendly option.
