<a href="https://colab.research.google.com/github/somewhatclueless07/greenwashing_detection/blob/main/greenwashing_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Problem Statement:**
Brands often market products as eco-friendly using buzzwords or natural-looking packaging — even when the product isn’t sustainable.
This is called **greenwashing**. So the project focuses on automatically detecting greenwashing in **food products** by analyzing both text
descriptions and packaging materials using a combination of **Machine Learning** and **CNN-based image analysis**. This can enable the
consumers to make a more informed and sustainable choice

In [None]:
!pip install tensorflow scikit-learn pandas numpy pillow requests matplotlib joblib



In [None]:
#import libraries
import pandas as pd
import numpy as np
import os
import requests
from PIL import Image
from io import BytesIO
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, models
import joblib

In [None]:
#access dataset
!wget https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv.gz

df_sample = pd.read_csv(
    "en.openfoodfacts.org.products.csv.gz",
    sep='\t',
    nrows=5000
)

print(df_sample.columns.tolist())

--2025-11-01 16:20:38--  https://static.openfoodfacts.org/data/en.openfoodfacts.org.products.csv.gz
Resolving static.openfoodfacts.org (static.openfoodfacts.org)... 213.36.253.214
Connecting to static.openfoodfacts.org (static.openfoodfacts.org)|213.36.253.214|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://openfoodfacts-ds.s3.eu-west-3.amazonaws.com/en.openfoodfacts.org.products.csv.gz [following]
--2025-11-01 16:20:39--  https://openfoodfacts-ds.s3.eu-west-3.amazonaws.com/en.openfoodfacts.org.products.csv.gz
Resolving openfoodfacts-ds.s3.eu-west-3.amazonaws.com (openfoodfacts-ds.s3.eu-west-3.amazonaws.com)... 3.5.204.12, 3.5.205.175
Connecting to openfoodfacts-ds.s3.eu-west-3.amazonaws.com (openfoodfacts-ds.s3.eu-west-3.amazonaws.com)|3.5.204.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1173187100 (1.1G) [application/gzip]
Saving to: ‘en.openfoodfacts.org.products.csv.gz.1’


2025-11-01 16:21:19 (28.0 

  df_sample = pd.read_csv(


In [None]:
#load dataset
cols = [
    'product_name', 'brands', 'categories', 'main_category_en',
    'packaging', 'packaging_tags', 'packaging_text',
    'environmental_score_grade', 'carbon-footprint_100g',
    'image_url', 'image_small_url'
]

# Load first 5000 rows with only these columns
df = pd.read_csv(
    "en.openfoodfacts.org.products.csv.gz",
    sep='\t',
    nrows=5000,
    usecols=cols,
    low_memory=False
)

print(df.head(5))
print("Shape of dataset:", df.shape)


                     product_name packaging packaging_tags packaging_text  \
0                             NaN       NaN            NaN            NaN   
1                             NaN       NaN            NaN            NaN   
2  Entrecôesteack - Highland Beef      Glas       en:glass            NaN   
3                             NaN       NaN            NaN            NaN   
4                             NaN       NaN            NaN            NaN   

                   brands           categories environmental_score_grade  \
0                     NaN                  NaN                       NaN   
1                     NaN                  NaN                       NaN   
2  PG Tips, green organic  Nutrition drink mix                   unknown   
3                     NaN                  NaN                       NaN   
4                     NaN                  NaN                       NaN   

      main_category_en                                          image_url  \
0  