# Automated Halal Product Identification through Ingredient Analysis

## Problem statement:

Develop an automated system to determine whether food products are halal or not by analyzing and interpreting their ingredients, particularly when purchasing products overseas.

## Objective

- Create a model that can classify food products as halal, not halal, or of uncertain status based on their ingredient lists.
    - Uncertain ingredients will then be checked through OpenAI to determine if the ingredients are halal.

- Enhancing the ability of consumers, especially Muslims, to make informed choices when purchasing food items abroad.



## Import libraries 

In [1]:
%%time
import base64
import requests
import pandas as pd
import re
import toml
import openai

CPU times: total: 31.2 ms
Wall time: 908 ms


## Import ingredients file needed for the analysis

- the ingredient list is consolidated from:-
    - [the MUIS Food Additive List](https://www.muis.gov.sg/-/media/Files/Halal/Documents/FOOD-ADDITIVE-LISTING-5.ashx)
    - [World of Islam Food Numbers](https://special.worldofislam.info/Food/numbers.html)
    - [Islamcan.com](https://islamcan.com/blog/2020/01/halal-and-haram-ingredient-database/)
    
    

### Data Dictionary

| Column Name               | Description                                                                                               |
|---------------------------|-----------------------------------------------------------------------------------------------------------|
| `ingred_name`             | Code or short identifier for each ingredient.                                                             |
| `chem_name`               | The chemical name of the ingredient.                                                                      |
| `description`             | A brief description of the ingredient, indicating its use or properties.                                 |
| `halal_non_halal_doubtful` | Numerical value indicating the halal status: 0 for Halal, 1 for Non-Halal, 2 for Doubtful.               |


In [2]:
df = pd.read_csv('./data/halal_non_halal_ingred.csv')

# Convert all string columns to lowercase and remove spaces
for col in df.columns:
    if df[col].dtype == 'object':  # Check if the column is of string type
        df[col] = df[col].str.lower()

## Halal Status Determination Code Summary

### Function: `create_lookup_table(df)`
- **Purpose**: Creates a lookup table from a DataFrame.
- **Process**:
  - Selects relevant columns.
  - Converts DataFrame to a dictionary using `ingred_name` as the key.

### Function: `parse_ingredients(ingredients_text)`
- **Purpose**: Normalizes and parses the ingredient text.
- **Process**:
  - Converts text to lowercase.
  - Replaces and removes certain characters and phrases.
  - Splits text based on commas outside parentheses.
  - Strips whitespace and returns a list of ingredients.

### Function: `check_halal_status(ingredients, lookup_table)`
- **Purpose**: Determines the Halal status of a product based on its ingredients.
- **Process**:
  - Initializes default product status as 'Halal'.
  - Iterates through ingredients to check their status in the lookup table.
  - Updates product status to 'Non-Halal' or 'Doubtful' based on ingredient status.
  - Returns a tuple with the product's Halal status and a list of unknown ingredients.


In [3]:
%%time

# Function to create a lookup table from the pre-processed DataFrame
def create_lookup_table(df):
    # Use only required columns and create a dictionary for lookup
    lookup_table = df.set_index('ingred_name')['halal_non_halal_doubtful'].to_dict()
    return lookup_table


def parse_ingredients(ingredients_text):
    # Normalize the text
    ingredients_text = ingredients_text.lower().replace("[", "(").replace("]", ")")
    
    # Remove newline characters
    ingredients_text = ingredients_text.replace("\n", ",")

    # Remove the portion starting from "allergen information:"
    ingredients_text = re.sub(r'allergen information:.*|allergen information.*', '', ingredients_text)
    
    # Replace specific phrases
    phrases_to_remove = ["the ingredients listed in the image are:\n\n", "ingredients:", 'the ingredients listed on the image', 
                         "here are the ingredients as listed on the image:\n\n ", "here are the ingredients as listed on the image: ", "are as follows:  -",
                        "the ingredients listed on the image are as follows:\n\n ", "are as follows:", "are:", "the ingredients listed", 
                         "the image shows a list of ingredients", "which includes items like", 
                         "the list is a typical example of ingredients you might find on the packaging of a processed food product",
                         "and it provides important information for consumers about what is in the product as well as potential allergens they should be aware of",
                        "the text also mentions","here are the ingredients exactly as they appear in the image:", "sure", 
                        "here is the list of ingredients exactly as they appear in the image:", "in the image", "the ingredients  are listed as follows:", "on the packaging"]
    
    for phrase in phrases_to_remove:
        ingredients_text = ingredients_text.replace(phrase, "")

    # Replace remaining "\n" with ", "
    ingredients_text = ingredients_text.replace("\n", "")
        
    # Replace semicolons with commas
    ingredients_text = ingredients_text.replace(';', ',')
    
    # Remove a full stop at the end if it exists
    ingredients_text = ingredients_text.rstrip('.')
    
    # Replace remaining "-" with ", "
    ingredients_text = ingredients_text.replace("-", "")
    
    # Remove any periods ('.')
    ingredients_text = ingredients_text.replace('.', '')
    
    # Split based on commas not within parentheses
    ingredients = re.split(r',\s*(?![^()]*\))', ingredients_text)
    
    # Strip whitespace from each ingredient
    parsed_ingredients = [ingredient.strip() for ingredient in ingredients if ingredient.strip()]
    return parsed_ingredients


def check_halal_status(ingredients, lookup_table):
    product_halal_status = 'Halal'  # Default status
    unknown_ingredients = []

    for ingredient in ingredients:
        ingredient_lower = ingredient.lower()

        # Check the Halal status from the lookup table
        status = lookup_table.get(ingredient_lower, "Unknown")

        if status == 'Non-Halal' or status == 'Doubtful':  # Non-halal or doubtful
            product_halal_status = 'Non-Halal'
        elif status == "Unknown":
            product_halal_status = 'Doubtful'
            unknown_ingredients.append(ingredient)

    # Return a tuple: product status and list of unknown ingredients
    return product_halal_status, unknown_ingredients

CPU times: total: 0 ns
Wall time: 0 ns


## Configuration Loading for OpenAI API Key

- **Purpose**: Loads the OpenAI API key from a TOML configuration file. 
    - so your api key is safe

In [4]:
%%time

with open('secrets.toml', 'r') as f:
    config = toml.load(f)

# Set OpenAI API key through the streamlit app's secrets
openai.api_key = config['api']['openai_key'] 


CPU times: total: 0 ns
Wall time: 1 ms


## Chat Completions with Open AI

- for more information, refer here: https://platform.openai.com/docs/guides/text-generation/chat-completions-api

### Function: `is_halal(ingredient_list, headers)`

- Determines if a given list of ingredients is halal using the OpenAI API gpt-3.5-turbo


In [5]:
%%time

def is_halal(ingredient_list, headers):
    payload = {
        "model": "gpt-3.5-turbo",  # You can choose your preferred model
        "messages": [
            {"role": "system", "content": "You are a helpful assistant that provides information."},
            {"role": "user", "content": f"Determine if the ingredients in '{ingredient_list}' are halal."}
        ]
    }

    try:
        response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
        response.raise_for_status()  # This will raise an exception for HTTP errors
        return response.json()['choices'][0]['message']['content']
    except requests.RequestException as e:
        return f"An error occurred: {e}"

CPU times: total: 0 ns
Wall time: 0 ns


## Reading text from images using Vision API from Open AI

- GPT-4 with Vision, sometimes referred to as GPT-4V or gpt-4-vision-preview in the API, allows the model to take in images and answer questions about them.
- for more information, refer here: https://platform.openai.com/docs/guides/vision

In [6]:
# Path to your image
image_path = "./ingredient_picture/hersheys.jpg"

In [7]:

# Function to encode the image to base64
def encode_image(image_path):
    try:
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')
    except IOError as e:
        return f"An error occurred while reading the file: {e}"

In [8]:
%%time

# Getting the base64 string
base64_image = encode_image(image_path)

# Make sure to replace {api_key} with your actual API key
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {openai.api_key}"  # Replace with your actual API key
}

# Payload for the request
payload = {
    "model": "gpt-4-vision-preview",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Please provide a list of the ingredients exactly as they appear in the image."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    "max_tokens": 800
}

# Make the API request to OpenAI
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

CPU times: total: 15.6 ms
Wall time: 5.6 s


## Halal Status Determination Process

- **Response Validation**: Checks if the HTTP response is successful (status code 200).
- **Data Extraction and Parsing**: Extracts ingredient text from the response and parses it into a list.
- **Lookup Table Creation**: Generates a lookup table for ingredient statuses from a DataFrame.
- **Halal Status Check**: Determines the overall Halal status of the product.
- **Handling Unknown Ingredients**:
  - Lists unknown ingredients, if any.
  - Further checks their Halal status using the `is_halal` function.
- **Error Handling**: Outputs error information for non-200 response statuses.

**Overview**: The code integrates various functions to evaluate and report the Halal status of ingredients based on API responses and internal logic.

In [9]:
# Main logic
if response.status_code == 200:
    response_data = response.json()

    # Extracting the content from the response
    ingredients_text = response_data['choices'][0]['message']['content']
    
    # Parse the ingredients_text to get the ingredients_list with main and sub-ingredients
    ingredients_list = parse_ingredients(ingredients_text)

    # Creating the lookup table from the pre-processed DataFrame
    lookup_table = create_lookup_table(df)

    # Checking the status of each ingredient
    product_status, unknown_ingredients = check_halal_status(ingredients_list, lookup_table)

    # Print the results
    print(f"""{ingredients_text}
    
    Product Halal Status:, {product_status}""")
    
    if unknown_ingredients:
        print("Unknown Ingredients:", unknown_ingredients)
        halal_status_response = is_halal(unknown_ingredients, headers)
        print(f"""Status of {unknown_ingredients}: 
              {halal_status_response}""")    
else:
    print("Error:", response.status_code, response.json())

The ingredients listed on the packaging are:

- Milk Chocolate [Sugar, Skim Milk Powder (Cow's Milk), Cocoa Mass, Cocoa Butter, Milk Fat (Cow's Milk), Lactose (Cow's Milk), Vegetable Oil (Palm, Sunflower), Alkalized Cocoa Powder];
- Emulsifiers: Soy Lecithin (INS 322), Polyglycerol Polyricinoleate (INS 476) from Castor Oil;
- Artificial Flavor: Vanillin.

Allergen Information:
- Contains Milk and Soy.
- May contain traces of Almonds, Hazelnuts and Wheat.
    
    Product Halal Status:, Doubtful
Unknown Ingredients: ['on the packaging', 'vegetable oil (palm, sunflower), alkalized cocoa powder)']
Status of ['on the packaging', 'vegetable oil (palm, sunflower), alkalized cocoa powder)']: 
              To determine if the ingredients in "vegetable oil (palm, sunflower), alkalized cocoa powder" are halal, you would need to consider the source of the vegetable oil and the method used to refine it. 

1. Palm Oil: Palm oil is generally considered halal unless it is sourced from non-halal anim

## Conclusion

**Effective in Simple Ingredient Detection:** The model currently performs well in detecting straightforward ingredient lists. It efficiently identifies basic ingredients, demonstrating its effectiveness in simpler scenarios of ingredient analysis.

## Future Developments

**Streamlining Phrase Recognition:** Instead of continually adding new phrases to the existing list, it would be more practical to develop an efficient module for removing unwanted phrases, leaving only the ingredient names for analysis. This approach ensures streamlined and effective extraction of ingredient information without the need for constant phrase additions.

**Detailed Analysis of Sub-Ingredients:** To better determine halal status, the model should be able to analyze not just main ingredients but also their sub-ingredients. A more detailed approach is required to understand the complexity of ingredient lists and their individual components.

**Refining Halal Status Determination:** Assessing whether an ingredient is halal involves understanding its source and how it's processed. Enhancing the model to incorporate external data on ingredient sourcing and processing will improve its ability to accurately classify halal status.

**Ongoing Model Improvement:** As food ingredients and manufacturing processes change, the model must evolve. Implementing a learning mechanism that allows the model to update its knowledge base with new information will ensure it stays accurate and relevant.

**Incorporating User Feedback:** Introducing a feature for users to provide feedback directly through the application. A feedback button can be implemented for users to suggest improvements or report inaccuracies, enabling the model to adapt and evolve based on user experiences and insights.
