# Keto/Vegan Diet classifier
Argmax, a consulting firm specializing in search and recommendation solutions with offices in New York and Israel, is hiring entry-level Data Scientists and Machine Learning Engineers.

At Argmax, we prioritize strong coding skills and a proactive, “get-things-done” attitude over a perfect resume. As part of our selection process, candidates are required to complete a coding task demonstrating their practical abilities.

In this task, you’ll work with a large recipe dataset sourced from Allrecipes.com. Your challenge will be to classify recipes based on their ingredients, accurately identifying keto (low-carb) and vegan (no animal products) dishes.

Successfully completing this assignment is a crucial step toward joining Argmax’s talented team.

In [1]:
# !pip install opensearch-py


In [2]:
# !pip install python-decouple


In [None]:
from opensearchpy import OpenSearch
from decouple import config
import pandas as pd
from opensearchpy import OpenSearch
import json
import sys
from argparse import ArgumentParser
from typing import List
from time import time
import pandas as pd
from sklearn.metrics import classification_report
import unittest
import unittest

# Local instrustions:

open http://localhost:9200/

RUN on BASH

docker start opensearch-dev


OR ,

RUN 

docker run -d --name opensearch-dev -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "plugins.security.disabled=true" opensearchproject/opensearch:2.11.1


ON CMD

AND THEN RUN


 docker ps


ON CMD

In [4]:

client = OpenSearch(
    hosts=['http://localhost:9200'],
    http_auth=('admin', 'Segev2025!'),
    use_ssl=False,
    verify_certs=False,
    ssl_show_warn=False
)




In [5]:

if not client.indices.exists(index="recipes"):
    client.indices.create(index="recipes")

doc = {
    "description": "This is a simple egg recipe"
}
client.index(index="recipes", id=1, body=doc)

client.indices.refresh(index="recipes")



{'_shards': {'total': 2, 'successful': 1, 'failed': 0}}

In [6]:

query = {
    "query": {
        "match": {
            "description": "egg"
        }
    }
}
res = client.search(index="recipes", body=query)




In [7]:
res

{'took': 181,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 0.2876821,
  'hits': [{'_index': 'recipes',
    '_id': '1',
    '_score': 0.2876821,
    '_source': {'description': 'This is a simple egg recipe'}}]}}

In [8]:
res['hits']

{'total': {'value': 1, 'relation': 'eq'},
 'max_score': 0.2876821,
 'hits': [{'_index': 'recipes',
   '_id': '1',
   '_score': 0.2876821,
   '_source': {'description': 'This is a simple egg recipe'}}]}

In [9]:
res['hits']['hits']

[{'_index': 'recipes',
  '_id': '1',
  '_score': 0.2876821,
  '_source': {'description': 'This is a simple egg recipe'}}]

In [10]:
hits = res['hits']['hits']

In [11]:
 pd.DataFrame(hits)

Unnamed: 0,_index,_id,_score,_source
0,recipes,1,0.287682,{'description': 'This is a simple egg recipe'}


In [12]:
pd.DataFrame(res)

Unnamed: 0,took,timed_out,_shards,hits
total,181,False,1.0,"{'value': 1, 'relation': 'eq'}"
successful,181,False,1.0,
skipped,181,False,0.0,
failed,181,False,0.0,
max_score,181,False,,0.287682
hits,181,False,,"[{'_index': 'recipes', '_id': '1', '_score': 0..."


# Recipes Index
Our data is stored in OpenSearch, and you can query it using either Elasticsearch syntax or SQL.
## Elasticsearch Syntax

In [13]:
query = {
    "query": {
        "match": {
            "description": { "query": "egg" }
        }
    }
}

res = client.search(
    index="recipes",
    body=query,
    size=2
)

hits = res['hits']['hits']
hits 

[{'_index': 'recipes',
  '_id': '1',
  '_score': 0.2876821,
  '_source': {'description': 'This is a simple egg recipe'}}]

In [14]:
res

{'took': 27,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 0.2876821,
  'hits': [{'_index': 'recipes',
    '_id': '1',
    '_score': 0.2876821,
    '_source': {'description': 'This is a simple egg recipe'}}]}}

## SQL syntax

In [15]:
query = """
SELECT *
FROM recipes
WHERE description like '%egg%'
---LIMIT 10
"""

res = client.sql.query(body={'query': query})

df = pd.DataFrame(res["datarows"], columns=[c["name"] for c in res["schema"]])
df

Unnamed: 0,description
0,This is a simple egg recipe


# Task Instructions

Your goal is to implement two classifiers:

1.	Vegan Meal Classifier
1.	Keto Meal Classifier

Unlike typical supervised machine learning tasks, the labels are not provided in the dataset. Instead, you will rely on clear and verifiable definitions to classify each meal based on its ingredients.

### Definitions:

1. **Vegan Meal**: Contains no animal products whatsoever (no eggs, milk, meat, etc.).
1. **Keto Meal**: Contains no ingredients with more than 10g of carbohydrates per 100g serving. For example, eggs are keto-friendly, while apples are not.

Note that some meals may meet both vegan and keto criteria (e.g., meals containing avocados), though most meals typically fall into neither category.

## Example heuristic:

In [17]:
def is_ingredient_vegan(ing):
    for animal_product in "egg meat milk butter veel lamb beef chicken sausage".split():
        if animal_product in ing:
            return False
    return True

def is_vegan_example(ingredients):
    return all(map(is_ingredient_vegan, ingredients))
    
# df["vegan"] = df["ingredients"].apply(is_vegan_example)


### Limitations of the Simplistic Heuristic

The heuristic described above is straightforward but can lead to numerous false positives and negatives due to its reliance on keyword matching. Common examples of incorrect classifications include:
- "Peanut butter" being misclassified as non-vegan, as “butter” is incorrectly assumed to imply dairy.
- "eggless" recipes being misclassified as non-vegan, due to the substring “egg.”
- Animal-derived ingredients such as “pork” and “bacon” being incorrectly identified as vegan, as they may not be explicitly listed in the keyword set.


# Submission
## 1. Implement Diet Classifiers
Complete the two classifier functions in the diet_classifiers.py file within this repository. Ensure your implementation correctly identifies “keto” and “vegan” meals. After implementing these functions, verify that the Flask server displays the appropriate badges (“keto” and “vegan”) next to the corresponding recipes.

> **Note**
>
> This repo contains two `diet_classifiers.py` files:
> 1. One in this folder (`nb/src/diet_classifiers.py`)
> 2. One in the Flask web app folder (`web/src/diet_classifiers.py`)
>
> You can develop your solution here in the notebook environment, but to apply your solution 
> to the Flask app you will need to copy your implementation into the `diet_classifiers.py` 
> file in the Flask folder!!!

In [18]:
def is_ingredient_keto(ingredient):
    # TODO: complete
    return False

def is_ingredient_vegan(ingredient):
    # TODO: complete
    return False    

For your convenience, you can sanity check your solution on a subset of labeled recipes by running `diet_classifiers.py`

In [19]:
! python diet_classifiers.py --ground_truth /usr/src/data/ground_truth_sample.csv

Traceback (most recent call last):
  File "C:\Users\segev\code_notebooks\HTs\diet_classifiers.py", line 60, in <module>
    sys.exit(main(parser.parse_args()))
  File "C:\Users\segev\code_notebooks\HTs\diet_classifiers.py", line 34, in main
    ground_truth = pd.read_csv(args.ground_truth, index_col=None)
  File "C:\Users\segev\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\segev\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 678, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\segev\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\segev\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 932, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "C:\Users\segev\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1216, in _m

## 2. Repository Setup
Create a **private** GitHub repository for your solution, and invite the GitHub user `argmax2025` as a collaborator. **Do not** share your implementation using a **forked** repository.

## 3. Application Form
Once you’ve completed the implementation and shared your private GitHub repository with argmax2025, please fill out the appropriate application form:
1. [US Application Form](https://forms.clickup.com/25655193/f/rexwt-1832/L0YE9OKG2FQIC3AYRR)
2.  [IL Application Form](https://forms.clickup.com/25655193/f/rexwt-1812/IP26WXR9X4P6I4LGQ6)


Your application will not be considered complete until this form is submitted.

## Evaluation process


Your submission will be assessed based on the following criteria:


1.	**Readability & Logic** – Clearly explain your approach, including your reasoning and any assumptions. If you relied on external resources (e.g., ingredient databases, nutrition datasets), be sure to cite them.
2.	**Executability** – Your code should run as is when cloned from your GitHub repository. Ensure that all paths are relative, syntax is correct, and no manual setup is required.
3.	**Accuracy** – Your classifiers will be evaluated against a holdout set of 20,000 recipes with verified labels. Performance will be compared to the ground truth.
data.


## Next steps
If your submission passes the initial review, you’ll be invited to a 3-hour live coding interview, where you’ll be asked to extend and adapt your solution in real time.

Please make sure you join from a quiet environment and have access to a Python-ready workstation capable of running your submitted project.

# SOLUTION

In [None]:
def is_ingredient_keto(ingredient: str) -> bool:
    non_keto = {
        "sugar", "honey", "flour", "bread", "pasta", "rice", "corn", "potato",
        "banana", "beans", "lentils", "quinoa", "chickpeas", "oats"
    }
    return not any(word in ingredient.lower() for word in non_keto)

def is_ingredient_vegan(ingredient: str) -> bool:
    animal_products = {
        "egg", "eggs", "omelette", "meat", "milk", "cheese", "butter", "veal", "lamb",
        "beef", "chicken", "sausage", "fish", "turkey", "salmon", "anchovy", "yogurt", "cream"
    }
    words = ingredient.lower().replace("-", " ").split()
    return not any(word in animal_products for word in words)

def classify_recipe(ingredients):
    ingredients = [ing.lower() for ing in ingredients]
    is_vegan = all(is_ingredient_vegan(ing) for ing in ingredients)
    is_keto = all(is_ingredient_keto(ing) for ing in ingredients)
    
    vegan_str = "vegan" if is_vegan else "X"
    keto_str = "keto" if is_keto else "X"
    return f"{vegan_str} {keto_str}"


In [None]:


class TestRecipeClassification(unittest.TestCase):
    
    def test_is_ingredient_keto(self):
        self.assertTrue(is_ingredient_keto("olive oil"))
        self.assertTrue(is_ingredient_keto("cauliflower"))
        self.assertFalse(is_ingredient_keto("sugar"))
        self.assertFalse(is_ingredient_keto("rice noodles"))
        self.assertFalse(is_ingredient_keto("potato"))
        self.assertTrue(is_ingredient_keto("almonds"))

    def test_is_ingredient_vegan(self):
        self.assertTrue(is_ingredient_vegan("tofu"))
        self.assertTrue(is_ingredient_vegan("spinach"))
        self.assertFalse(is_ingredient_vegan("chicken breast"))
        self.assertFalse(is_ingredient_vegan("milk chocolate"))
        self.assertFalse(is_ingredient_vegan("eggplant omelette"))
        self.assertTrue(is_ingredient_vegan("soy sauce"))

    def test_classify_recipe(self):
        self.assertEqual(
            classify_recipe(["cauliflower", "olive oil", "salt", "pepper"]),
            "vegan keto"
        )
        self.assertEqual(
            classify_recipe(["chicken breast", "olive oil", "garlic"]),
            "X keto"
        )
        self.assertEqual(
            classify_recipe(["flour", "sugar", "eggs", "milk"]),
            "X X"
        )
        self.assertEqual(
            classify_recipe(["tofu", "spinach", "soy sauce", "rice"]),
            "vegan X"
        )

# ✅ Jupyter-friendly runner:
unittest.main(argv=[''], exit=False)


In [None]:
def classify_recipe(ingredients):
    return {
        "vegan": is_vegan(ingredients),
        "keto": is_keto(ingredients)
    }


In [None]:


class TestRecipeClassification(unittest.TestCase):

    def test_is_ingredient_keto(self):
        self.assertTrue(is_ingredient_keto("avocado"))
        self.assertTrue(is_ingredient_keto("olive oil"))
        self.assertFalse(is_ingredient_keto("sugar"))
        self.assertFalse(is_ingredient_keto("pasta sauce"))

    def test_is_ingredient_vegan(self):
        self.assertTrue(is_ingredient_vegan("tomato"))
        self.assertTrue(is_ingredient_vegan("lentils"))
        self.assertFalse(is_ingredient_vegan("milk chocolate"))
        self.assertFalse(is_ingredient_vegan("chicken breast"))

    def test_is_keto(self):
        self.assertTrue(is_keto(["avocado", "olive oil", "spinach"]))
        self.assertFalse(is_keto(["avocado", "bread", "spinach"]))

    def test_is_vegan(self):
        self.assertTrue(is_vegan(["broccoli", "rice", "tomato"]))
        self.assertFalse(is_vegan(["egg", "tofu", "rice"]))

# ✅ Jupyter-friendly test runner
unittest.main(argv=['first-arg-is-ignored'], exit=False)


# Sources

##  animal_products

### **egg, eggs**  
Eggs are laid by birds (not mammals), so they are poultry products and not dairy.  
🔗 Full link: [Are Eggs Dairy? – EatingWell](https://www.eatingwell.com/article/8027576/are-eggs-dairy/)

Additional sources:  
- https://www.mashed.com  
- https://www.nobunplease.com  
- https://www.ketobible.net  
- https://www.ketodietapp.com  
- https://www.eatingwell.com  
- https://www.nobunplease.com

---

### **omelette**  
A prepared dish made entirely from eggs—thus derived from animals.

---

### **meat, beef, chicken, lamb, veal, turkey, fish, salmon, anchovy**  
These are all animal flesh or muscle, making them core animal-source foods.  
🔗 Listed under animal-based protein:  
[Complete Keto Food List – EatingWell](https://www.eatingwell.com/article/291245/complete-keto-diet-food-list-what-you-can-and-cannot-eat-if-youre-on-a-ketogenic-diet/)

Additional sources:  
- https://blog.kettleandfire.com  
- https://www.eatingwell.com  
- https://www.eatingwell.com

---

### **sausage**  
A processed meat product—fully animal-based (ground meat, spices).

---

### **milk, cheese, butter, cream, yogurt**  
Classic dairy items, made from the milk of mammals (cows, goats, sheep).  
🔗 Keto dairy guide:  
[Complete Keto Food List – EatingWell](https://www.eatingwell.com/article/291245/complete-keto-diet-food-list-what-you-can-and-cannot-eat-if-youre-on-a-ketogenic-diet/)

Additional source:  
- https://www.lowfodmapeating.com

---

##  non_keto  
These are rich in carbohydrates or sugars—generally avoided in a ketogenic diet.

### **sugar, honey**  
Pure simple sugars; they spike blood glucose and disrupt ketosis.  
🔗 Listed as “avoid” on keto:  
[Complete Keto Food List – EatingWell](https://www.eatingwell.com/article/291245/complete-keto-diet-food-list-what-you-can-and-cannot-eat-if-youre-on-a-ketogenic-diet/)

Additional sources:  
- https://www.dietdoctor.com  
- https://www.eatingwell.com  
- https://www.ketodietapp.com  
- https://www.godairyfree.org

---

### **flour, bread, pasta, rice, corn, potato**  
These are starchy grains/tubers with high digestible carbs—also listed under “foods to avoid”.  
🔗 Keto exclusion list:  
[Complete Keto Food List – EatingWell](https://www.eatingwell.com/article/291245/complete-keto-diet-food-list-what-you-can-and-cannot-eat-if-youre-on-a-ketogenic-diet/)

---

### **banana, beans, lentils, quinoa, chickpeas, oats**  
All high-carb fruits, legumes, or grains—usually excluded from strict keto eating.  
🔗 Avoided in keto meal plans:  
[Complete Keto Food List – EatingWell](https://www.eatingwell.com/article/291245/complete-keto-diet-food-list-what-you-can-and-cannot-eat-if-youre-on-a-ketogenic-diet/)
