<a href="https://colab.research.google.com/github/mlangsman/fastai-experiments/blob/main/Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🤖 Classifier Experiment

## 📁 Install dependencies

We need the Fastai library for model training and DuckDuckGo search to retrieve images for our dataset.


In [3]:
!pip install -Uqq fastai ddgs

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/41.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.6/41.6 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/5.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m5.3/5.3 MB[0m [31m208.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.3/5.3 MB[0m [31m99.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/3.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m99.4 MB/s[0m eta [36m0:00:00[0m
[?25h

## Download images

Create a function which can download images using the DuckDuckGo API for training.

In [None]:
from ddgs import DDGS
from fastcore.all import *

def search_images(keywords, max_images=10):
  results = DDGS().images(keywords, max_results=max_images) #run a search
  imageUrls = L(results).itemgot('image') #extract just image urls
  return imageUrls


Classifying UPF food is a little trickier than say cats/dogs. Here I create a list of search terms for specific foods and so we can then grab images for each


In [53]:
# Ultra-processed food terms (UPF)
upf_terms = [
    "Big Mac burger",
    "Doritos crisps",
    "KitKat bar",
    "Oreo cookies",
    "Pringles tube",
    "Pot Noodle",
    "Twix bar",
    "Haribo sweets",
    "Coca-Cola can",
    "Fanta bottle",
    "Red Bull can",
    "Pop-Tarts",
    "Ben & Jerry's ice cream tub",
    "Chicken nuggets",
    "Pepperami",
    "Frozen pizza",
    "Pot noodle",
    "Mars bar",
    "Snickers bar",
    "Chocolate bar"
    "Walkers crisps",
    "Crisps"
]

# Fresh / minimally-processed foods
fresh_terms = [
    "Apple fruit",
    "Banana fruit",
    "Broccoli",
    "Carrot",
    "Tomato",
    "Cucumber",
    "Lettuce",
    "Blueberries",
    "Strawberries",
    "Eggs",
    "Whole chicken raw",
    "Salmon fillet",
    "Beef steak",
    "Brown rice bowl",
    "Oats porridge",
    "Almonds nuts",
    "Avocado",
    "Red bell pepper",
    "Courgette",
    "Mushrooms"
]

Let's test these work correctly by downloading some images for each term. FastAi's download_url() seems to stall on some urls so here I'm doing an http request instead to get the images.

In [None]:

from fastdownload import download_url
from fastai.vision.all import *
from io import BytesIO
import random

thumbs = []

for item in upf_terms:
  urls = search_images(item, max_images=3)
  for i, url in enumerate(urls):
    dest = f"{item.replace(' ', '_')}_{i}.jpg"
    try:
      r = requests.get(url, timeout=4, headers={"User-Agent": "Mozilla/5.0"})
      r.raise_for_status()
      if "image" not in r.headers.get("Content-Type", ""): # skip non-image content
        continue

      # Create image from raw bytes and also write to disk
      img = PILImage.create(BytesIO(r.content))
      with open(dest, "wb") as f:
                f.write(r.content)
      thumbs.append(img.to_thumb(64,64))
    except Exception as e:
      pass

show_images(thumbs, nrows=len(upf_terms))

