SWEET PORTUGAL - Portuguese pastry identifier - Rui Cruzeiro, Jan 2023

Sweet Portugal is a tool created for my personal portfolio. It consists on a study of a Convolutional Neural Network that will identify 5 different types of Portuguese pastries and return their name, recipe, and calorie intake. It can be scaled to identify much more; however, that would require an extensive data gathering operation that lies outside of the scope of this project. It would nonetheless be some delicious data mining!

NOTEBOOK 1/2 - DATA AUGMENTATION

The starting data consists on images taken from the Internet or by myself. No food was spoiled during this process -- I ate every pastry I photographed. We start with the following:
- Bolo de Arroz: 78 images
- Brigadeiro: 69 images
- Ovos Moles: 68 images
- Pastel de Nata: 67 images
- Pastel de Tentúgal: 74 images

Let's get this information from their folders and store it in a dictionary:

In [1]:
import os

available_pastries = [subdir for subdir in os.listdir('raw_data') if not subdir.startswith('.')]
available_pastries.sort()
available_pastries

pastries = {}

for pastry in available_pastries:
    
    pastry_path = os.path.join('raw_data', pastry)
    
    pastries[pastry] = \
        len([name for name in os.listdir(pastry_path) if name.startswith(pastry)])

We'll aim at 130 images per pastry so we can have 100 images for training. We will use the Augmentor package to create new images from the existing ones. This package returns an error if we try to save the new images as JPEG, which is the format of the existing images. We'll use PNG for now and change it later.

In [2]:
import Augmentor

for pastry, count in pastries.items():
    
    # image file path
    pastry_path = os.path.join('raw_data', pastry)
    
    # tweak zoom, flipping, brightness and distortion with Augmentor
    pipe = Augmentor.Pipeline(pastry_path)
    pipe.zoom(probability=0.3, min_factor=0.8, max_factor=1.5)
    pipe.flip_top_bottom(probability=0.4)
    pipe.random_brightness(probability=0.5, min_factor=0.3, max_factor=1.2)
    pipe.random_distortion(probability=1, grid_width=4, grid_height=4, magnitude=8)
    
    # get a total of 130 images
    pipe.set_save_format(save_format='png')
    pipe.sample(130 - count)

Initialised with 78 image(s) found.
Output directory set to raw_data/bolo_arroz/output.

Processing <PIL.Image.Image image mode=RGB size=2016x1512 at 0x115FC5310>: 100


Initialised with 69 image(s) found.
Output directory set to raw_data/brigadeiro/output.

Processing <PIL.Image.Image image mode=RGB size=650x650 at 0x115F91760>: 100%|


Initialised with 68 image(s) found.
Output directory set to raw_data/ovos_moles/output.

Processing <PIL.Image.Image image mode=RGB size=960x662 at 0x116009B80>: 100%|


Initialised with 67 image(s) found.
Output directory set to raw_data/pastel_nata/output.

Processing <PIL.Image.Image image mode=RGB size=290x290 at 0x116021190>: 100%|


Initialised with 74 image(s) found.
Output directory set to raw_data/pastel_tentugal/output.

Processing <PIL.Image.Image image mode=RGB size=300x300 at 0x116024730>: 100%|


Confirming we now have 130 images for each pastry:

In [3]:
for pastry, count in pastries.items():
    
    pastry_path = os.path.join('raw_data', pastry, 'output')
    new = len([name for name in os.listdir(pastry_path) if name.endswith('.png')])
    print('New image total for ' + pastry + ': ' + str(count + new))

New image total for bolo_arroz: 130
New image total for brigadeiro: 130
New image total for ovos_moles: 130
New image total for pastel_nata: 130
New image total for pastel_tentugal: 130


Now let's rename the PNG files to JPEG:

In [4]:
import glob

for pastry, count in pastries.items():
    
    pastry_path = os.path.join('raw_data', pastry, 'output')
    for filename in glob.iglob(os.path.join(pastry_path, '*.png')):
        os.rename(filename, filename[:-4] + '.jpeg')

Done! We'll move the files manually to the correct folder to be used in the second notebook of this project, so we can rerun this process from the original images if needed.