## Table of Contents
1. Obtain and Clean Baseline Model Data
2. Obtain and Clean Hot Dog Data Set
3. Process Baseline and Hot Dog Model Data
4. Obtain, Clean and Process Taco Bias Model Data

In this notebook, we obtain and prepare the data set. By the end of the notebook, we want to have 2 data sets: baseline data set, and a taco bias data set. The plan is to use the troubleshoot an interactive pipeline with the baseline data set. We can then use both the baseline data set and the taco bias data set using the Airflow orchestrator to simulate continuous training. We would have 2 models: baseline model trained in the interactive pipeline and a taco bias model which was trained using the Airflow orchestrator. We then use the models to on the hot dog data set to tell us, based on the prediction, if a hot dog is a taco or a sandwich.

In [None]:
import time
import random
import download_images

The `download_images` package contains custom functions to download images using the API from [SerpApi](https://serpapi.com/). There was an issue with VSCode [not properly opening .env files](https://github.com/microsoft/vscode-jupyter/issues/1467). It may be necessary to add the following to the `settings.json` file:

`'jupyter.runStartupCommands': ['import sys', 'sys.path.insert(0, 'path\\to\\download_images.py\\folder')']`

## 1. Obtain and Clean Baseline Model Data

In [None]:
for page_num in range(6):
    for category in ['sandwich', 'taco', 'hot dog']:
        image_search_info = download_images.get_image_search_info(
            page_num, category
        )
        download_images.get_images(image_search_info)

After downloading the images, they were manually reviewed to remove any images with people's faces and any writing or logos in the pictures.

In [2]:
taco_issues = [
    'taco_pg_0_pos_13.jpeg', 'taco_pg_0_pos_20.jpeg', 'taco_pg_0_pos_21.jpeg',
    'taco_pg_0_pos_29.jpeg', 'taco_pg_0_pos_34.jpeg', 'taco_pg_0_pos_45.jpeg',
    'taco_pg_0_pos_48.jpeg', 'taco_pg_0_pos_77.jpeg', 'taco_pg_0_pos_97.jpeg',
    'taco_pg_0_pos_98.jpeg', 'taco_pg_0_pos_99.jpeg', 'taco_pg_1_pos_0.jpeg',
    'taco_pg_1_pos_18.jpeg', 'taco_pg_1_pos_26.png', 'taco_pg_1_pos_31.jpeg',
    'taco_pg_1_pos_39.jpeg', 'taco_pg_1_pos_48.jpeg', 'taco_pg_1_pos_5.jpeg',
    'taco_pg_1_pos_54.jpeg', 'taco_pg_1_pos_56.jpeg', 'taco_pg_1_pos_59.jpeg',
    'taco_pg_1_pos_65.jpeg', 'taco_pg_1_pos_67.jpeg', 'taco_pg_1_pos_7.jpeg',
    'taco_pg_1_pos_70.jpeg', 'taco_pg_1_pos_78.jpeg', 'taco_pg_1_pos_81.jpeg',
    'taco_pg_1_pos_87.jpeg', 'taco_pg_1_pos_90.jpeg', 'taco_pg_2_pos_0.jpeg',
    'taco_pg_2_pos_13.jpeg', 'taco_pg_2_pos_19.jpeg', 'taco_pg_2_pos_20.jpeg',
    'taco_pg_2_pos_21.jpeg', 'taco_pg_2_pos_22.jpeg', 'taco_pg_2_pos_25.jpeg',
    'taco_pg_2_pos_4.jpeg', 'taco_pg_2_pos_47.jpeg', 'taco_pg_2_pos_50.jpeg',
    'taco_pg_2_pos_55.jpeg', 'taco_pg_2_pos_60.jpeg', 'taco_pg_2_pos_67.jpeg',
    'taco_pg_2_pos_70.jpeg', 'taco_pg_2_pos_71.jpeg', 'taco_pg_2_pos_73.jpeg',
    'taco_pg_2_pos_75.jpeg', 'taco_pg_2_pos_77.jpeg', 'taco_pg_2_pos_83.jpeg',
    'taco_pg_2_pos_88.png', 'taco_pg_2_pos_90.jpeg', 'taco_pg_2_pos_97.jpeg',
    'taco_pg_2_pos_99.jpeg', 'taco_pg_3_pos_10.jpeg', 'taco_pg_3_pos_13.jpeg',
    'taco_pg_3_pos_14.jpeg', 'taco_pg_3_pos_17.jpeg', 'taco_pg_3_pos_21.png',
    'taco_pg_3_pos_22.jpeg', 'taco_pg_3_pos_27.jpeg', 'taco_pg_3_pos_31.jpeg',
    'taco_pg_3_pos_32.jpeg', 'taco_pg_3_pos_34.jpeg', 'taco_pg_3_pos_37.jpeg',
    'taco_pg_3_pos_40.jpeg', 'taco_pg_3_pos_41.jpeg', 'taco_pg_3_pos_44.jpeg',
    'taco_pg_3_pos_50.jpeg', 'taco_pg_3_pos_54.jpeg', 'taco_pg_3_pos_55.png',
    'taco_pg_3_pos_56.jpeg', 'taco_pg_3_pos_59.png', 'taco_pg_3_pos_60.jpeg',
    'taco_pg_3_pos_62.png', 'taco_pg_3_pos_64.jpeg', 'taco_pg_3_pos_69.png',
    'taco_pg_3_pos_71.jpeg', 'taco_pg_3_pos_73.jpeg', 'taco_pg_3_pos_74.png',
    'taco_pg_3_pos_75.png', 'taco_pg_3_pos_76.jpeg', 'taco_pg_3_pos_81.jpeg',
    'taco_pg_3_pos_86.jpeg', 'taco_pg_3_pos_88.jpeg', 'taco_pg_3_pos_89.jpeg',
    'taco_pg_3_pos_9.jpeg', 'taco_pg_3_pos_90.jpeg', 'taco_pg_3_pos_92.jpeg',
    'taco_pg_3_pos_94.jpeg', 'taco_pg_3_pos_96.png', 'taco_pg_3_pos_99.png',
    'taco_pg_4_pos_0.png', 'taco_pg_4_pos_1.jpeg', 'taco_pg_4_pos_10.jpeg',
    'taco_pg_4_pos_12.jpeg', 'taco_pg_4_pos_13.jpeg', 'taco_pg_4_pos_14.jpeg',
    'taco_pg_4_pos_16.jpeg', 'taco_pg_4_pos_17.png', 'taco_pg_4_pos_18.jpeg',
    'taco_pg_4_pos_2.jpeg', 'taco_pg_4_pos_22.jpeg', 'taco_pg_4_pos_23.jpeg',
    'taco_pg_4_pos_24.jpeg', 'taco_pg_4_pos_26.jpeg', 'taco_pg_4_pos_27.jpeg',
    'taco_pg_4_pos_29.png', 'taco_pg_4_pos_31.png', 'taco_pg_4_pos_32.jpeg',
    'taco_pg_4_pos_34.png', 'taco_pg_4_pos_37.jpeg', 'taco_pg_4_pos_38.jpeg',
    'taco_pg_4_pos_39.png', 'taco_pg_4_pos_40.jpeg', 'taco_pg_4_pos_41.jpeg',
    'taco_pg_4_pos_49.jpeg', 'taco_pg_4_pos_51.jpeg', 'taco_pg_4_pos_54.jpeg',
    'taco_pg_4_pos_56.jpeg', 'taco_pg_4_pos_59.jpeg', 'taco_pg_4_pos_60.jpeg',
    'taco_pg_4_pos_61.jpeg', 'taco_pg_4_pos_63.jpeg', 'taco_pg_4_pos_64.png',
    'taco_pg_4_pos_65.png', 'taco_pg_4_pos_66.jpeg', 'taco_pg_4_pos_67.jpeg',
    'taco_pg_4_pos_68.jpeg', 'taco_pg_4_pos_7.png', 'taco_pg_4_pos_70.jpeg',
    'taco_pg_4_pos_75.jpeg', 'taco_pg_4_pos_76.jpeg', 'taco_pg_4_pos_81.jpeg',
    'taco_pg_4_pos_82.png', 'taco_pg_4_pos_84.jpeg', 'taco_pg_4_pos_85.png',
    'taco_pg_4_pos_86.jpeg', 'taco_pg_4_pos_89.jpeg', 'taco_pg_4_pos_9.jpeg',
    'taco_pg_4_pos_90.jpeg', 'taco_pg_4_pos_92.jpeg', 'taco_pg_4_pos_95.png',
    'taco_pg_4_pos_99.jpeg', 'taco_pg_5_pos_0.jpeg', 'taco_pg_5_pos_14.jpeg',
    'taco_pg_5_pos_15.jpeg', 'taco_pg_5_pos_16.jpeg', 'taco_pg_5_pos_17.jpeg',
    'taco_pg_5_pos_18.png', 'taco_pg_5_pos_19.jpeg', 'taco_pg_5_pos_21.jpeg',
    'taco_pg_5_pos_26.jpeg', 'taco_pg_5_pos_27.png', 'taco_pg_5_pos_29.jpeg',
    'taco_pg_5_pos_30.jpeg', 'taco_pg_5_pos_31.jpeg', 'taco_pg_5_pos_32.jpeg',
    'taco_pg_5_pos_33.jpeg', 'taco_pg_5_pos_35.jpeg', 'taco_pg_5_pos_37.jpeg',
    'taco_pg_5_pos_4.jpeg', 'taco_pg_5_pos_41.jpeg', 'taco_pg_5_pos_42.jpeg',
    'taco_pg_5_pos_44.jpeg', 'taco_pg_5_pos_45.jpeg', 'taco_pg_5_pos_52.jpeg',
    'taco_pg_5_pos_53.jpeg', 'taco_pg_5_pos_56.jpeg', 'taco_pg_5_pos_57.jpeg',
    'taco_pg_5_pos_6.jpeg', 'taco_pg_5_pos_60.jpeg', 'taco_pg_5_pos_61.png',
    'taco_pg_5_pos_65.jpeg', 'taco_pg_5_pos_68.jpeg', 'taco_pg_5_pos_7.png', 
    'taco_pg_5_pos_70.jpeg', 'taco_pg_5_pos_74.jpeg', 'taco_pg_5_pos_76.jpeg',
    'taco_pg_5_pos_77.png', 'taco_pg_5_pos_79.jpeg', 'taco_pg_5_pos_81.png',
    'taco_pg_5_pos_83.jpeg', 'taco_pg_5_pos_84.jpeg', 'taco_pg_5_pos_86.jpeg',
    'taco_pg_5_pos_88.jpeg', 'taco_pg_5_pos_89.png', 'taco_pg_5_pos_92.png',
    'taco_pg_5_pos_95.jpeg', 'taco_pg_5_pos_96.png', 'taco_pg_5_pos_97.jpeg'
]
sandwich_issues = [
    'sandwich_pg_0_pos_20.jpeg', 'sandwich_pg_0_pos_21.jpeg', 
    'sandwich_pg_0_pos_24.jpeg', 'sandwich_pg_0_pos_48.jpeg',
    'sandwich_pg_0_pos_58.jpeg', 'sandwich_pg_1_pos_32.jpeg',
    'sandwich_pg_1_pos_49.jpeg', 'sandwich_pg_1_pos_54.jpeg',
    'sandwich_pg_1_pos_71.jpeg', 'sandwich_pg_1_pos_76.jpeg',
    'sandwich_pg_1_pos_91.jpeg', 'sandwich_pg_1_pos_93.jpeg',
    'sandwich_pg_2_pos_26.jpeg', 'sandwich_pg_2_pos_3.jpeg',
    'sandwich_pg_2_pos_43.jpeg', 'sandwich_pg_2_pos_46.jpeg',
    'sandwich_pg_2_pos_50.jpeg', 'sandwich_pg_2_pos_53.jpeg',
    'sandwich_pg_2_pos_63.jpeg', 'sandwich_pg_2_pos_91.jpeg',
    'sandwich_pg_2_pos_99.jpeg', 'sandwich_pg_3_pos_1.jpeg',
    'sandwich_pg_3_pos_54.jpeg', 'sandwich_pg_3_pos_59.jpeg',
    'sandwich_pg_3_pos_72.jpeg', 'sandwich_pg_3_pos_81.jpeg',
    'sandwich_pg_3_pos_82.jpeg', 'sandwich_pg_3_pos_83.jpeg',
    'sandwich_pg_3_pos_84.jpeg', 'sandwich_pg_3_pos_9.jpeg',
    'sandwich_pg_4_pos_12.jpeg', 'sandwich_pg_4_pos_16.jpeg',
    'sandwich_pg_4_pos_19.jpeg', 'sandwich_pg_4_pos_21.jpeg',
    'sandwich_pg_4_pos_22.jpeg', 'sandwich_pg_4_pos_44.jpeg',
    'sandwich_pg_4_pos_46.jpeg', 'sandwich_pg_4_pos_47.jpeg',
    'sandwich_pg_4_pos_48.jpeg', 'sandwich_pg_4_pos_56.jpeg',
    'sandwich_pg_4_pos_63.jpeg', 'sandwich_pg_4_pos_66.jpeg',
    'sandwich_pg_4_pos_78.jpeg', 'sandwich_pg_4_pos_84.png',
    'sandwich_pg_4_pos_85.jpeg', 'sandwich_pg_4_pos_86.jpeg',
    'sandwich_pg_4_pos_94.jpeg', 'sandwich_pg_5_pos_0.jpeg',
    'sandwich_pg_5_pos_2.jpeg', 'sandwich_pg_5_pos_3.jpeg',
    'sandwich_pg_5_pos_44.jpeg', 'sandwich_pg_5_pos_5.jpeg',
    'sandwich_pg_5_pos_52.jpeg', 'sandwich_pg_5_pos_53.jpeg',
    'sandwich_pg_5_pos_55.jpeg', 'sandwich_pg_5_pos_59.jpeg',
    'sandwich_pg_5_pos_61.jpeg', 'sandwich_pg_5_pos_69.jpeg',
    'sandwich_pg_5_pos_79.jpeg', 'sandwich_pg_5_pos_81.jpeg',
    'sandwich_pg_5_pos_92.jpeg'
]
hot_dog_issues = [
    'hot dog_pg_0_pos_12.jpeg', 'hot dog_pg_0_pos_16.jpeg', 
    'hot dog_pg_0_pos_17.jpeg', 'hot dog_pg_0_pos_24.jpeg', 
    'hot dog_pg_0_pos_28.jpeg', 'hot dog_pg_0_pos_33.jpeg', 
    'hot dog_pg_0_pos_36.jpeg', 'hot dog_pg_0_pos_38.jpeg', 
    'hot dog_pg_0_pos_39.jpeg', 'hot dog_pg_0_pos_4.jpeg', 
    'hot dog_pg_0_pos_40.jpeg', 'hot dog_pg_0_pos_42.jpeg', 
    'hot dog_pg_0_pos_44.jpeg', 'hot dog_pg_0_pos_46.jpeg', 
    'hot dog_pg_0_pos_48.jpeg', 'hot dog_pg_0_pos_50.jpeg', 
    'hot dog_pg_0_pos_66.jpeg', 'hot dog_pg_0_pos_66.png', 
    'hot dog_pg_0_pos_74.jpeg', 'hot dog_pg_0_pos_76.png', 
    'hot dog_pg_0_pos_81.jpeg', 'hot dog_pg_0_pos_85.jpeg', 
    'hot dog_pg_0_pos_92.jpeg', 'hot dog_pg_1_pos_0.jpeg', 
    'hot dog_pg_1_pos_13.jpeg', 'hot dog_pg_1_pos_16.jpeg', 
    'hot dog_pg_1_pos_17.jpeg', 'hot dog_pg_1_pos_20.jpeg', 
    'hot dog_pg_1_pos_22.jpeg', 'hot dog_pg_1_pos_3.jpeg', 
    'hot dog_pg_1_pos_39.jpeg', 'hot dog_pg_1_pos_49.jpeg', 
    'hot dog_pg_1_pos_54.jpeg', 'hot dog_pg_1_pos_57.jpeg', 
    'hot dog_pg_1_pos_61.jpeg', 'hot dog_pg_1_pos_62.jpeg', 
    'hot dog_pg_1_pos_82.jpeg', 'hot dog_pg_1_pos_86.jpeg', 
    'hot dog_pg_1_pos_91.jpeg', 'hot dog_pg_1_pos_92.jpeg', 
    'hot dog_pg_1_pos_94.jpeg', 'hot dog_pg_1_pos_98.jpeg', 
    'hot dog_pg_2_pos_10.jpeg', 'hot dog_pg_2_pos_16.jpeg', 
    'hot dog_pg_2_pos_18.jpeg', 'hot dog_pg_2_pos_19.jpeg', 
    'hot dog_pg_2_pos_2.jpeg', 'hot dog_pg_2_pos_27.jpeg', 
    'hot dog_pg_2_pos_30.jpeg', 'hot dog_pg_2_pos_34.jpeg', 
    'hot dog_pg_2_pos_37.jpeg', 'hot dog_pg_2_pos_41.jpeg', 
    'hot dog_pg_2_pos_43.jpeg', 'hot dog_pg_2_pos_44.jpeg', 
    'hot dog_pg_2_pos_5.jpeg', 'hot dog_pg_2_pos_53.jpeg', 
    'hot dog_pg_2_pos_57.jpeg', 'hot dog_pg_2_pos_60.jpeg', 
    'hot dog_pg_2_pos_62.jpeg', 'hot dog_pg_2_pos_63.jpeg', 
    'hot dog_pg_2_pos_68.jpeg', 'hot dog_pg_2_pos_69.jpeg', 
    'hot dog_pg_2_pos_71.jpeg', 'hot dog_pg_2_pos_75.jpeg', 
    'hot dog_pg_2_pos_79.jpeg', 'hot dog_pg_2_pos_8.jpeg', 
    'hot dog_pg_2_pos_90.jpeg', 'hot dog_pg_2_pos_91.jpeg', 
    'hot dog_pg_2_pos_94.jpeg', 'hot dog_pg_2_pos_95.jpeg', 
    'hot dog_pg_2_pos_98.png', 'hot dog_pg_3_pos_0.jpeg', 
    'hot dog_pg_3_pos_17.jpeg', 'hot dog_pg_3_pos_21.jpeg', 
    'hot dog_pg_3_pos_23.jpeg', 'hot dog_pg_3_pos_24.jpeg', 
    'hot dog_pg_3_pos_25.jpeg', 'hot dog_pg_3_pos_27.jpeg', 
    'hot dog_pg_3_pos_29.jpeg', 'hot dog_pg_3_pos_30.jpeg', 
    'hot dog_pg_3_pos_34.jpeg', 'hot dog_pg_3_pos_35.jpeg', 
    'hot dog_pg_3_pos_39.jpeg', 'hot dog_pg_3_pos_4.jpeg', 
    'hot dog_pg_3_pos_40.jpeg', 'hot dog_pg_3_pos_41.jpeg', 
    'hot dog_pg_3_pos_49.jpeg', 'hot dog_pg_3_pos_5.jpeg', 
    'hot dog_pg_3_pos_52.jpeg', 'hot dog_pg_3_pos_55.jpeg', 
    'hot dog_pg_3_pos_56.jpeg', 'hot dog_pg_3_pos_61.jpeg', 
    'hot dog_pg_3_pos_65.jpeg', 'hot dog_pg_3_pos_66.jpeg', 
    'hot dog_pg_3_pos_71.jpeg', 'hot dog_pg_3_pos_74.jpeg', 
    'hot dog_pg_3_pos_76.jpeg', 'hot dog_pg_3_pos_79.jpeg', 
    'hot dog_pg_3_pos_82.jpeg', 'hot dog_pg_3_pos_89.jpeg', 
    'hot dog_pg_3_pos_9.jpeg', 'hot dog_pg_3_pos_90.jpeg', 
    'hot dog_pg_3_pos_92.jpeg', 'hot dog_pg_3_pos_94.jpeg', 
    'hot dog_pg_3_pos_95.jpeg', 'hot dog_pg_3_pos_97.jpeg', 
    'hot dog_pg_3_pos_99.jpeg', 'hot dog_pg_4_pos_0.jpeg', 
    'hot dog_pg_4_pos_1.jpeg', 'hot dog_pg_4_pos_13.jpeg', 
    'hot dog_pg_4_pos_14.jpeg', 'hot dog_pg_4_pos_15.jpeg', 
    'hot dog_pg_4_pos_17.jpeg', 'hot dog_pg_4_pos_18.jpeg', 
    'hot dog_pg_4_pos_2.jpeg', 'hot dog_pg_4_pos_22.jpeg', 
    'hot dog_pg_4_pos_23.jpeg', 'hot dog_pg_4_pos_25.jpeg', 
    'hot dog_pg_4_pos_26.jpeg', 'hot dog_pg_4_pos_30.jpeg', 
    'hot dog_pg_4_pos_32.jpeg', 'hot dog_pg_4_pos_35.jpeg', 
    'hot dog_pg_4_pos_37.jpeg', 'hot dog_pg_4_pos_45.jpeg', 
    'hot dog_pg_4_pos_47.jpeg', 'hot dog_pg_4_pos_49.jpeg', 
    'hot dog_pg_4_pos_51.jpeg', 'hot dog_pg_4_pos_55.jpeg', 
    'hot dog_pg_4_pos_57.png', 'hot dog_pg_4_pos_58.jpeg', 
    'hot dog_pg_4_pos_59.jpeg', 'hot dog_pg_4_pos_60.jpeg', 
    'hot dog_pg_4_pos_62.jpeg', 'hot dog_pg_4_pos_68.jpeg', 
    'hot dog_pg_4_pos_69.jpeg', 'hot dog_pg_4_pos_72.jpeg', 
    'hot dog_pg_4_pos_73.jpeg', 'hot dog_pg_4_pos_74.jpeg', 
    'hot dog_pg_4_pos_76.jpeg', 'hot dog_pg_4_pos_79.jpeg', 
    'hot dog_pg_4_pos_8.jpeg', 'hot dog_pg_4_pos_82.jpeg', 
    'hot dog_pg_4_pos_87.jpeg', 'hot dog_pg_4_pos_9.jpeg', 
    'hot dog_pg_4_pos_91.png', 'hot dog_pg_4_pos_94.jpeg', 
    'hot dog_pg_4_pos_95.jpeg', 'hot dog_pg_4_pos_96.jpeg', 
    'hot dog_pg_5_pos_0.jpeg', 'hot dog_pg_5_pos_1.jpeg', 
    'hot dog_pg_5_pos_12.jpeg', 'hot dog_pg_5_pos_13.jpeg', 
    'hot dog_pg_5_pos_15.png', 'hot dog_pg_5_pos_17.jpeg', 
    'hot dog_pg_5_pos_18.jpeg', 'hot dog_pg_5_pos_19.jpeg', 
    'hot dog_pg_5_pos_2.jpeg', 'hot dog_pg_5_pos_20.jpeg', 
    'hot dog_pg_5_pos_22.jpeg', 'hot dog_pg_5_pos_3.jpeg', 
    'hot dog_pg_5_pos_32.jpeg', 'hot dog_pg_5_pos_33.jpeg', 
    'hot dog_pg_5_pos_35.jpeg', 'hot dog_pg_5_pos_38.jpeg', 
    'hot dog_pg_5_pos_42.jpeg', 'hot dog_pg_5_pos_44.jpeg', 
    'hot dog_pg_5_pos_48.jpeg', 'hot dog_pg_5_pos_50.jpeg', 
    'hot dog_pg_5_pos_55.jpeg', 'hot dog_pg_5_pos_56.jpeg', 
    'hot dog_pg_5_pos_57.jpeg', 'hot dog_pg_5_pos_59.jpeg', 
    'hot dog_pg_5_pos_62.jpeg', 'hot dog_pg_5_pos_63.jpeg', 
    'hot dog_pg_5_pos_65.jpeg', 'hot dog_pg_5_pos_67.jpeg', 
    'hot dog_pg_5_pos_68.jpeg', 'hot dog_pg_5_pos_69.jpeg', 
    'hot dog_pg_5_pos_70.jpeg', 'hot dog_pg_5_pos_72.jpeg', 
    'hot dog_pg_5_pos_74.jpeg', 'hot dog_pg_5_pos_78.jpeg', 
    'hot dog_pg_5_pos_79.jpeg', 'hot dog_pg_5_pos_80.jpeg', 
    'hot dog_pg_5_pos_82.jpeg', 'hot dog_pg_5_pos_83.jpeg', 
    'hot dog_pg_5_pos_85.jpeg', 'hot dog_pg_5_pos_86.jpeg', 
    'hot dog_pg_5_pos_88.jpeg', 'hot dog_pg_5_pos_9.jpeg', 
    'hot dog_pg_5_pos_92.jpeg', 'hot dog_pg_5_pos_93.png', 
    'hot dog_pg_5_pos_94.jpeg', 'hot dog_pg_5_pos_96.jpeg', 
    'hot dog_pg_5_pos_99.jpeg'
]

The same process was repeat for a `food` category. In addition to identifying images with issues, images of tacos, sandwiches and hot dogs had to be removed from the `food` data set; therefore, the pool of images for `food` was initially larger than the other categories.

In [None]:
for page_num in range(8):
    image_search_info = download_images.get_image_search_info(page_num, 'food')
    download_images.get_images(image_search_info)

In [3]:
food_issues = [
    'food_pg_0_pos_2.jpeg', 'food_pg_0_pos_24.jpeg', 'food_pg_0_pos_28.jpeg',
    'food_pg_0_pos_3.jpeg', 'food_pg_0_pos_31.jpeg', 'food_pg_0_pos_32.jpeg',
    'food_pg_0_pos_34.jpeg', 'food_pg_0_pos_35.jpeg', 'food_pg_0_pos_4.jpeg',
    'food_pg_0_pos_41.jpeg', 'food_pg_0_pos_47.jpeg', 'food_pg_0_pos_48.jpeg',
    'food_pg_0_pos_56.jpeg', 'food_pg_0_pos_58.jpeg', 'food_pg_0_pos_59.jpeg',
    'food_pg_0_pos_65.jpeg', 'food_pg_0_pos_68.jpeg', 'food_pg_0_pos_70.jpeg',
    'food_pg_0_pos_80.jpeg', 'food_pg_0_pos_85.jpeg', 'food_pg_0_pos_86.jpeg',
    'food_pg_0_pos_96.jpeg', 'food_pg_0_pos_99.jpeg', 'food_pg_1_pos_0.jpeg',
    'food_pg_1_pos_1.jpeg', 'food_pg_1_pos_10.jpeg', 'food_pg_1_pos_24.jpeg',
    'food_pg_1_pos_25.jpeg', 'food_pg_1_pos_27.jpeg', 'food_pg_1_pos_30.jpeg',
    'food_pg_1_pos_35.jpeg', 'food_pg_1_pos_51.png', 'food_pg_1_pos_56.jpeg',
    'food_pg_1_pos_7.jpeg', 'food_pg_1_pos_70.jpeg', 'food_pg_1_pos_72.jpeg',
    'food_pg_1_pos_75.jpeg', 'food_pg_1_pos_82.jpeg', 'food_pg_1_pos_91.jpeg',
    'food_pg_1_pos_94.jpeg', 'food_pg_1_pos_98.jpeg', 'food_pg_2_pos_19.jpeg',
    'food_pg_2_pos_23.jpeg', 'food_pg_2_pos_24.jpeg', 'food_pg_2_pos_39.jpeg',
    'food_pg_2_pos_43.jpeg', 'food_pg_2_pos_47.jpeg', 'food_pg_2_pos_52.jpeg',
    'food_pg_2_pos_54.jpeg', 'food_pg_2_pos_57.jpeg', 'food_pg_2_pos_62.jpeg',
    'food_pg_2_pos_7.jpeg', 'food_pg_2_pos_70.jpeg', 'food_pg_2_pos_75.jpeg',
    'food_pg_2_pos_80.jpeg', 'food_pg_2_pos_81.jpeg', 'food_pg_2_pos_85.jpeg',
    'food_pg_2_pos_86.jpeg', 'food_pg_2_pos_87.png', 'food_pg_2_pos_90.jpeg',
    'food_pg_3_pos_1.jpeg', 'food_pg_3_pos_15.png', 'food_pg_3_pos_19.jpeg',
    'food_pg_3_pos_20.jpeg', 'food_pg_3_pos_24.jpeg', 'food_pg_3_pos_27.jpeg',
    'food_pg_3_pos_32.jpeg', 'food_pg_3_pos_34.jpeg', 'food_pg_3_pos_36.jpeg',
    'food_pg_3_pos_37.jpeg', 'food_pg_3_pos_4.jpeg', 'food_pg_3_pos_42.jpeg',
    'food_pg_3_pos_43.jpeg', 'food_pg_3_pos_44.jpeg', 'food_pg_3_pos_45.jpeg',
    'food_pg_3_pos_5.jpeg', 'food_pg_3_pos_50.jpeg', 'food_pg_3_pos_51.jpeg',
    'food_pg_3_pos_52.jpeg', 'food_pg_3_pos_65.jpeg', 'food_pg_3_pos_67.jpeg',
    'food_pg_3_pos_69.jpeg', 'food_pg_3_pos_7.jpeg', 'food_pg_3_pos_72.jpeg',
    'food_pg_3_pos_74.jpeg', 'food_pg_3_pos_80.jpeg', 'food_pg_3_pos_81.jpeg',
    'food_pg_3_pos_82.png', 'food_pg_3_pos_86.jpeg', 'food_pg_3_pos_9.jpeg',
    'food_pg_3_pos_93.png', 'food_pg_3_pos_95.jpeg', 'food_pg_3_pos_96.jpeg',
    'food_pg_4_pos_0.png', 'food_pg_4_pos_1.jpeg', 'food_pg_4_pos_10.jpeg',
    'food_pg_4_pos_12.jpeg', 'food_pg_4_pos_14.jpeg', 'food_pg_4_pos_15.jpeg',
    'food_pg_4_pos_16.jpeg', 'food_pg_4_pos_17.jpeg', 'food_pg_4_pos_2.jpeg',
    'food_pg_4_pos_20.jpeg', 'food_pg_4_pos_23.jpeg', 'food_pg_4_pos_25.jpeg',
    'food_pg_4_pos_28.jpeg', 'food_pg_4_pos_3.jpeg', 'food_pg_4_pos_30.jpeg',
    'food_pg_4_pos_32.jpeg', 'food_pg_4_pos_34.jpeg', 'food_pg_4_pos_35.jpeg',
    'food_pg_4_pos_36.jpeg', 'food_pg_4_pos_37.jpeg', 'food_pg_4_pos_38.jpeg',
    'food_pg_4_pos_40.jpeg', 'food_pg_4_pos_44.png', 'food_pg_4_pos_48.jpeg',
    'food_pg_4_pos_5.jpeg', 'food_pg_4_pos_53.jpeg', 'food_pg_4_pos_54.jpeg',
    'food_pg_4_pos_57.jpeg', 'food_pg_4_pos_6.jpeg', 'food_pg_4_pos_62.png',
    'food_pg_4_pos_67.jpeg', 'food_pg_4_pos_68.jpeg', 'food_pg_4_pos_73.jpeg',
    'food_pg_4_pos_74.jpeg', 'food_pg_4_pos_76.jpeg', 'food_pg_4_pos_77.jpeg',
    'food_pg_4_pos_8.jpeg', 'food_pg_4_pos_81.jpeg', 'food_pg_4_pos_84.jpeg',
    'food_pg_4_pos_90.jpeg', 'food_pg_5_pos_0.jpeg', 'food_pg_5_pos_15.jpeg',
    'food_pg_5_pos_16.jpeg', 'food_pg_5_pos_20.jpeg', 'food_pg_5_pos_22.jpeg',
    'food_pg_5_pos_24.jpeg', 'food_pg_5_pos_26.jpeg', 'food_pg_5_pos_27.jpeg',
    'food_pg_5_pos_28.jpeg', 'food_pg_5_pos_3.jpeg', 'food_pg_5_pos_32.jpeg',
    'food_pg_5_pos_33.jpeg', 'food_pg_5_pos_34.jpeg', 'food_pg_5_pos_35.png',
    'food_pg_5_pos_38.jpeg', 'food_pg_5_pos_39.png', 'food_pg_5_pos_4.jpeg',
    'food_pg_5_pos_41.jpeg', 'food_pg_5_pos_42.jpeg', 'food_pg_5_pos_45.png',
    'food_pg_5_pos_49.png', 'food_pg_5_pos_52.jpeg', 'food_pg_5_pos_53.jpeg',
    'food_pg_5_pos_54.png', 'food_pg_5_pos_55.jpeg', 'food_pg_5_pos_56.jpeg',
    'food_pg_5_pos_59.jpeg', 'food_pg_5_pos_61.png', 'food_pg_5_pos_63.jpeg',
    'food_pg_5_pos_65.jpeg', 'food_pg_5_pos_68.jpeg', 'food_pg_5_pos_70.jpeg',
    'food_pg_5_pos_74.jpeg', 'food_pg_5_pos_77.jpeg', 'food_pg_5_pos_8.jpeg',
    'food_pg_5_pos_81.jpeg', 'food_pg_5_pos_82.jpeg', 'food_pg_5_pos_85.jpeg',
    'food_pg_5_pos_86.jpeg', 'food_pg_5_pos_87.png', 'food_pg_5_pos_91.png',
    'food_pg_5_pos_92.jpeg', 'food_pg_5_pos_96.jpeg', 'food_pg_5_pos_97.jpeg',
    'food_pg_5_pos_99.jpeg', 'food_pg_6_pos_1.jpeg', 'food_pg_6_pos_10.jpeg',
    'food_pg_6_pos_11.jpeg', 'food_pg_6_pos_12.jpeg', 'food_pg_6_pos_16.jpeg',
    'food_pg_6_pos_17.jpeg', 'food_pg_6_pos_19.jpeg', 'food_pg_6_pos_20.png',
    'food_pg_6_pos_23.jpeg', 'food_pg_6_pos_25.jpeg', 'food_pg_6_pos_29.jpeg',
    'food_pg_6_pos_3.jpeg', 'food_pg_6_pos_32.jpeg', 'food_pg_6_pos_36.jpeg',
    'food_pg_6_pos_41.jpeg', 'food_pg_6_pos_43.jpeg', 'food_pg_6_pos_45.jpeg',
    'food_pg_6_pos_46.jpeg', 'food_pg_6_pos_47.jpeg', 'food_pg_6_pos_5.jpeg',
    'food_pg_6_pos_52.jpeg', 'food_pg_6_pos_55.jpeg', 'food_pg_6_pos_57.jpeg',
    'food_pg_6_pos_6.jpeg', 'food_pg_6_pos_63.jpeg', 'food_pg_6_pos_64.jpeg',
    'food_pg_6_pos_67.jpeg', 'food_pg_6_pos_69.jpeg', 'food_pg_6_pos_7.jpeg',
    'food_pg_6_pos_76.jpeg', 'food_pg_6_pos_77.jpeg', 'food_pg_6_pos_80.jpeg',
    'food_pg_6_pos_82.jpeg', 'food_pg_6_pos_83.jpeg', 'food_pg_6_pos_84.jpeg',
    'food_pg_6_pos_86.jpeg', 'food_pg_6_pos_89.jpeg', 'food_pg_6_pos_90.jpeg',
    'food_pg_6_pos_91.jpeg', 'food_pg_6_pos_95.jpeg', 'food_pg_6_pos_96.jpeg',
    'food_pg_6_pos_98.jpeg', 'food_pg_7_pos_0.jpeg', 'food_pg_7_pos_1.jpeg',
    'food_pg_7_pos_13.jpeg', 'food_pg_7_pos_17.jpeg', 'food_pg_7_pos_2.jpeg',
    'food_pg_7_pos_23.jpeg', 'food_pg_7_pos_24.png', 'food_pg_7_pos_25.jpeg',
    'food_pg_7_pos_28.jpeg', 'food_pg_7_pos_29.jpeg', 'food_pg_7_pos_32.jpeg',
    'food_pg_7_pos_35.jpeg', 'food_pg_7_pos_42.jpeg', 'food_pg_7_pos_44.jpeg',
    'food_pg_7_pos_47.jpeg', 'food_pg_7_pos_49.jpeg', 'food_pg_7_pos_59.png',
    'food_pg_7_pos_60.jpeg', 'food_pg_7_pos_66.jpeg', 'food_pg_7_pos_67.jpeg',
    'food_pg_7_pos_70.jpeg', 'food_pg_7_pos_71.png', 'food_pg_7_pos_72.jpeg',
    'food_pg_7_pos_76.jpeg', 'food_pg_7_pos_79.png', 'food_pg_7_pos_8.jpeg',
    'food_pg_7_pos_80.jpeg', 'food_pg_7_pos_82.jpeg', 'food_pg_7_pos_83.jpeg',
    'food_pg_7_pos_84.jpeg', 'food_pg_7_pos_86.jpeg', 'food_pg_7_pos_87.jpeg',
    'food_pg_7_pos_89.jpeg', 'food_pg_7_pos_93.jpeg'
]

food_issues.extend([
    'food_pg_0_pos_1.jpeg', 'food_pg_0_pos_15.jpeg', 'food_pg_0_pos_19.jpeg',
    'food_pg_0_pos_20.jpeg', 'food_pg_0_pos_23.jpeg', 'food_pg_0_pos_27.jpeg',
    'food_pg_0_pos_37.jpeg', 'food_pg_0_pos_38.jpeg', 'food_pg_0_pos_45.jpeg',
    'food_pg_0_pos_5.jpeg', 'food_pg_0_pos_55.jpeg', 'food_pg_0_pos_57.jpeg',
    'food_pg_0_pos_64.jpeg', 'food_pg_0_pos_66.jpeg', 'food_pg_0_pos_75.jpeg',
    'food_pg_0_pos_81.jpeg', 'food_pg_0_pos_82.jpeg', 'food_pg_0_pos_94.jpeg',
    'food_pg_1_pos_12.jpeg', 'food_pg_1_pos_17.jpeg', 'food_pg_1_pos_2.jpeg',
    'food_pg_1_pos_28.jpeg', 'food_pg_1_pos_36.jpeg', 'food_pg_1_pos_37.jpeg',
    'food_pg_1_pos_4.jpeg', 'food_pg_1_pos_41.jpeg', 'food_pg_1_pos_45.jpeg',
    'food_pg_1_pos_49.jpeg', 'food_pg_1_pos_5.jpeg', 'food_pg_1_pos_50.jpeg',
    'food_pg_1_pos_53.jpeg', 'food_pg_1_pos_57.jpeg', 'food_pg_1_pos_71.jpeg',
    'food_pg_1_pos_79.jpeg', 'food_pg_1_pos_8.jpeg', 'food_pg_1_pos_86.jpeg',
    'food_pg_2_pos_1.jpeg', 'food_pg_2_pos_18.jpeg', 'food_pg_2_pos_25.jpeg',
    'food_pg_2_pos_26.jpeg', 'food_pg_2_pos_29.jpeg', 'food_pg_2_pos_4.jpeg',
    'food_pg_2_pos_40.jpeg', 'food_pg_2_pos_48.jpeg', 'food_pg_2_pos_5.jpeg',
    'food_pg_2_pos_56.jpeg', 'food_pg_2_pos_58.jpeg', 'food_pg_2_pos_60.jpeg',
    'food_pg_2_pos_66.jpeg', 'food_pg_2_pos_8.jpeg', 'food_pg_2_pos_88.jpeg',
    'food_pg_2_pos_89.jpeg', 'food_pg_3_pos_0.jpeg', 'food_pg_3_pos_16.jpeg',
    'food_pg_3_pos_23.jpeg', 'food_pg_3_pos_29.jpeg', 'food_pg_3_pos_33.jpeg',
    'food_pg_3_pos_41.jpeg', 'food_pg_3_pos_55.jpeg', 'food_pg_3_pos_56.jpeg',
    'food_pg_3_pos_57.jpeg', 'food_pg_3_pos_61.jpeg', 'food_pg_3_pos_62.jpeg',
    'food_pg_3_pos_64.jpeg', 'food_pg_3_pos_87.jpeg', 'food_pg_3_pos_90.jpeg',
    'food_pg_3_pos_97.jpeg', 'food_pg_3_pos_99.jpeg', 'food_pg_4_pos_42.jpeg',
    'food_pg_4_pos_49.jpeg', 'food_pg_4_pos_56.jpeg', 'food_pg_4_pos_60.jpeg',
    'food_pg_4_pos_69.jpeg', 'food_pg_4_pos_7.jpeg', 'food_pg_4_pos_70.jpeg',
    'food_pg_4_pos_91.jpeg', 'food_pg_4_pos_97.jpeg', 'food_pg_5_pos_2.jpeg',
    'food_pg_5_pos_51.jpeg', 'food_pg_5_pos_76.jpeg', 'food_pg_5_pos_78.jpeg',
    'food_pg_5_pos_90.jpeg', 'food_pg_6_pos_13.jpeg', 'food_pg_6_pos_39.jpeg',
    'food_pg_6_pos_42.jpeg', 'food_pg_6_pos_44.jpeg', 'food_pg_6_pos_50.jpeg',
    'food_pg_6_pos_62.jpeg', 'food_pg_6_pos_68.jpeg', 'food_pg_6_pos_74.jpeg',
    'food_pg_6_pos_78.jpeg', 'food_pg_7_pos_36.jpeg', 'food_pg_7_pos_39.jpeg',
    'food_pg_7_pos_51.jpeg', 'food_pg_7_pos_6.jpeg', 'food_pg_7_pos_73.jpeg',
    'food_pg_0_pos_94.jpeg', 'food_pg_1_pos_12.jpeg', 'food_pg_1_pos_17.jpeg',
    'food_pg_1_pos_2.jpeg', 'food_pg_1_pos_28.jpeg', 'food_pg_1_pos_36.jpeg',
    'food_pg_1_pos_37.jpeg', 'food_pg_1_pos_4.jpeg', 'food_pg_1_pos_41.jpeg',
    'food_pg_1_pos_45.jpeg', 'food_pg_1_pos_49.jpeg', 'food_pg_1_pos_5.jpeg',
    'food_pg_1_pos_50.jpeg', 'food_pg_1_pos_53.jpeg', 'food_pg_1_pos_57.jpeg',
    'food_pg_1_pos_71.jpeg', 'food_pg_1_pos_79.jpeg', 'food_pg_1_pos_8.jpeg',
    'food_pg_1_pos_86.jpeg', 'food_pg_2_pos_1.jpeg', 'food_pg_2_pos_18.jpeg',
    'food_pg_2_pos_25.jpeg', 'food_pg_2_pos_26.jpeg', 'food_pg_2_pos_29.jpeg',
    'food_pg_2_pos_4.jpeg', 'food_pg_2_pos_40.jpeg', 'food_pg_2_pos_48.jpeg',
    'food_pg_2_pos_5.jpeg', 'food_pg_2_pos_56.jpeg', 'food_pg_2_pos_58.jpeg',
    'food_pg_2_pos_60.jpeg', 'food_pg_2_pos_66.jpeg', 'food_pg_2_pos_8.jpeg',
    'food_pg_2_pos_88.jpeg', 'food_pg_2_pos_89.jpeg', 'food_pg_3_pos_0.jpeg',
    'food_pg_3_pos_16.jpeg', 'food_pg_3_pos_23.jpeg', 'food_pg_3_pos_29.jpeg',
    'food_pg_3_pos_33.jpeg', 'food_pg_3_pos_41.jpeg', 'food_pg_3_pos_55.jpeg',
    'food_pg_3_pos_56.jpeg', 'food_pg_3_pos_57.jpeg', 'food_pg_3_pos_61.jpeg',
    'food_pg_3_pos_62.jpeg', 'food_pg_3_pos_64.jpeg', 'food_pg_3_pos_87.jpeg',
    'food_pg_3_pos_90.jpeg', 'food_pg_3_pos_97.jpeg', 'food_pg_3_pos_99.jpeg',
    'food_pg_4_pos_42.jpeg', 'food_pg_4_pos_49.jpeg', 'food_pg_4_pos_56.jpeg',
    'food_pg_4_pos_60.jpeg', 'food_pg_4_pos_69.jpeg', 'food_pg_4_pos_7.jpeg',
    'food_pg_4_pos_70.jpeg', 'food_pg_4_pos_91.jpeg', 'food_pg_4_pos_97.jpeg',
    'food_pg_5_pos_2.jpeg', 'food_pg_5_pos_51.jpeg', 'food_pg_5_pos_76.jpeg',
    'food_pg_5_pos_78.jpeg', 'food_pg_5_pos_90.jpeg', 'food_pg_6_pos_13.jpeg',
    'food_pg_6_pos_39.jpeg', 'food_pg_6_pos_42.jpeg', 'food_pg_6_pos_44.jpeg',
    'food_pg_6_pos_50.jpeg', 'food_pg_6_pos_62.jpeg', 'food_pg_6_pos_68.jpeg',
    'food_pg_6_pos_74.jpeg', 'food_pg_6_pos_78.jpeg', 'food_pg_7_pos_36.jpeg',
    'food_pg_7_pos_39.jpeg', 'food_pg_7_pos_51.jpeg', 'food_pg_7_pos_6.jpeg',
    'food_pg_7_pos_73.jpeg', 'food_pg_7_pos_88.jpeg'
])

## 2. Obtain and Clean Hot Dog Data Set

In [None]:
for page_num in range(6):
    image_search_info = download_images.get_image_search_info(
        page_num, 'hot dog'
    )
    download_images.get_images(image_search_info)

In [None]:
hot_dog_issues = [
    'hot dog_pg_0_pos_12.jpeg', 'hot dog_pg_0_pos_16.jpeg', 
    'hot dog_pg_0_pos_17.jpeg', 'hot dog_pg_0_pos_24.jpeg', 
    'hot dog_pg_0_pos_28.jpeg', 'hot dog_pg_0_pos_33.jpeg', 
    'hot dog_pg_0_pos_36.jpeg', 'hot dog_pg_0_pos_38.jpeg', 
    'hot dog_pg_0_pos_39.jpeg', 'hot dog_pg_0_pos_4.jpeg', 
    'hot dog_pg_0_pos_40.jpeg', 'hot dog_pg_0_pos_42.jpeg', 
    'hot dog_pg_0_pos_44.jpeg', 'hot dog_pg_0_pos_46.jpeg', 
    'hot dog_pg_0_pos_48.jpeg', 'hot dog_pg_0_pos_50.jpeg', 
    'hot dog_pg_0_pos_66.jpeg', 'hot dog_pg_0_pos_66.png', 
    'hot dog_pg_0_pos_74.jpeg', 'hot dog_pg_0_pos_76.png', 
    'hot dog_pg_0_pos_81.jpeg', 'hot dog_pg_0_pos_85.jpeg', 
    'hot dog_pg_0_pos_92.jpeg', 'hot dog_pg_1_pos_0.jpeg', 
    'hot dog_pg_1_pos_13.jpeg', 'hot dog_pg_1_pos_16.jpeg', 
    'hot dog_pg_1_pos_17.jpeg', 'hot dog_pg_1_pos_20.jpeg', 
    'hot dog_pg_1_pos_22.jpeg', 'hot dog_pg_1_pos_3.jpeg', 
    'hot dog_pg_1_pos_39.jpeg', 'hot dog_pg_1_pos_49.jpeg', 
    'hot dog_pg_1_pos_54.jpeg', 'hot dog_pg_1_pos_57.jpeg', 
    'hot dog_pg_1_pos_61.jpeg', 'hot dog_pg_1_pos_62.jpeg', 
    'hot dog_pg_1_pos_82.jpeg', 'hot dog_pg_1_pos_86.jpeg', 
    'hot dog_pg_1_pos_91.jpeg', 'hot dog_pg_1_pos_92.jpeg', 
    'hot dog_pg_1_pos_94.jpeg', 'hot dog_pg_1_pos_98.jpeg', 
    'hot dog_pg_2_pos_10.jpeg', 'hot dog_pg_2_pos_16.jpeg', 
    'hot dog_pg_2_pos_18.jpeg', 'hot dog_pg_2_pos_19.jpeg', 
    'hot dog_pg_2_pos_2.jpeg', 'hot dog_pg_2_pos_27.jpeg', 
    'hot dog_pg_2_pos_30.jpeg', 'hot dog_pg_2_pos_34.jpeg', 
    'hot dog_pg_2_pos_37.jpeg', 'hot dog_pg_2_pos_41.jpeg', 
    'hot dog_pg_2_pos_43.jpeg', 'hot dog_pg_2_pos_44.jpeg', 
    'hot dog_pg_2_pos_5.jpeg', 'hot dog_pg_2_pos_53.jpeg', 
    'hot dog_pg_2_pos_57.jpeg', 'hot dog_pg_2_pos_60.jpeg', 
    'hot dog_pg_2_pos_62.jpeg', 'hot dog_pg_2_pos_63.jpeg', 
    'hot dog_pg_2_pos_68.jpeg', 'hot dog_pg_2_pos_69.jpeg', 
    'hot dog_pg_2_pos_71.jpeg', 'hot dog_pg_2_pos_75.jpeg', 
    'hot dog_pg_2_pos_79.jpeg', 'hot dog_pg_2_pos_8.jpeg', 
    'hot dog_pg_2_pos_90.jpeg', 'hot dog_pg_2_pos_91.jpeg', 
    'hot dog_pg_2_pos_94.jpeg', 'hot dog_pg_2_pos_95.jpeg', 
    'hot dog_pg_2_pos_98.png', 'hot dog_pg_3_pos_0.jpeg', 
    'hot dog_pg_3_pos_17.jpeg', 'hot dog_pg_3_pos_21.jpeg', 
    'hot dog_pg_3_pos_23.jpeg', 'hot dog_pg_3_pos_24.jpeg', 
    'hot dog_pg_3_pos_25.jpeg', 'hot dog_pg_3_pos_27.jpeg', 
    'hot dog_pg_3_pos_29.jpeg', 'hot dog_pg_3_pos_30.jpeg', 
    'hot dog_pg_3_pos_34.jpeg', 'hot dog_pg_3_pos_35.jpeg', 
    'hot dog_pg_3_pos_39.jpeg', 'hot dog_pg_3_pos_4.jpeg', 
    'hot dog_pg_3_pos_40.jpeg', 'hot dog_pg_3_pos_41.jpeg', 
    'hot dog_pg_3_pos_49.jpeg', 'hot dog_pg_3_pos_5.jpeg', 
    'hot dog_pg_3_pos_52.jpeg', 'hot dog_pg_3_pos_55.jpeg', 
    'hot dog_pg_3_pos_56.jpeg', 'hot dog_pg_3_pos_61.jpeg', 
    'hot dog_pg_3_pos_65.jpeg', 'hot dog_pg_3_pos_66.jpeg', 
    'hot dog_pg_3_pos_71.jpeg', 'hot dog_pg_3_pos_74.jpeg', 
    'hot dog_pg_3_pos_76.jpeg', 'hot dog_pg_3_pos_79.jpeg', 
    'hot dog_pg_3_pos_82.jpeg', 'hot dog_pg_3_pos_89.jpeg', 
    'hot dog_pg_3_pos_9.jpeg', 'hot dog_pg_3_pos_90.jpeg', 
    'hot dog_pg_3_pos_92.jpeg', 'hot dog_pg_3_pos_94.jpeg', 
    'hot dog_pg_3_pos_95.jpeg', 'hot dog_pg_3_pos_97.jpeg', 
    'hot dog_pg_3_pos_99.jpeg', 'hot dog_pg_4_pos_0.jpeg', 
    'hot dog_pg_4_pos_1.jpeg', 'hot dog_pg_4_pos_13.jpeg', 
    'hot dog_pg_4_pos_14.jpeg', 'hot dog_pg_4_pos_15.jpeg', 
    'hot dog_pg_4_pos_17.jpeg', 'hot dog_pg_4_pos_18.jpeg', 
    'hot dog_pg_4_pos_2.jpeg', 'hot dog_pg_4_pos_22.jpeg', 
    'hot dog_pg_4_pos_23.jpeg', 'hot dog_pg_4_pos_25.jpeg', 
    'hot dog_pg_4_pos_26.jpeg', 'hot dog_pg_4_pos_30.jpeg', 
    'hot dog_pg_4_pos_32.jpeg', 'hot dog_pg_4_pos_35.jpeg', 
    'hot dog_pg_4_pos_37.jpeg', 'hot dog_pg_4_pos_45.jpeg', 
    'hot dog_pg_4_pos_47.jpeg', 'hot dog_pg_4_pos_49.jpeg', 
    'hot dog_pg_4_pos_51.jpeg', 'hot dog_pg_4_pos_55.jpeg', 
    'hot dog_pg_4_pos_57.png', 'hot dog_pg_4_pos_58.jpeg', 
    'hot dog_pg_4_pos_59.jpeg', 'hot dog_pg_4_pos_60.jpeg', 
    'hot dog_pg_4_pos_62.jpeg', 'hot dog_pg_4_pos_68.jpeg', 
    'hot dog_pg_4_pos_69.jpeg', 'hot dog_pg_4_pos_72.jpeg', 
    'hot dog_pg_4_pos_73.jpeg', 'hot dog_pg_4_pos_74.jpeg', 
    'hot dog_pg_4_pos_76.jpeg', 'hot dog_pg_4_pos_79.jpeg', 
    'hot dog_pg_4_pos_8.jpeg', 'hot dog_pg_4_pos_82.jpeg', 
    'hot dog_pg_4_pos_87.jpeg', 'hot dog_pg_4_pos_9.jpeg', 
    'hot dog_pg_4_pos_91.png', 'hot dog_pg_4_pos_94.jpeg', 
    'hot dog_pg_4_pos_95.jpeg', 'hot dog_pg_4_pos_96.jpeg', 
    'hot dog_pg_5_pos_0.jpeg', 'hot dog_pg_5_pos_1.jpeg', 
    'hot dog_pg_5_pos_12.jpeg', 'hot dog_pg_5_pos_13.jpeg', 
    'hot dog_pg_5_pos_15.png', 'hot dog_pg_5_pos_17.jpeg', 
    'hot dog_pg_5_pos_18.jpeg', 'hot dog_pg_5_pos_19.jpeg', 
    'hot dog_pg_5_pos_2.jpeg', 'hot dog_pg_5_pos_20.jpeg', 
    'hot dog_pg_5_pos_22.jpeg', 'hot dog_pg_5_pos_3.jpeg', 
    'hot dog_pg_5_pos_32.jpeg', 'hot dog_pg_5_pos_33.jpeg', 
    'hot dog_pg_5_pos_35.jpeg', 'hot dog_pg_5_pos_38.jpeg', 
    'hot dog_pg_5_pos_42.jpeg', 'hot dog_pg_5_pos_44.jpeg', 
    'hot dog_pg_5_pos_48.jpeg', 'hot dog_pg_5_pos_50.jpeg', 
    'hot dog_pg_5_pos_55.jpeg', 'hot dog_pg_5_pos_56.jpeg', 
    'hot dog_pg_5_pos_57.jpeg', 'hot dog_pg_5_pos_59.jpeg', 
    'hot dog_pg_5_pos_62.jpeg', 'hot dog_pg_5_pos_63.jpeg', 
    'hot dog_pg_5_pos_65.jpeg', 'hot dog_pg_5_pos_67.jpeg', 
    'hot dog_pg_5_pos_68.jpeg', 'hot dog_pg_5_pos_69.jpeg', 
    'hot dog_pg_5_pos_70.jpeg', 'hot dog_pg_5_pos_72.jpeg', 
    'hot dog_pg_5_pos_74.jpeg', 'hot dog_pg_5_pos_78.jpeg', 
    'hot dog_pg_5_pos_79.jpeg', 'hot dog_pg_5_pos_80.jpeg', 
    'hot dog_pg_5_pos_82.jpeg', 'hot dog_pg_5_pos_83.jpeg', 
    'hot dog_pg_5_pos_85.jpeg', 'hot dog_pg_5_pos_86.jpeg', 
    'hot dog_pg_5_pos_88.jpeg', 'hot dog_pg_5_pos_9.jpeg', 
    'hot dog_pg_5_pos_92.jpeg', 'hot dog_pg_5_pos_93.png', 
    'hot dog_pg_5_pos_94.jpeg', 'hot dog_pg_5_pos_96.jpeg', 
    'hot dog_pg_5_pos_99.jpeg'
]

## 3. Process Baseline and Hot Dog Model Data
Since we know which images have issues, we transfer the images without issues from the 'raw' folder to the 'intermediary' folder. We won't use all the data so we will only transfer 400 images for each of 'food', 'sandwich', 'taco', and 'hot dog'.

In [7]:

import re
import pathlib
from PIL import Image


# doesn't incorporate same number of images
image_issues = {
    'food': food_issues,
    'sandwich': sandwich_issues,
    'taco': taco_issues, 
    'hot dog': hot_dog_issues
}

_raw_path = pathlib.Path('../data/raw').resolve()
_intermediary_path = pathlib.Path('../data/intermediary').resolve()

for category in ['food', 'hot dog', 'sandwich', 'taco']:
    
    # get list containing all image files
    category_path = _raw_path.joinpath(category)
    category_images = [
        category_obj 
        for category_obj in category_path.glob('**/*') 
        if category_obj.is_file()
    ]
    
    transferred_images = 0

    for category_image in category_images:
        # stop when 400 images are transfered
        if transferred_images >= 400:
            break
        
        # name of file
        im_name = str(category_image.relative_to(category_path))
        
        # confirm that image is not an image with issues
        if im_name not in image_issues[category]:
            # modifiy all raw images and save to intermediary folder
            im = Image.open(category_image)
            im.save(_intermediary_path.joinpath(category, im_name))
  
            transferred_images = transferred_images + 1

In [4]:
import tensorflow as tf
import os
import pathlib
import re
import random

_intermediary_path = pathlib.Path('../data/intermediary').resolve()
image_labels = {
    'food' : [1, 0, 0],
    'sandwich' : [0, 1, 0],
    'taco': [0, 0, 1]
}

def _bytestring_feature(list_of_bytestrings):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=list_of_bytestrings))

def _int_feature(list_of_ints): # int64
    return tf.train.Feature(int64_list=tf.train.Int64List(value=list_of_ints))

def create_example(image, label):
    feature = {
        'image': _bytestring_feature(image),
        'label': _int_feature(label),
    }
    return tf.train.Example(features=tf.train.Features(feature=feature))


category_filenames = {}
random.seed(123)
for category in ['food', 'sandwich', 'taco']:
    category_path = os.path.join(_intermediary_path, category)
    full_path = []
    for path in os.listdir(category_path):
        full_path.append(os.path.join(category_path, path))
    
    # shuffles in place; returns None
    random.shuffle(full_path)
    category_filenames[category] = full_path

interleaved_filenames = [
    val 
    for tup in zip(*category_filenames.values()) 
    for val in tup
]

In [6]:
with tf.io.TFRecordWriter('../data/processed/baseline.tfrecord') as writer:
    for image_path in interleaved_filenames[:301]:
        try:
            raw_file = tf.io.read_file(image_path)
        except FileNotFoundError:
            print(f'File {image_path} could not be found')
            continue
        image = [tf.io.read_file(image_path).numpy()]

        category = re.sub(
            r'\\.+$', '', re.sub(r'^.+intermediary\\', '', image_path)
        )
        example = create_example(image, image_labels[category])
        writer.write(example.SerializeToString())


## 4. Obtain, Clean and Process Taco Bias Model Data
The process was repeated with 'open face sandwich' and 'hard taco'. It was thought that since a hard taco looks more like a hot dog than a normal taco and an open face sandwich looks less like a taco than the average sandwich, we could potentially alter the model prediction with a 'taco bias' data set.

In [None]:
for page_num in range(1):
    for category in ['hard taco', 'open face sandwich']:
        image_search_info = download_images.get_image_search_info(
            page_num, category
        )
        download_images.get_images(image_search_info)

In [None]:
hard_taco_issues = [
    'hard taco_pg_0_pos_8.jpeg', 'hard taco_pg_0_pos_20.jpeg',
    'hard taco_pg_0_pos_21.jpeg', 'hard taco_pg_0_pos_25.jpeg',
    'hard taco_pg_0_pos_27.jpeg', 'hard taco_pg_0_pos_35.jpeg',
    'hard taco_pg_0_pos_44.jpeg', 'hard taco_pg_0_pos_53.jpeg',
    'hard taco_pg_0_pos_54.jpeg', 'hard taco_pg_0_pos_64.jpeg',
    'hard taco_pg_0_pos_65.jpeg', 'hard taco_pg_0_pos_66.jpeg',
    'hard taco_pg_0_pos_70.jpeg', 'hard taco_pg_0_pos_74.jpeg',
    'hard taco_pg_0_pos_84.jpeg', 'hard taco_pg_0_pos_89.jpeg',
    'hard taco_pg_0_pos_96.jpeg', 'hard taco_pg_0_pos_99.jpeg'
]
open_face_sandwich_issues = [
    'open face sandwich_pg_0_pos_81.jpeg', 'open face sandwich_pg_0_pos_90.jpeg'
]


In [None]:
import re
import pathlib
from PIL import Image


# doesn't incorporate same number of images
image_issues = {
    'food': food_issues,
    'open face sandwich': open_face_sandwich_issues,
    'hard taco': hard_taco_issues, 
}

_raw_path = pathlib.Path('../data/raw').resolve()
_intermediary_path = pathlib.Path('../data/intermediary').resolve()

for category in ['hard taco', 'open face sandwich']:
    
    # get list containing all image files
    category_path = _raw_path.joinpath(category)
    category_images = [
        category_obj 
        for category_obj in category_path.glob('**/*') 
        if category_obj.is_file()
    ]
    
    transferred_images = 0

    for category_image in category_images:
        # stop when 50 images are transfered
        if transferred_images >= 50:
            break
        
        # name of file
        im_name = str(category_image.relative_to(category_path))
        
        # confirm that image is not an image with issues
        if im_name not in image_issues[category]:
            # modifiy all raw images and save to intermediary folder
            im = Image.open(category_image)
            im.save(_intermediary_path.joinpath(category, im_name))
  
            transferred_images = transferred_images + 1

In [None]:
import tensorflow as tf
import os
import pathlib
import re
import random

_intermediary_path = pathlib.Path('../data/intermediary').resolve()
image_labels = {
    'food' : [1, 0, 0],
    'open face sandwich' : [0, 1, 0],
    'hard taco': [0, 0, 1]
}

def _bytestring_feature(list_of_bytestrings):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=list_of_bytestrings))

def _int_feature(list_of_ints):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=list_of_ints))

def create_example(image, label):
    feature = {
        'image': _bytestring_feature(image),
        'label': _int_feature(label),
    }
    return tf.train.Example(features=tf.train.Features(feature=feature))


category_filenames = {}
random.seed(123)
for category in ['food', 'open face sandwich', 'hard taco']:
    category_path = os.path.join(_intermediary_path, category)
    full_path = []
    for path in os.listdir(category_path):
        full_path.append(os.path.join(category_path, path))
    
    # shuffles in place; returns None
    random.shuffle(full_path)
    category_filenames[category] = full_path

interleaved_filenames = [
    val 
    for tup in zip(*category_filenames.values()) 
    for val in tup
]

In [None]:

with tf.io.TFRecordWriter('../data/processed/taco_bias.tfrecord') as writer:
    for image_path in interleaved_filenames:
        try:
            raw_file = tf.io.read_file(image_path)
        except FileNotFoundError:
            print(f'File {image_path} could not be found')
            continue
        image = [tf.io.read_file(image_path).numpy()]

        category = re.sub(
            r'\\.+$', '', re.sub(r'^.+intermediary\\', '', image_path)
        )
        example = create_example(image, image_labels[category])
        writer.write(example.SerializeToString())