##  Collecting Produce Images
- Collect 300 images of different vegetables [google-images-download](https://pypi.org/project/google-images-download/#usage-using-command-line-interface)
- Afterwards, inspect each image and remove non-relevant ones.

Import Libraries

In [1]:
import os
from google_images_download import google_images_download

Create functions to automate image collection and renaming of figures

In [None]:
# Function to download images
def downloadVeg(list_veg, n_img=300):
    '''Function to download images from google
    Input:
        list_veg - list of veggies, e.g., ['tomato','carrot']
        n_img    - the number of images to collect
    '''
    for each in list_veg:
        arguments = {"keywords":each,
                 "limit":n_img,
                 "print_urls":False,
                "chromedriver":'/Users/jhonsen/Downloads/chromedriver'}   #creating list of arguments
        response = google_images_download.googleimagesdownload()   #class instantiation
        paths = response.download(arguments)   #passing the arguments to the function

# Python3 code to rename multiple files in a directory or folder   
def rename(directory, prefix): 
    ''' Function renames files within a directory
    Input:
        directory - in string
        prefix    - in string, the prefix of filenames
    '''
    i = 0
    for filename in os.listdir(directory): 
        dst = prefix + str(i) + ".jpg"
        src = directory + filename 
        dst = directory + dst  
        # rename() function will 
        # rename all the files 
        os.rename(src, dst) 
        i += 1
        
def renamePhotos(list_veg):
    ''' function to rename the photos inside folder
    '''
    for each in list_veg:
        rename(f'../data/images/raw/{each}/', each)
        

In [None]:
# These are the types of images to collect. Some will be used to train the model.
vegetables = ['tomato','bell pepper','asparagus', 'broccoli','spinach',
              'eggplant','zucchini','scallion','celery',
              'cauliflower','spinach','bok choy vegetable','kale',
              'brussels sprouts vegetable','cabbage','bean sprouts']

In [None]:
# START Collecting Images! 

# >>>>>>> This function will take a while to complete<<<<<
# >>>> It also outputs a huge number of text output <<<<<<

downloadVeg(vegetables)

In [None]:
# Rename photos so the filenames are labeled in numerical order 

renamePhotos(vegetables)

- After inspecting & removing non-relevant files in `/data/images/raw/` directory, split them into train & test groups,
- Place those images in `/data/veggies/train` and `/data/veggies/test` 

---

## Recipe collection via webscraping Epicurious
To collect recipes and images of the recipes, I utilized others' webscrapers:
1. Go to terminal and **clone** this [recipe-box](https://github.com/jhonsen/recipe-box) repo, which has a few commits ahead of its upstream ([Ryan's recipe-box](https://github.com/rtlee9/recipe-box)) to account for epicurious page-changes and to collect nutritional facts during scraping.  In particular, I made a few changes to the following python scripts:
        - get_pictures.py => updated image url
        - get_recipes.py  => added a few lines to scrape nutritional facts
2. Inside `./src/`-folder of recipe-box repo, clone this [recipe-scraper](https://github.com/rtlee9/recipe-scraper), which is where the scraper script imported from
3. Run scraper from top directory, e.g., `$ python src/get_recipes.py --epi --multi --sleep 2` (see doc for args details)
4. Briefly inspect the output file in `./data/`-folder. My output is called **recipes_raw_epi.json**, which is a json file that will be parsed in [Step2_Cleaning.ipynb](./Step2_Cleaning.ipynb). Since this file exceeds the limit allowed by Github size (>100MB+), it's not included in this repo. 
5. Move the **recipes_raw_epi.json** file to the [data](../data) folder of this repo

---

---