# 02. Downloading Images

This notebook was built for the purpose of retriving the consumer-grade images from the FTP site from the U.S. National Library of Medicine.  After the first notebook built a `.csv` file with the appropriate link to the referenced images, this notebook will pull in that csv file and leverage the `webbrowser` library to download each image file.

---
## Table of Contents
- [01. Importing Libraries](#01.-Importing-Libraries)
- [02. Read In Directory](#02.-Read-In-Directory)
- [03. Downloading Images](#03.-Downloading-Images)
- [04. Relocating Images](#04.-Relocating-Images)

---
### 01. Importing Libraries

For this notebook, we will require the used of the below libraries:
- `pandas`: for dealing with the csv file as a dataframe
- `webbrowser`: to access and manipulate our default web browser
- `time`: so that we can have our downloading code sleep between downloads
- `os`: used in conjunction with `shutil` in order to relocate downloads
- `shutil`: used in conjunction with `os` in order to relocate downloads

In [1]:
import pandas as pd
import webbrowser
import time
from time import sleep
import os
# import random
import shutil

---
### 02. Read In Directory

In [2]:
# Read-in csv file
data = pd.read_csv('../data/consumer_lookup.csv').drop(columns = ['Unnamed: 0'])

# Looking at the top 5 rows
data.head()

Unnamed: 0,Name,full_url
0,STRATTERA 10MG,ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/...
1,STRATTERA 10MG,ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/...
2,STRATTERA 10MG,ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/...
3,STRATTERA 10MG,ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/...
4,STRATTERA 10MG,ftp://lhcftp.nlm.nih.gov/Open-Access-Datasets/...


---
### 03. Downloading Images

This code block will use the `webbrowser` library to open up the links listed on the `full_url` column in a new web browser tab.  Each link, once opened, automatically downloads the pill image for the associated medication in the `Name` column.  In order to alleviate the potential flood of traffic to the host server, which can be interpreted as a distributed denial-of-service (DDoS) attack, we will wait 4 seconds between each download.

In [3]:
# For each link found in the 'full_url' column,
for url in data['full_url']:
    # open the link in a new web browser,
    webbrowser.open_new(url)
    # and take a 4 second break before moving on to the next link
    sleep(4)

Due to the aforementioned waiting period of 4 seconds between each download, the total amount of time it took to download all 59,414 jpeg files was just over 66 hours.

---
### 04. Relocating Images

After a lengthy waiting period, we need to move all of the downloaded images from our download folder into the appropriate test folder, from which future notebooks will access the images.

In [9]:
# Instantiate the folder from which we will move the images from
source = 'C:\\Users\\Fausto\\Downloads\\'

# Instantiate the folder where we will move the images to
destination = 'C:\\Users\\Fausto\\Documents\\General_Assembly\\Projects\\f-manon_capstone\\images\\test'

# Instantiate a list of files located in the Downloads folder
files = os.listdir(source)

# For each file in the Downloads folder, move from the source to the destination
for f in files:
        shutil.move(source + f, destination)

Now we are ready to move on to the next notebook in order to convert the images into data that our neural network can process.

---