# Downloading Notre Dame Bagby Negative Images

From the [University Archives](http://archives.nd.edu/digital/):
- "The Bagby company, a South Bend photographic studio, took pictures of athletes for Notre Dame. The digitized Glass Plate Negative Collection is part of a [larger Bagby collection](http://archives.nd.edu/findaids/ead/xml/bby.xml)."
- [Bagby Glass Plate Negative Collection (Notre Dame Sports), 1920s-1930s](http://archives.nd.edu/Bagby/index.htm)

This Jupyter Notebook inclues codes + comments that downloads all images in the Bagby Glass Plate Negative Collection (Notre Dame Sports), and also matches image metadata to file names.

# Import Libraries, Load URL, and Create Beautiful Soup Object

In [None]:
# import libraries
import os
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
import csv
import pandas as pd

In [None]:
# load url, create beautifulsoup object
page = requests.get('http://archives.nd.edu/Bagby/index.htm')

soup = BeautifulSoup(page.text, 'html.parser')

# isolate html with 'table' tag
url_names = soup.find('table')

# find all instances of 'img' tag
img_list = url_names.find_all('img')

# Get List of Image File Names

In [None]:
# create empty list for image file names
image_file_names = []

# for loop that isolates src contents, removes 'tn\\' string, and appends to empty list
for img in img_list:
    image_file_names.append(img.get('src').replace("tn\\tn-", ""))
    
# list of image file names
image_file_names

# Get List of Image URLs

In [None]:
# create empty list for image urls
image_url_list = []

# for loop that concatenates URL root with image file name (end of link)
for name in image_file_names:
    image_url_list.append("http://archives.nd.edu/Bagby/" + name)
    
# list of urls
image_url_list

# Download Image Files from List of Full URLS

In [None]:
# import libraries
import urllib3
import os

# configure urllib
http = urllib3.PoolManager()
print("downloading with urllib")

# for loop that downloads image for each url in image_url_list
for url in image_url_list:
    r = http.request('GET', url)
    filename = os.path.basename(url)
    with open (filename, 'wb') as fcont:
        fcont.write(r.data)

# Matching File Names and Image Info

In [None]:
# show table object
table

In [None]:
# create dataframe from table object usign pd.read_html
df = pd.read_html(str(table))[0]

# show newly-created dataframe
df

In [None]:
# map image file names to image description

# rename second column 
df.rename(columns={1: 'image_title'}, inplace=True)

# delete first column
del df[0]

# show updated dataframe
df

In [None]:
# create new file_name column with values from image_file_names list
df['file_name'] = image_file_names

# show updated dataframe
df

In [None]:
# write dataframe to csv
df.to_csv('bagby_images_file_name_master.csv', index=False)