# Category Summaries

**Speeches**  <br>
The Speeches category contains all archival items that contain the content of speeches. This category includes written drafts and written version of speeches by historical figures as well as transcripts of speeches given to public or designated audiences.*italicized text*

**Correspondence** <br>
The correspondence category contains all archival items which contain correspondence from historical figures. This category includes all types of correspondence, including personal, business, and institutional.

**Drawing** <br>
The Drawing category contains all illustrated art not included in other specified categories (ie. Advertisements). This category holds drawings of all types including newspaper cartoons, illustrations by artists, and political cartoons.


**Photograph** <br>
The Photographs category contains all photographic materials in the archival collection, including historical photography and photographs by artists.


**Advertisement** <br>
The Advertisements category contains all image and written materials used for advertising purposes. This includes items such as advertisement art and posters/flyers, for business or political purposes.


**Book**  <br>
The Books category contains all book-related and printed materials, excluding those in other categories. Such materials include pages from books, printed drafts or manuscripts, and full versions of selected printed materials. This category included printed materials with multiple pages.


**Biography** <br>
The Biographies category contains all Biography materials. These include written autobiographies, biographical manuscripts, books, and other biographical materials. Biographies span from one single page to multiple pages and books/manuscripts.


# Model Analysis on Library of Congress Data

Total images: 59,945 <br>
Test items: 6,658 <br>
Precision (% of true positives): 95.82% <br>
Recall (%  of true negatives): 94.74% <br>
**Confusion Matrix** <br>
<center>
<img src="https://drive.google.com/uc?id=1sjt7ZEuxkVEOt_NMSOR56cqqfvFpmHlC" width="700">
</center>

# Setup

In [0]:
#@title Authenticate
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


In [0]:
!pip install google-cloud-vision
!pip install google-cloud-automl

Collecting google-cloud-vision
[?25l  Downloading https://files.pythonhosted.org/packages/f8/6b/8c1284f9f1b1cfd6323b0c64d1abf3ca7c98b5860aa18f40396e3a193b67/google_cloud_vision-0.41.0-py2.py3-none-any.whl (431kB)
[K     |████████████████████████████████| 440kB 2.7MB/s 
Installing collected packages: google-cloud-vision
Successfully installed google-cloud-vision-0.41.0


Collecting google-cloud-automl
[?25l  Downloading https://files.pythonhosted.org/packages/5f/60/d2e5721713967d40cdee5ed9da4fad4a0349dada84608f16f24920afc6f2/google_cloud_automl-0.9.0-py2.py3-none-any.whl (371kB)
[K     |████████████████████████████████| 378kB 2.5MB/s 
Installing collected packages: google-cloud-automl
Successfully installed google-cloud-automl-0.9.0


In [0]:
from google.cloud import automl_v1beta1
from google.cloud.automl_v1beta1.proto import service_pb2
from google.cloud import vision
from google.cloud.vision import types
from google.cloud import storage
import pandas as pd
import os
import csv
from google.colab import files
import sys

Upload GCP key file

In [0]:
file_info = files.upload()
key = next(iter(file_info))
os.environ['GOOGLE_APPLICATION_CREDENTIALS']= key

Saving BridgeUROPs-e2517134f7a0.json to BridgeUROPs-e2517134f7a0.json


# Classification





In [0]:
gcs_client = storage.Client()

bucket_path = 'loc-entity-tagging'
bucket = gcs_client.get_bucket(bucket_path)

test_dir_path = 'test/'


In [0]:
# 'content' is base-64-encoded image data.
def get_prediction(content, project_id, model_id):
  prediction_client = automl_v1beta1.PredictionServiceClient()

  name = 'projects/{}/locations/us-central1/models/{}'.format(project_id, model_id)
  payload = {'image': {'image_bytes': content }}
  params = {}
  request = prediction_client.predict(name, payload, params)
  return request  # waits till request is returned


Benny Goodman photo, autographed
<br>
<center>
<img src="https://storage.cloud.google.com/loc-entity-tagging/test/Benny%20Goodman%20photo%2C%20autographed.jpg" width="700">

> Indented block


</center>

In [0]:
doc = "Benny Goodman photo, autographed.jpg"
proj = "238074779717"
model = "ICN659931277037666304"

download_blob = bucket.blob(test_dir_path + doc)
filename = download_blob.download_to_filename(doc)
filepath = './' + doc

with open(filepath, 'rb') as ff:
    content = ff.read()

result = print(get_prediction(content, proj, model))

payload {
  annotation_spec_id: "4692225245161979904"
  classification {
    score: 0.9606190919876099
  }
  display_name: "photograph"
}



# Print vs. Handwriting Classification

In [0]:
def get_text_prediction(content, project_id, model_id):
  prediction_client = automl_v1beta1.PredictionServiceClient()

  name = 'projects/{}/locations/us-central1/models/{}'.format(project_id, model_id)
  payload = {'image': {'image_bytes': content }}
  params = {}
  request = prediction_client.predict(name, payload, params)
  return request  # waits till request is returned

In [0]:
doc = "Benny Goodman photo, autographed.jpg"
proj = "238074779717"
model = "ICN659931277037666304"

download_blob = bucket.blob(test_dir_path + doc)
filename = download_blob.download_to_filename(doc)
filepath = './' + doc

with open(filepath, 'rb') as ff:
    content = ff.read()

result = print(get_text_prediction(content, proj, model))

# Image to Text

In [0]:
def detect_document_uri(uri):
    """Detects document features in the file located in Google Cloud
    Storage."""
    
    client = vision.ImageAnnotatorClient()
    image = vision.types.Image()
    image.source.image_uri = uri

    response = client.document_text_detection(image=image)
    words = ''
    final = ()
    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            #print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                #print('Paragraph confidence: {}'.format(
                    #paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    #print('Word text: {} (confidence: {})'.format(
                        #word_text, word.confidence))

                    for symbol in word.symbols:
                        #print('\tSymbol: {} (confidence: {})'.format(
                            #symbol.text, symbol.confidence))
                        words += symbol.text
                    words += ' '

    words = words.replace(" . ", " ").replace(" , ", " ")

    return words


In [0]:
date = ("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sept", "Oct", "Nov", "Dec", "January", "February", "March", "April", "May", "Jun", "July", "August", "September", "October", "November", "December")
uppercase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
numbers = "1234567890"

In [0]:
def getMetaData(x):
	address = ""
	x = x.split(" ")
	for i in x:
		if i in date:
			place = x.index(i)
			print("The date of this document is: " + x[place]+ " " + x[place+1] + " " + x[place+2])

			break
	people = ""
	to = ''
	for i in range(len(x)):
		if x[i] == "Mr" or x[i] == "mr":
			if x[i+2][0] in uppercase:
				people += (x[i+1] + " " + x[i+2] + ", ")
		if x[i] == "Deri" or x[i] == "Dear":
			if x[i+2][0] in uppercase:
				to += (x[i+1] + " " + x[i+2] + ", ")
			else:
				to += x[i+1] + ", "
	for i in range(len(x)):
		if x[i] == "Street" or x[i] == "street" or x[i] == "Road" or x[i] == "road":
			if x[i-3][0] in numbers:
				address += x[i-3] + " " + x[i-2] + " " + x[i-1] + " " + x[i]
			else:
				address += x[i-2] + " " + x[i-1] + " " + x[i]
			break

	if people != "" and people[len(people)-2] == ",":
		people = people[:len(people)-2]
	if to != "" and to[len(to)-2] == ",":
		to = to[:len(to)-2]
	if to != "":
		print("This letter was sent to " + to)
	if people != "":
		print("People involved in this correspondence: " + people)
	if address != "":
		print("Address: " + address)

Composers Birthday Party announcement
<br>
<center>
<img src="https://storage.cloud.google.com/loc-entity-tagging/test/Composers%20Birthday%20Party%20announcement.jpg" width="700">


In [0]:
x=(detect_document_uri("gs://loc-entity-tagging/test/Composers Birthday Party announcement.jpg"))
y = getMetaData(x)

The date of this document is: March 15 1949
Address: 5 Peter Cooper Road


George Avakian's letter to Keith Jarrett's draft board

<br>
<center>
<img src="https://storage.cloud.google.com/loc-entity-tagging/test/George%20Avakian's%20letter%20to%20Keith%20Jarrett's%20draft%20board.jpg" width="700">

In [0]:
x=(detect_document_uri("gs://loc-entity-tagging/test/George Avakian's letter to Keith Jarrett's draft board.jpg"))
y = getMetaData(x)

The date of this document is: May 4 1968
People involved in this correspondence: Keith Jarrett
Address: 118 North 9th Street


From Avakian to Jim Conkling at Columbia Records

<br>
<center>
<img src="https://storage.cloud.google.com/loc-entity-tagging/test/Letter from Avakian to Jim Conkling at Columbia Records.jpg" width="700">

In [0]:
x=(detect_document_uri("gs://loc-entity-tagging/test/Letter from Avakian to Jim Conkling at Columbia Records.jpg"))
y = getMetaData(x)

The date of this document is: Dec 2 1955
This letter was sent to Jim


Letter from George Avakian to Joe Glaser

<br>
<center>
<img src="https://storage.cloud.google.com/loc-entity-tagging/test/Letter%20from%20George%20Avakian%20to%20Joe%20Glaser.jpg" width="700">

In [0]:
x =(detect_document_uri("gs://loc-entity-tagging/test/Letter from George Avakian to Joe Glaser.jpg"))
y = getMetaData(x)

The date of this document is: April 10 1965
This letter was sent to Joe
