In [35]:
# https://github.com/bibliotechy/DPyLA
# https://pro.dp.la/developers/field-reference#sourceResource-subject

In [72]:
from configparser import ConfigParser
import simplejson as json
from datetime import datetime, timedelta, time

import dpla
from dpla.api import DPLA

import pprint

print 'all set.'

all set.


### REMOTE API

If you do not wish for others to see your API Key, then add a file that adheres to the specificaitons laid out in the [ConfigParser](https://docs.python.org/2/library/configparser.html) documentation.

In [36]:
config = ConfigParser()
config.read(u'data/api_keys.txt') # REQUIRES UNICODE TO AVOID DEPRECATION WARNING
api_key = config.get('dpla', 'api_key')

dpla = DPLA(api_key)

# print 'Product Key %s initalized, we are a go!' % api_key

In [67]:
#DPLA URL: https://dp.la/item/0a9d7daa792c26aa0288e23b843176ee
#ORIGINAL URL: https://dlg.usg.edu/record/gcfa_gsac_gsac492

pp = pprint.PrettyPrinter(depth=2)

textile_id = "0a9d7daa792c26aa0288e23b843176ee"
news_id = '000027b7481890126251e17b7c53fc32'

textile = dpla.fetch_by_id([textile_id]).items[0]
news = dpla.fetch_by_id([news_id]).items[0]

### Ingestion Fields

INGESTION PIPELINE: Geo Location >> Library >> Digitization >> Provider >> Data Provider >> DPLA

#### Data Provider

In [68]:
# DEFINITION (EDM): The name or identifier of the organization who contributes data indirectly to an 
# aggregation service.

# EXAMPLE: The Arts and Theatre Institute in Prague is a dataProvider via the Linked Heritage project

textile['dataProvider']

u'Georgia Council for the Arts'

#### Provider

In [69]:
# DEFINITION (EDM): The name or identifier of the organization who delivers data directly to an aggrecation service.

# EXAMPLE: The Linked Heritage project is an provider for digital objects from 
# The Arts and Theatre Institute in Prague

textile['provider']['name']

u'Digital Library of Georgia'

#### Ingestion Date

In [93]:
# DEFINITION (DPLA): The ISO 8601 date on which the original record was imported into the DPLA database.

# EXAMPLE: The Linked Heritage project provided the digital record of a CHO from The Arts and Theatre 
# Institute of Prague and it was "put in" the DPLA database on 2018-06-12 at 12:49:52PM

to_datetime = datetime.strptime(textile['ingestDate'], '%Y-%m-%dT%H:%M:%S.%fZ')
str_datetime = datetime.strftime(to_datetime, '%Y-%m-%d %H:%M:%S')
print 'ORIGINAL ISO 8601:', '\t', textile['ingestDate']
print 'FORMATTED:', '\t\t', str_datetime

ORIGINAL ISO 8601: 	2018-06-12T12:49:52.107216Z
FORMATTED: 		2018-06-12 12:49:52


### CHO Fields (sourceResource)

In [95]:
textile_cho = textile['sourceResource']
news_cho = news['sourceResource']

#### Format

In [48]:
# DEFINITION (DC): The file format, physical medium, or dimensions of the resource.

# EXAMPLE: A wall hanging made of fibers
textile_cho['format']

[u'1 wall hanging', u'Fiber']

#### Title

In [49]:
# DEFINITION (DC): A name given to the CHO.  Typically, a title will be a name by which the resrouce is formally known.
    
# EXAMPLE: "Gulliver's Travels" (with an en language tag) and "Les Voyages de Gulliver" (with a fr language tag) are
# two titles for the same work by Jonathan Swift

textile_cho['title']

u'Homage to J.H'

#### Collection

In [57]:
# DEFINITION (DCMITYPE): An aggregation of resources, described as a group.  Parts may be sepperately described.
    
# EXAMPLE: A university could have a collection of Emily Dickinson poems.  If so the DPLA would have a colleciton
# object representing this conceptually.

textile_cho['collection']

{u'@id': u'http://dp.la/api/collections/a72045095d4a687a170a4f300d8e0637',
 u'description': u'DPLA: Include in Digital Public Library of America',
 u'id': u'a72045095d4a687a170a4f300d8e0637',
 u'title': u'DPLA: Include in Digital Public Library of America'}

#### Creator

In [51]:
# DEFINITION (DC): An entity primarily responsible for making the resource.  
# This may be a person, organization, or service.
    
# EXAMPLE: Leonardo da Vinci was the creator of the Mona Lisa

textile_cho['creator']

u'Reiss, Zenaide'

#### Description

Q: How can we tell what things are actually descriptions in the sense of the DC definiton? For example, the description of the CHO below is a description of the project, not the artifact.

In [98]:
# DEFINITION (DC): A description of the original analog or born digital artifact.

# EXAMPLE: Hard to give one because not all descriptions are descriptions, see below.

news_cho['description'] # textile_cho['description'] DOES NOT HAVE A DESCRIPTION

The Savannah Historic Newspapers database is a project of the Digital Library of Georgia as part of Georgia HomePLACE. The project is supported with federal LSTA funds administered by the Institute of Museum and Library Services through the Georgia Public Library Service, a unit of the Board of Regents of the University System of Georgia.


#### Spatial

Q: Define represents vs. depicts?

In [52]:
# DEFINITION (DC): The spatial characteristics of the CHO.  Information about the spatial characteristics of the
# orignal analog or born digital object, i.e. what the CHO represents or depicts in terms of space.
    
# EXAMPLE: This may be a named place, a location, a spatial coordinate, or named administrative entity.

cho['spatial']

[{u'coordinates': u'32.165622, -82.900075',
  u'country': u'United States',
  u'county': u'Dodge County',
  u'name': u'Dodge County, GA',
  u'state': u'Georgia'}]

#### Date

Q: Unclear.  Can be date of creation or artists lifespan.  If date of creation, then why a period?  
Q: Why a span vs. a single date repeated three times (as in other CHO's)?  
Q: Probably differs in specificiy (e.g. year, vs. year, month, and day being the 1st).
Q: What about outright incorrect dates (ex. [La-Z-Boy Strike](https://dp.la/item/00004f9b0cfd8f7317d09616d7691d4d))

In [53]:
# DEFINITION (DC): A point or period of time associated with an event in the lifecycle of the CHO.
#     BEGIN (EDM): Date/time of the start of a time span (inclusive).
#     DISPLAY DATE (DPLA): The date to be displayed by an application seeking to provide a date to accompany 
#                          the sourceResource. 
#     END (EDM): Date/time of the end of a time span (inclusive).
    
# EXAMPLE: The Mona Lisa could have a date of 1506

cho['date']

{u'begin': u'1950', u'displayDate': u'1950/1999', u'end': u'1999'}

#### Subject

Q: A term from a "controlled vocabulary" can be used.  Can the DPLA ever have a controlled vocabulary?

In [54]:
# DEFINITION (DC): Topic of the resource
    
# EXAMPLE: Fiberwork--Georgia, Textile design, Abstract.  Term from a controlled vocabulary can be used.  

cho['subject']

[{u'name': u'Fiberwork--Georgia'},
 {u'name': u'Fiberwork--United States'},
 {u'name': u'Art--Georgia'},
 {u'name': u'Textile design, Abstract'}]

#### Type

Q: Does DPLA use all DCMI types or just the subsets we see?

In [55]:
# DEFINITION (DCMI): Nature of genre of the CHO.  Collection , Dataset , Event , Image , InteractiveResource , 
# MovingImage , PhysicalObject , Service , Software , Sound , StillImage , Text
    
# EXAMPLE: The type of the "Savannah Morning News" is "text". 

cho['type']

[u'image', u'physical object']

### MISSING FIELDS FROM 250 DOCS

**MISSING FIELDS**  
  
CONTRIBUTOR (DC): 250  
EXTENT (DC): 250  
PHYSICAL MEDIUM (DC): 250  
PUBLISHER (DC): 250  
TEMPORAL (DPLA): 250  
INTERMEDIATE PROVIDER (DPLA): 250  
SPATIAL DISTANCE (DPLA): 250  
SPATIAL REGION (DPLA): 250  
INGESTION SEQUENCE (DPLA?): null in all 250 - cannot find definition  
      
**NOT YET IN DATABASE**  
  
SUBJECT ID  
SUBJECT TYPE  