# Python Uses for Atla Librarians

## Generate Dublin Core XML Records from a Spreadsheet
XML (eXtensible Markup Language) is a useful format for ingesting Dublin Core records into a digital reposotry (such as Islandora). However, it is far easier to enter metadata into something like a spreadsheet rather than an xml file. Python allows users to enter metadata into a spreadsheet and then automatically convert each row to an xml file for ingest.

### Install and Import the Python Libraries you will use

In [None]:
!pip install dcxml

In [1]:
import pandas as pd  # pd is an alias for pandas (coders are lazy and don't want to type too much)
from dcxml import simpledc

### Load the data

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/msaxton/atla2019_workshop/master/dc_sample.csv')

Note: The column names MUST be as follows (thought the order doesn't matter): titles, dates, creators, contributors, descriptions, coverage, subject, types, formats, identifiers, languages, publishers, relations, rights, sources.

In [None]:
# inspect data
df.head()

### Process data

In [None]:
# replace NaN with and empty string
df = df.fillna('')

`simpledc` requires that each record be in the form of a Python dictionary, so we will frist convert each row of the dataframe to a dictionary

In [None]:
# change each row to a Python dictionary
list_of_dicts = df.to_dict(orient='records')

In [None]:
# check data
print(list_of_dicts[0])

Additionaly, `simpledc` expects that the values of the dictionary will be a Python list, even if the list only has one item. Therefore we will put each of the values in a list.

In [None]:
for dict_ in list_of_dicts:  # iterate through the list of dictionaries
    for key, value in dict_.items():  # iterate through each key-value pair in the dictionary
        list_values = [value]  # change the value to a list
        dict_[key] = list_values  # pair the list of values with the key
    xml = simpledc.tostring(dict_) # make each dictionary a dublin core xml string
    fn = dict_['identifiers'][0]  # set the idendifier as the file name
    with open(fn + '.xml', mode='w', encoding='utf8') as f:
        f.write(xml)

## Compare Large Ebook Collections
Python can be useful when you need to compare a potential new ebook collection with an ebook collection you already own in order to identify duplicates. 

### Load the data

In [2]:
ebooks1_df = pd.read_csv('https://raw.githubusercontent.com/msaxton/atla2019_workshop/master/ebooks1.csv')
ebooks2_df = pd.read_csv('https://raw.githubusercontent.com/msaxton/atla2019_workshop/master/ebooks2.csv')

In [3]:
# inspect data
ebooks1_df.head()

Unnamed: 0,Product ID,ISBN,eISBN,Title,Author,Publisher,Publication Year,Language,LCC,LC Subject Heading,BISAC,DDC,Downloadable
0,515720,9788779000000.0,9788779000000.0,Religion,"Sørensen, Jesper-Hammer, Olav.",Aarhus University Press,2010,dan,BD215 .S663 2010eb,"Belief and doubt.,Religion.",PHILOSOPHY / Epistemology,121/.6,Y
1,515586,9788779000000.0,9788779000000.0,Property and Virginity,Agnes S. Arnórsdóttir.,Aarhus University Press,2010,eng,HQ643 .A36 2010,"Marital property--Iceland--History.,Marriage (...",SOCIAL SCIENCE / Sociology / Marriage & Family,306.8,Y
2,515721,9788779000000.0,9788779000000.0,"Religion, Politics, and Law","Lodberg, Peter.",Aarhus University Press,2009,eng,BL65.P7 R44 2009,"Democracy--Religious aspects.,Globalization.,R...",RELIGION / Reference,200,Y
3,515692,9788779000000.0,9788779000000.0,Alexandria,"Krasilnikoff, Jens A.-Hinge, George.",Aarhus University Press,2009,eng,DT154.A4 A449 2009,,HISTORY / Middle East / Egypt,962,Y
4,515644,9788779000000.0,9788779000000.0,The Discursive Fight Over Religious Texts in A...,"Jacobsen, Anders-Christian.",Aarhus University Press,2009,eng,BS1135 .D57 2009,"Christianity and other religions--Judaism.,Chr...",RELIGION / Christian Theology / Ethics,241.2,Y


In [4]:
# inspect data
ebooks1_df.shape

(7894, 13)

In [5]:
# inspect data
ebooks2_df.head()

Unnamed: 0,Title,Publisher,Imprint,Publication Year,Editor/ Author,ISBN,Awards,Subject,Description,USD List Price,Single-User Restriction,Title Phase,Predicted Release Quarter,Multi-Site Sales Restricted,Geographic Sales Regions,Corp URL,Edition ID#
0,A Biblical-Theological Introduction to the New...,Crossway,Crossway,2016,"Editor: Van Pelt, Miles V.",978-1-4335-3676-2,,Religion & Theology - Christianity,Featuring contributions from respected evangel...,$75,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,12245
1,A Biblical-Theological Introduction to the Old...,Crossway,Crossway,2016,"Editor: Van Pelt, Miles V.",978-1-4335-3346-4,,Religion & Theology - Christianity,"Covering each book in the Old Testament, this ...",$75,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,12244
2,A Complete Handbook of Literary Forms in the B...,Crossway,Crossway,2014,"Ryken, Leland",978-1-4335-4114-8,,Religion & Theology - Christianity,"Whether examining genre, motifs, figures of sp...",$30,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,9857
3,A Family Guide to the Bible,Crossway,Crossway,2009,"Ditchfield, Christin",978-1-58-134891-0,,Religion & Theology - Christianity,A Family Guide to the Bible takes readers on a...,$39,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,7961
4,Blackwell Guides to Global Christianity: A New...,Wiley,Wiley,2011,"Bays, Daniel H.",978-1-4051-5954-8,Rated Essential by Choice,Religion & Theology - Christianity,"A New History of Christianity in China, writte...",$154,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,5018


In [6]:
# isnspect data
ebooks2_df.shape

(91, 17)

### Point of comparison
In order to compare these dataframes, we need a column that contains common infomration. ISBN is probably the best best, but we need to make sure that the ISBN columns are formatted the same way

In [7]:
ebooks1_df['ISBN'].dtypes == ebooks2_df['ISBN'].dtypes

False

In [8]:
ebooks2_df['ISBN'] = ebooks2_df['ISBN'].str.replace('-', '')  # remove dashes from ebooks2 data
ebooks2_df['ISBN'] = ebooks2_df['ISBN'].astype('float64')  # change the datatype to match ebooks1 data

In [9]:
ebooks1_df['ISBN'].dtypes == ebooks2_df['ISBN'].dtypes

True

In [12]:
ebooks2_unique_df = ebooks2_df.loc[~ebooks2_df['ISBN'].isin(ebooks1_df['ISBN'])]
ebooks2_duplicate_df = ebooks2_df.loc[ebooks2_df['ISBN'].isin(ebooks1_df['ISBN'])]

In [11]:
ebooks2_unique_df.shape

(83, 17)

In [13]:
ebooks2_duplicate_df.shape

(8, 17)

In [15]:
ebooks2_duplicate_df

Unnamed: 0,Title,Publisher,Imprint,Publication Year,Editor/ Author,ISBN,Awards,Subject,Description,USD List Price,Single-User Restriction,Title Phase,Predicted Release Quarter,Multi-Site Sales Restricted,Geographic Sales Regions,Corp URL,Edition ID#
6,Lives of Great Religious Books: Augustine's Co...,Princeton University Press,Princeton University Press,2011,"Wills, Gary",9780691000000.0,RCL recipient,Religion & Theology - Christianity,"In this brief and incisive book, Pulitzer Priz...",$56,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,3424
17,Ancient Christian Texts: Commentary on Isaiah,InterVarsity Press,InterVarsity Press,2013,Eusebius of Caesarea,9780831000000.0,,Religion & Theology - Christianity,A first-ever English translation of Eusebius's...,$90,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,8129
19,Ancient Christian Texts: Commentary on John: V...,InterVarsity Press,InterVarsity Press,2013,Cyril of Alexandria,9780831000000.0,,Religion & Theology - Christianity,David Maxwell renders a service to students of...,$90,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,8131
20,Ancient Christian Texts: Commentary on John: V...,InterVarsity Press,InterVarsity Press,2015,"Editor: Elowsky, Joel C.",9780831000000.0,,Religion & Theology - Christianity,In the latest addition to the Ancient Christia...,$90,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,11497
24,Dictionary of Biblical Imagery,InterVarsity Press,InterVarsity Press,2010,"Editors: Ryken, Leland, Wilhoit, James C. and ...",9780831000000.0,,Religion & Theology - Christianity,The Dictionary of Biblical Imagery is the firs...,$83,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,12205
31,Lives of Great Religious Books: Dietrich Bonho...,Princeton University Press,Princeton University Press,2011,"Marty, Martin E.",9780691000000.0,RCL recipient,Religion & Theology - Christianity,"For fascination, influence, inspiration, and c...",$70,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,4117
63,Ashgate Research Companions: The Ashgate Resea...,Taylor & Francis,Ashgate Publishing,2012,"Kapic, Kelly M. and Jones, Mark",9781409000000.0,RCL Recipient,Religion & Theology - Christianity,As a revival in Owen studies and reprints has ...,$67,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,4487
78,Lives of Great Religious Books: The Book of Mo...,Princeton University Press,Princeton University Press,2012,"Gutjahr, Paul C.",9780691000000.0,RCL recipient,Religion & Theology - Christianity,Gutjahr shows how Smith's influential book lau...,$70,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,4119
