<a href="https://colab.research.google.com/github/iliff/atla-2019/blob/master/NotesForPythonWorkshop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Pre-conference Workshop

Information professionals are increasingly dealing with electronic data, and
as this data exponentially increases, more efficient tools and skills need to
be implemented. With this in mind, this pre-conference workshop will focus on
using Python as one of the important tools for aiding librarians in their work
and research.

## Why Python and Getting Started 

Every job has some repetitive tasks that require precision and accuracy. Here
are a list of tasks I have been assigned recently that I would like to
brainstorm different ways of solving them with you. 

### Librarian Problems 30 minutes

1. A list of ISBNs need to find associated PDFS, ePubs, and JPEGs in three
   different drives, and copy them into one folder to share it with a client. 
2. I have 12 excel workbooks that have 12 months of usage statistics, how does
   one collate them into one master sheet? And how long does it take?
3. After you collate the workbooks, you realize that there are duplicates of
   the same title (they have different ISSN/ISBNs), though some of them are
spelled slightly different. How do you merge them?
4. You want to compile a list of the books most commonly cited in theses written
   at your school that you don't have in your library, to see if it makes sense
to purchase the work. How do you go about doing that? 
5. A student is interested in Carl Schmitt's _Political Theology_, especially
   how that concept has developed sense he wrote his book in 1922. But the LC
Subject Heading was not created until 1998. How do you find resources that
might help the researcher? 
6. Compare Title lists from two different products like [Proquests Philosophy collection](https://www.proquest.com/products-services/Philosophy-Collection.html) and [EBSCO's Philosophy and Religion](https://www.ebsco.com/products/research-databases/religion-philosophy-collection); comparing eBook collections.
7. Update Dublincore Metadata. Its easier to edit a spreadsheet, but how do you get 100s of records from the spreadsheet into XML? 

### Getting Started with Jupyter Notebooks and Vim 15 minutes

1. Signup with a Microsoft Account 
2. Markdown and annotations 
3. Doing basic math in the prompt
4. Copy the tutorial notebook

## 10 Minute break

## The Basics Of Python 1hr 15 minutes

1. Variables 
2. Lists, Tuples, and Dictionaries 
3. Flow Control: if, else, for, try
4. Functions 
5. Reading and writing to texts
6. A few key libraries (os, requests, pandas)


## Applying Python to real problems 50 min

1. Define the Problem 
2. Define the desired result
3. Break the problem down into atomic parts 
4. Identify the areas that you need to research 
5. Resources for help

## References 

- [Stanford's Python notebooks](https://notebooks.azure.com/csbailey/libraries/intro-to-python-spring/html/intro_to_python-filled.ipynb)

- setup with pycharm community edition
- install python 3

# Outline of Workshop

## Library Problems 30 min 

1. A list of ISBNs need to find associated PDFS, ePubs, and JPEGs in three
   different drives, and copy them into one folder to share it with a client. 
2. I have 12 excel workbooks that have 12 months of usage statistics, how does
   one collate them into one master sheet? And how long does it take?
3. After you collate the workbooks, you realize that there are duplicates of
   the same title (they have different ISSN/ISBNs), though some of them are
spelled slightly different. How do you merge them?
4. You want to compile a list of the books most commonly cited in theses written
   at your school that you don't have in your library, to see if it makes sense
to purchase the work. How do you go about doing that? 
5. A student is interested in Carl Schmitt's _Political Theology_, especially
   how that concept has developed sense he wrote his book in 1922. But the LC
Subject Heading was not created until 1998. How do you find resources that
might help the researcher? 
6. Compare Title lists from two different products like [Proquests Philosophy collection](https://www.proquest.com/products-services/Philosophy-Collection.html) and [EBSCO's Philosophy and Religion](https://www.ebsco.com/products/research-databases/religion-philosophy-collection); comparing eBook collections.
7. Update Dublincore Metadata. Its easier to edit a spreadsheet, but how do you get 100s of records from the spreadsheet into XML? 
8. Compare new eBook package to see if it adds enough value to your current eBook holdings. 

## Getting Set up with Python and PyCharm 1:15 

1) [Download Python3.7](https://www.python.org/downloads/). This provides the Python interpreter that will allow us to execute the scripts that we write.

2) [Download PyCharm](https://www.jetbrains.com/pycharm/download). 

3) Linking PyCharm to Python 

4) Navigation around PyCharm

5) ***WHERE SHOULD WE PUT THE LESSON FILES?*** 
 
 

## DublinCore Metadata 

How do you turn a spreadsheet into DublinCore XML? 

### Getting started by using other peoples code 

- Python has batteries inlcuded
- Additional packages can be installed with `pip`.

```
pip install pandas 
pip install dcxml 
```
- [Pandas](http://pandas.pydata.org/) is a library for handeling tabular data 
- [dcxml](https://github.com/inveniosoftware/dcxml) is a library for handeling DublinCore xml 
- Notice the different qualities of the project webpages 

After installing the packages they, need to be imported into the code. This is usually done at the beginning of the script. 

In [0]:
import pandas as pd
from dcxml import simpledc


### Creating Variables and Using Dataframes

- Variables are ways to point to things with a nickname. 

- A dataframe is the pandas data object for storing data 

- Objects have different "methods". Methods are operations that can be done on the object. 

In [0]:
# pd is the alias that we gave pandas 
# This is grabing a csv from a url we have set up for this
df = pd.read_csv('https://raw.githubusercontent.com/msaxton/atla2019_workshop/master/dc_sample.csv')

# fillna() is one of the methods that dataframes have. This one replaces Not A Number 
# with nothing
df = df.fillna('')



In [0]:
# head() is another method, we can use this to explore the dataframe
df.head()

### Lists and dictionaries 

- Accessing data in data structures 
- Lists, ordered arrays 
- dictionaries, key value pairs like json 
- Dataframes are datastructures, but not basic datastructures 

In [0]:
list_of_dicts = df.to_dict(orient='records')

In [0]:
# What is in the list? 
list_of_dicts[:10] # returns the first 10 items of the list

In [0]:
dict0 = list_of_dicts[0] # lists are 0 indexed 

print(dict0) # This is a dictionary 

### Flow control, Iterating through a list

For each item in the list that we created (they are each records, each element in the record has a key value pair), we need to manipulate the item in some way. 

- Iteration 
- local variables 
- simpledoc methods 
- accessing dictionaries 
- writing files 

In [0]:
# for loops are ways of iterating through data
# list_of_dicts has a lot of dictionary items
# we are defining a varibale `dict` and reusing that variable for each iteration of the loop
for dictionary in list_of_dicts:
    new_dict = {}
    for key, value in dictionary.items():
        list_values = [value]
        new_dict[key] = list_values
    xml = simpledc.tostring(new_dict)
    filename = new_dict['identifiers'][0]

    with open(fn + '.xml', mode='w', encoding='utf8') as fp:
        fp.write(xml)

### Creating A Reusable Script

- `__name__` variables 
- The `sys` library 
- Installing `xlrd`
- `try` and `except` 
- Defining and Using functions 

The following is a standalone script that works with an argument. 

Save this script as `xlsx_to_xml.py`

- On windows run `xlsx_to_xml.py NameOfExcelFile.xlsx`
- On mac/linux `python3 xlsx_to_xml.py NameOfExcelFile.xlsx`


In [0]:
#!/usr/bin/env python
'''Taking the code we wrote, this is creating a script that we can call from the command line to create the xml files
per record. The main purpose of this is to see the structure of a python script that can be reused.'''

import sys
import pandas as pd
from dcxml import simpledc
# pip install xlrd


def dictionary2xml(dictionary):
    '''takes a dictionary and creates an xml string from the object, and the file name'''
    new_dict = {}
    for key, value in dictionary.items():
        list_value = [value]
        new_dict[key] = list_value
    xml = simpledc.tostring(new_dict)
    file_name = dictionary['identifiers']
    return xml, file_name


def main():
    '''The main function will hand the file input and output for the script'''
    import_file = sys.argv[1]
    df = pd.read_excel(import_file)
    df = df.fillna('')
    try:
      os.mkdir('XMLFiles')
    except FileExistsError:
      pass
    list_of_dicts = df.to_dict(orient='records')
    for dictionary in list_of_dicts:
        xml, file_name = dictionary2xml(dictionary)
        with open('XMLFiles/' + file_name + '.xml', mode='w', encoding='utf8') as fp:
            fp.write(xml)


if __name__ == '__main__':
    '''Python tries to limit the boiler plate type code, and this is some of the limited boiler plate we have. 
    The main purpose for this code is that if the script is imported into another script the main function will not 
    be called.'''
    main() 

## Comparing eBook Collections 

Solving a problem together 


## Two ebook collections:
- [collection one](https://raw.githubusercontent.com/msaxton/atla2019_workshop/master/ebooks1.csv)
- [collection two](https://raw.githubusercontent.com/msaxton/atla2019_workshop/master/ebooks2.csv)

### Importing Tabular Data

In [0]:
import pandas as pd 

In [0]:
ebooks1_df = pd.read_csv('https://raw.githubusercontent.com/msaxton/atla2019_workshop/master/ebooks1.csv')
ebooks2_df = pd.read_csv('https://raw.githubusercontent.com/msaxton/atla2019_workshop/master/ebooks2.csv')

### Exploring the Data 

- How do we explore the data? 
- How do we match ebook collections? 
- What kinds of problems have you had in the past comparing these things? 

In [0]:
# See how the records look with .head()
ebooks1_df.head()

Unnamed: 0,Product ID,ISBN,eISBN,Title,Author,Publisher,Publication Year,Language,LCC,LC Subject Heading,BISAC,DDC,Downloadable
0,515720,9788779000000.0,9788779000000.0,Religion,"Sørensen, Jesper-Hammer, Olav.",Aarhus University Press,2010,dan,BD215 .S663 2010eb,"Belief and doubt.,Religion.",PHILOSOPHY / Epistemology,121/.6,Y
1,515586,9788779000000.0,9788779000000.0,Property and Virginity,Agnes S. Arnórsdóttir.,Aarhus University Press,2010,eng,HQ643 .A36 2010,"Marital property--Iceland--History.,Marriage (...",SOCIAL SCIENCE / Sociology / Marriage & Family,306.8,Y
2,515721,9788779000000.0,9788779000000.0,"Religion, Politics, and Law","Lodberg, Peter.",Aarhus University Press,2009,eng,BL65.P7 R44 2009,"Democracy--Religious aspects.,Globalization.,R...",RELIGION / Reference,200,Y
3,515692,9788779000000.0,9788779000000.0,Alexandria,"Krasilnikoff, Jens A.-Hinge, George.",Aarhus University Press,2009,eng,DT154.A4 A449 2009,,HISTORY / Middle East / Egypt,962,Y
4,515644,9788779000000.0,9788779000000.0,The Discursive Fight Over Religious Texts in A...,"Jacobsen, Anders-Christian.",Aarhus University Press,2009,eng,BS1135 .D57 2009,"Christianity and other religions--Judaism.,Chr...",RELIGION / Christian Theology / Ethics,241.2,Y


In [0]:
ebooks2_df.head()

Unnamed: 0,Title,Publisher,Imprint,Publication Year,Editor/ Author,ISBN,Awards,Subject,Description,USD List Price,Single-User Restriction,Title Phase,Predicted Release Quarter,Multi-Site Sales Restricted,Geographic Sales Regions,Corp URL,Edition ID#
0,A Biblical-Theological Introduction to the New...,Crossway,Crossway,2016,"Editor: Van Pelt, Miles V.",978-1-4335-3676-2,,Religion & Theology - Christianity,Featuring contributions from respected evangel...,$75,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,12245
1,A Biblical-Theological Introduction to the Old...,Crossway,Crossway,2016,"Editor: Van Pelt, Miles V.",978-1-4335-3346-4,,Religion & Theology - Christianity,"Covering each book in the Old Testament, this ...",$75,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,12244
2,A Complete Handbook of Literary Forms in the B...,Crossway,Crossway,2014,"Ryken, Leland",978-1-4335-4114-8,,Religion & Theology - Christianity,"Whether examining genre, motifs, figures of sp...",$30,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,9857
3,A Family Guide to the Bible,Crossway,Crossway,2009,"Ditchfield, Christin",978-1-58-134891-0,,Religion & Theology - Christianity,A Family Guide to the Bible takes readers on a...,$39,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,7961
4,Blackwell Guides to Global Christianity: A New...,Wiley,Wiley,2011,"Bays, Daniel H.",978-1-4051-5954-8,Rated Essential by Choice,Religion & Theology - Christianity,"A New History of Christianity in China, writte...",$154,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,5018


In [0]:
# Get the shape of the dataframes with .shape
# shape is property and not a method 
ebooks1_df.shape

(7894, 13)

In [0]:
ebooks2_df.shape 

(91, 17)

### Finding a Point of Comparison 

- A column in each of them is ISBN 

- ```ebooks1_df['ISBN'].dtypes == ebooks2_df['ISBN'].dtypes```
- But these aren't the same datatypes. 
- How do we make them the same datatype?

In [0]:
ebooks2_df['ISBN'] = ebooks2_df['ISBN'].str.replace('-', '') # remove dashes from ebooks2 data 

In [0]:
ebooks2_df['ISBN'] = ebooks2_df['ISBN'].astype('float64') # change the datatype to match ebooks1 data 

In [0]:
duplicate = pd.merge(ebooks1_df, ebooks2_df, on='ISBN')

In [0]:
duplicate

Unnamed: 0,Product ID,ISBN,eISBN,Title_x,Author,Publisher_x,Publication Year_x,Language,LCC,LC Subject Heading,BISAC,DDC,Downloadable,Title_y,Publisher_y,Imprint,Publication Year_y,Editor/ Author,Awards,Subject,Description,USD List Price,Single-User Restriction,Title Phase,Predicted Release Quarter,Multi-Site Sales Restricted,Geographic Sales Regions,Corp URL,Edition ID#
0,989239,9780831000000.0,9780831000000.0,Commentary on John,"Cyril-Maxwell, David R.-Elowsky, Joel C.",Inter-Varsity Press,2015,eng,BR65.C953 I413 2015,,RELIGION / Biblical Commentary / New Testament,226.5/07;226.507,Y,Ancient Christian Texts: Commentary on John: V...,InterVarsity Press,InterVarsity Press,2015,"Editor: Elowsky, Joel C.",,Religion & Theology - Christianity,In the latest addition to the Ancient Christia...,$90,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,11497
1,684985,9780831000000.0,9780831000000.0,Commentary on Isaiah,"Eusebius-Elowsky, Joel C.-Armstrong, Jonathan J.",Inter-Varsity Press,2013,eng,BS1515.53 .E9713 2013eb,,RELIGION / Biblical Studies / Old Testament,224/.107,Y,Ancient Christian Texts: Commentary on Isaiah,InterVarsity Press,InterVarsity Press,2013,Eusebius of Caesarea,,Religion & Theology - Christianity,A first-ever English translation of Eusebius's...,$90,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,8129
2,684457,9780831000000.0,9780831000000.0,Commentary on John,"Cyril-Elowsky, Joel C.-Maxwell, David R.",Inter-Varsity Press,2013,eng,BR65.C953 I413 2013eb,,RELIGION / Biblical Commentary / New Testament,226.5/07,Y,Ancient Christian Texts: Commentary on John: V...,InterVarsity Press,InterVarsity Press,2013,Cyril of Alexandria,,Religion & Theology - Christianity,David Maxwell renders a service to students of...,$90,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,8131
3,650443,9780831000000.0,9780831000000.0,Dictionary of Biblical Imagery,"Duriez, Colin.-Penney, Douglas-Reid, Daniel G....",Inter-Varsity Press,1998,eng,BS537 .D48 1998,Symbolism in the Bible--Dictionaries.,RELIGION / Biblical Reference / Dictionaries &...,220.3,Y,Dictionary of Biblical Imagery,InterVarsity Press,InterVarsity Press,2010,"Editors: Ryken, Leland, Wilhoit, James C. and ...",,Religion & Theology - Christianity,The Dictionary of Biblical Imagery is the firs...,$83,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,12205
4,439323,9780691000000.0,9781401000000.0,"The ""Book of Mormon""","Gutjahr, Paul C.",Princeton University Press,2012,eng,BX8627 .G88 2012,,RELIGION / Christianity / Church of Jesus Chri...,289.3/22;289.322,Y,Lives of Great Religious Books: The Book of Mo...,Princeton University Press,Princeton University Press,2012,"Gutjahr, Paul C.",RCL recipient,Religion & Theology - Christianity,Gutjahr shows how Smith's influential book lau...,$70,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,4119
5,356475,9780691000000.0,9781401000000.0,"Dietrich Bonhoeffer's ""Letters and Papers From...","Marty, Martin E.",Princeton University Press,2011,eng,BX4827.B57 M359 2011eb,"Prisoners of war--Germany--Correspondence.,The...",RELIGION / Christianity / History,230/.044092,Y,Lives of Great Religious Books: Dietrich Bonho...,Princeton University Press,Princeton University Press,2011,"Marty, Martin E.",RCL recipient,Religion & Theology - Christianity,"For fascination, influence, inspiration, and c...",$70,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,4117
6,356016,9780691000000.0,9781401000000.0,"Augustine's ""Confessions""","Wills, Garry",Princeton University Press,2011,eng,BR65.A62 W55 2011eb,Christian saints--Algeria--Hippo (Extinct city...,RELIGION / Christianity / History,270.2092,Y,Lives of Great Religious Books: Augustine's Co...,Princeton University Press,Princeton University Press,2011,"Wills, Gary",RCL recipient,Religion & Theology - Christianity,"In this brief and incisive book, Pulitzer Priz...",$56,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,3424
7,479267,9781409000000.0,9781352000000.0,The Ashgate Research Companion to John Owen's ...,"Jones, Mark-Kapic, Kelly M.",Taylor & Francis (CAM),2012,eng,BX5207.O88 A84 2012,,RELIGION / Theology,230/.59092,Y,Ashgate Research Companions: The Ashgate Resea...,Taylor & Francis,Ashgate Publishing,2012,"Kapic, Kelly M. and Jones, Mark",RCL Recipient,Religion & Theology - Christianity,As a revival in Owen studies and reprints has ...,$67,False,Live,,No Restrictions,No Restrictions,corp.credoreference.com/component/booktracker/...,4487


In [0]:
unique = ebooks2_df.loc[~ebooks2_df['ISBN'].isin(duplicate['ISBN'])]

In [0]:
ebooks2_df.shape

(91, 17)

In [0]:
unique.shape

(83, 17)

### Advanced Matching 



Spell checks and suggested typing work with [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance).  

This algorithm can also be applied to matching text strings. 

In [0]:
!pip install fuzzywuzzy



In [0]:
from fuzzywuzzy import fuzz
title_dict_list = ebooks1_df[['ISBN', 'Title']].to_dict(orient='record')
title_dict = {}
for title in title_dict_list:
  title_dict[title['ISBN']] = title


def title_matcher(row, title_dict_list, title_dict):
  matches = []
  try:
    title_match = title_dict[row['ISBN']]
    return title_match['Title']
  except KeyError:
    pass
  for title in title_dict_list:
    fuzzratio = fuzz.ratio(row['Title'].lower(), title['Title'].lower())
    if fuzzratio > 90:
      matches.append((title['ISBN'], title['Title']))
  titles = '; '.join(m[1] for m in matches)
  return titles

In [0]:
ebooks2_df['Title_Match'] = ebooks2_df.apply(lambda x: title_matcher(x, title_dict_list, title_dict), axis=1)  #, axis=1)

In [0]:
match_df = ebooks2_df[ebooks2_df['Title_Match'] != '']

In [0]:
match_df[['Title', 'Title_Match', 'ISBN']]

Unnamed: 0,Title,Title_Match,ISBN
6,Lives of Great Religious Books: Augustine's Co...,"Augustine's ""Confessions""",9780691000000.0
10,Christian Theology,Christian Theology; Christian Theology,9780801000000.0
17,Ancient Christian Texts: Commentary on Isaiah,Commentary on Isaiah,9780831000000.0
19,Ancient Christian Texts: Commentary on John: V...,Commentary on John,9780831000000.0
20,Ancient Christian Texts: Commentary on John: V...,Commentary on John,9780831000000.0
24,Dictionary of Biblical Imagery,Dictionary of Biblical Imagery,9780831000000.0
31,Lives of Great Religious Books: Dietrich Bonho...,"Dietrich Bonhoeffer's ""Letters and Papers From...",9780691000000.0
63,Ashgate Research Companions: The Ashgate Resea...,The Ashgate Research Companion to John Owen's ...,9781409000000.0
78,Lives of Great Religious Books: The Book of Mo...,"The ""Book of Mormon""",9780691000000.0


In [0]:
duplicate[['Title_x', 'Title_y']]

Unnamed: 0,Title_x,Title_y
0,Commentary on John,Ancient Christian Texts: Commentary on John: V...
1,Commentary on Isaiah,Ancient Christian Texts: Commentary on Isaiah
2,Commentary on John,Ancient Christian Texts: Commentary on John: V...
3,Dictionary of Biblical Imagery,Dictionary of Biblical Imagery
4,"The ""Book of Mormon""",Lives of Great Religious Books: The Book of Mo...
5,"Dietrich Bonhoeffer's ""Letters and Papers From...",Lives of Great Religious Books: Dietrich Bonho...
6,"Augustine's ""Confessions""",Lives of Great Religious Books: Augustine's Co...
7,The Ashgate Research Companion to John Owen's ...,Ashgate Research Companions: The Ashgate Resea...


In [0]:
ebooks2_df.iloc[10]

Title                                                         Christian Theology
Publisher                                                 Baker Publishing Group
Imprint                                                   Baker Publishing Group
Publication Year                                                            2013
Editor/ Author                                              Erickson, Millard J.
ISBN                                                                  9.7808e+12
Awards                                                                       NaN
Subject                                       Religion & Theology - Christianity
Description                    Leading evangelical scholar Millard Erickson o...
USD List Price                                                              $50 
Single-User Restriction                                                     True
Title Phase                                                                 Live
Predicted Release Quarter   

In [0]:
# Finding the duplicate
ebooks1_df[ebooks1_df['Title'].str.contains('Christian Theology')]

Unnamed: 0,Product ID,ISBN,eISBN,Title,Author,Publisher,Publication Year,Language,LCC,LC Subject Heading,BISAC,DDC,Downloadable
2628,519991,,9780834000000.0,Christian Theology,"Wiley, H. Orton",Foundry Publishing,1952,eng,BT75.2 .W55 1952eb,"Theology, Doctrinal.",RELIGION / Christian Theology / Systematic,230,Y
2629,561520,9780834000000.0,9780834000000.0,Introduction to Christian Theology,"Wiley, H. Orton-Culbertson, Paul T.",Foundry Publishing,1946,eng,BT65 .W55 1946eb,"Theology, Doctrinal.,Theology.",RELIGION / Christian Church / Administration,254,Y
2631,519992,,9780834000000.0,Christian Theology,"Wiley, H. Orton",Foundry Publishing,1943,eng,BT75 .W55 1943,"Theology, Doctrinal.",RELIGION / Christianity / General,230.01,Y
2991,1164261,9780831000000.0,9780831000000.0,An Invitation to Analytic Christian Theology,"McCall, Thomas H.",Inter-Varsity Press,2015,eng,BR118,"Analysis (Philosophy),Philosophical theology.,...",RELIGION / Christian Theology / General,230/.046,Y
3454,920466,9780227000000.0,9780228000000.0,Content and Method in Christian Theology,"Sell, Alan P. F.",James Clarke & Co,2014,eng,BT21.2 .S455 2014eb,"Theology, Doctrinal--History.",RELIGION / Theology,230.092,Y
3469,852849,9780227000000.0,9780228000000.0,Christian Theology and Islam,"Root, Michael-Buckley, James Joseph",James Clarke & Co,2014,eng,BP172 .C438 2014,"Christianity and other religions--Islam.,Islam...",RELIGION / Comparative Religion,261.27,Y
3497,814364,9780228000000.0,9780228000000.0,Christian Theology and Religious Pluralism,"Nah, David S.",James Clarke & Co,2013,eng,BR127 .N34 2013eb,"Christianity and other religions.,Incarnation-...",RELIGION / Christian Life / Social Issues,261.2,Y
5708,398365,9780755000000.0,9781317000000.0,Christian Theology and Tragedy,"Waller, Giles.-Taylor, T. Kevin.",Taylor & Francis (CAM),2011,eng,BR115.T73 C47 2011eb,"Theology.,Tragedy--History and criticism.,Trag...",RELIGION / Christianity / General,261.5/8,Y
6037,813783,9780719000000.0,9780719000000.0,Christian Theology and African Traditions,"Michael, Matthew.",The Lutterworth Press,2013,eng,BR1360 .M53 2013eb,"Christianity--Africa.,Theology.",HISTORY / Africa / General,230.1,Y
7610,612511,9781611000000.0,9781622000000.0,Christian Theology and African Traditions,"Michael, Matthew.",Wipf & Stock Publishers,2013,eng,BR1360 .M53 2013eb,"Christianity and culture--Africa.,Religion and...",RELIGION / Christian Theology / History,230.096,Y


## Finding Files 

System administration and automation are also one of the key uses for python. 

### Imports and systemsetup

In [0]:
from shutil import copy2 
import os 

#macOS directories
JPEGS_DIR = r'sample_directory/jpegs/'
PDFS_DIR = r'sample_directory/sub_dir/pdfs/'
EBOOKS_DIR = r'sample_directory/ebooks/'
'''
#windowes directories 
JPEGS_DIR = r'sample_directory\jpegs\'
PDFS_DIR = r'sample_directory\sub_dir\pdfs\'
EBOOKS_DIR = r'sample_directory\ebooks\'
'''
# SLASH = r'\' # For Windows
SLASH = r'/' # For Mac

### Walking Directory Paths 


In [0]:

def get_path_names(ext, path='sample_directory'):
    path_dict = {}
    for dirpath, _, fnames in os.walk(path):
        for f in fnames:
            ftup = os.path.splitext(f)
            if ftup[1] == ext:
                path_dict[ftup[0]] = dirpath
    return path_dict

### Copying files 



In [0]:
def file_copier(file_name, dir_dict, ext):
  try:
    copy2('{}{}{}.{}'.format(dir_loc[file_name], SLASH, file_name, ext))
  except (FileNotFoundError, KeyError):
    print('File Not Found: {}.{}'.format(file_name, ext))
    

### Putting it all together

Now that we have the pieces of the code, we can put it together to run through all of the functions. 

In [0]:
def main():
  files_dir = {
      'pdf': get_path_names('pdf', path='sample_directory'+SLASH+'sub_dir'+SLASH+'pdfs'),
      'jpeg': get_path_names('jpeg', path='sample_directory'+SLASH+'jpegs'), 
      'ebook': get_path_names('ebook', path='sample_directory'+SLASH+'ebooks')
  } 
  with open(sys.argv[1]) as files:
    for fn in files:
      fn = fn.strip() # removing whitespace around the file name 
      for ext, dir_dict in files_dir.items():
        file_copier(fn, dir_dict, ext)
  print('Finished')

### References 

This is the original code, I wrote for moving files. 

In [0]:
#!/usr/bin/env python3
'''
To use this type:
file_finder_and_copy ISBNFILE NEWDIRECTORYNAME. The user needs to supply
ISBNFILE and NEWDIRECTORYNAME
And it will go line by line through ISBNFILE and copy the files to the
specified
new directory on your desktop. This directory must be created as new
'''

from shutil import copy2 # copy2 copies with the metadata to a directory
# this takes a standard library and and only imports one of the methods
# of the  library
import sys # sys is a file that handles some system commands, here it is only
# used for importaning the system argument
import os # os is Operating System specifics
import stat 
# These are constants that you should change to where you have your
# remote directories mounted on your system
JPEG_SERVER = r'O:\Graphic Resources\Cover Images\HiRes JPEGs' # you will need to change This
# These need be searched within subdirectories
EBOOK_SERVER = r'G:\Digital Library'


#This is where you want the temporary directory to be stored
PATH_TO_FOLDER = r'C:\Users\bgoodwin\Downloads\Telegram Desktop' #This is the directory
    #for the new files

def get_path_names(ext, path=EBOOK_SERVER):
    path_dict = {}
    for dirpath, _, fnames in os.walk(path):
        for f in fnames:
            ftup = os.path.splitext(f)
            if ftup[1] == ext:
                path_dict[ftup[0]] = dirpath
    return path_dict


def main():
    new_dir = PATH_TO_FOLDER + '\\' + sys.argv[2] # sys.argv[2] is the
        #second item mentioned in the command prompt
    os.makedirs(new_dir)
    os.chmod(new_dir, stat.S_IWRITE)
    # sys.argv[1] is the first argument listed in the command prompt,
    # this is the file that you have the isbns stored in
    epub_dict = get_path_names('.epub')
    pdf_dict = get_path_names('.pdf')


    with open(sys.argv[1]) as isbnfile: # This opens files in a way
        #that is cleaner for the garbage collector
        for isbn in isbnfile:
            # copy2 takes two arguments the first is the source files
            # the second is the destination where you want to copy it to
            isbn = isbn.strip() # This will strip the new line from the
                #isbn file
            try:
                #'somestring{}'.format() will replace the {} with
                #whatever is in
                # the format function
                copy2('{}\\{}.jpg'.format(JPEG_SERVER, isbn), new_dir)
            except FileNotFoundError:
                print('File Not Found: {}.jpg'.format(isbn))
            try:
                copy2('{}\\{}.epub'.format(epub_dict[isbn], isbn), new_dir)
            except (KeyError, FileNotFoundError) as e:
                print('File Not Found: {}.epub'.format(isbn))
            try:
                copy2('{}\\{}.pdf'.format(pdf_dict[isbn], isbn), new_dir)
            except (KeyError, FileNotFoundError) as e:
                print('File Not Found: {}.pdf'.format(isbn))


if __name__ == '__main__':
    main()