This Jupyter Notebook runs files from https://github.com/selenachau/binder/ . It will walk through the main concepts that allow the python script to validates and clean ISBN numbers and pulls the most popular Library of Congress classification from ISBN numbers, using the OCLC Classify API. Use of the OCLC API must compy with the their [terms and conditions](https://www.oclc.org/research/areas/data-science/terms.html); this notebook is set up for demonstration only.

To access the quick tool that allows you to upload your own csv file of ISBN numbers, see https://github.com/selenachau/binder/blob/main/isbn-lcc-quick-convert.ipynb

**Overview**<br/>
What are Binder, Jupyter Notebooks, and Python?<br/>
What does the script do? Why was it created?<br/>
Demonstration:<br/>
--Introducing the isbnlib Python library<br/>
--Introducing the OCLC Classify API<br/>
Extensions of this script<br/>
Resources to Learn More

# What are Binder, Jupyter Notebooks, and Python?

If you are accessing this workspace from a mybinder.org link, you have launched a temporary Binder environment that is pre-configured with Python programming language and libraries, as well as test files and Jupyter Notebook files (.ipynb). Using Binder means you don't have to install things on your own computer; it is all readily available here. Note: this is a temporary space, and while you can make changes, the changes will not be saved or available in this hosted environment the next time you access the URL. You have the ability to download any files and upload any files from to and from your desktop location.

Jupyter Notebooks are interactive computational environments, in which you can read instructions and execute code, all in one page. It is a useful format for introducing programming concepts. 

Python is a programming language. It is free, but generally any code will need to be compatible with current versions of the language and any dependent packages. Meaning, it is similar to keeping computer software up to date.


# What does the script do? Why was it created?

This Python script uses a csv list of ISBN numbers as an input, cleans and converts it into its canonical form (i.e. no dashes, only numbers and X character), then uses that cleaned form of the ISBN to pass through the OCLC Classify API. The OCLC Classify API performs a search for each ISBN number and provides an finds the most popular LC classification. The Python script creates an output csv file with two columns: the cleaned ISBN number and the reciprocal LC classification.

This script, and this brief tutorial, were created as a solution for allowing ebook usage metrics to be combined with a standard subject classification. My application is to create reports on usage in a particular subject area, i.e. Literature, instead of reporting out solely on total ebook uses. Although some libraries may be able to do this with SUSHI set up, I am in a library that does not have SUSHI implemented, nor have we explored any further use of the WorldCat API for collections assessment applications. This is a free tool which may be useful for other libraries. Examples of how this script is extended further are listed at the end.

# Demonstration:
1. Introducing the isbnlib Python library
2. Introducing the OCLC Classify API

## 1. Introducing the isbnlib Python library


Definition from the [project description](https://pypi.org/project/isbnlib/): isbnlib is a python library that provides several useful methods and functions to validate, clean, transform, hyphenate and get metadata for ISBN strings.

To use OCLC Classify API, the ISBNs need to be in canonical form. That is, only digits and X like 9780321534965 and 954430603X but not 979-10-90636-07-1. The isbnlib Python library can be used to convert a list of ISBNs into their canonical form.

**Demo: validate an ISBN and convert it to its canonical form with isbnlib**

In [None]:
# isbnlib test to validate and convert an ISBN to it's canonical form
# Click anywhere inside this code cell and press Shift + Enter to run

from isbnlib import canonical, is_isbn13, is_isbn10, meta # To use the isbnlib Python library and its functions, it needs to be imported into your project

isbn13_test = "978-90-04335-46-2" # I define a variable isbn13_test to store the value in parentheses. You can replace the value with another ISBN13
print("Print true below if",isbn13_test,"is a valid ISBN13")
is_valid = is_isbn13(isbn13_test) # I am creating another variable named is_valid to the outcome of the is_isbn13 function which is either TRUE or FALSE
print(is_valid) # print the value of the variable

print("Print the canonical form of",isbn13_test)
out = canonical(isbn13_test) #The canonical function can be used with both ISBN10 and ISBN13
print(out)

isbn10_test = "954430603X"
print("Print true below if",isbn10_test,"is a valid ISBN10")
print(is_isbn10(isbn10_test))

print(meta(isbn10_test, service='wiki'))

## 2. Introducing the OCLC Classify API

OCLC Classify provides a [user interface](http://classify.oclc.org/classify2/) and a [machine service](http://classify.oclc.org/classify2/api_docs/index.html) for assigning classification numbers and subject headings. The database is searchable by many of the standard numbers associated with books, magazines, journals, and music and video recordings. These numbers include: (ISBN, ISSN, OCLC#, UPC). Author/title and FAST subject heading can also be searched. Use of OCLC's prototype is subject to OCLC's terms and conditions. By continuing past this point, you agree to abide by these terms.

My application of OCLC Classify API in this Binder project allows you to see how Python is used with the OCLC API, without needing to install anything on local computers or teach Python completely.

**Demo: Pulling LCC from ISBN**

In [None]:
# isbn to lcc test
# This uses a short list of two canonical ISBNs as a test
# Press Shift + Enter to run

import logging
import isbnlib #https://pypi.org/project/isbnlib/
import requests #https://pypi.org/project/requests/
from requests.utils import requote_uri
import xml.dom.pulldom
import xml.dom.minidom
import xml.sax.saxutils

UA = 'isbnlib (gzip)'
myheaders = {'User-Agent': UA}
base = 'http://classify.oclc.org/classify2/Classify?'
summaryBase = '&summary=true'
summaryInd = 'false'
logger = logging.getLogger(__name__)
global parmvalue
global parmtype
parmtype="isbn"

# OCLC Classify2 method
def get_oclc_data(parmtype, parmvalue=""):
    global lcc_value
    lcc_value = None
    try:
        nexturl = base + parmtype+"=" + requote_uri(parmvalue)+"&summary=true"
        logger.debug("OCLC URL: {} ".format(nexturl))
    except Exception as ue:
        logger.error("OCLC URL encode failed: {}".format(ue))
        return None
    else:
        try:
            r = requests.get(nexturl, headers=myheaders)
            if not r.ok:
                logger.error("OCLC Request returned http error: {}".format(r.status_code))
                return None
        except Exception as e:
            logger.error("OCLC URL request failed: {}".format(e))
            return None
        else:
            wq = r.text
        xdoc = xml.dom.minidom.parseString(wq)
    response = xdoc.getElementsByTagName('response')[0]
    respCode = response.attributes["code"].value
    if respCode == '0' or respCode == '2':
        recommendations = xdoc.getElementsByTagName('recommendations')[0]
        if recommendations:
            if len(xdoc.getElementsByTagName('lcc')) > 0:
                local_lcc = recommendations.getElementsByTagName('lcc')[0]
                if local_lcc:
                    for mostPopular in local_lcc.getElementsByTagName('mostPopular'):
                        nsfa = mostPopular.attributes["nsfa"].value
                        lcc_value = nsfa
    elif respCode == '4':
        works = xdoc.getElementsByTagName('works')[0]
        logger.debug('Works found: ' + str(len(works.getElementsByTagName('work'))))
        for work in works.getElementsByTagName('work'):
            try:
                m_wi = work.attributes["wi"].value
            except:
                continue
            else:
                try:
                    schemes = work.attributes["schemes"].value
                except:
                    continue
                if 'LCC' in schemes:
                    logger.debug(f'going to try to get lcc using wi {m_wi}')
                    lcc_value = get_oclc_data('wi',m_wi)
                    break
    elif respCode != '102':
        logger.error("OCLC reporting odd error {}, check by hand: {}".format(respCode,nexturl))
    if lcc_value:
        return lcc_value 
    else:
        return None

def validate_json(data):
    if str(data) == "":
        logger.error("validate_json: returns False because no data in passed string: {}".format(str(data)))
        return False
    return True

def fix_isbn(isbn):
    lib_isbn = isbnlib.canonical(isbn)
    if len(lib_isbn) in (10, 13):
        if len(lib_isbn) == 10:
            isgood = isbnlib.is_isbn10(lib_isbn)
        else:
            isgood = isbnlib.is_isbn13(lib_isbn)
        if isgood:
            return lib_isbn
    if len(lib_isbn) < 10:
        return None
    lib_isbn = isbnlib.get_isbnlike(isbn)
    if len(lib_isbn) < 10:
        return None
    lib_isbn = isbnlib.clean(lib_isbn)
    if len(lib_isbn) < 10:
        return None
    lib_isbn = isbnlib.get_canonical_isbn(lib_isbn)
    if len(lib_isbn) < 10:
        return None
    if not lib_isbn:
        return None
    if len(lib_isbn) in (10, 13):
        if len(lib_isbn) == 10:
            isgood = isbnlib.is_isbn10(lib_isbn)
        else:
            isgood = isbnlib.is_isbn13(lib_isbn)
    else:
        return None
    if isgood:
        return lib_isbn
    else:
        return None

isbn_list = ["9781452967554","9781504026253"]
lcc_list = []


for i in isbn_list:
    parmvalue =i
    x = get_oclc_data(parmtype, parmvalue)
    lcc_list.append(x)
print(lcc_list)

# Extensions of this script
This script allows a user to import a COUNTER-formatted usage report and export a file that includes added LC classification:  https://github.com/mbelvadi/lcc_from_isbn


# Resources to learn more
You can download all of the files in this Binder and experiment with them. They are provided under a CC-BY-NC-SA license.

* Browse on [GitHub.com](https://github.com)
* [Software Carpentry](https://software-carpentry.org/) and [Library Carpentry](https://librarycarpentry.org/)
* [Code{4}Lib Journal](https://journal.code4lib.org/)
* Python classes on [Code Academy](https://www.codecademy.com/catalog/language/python), [Udemy](https://www.udemy.com/topic/python/), [Coursera](https://www.coursera.org/courses?query=python), or [LinkedIn Learning](https://www.linkedin.com/learning/topics/python) and more
* [Intro to Jupyter and Jupyter Notebooks](https://jupyter.org/try-jupyter/retro/notebooks/?path=notebooks/Intro.ipynb) (short introduction)
* [Introduction to Jupyter and JupyterLab](https://coderefinery.github.io/jupyter/) (longer resource)
* [Tutorial on creating a Binder project](https://the-turing-way.netlify.app/communication/binder/zero-to-binder.html) 
* [OCLC Classify API documentation](http://classify.oclc.org/classify2/api_docs/index.html)


Copyright 2022 Selena Chau CC-BY-NC-SA granted