# Just Enough Python

Accessing the [OCLC New Titles API](https://developer.api.oclc.org/new-titles-api) with Python

## Introduction: What is Python?

Python is a programming language. More specifically, it is a high-level, general-purpose, interpreted programming language.

* High-level: Python is fairly abstracted from actual machine code. You won't have to handle individual bytes for the most part, and Python makes it fairly difficult to access things like memory allocation, as most of that is handled automatically. This can be both an advantage and a disadvantage, but for beginning programmers, it's mostly a relief not to have to think about it.

* General-purpose: Python can be used to run fully-featured applications and websites, but it can also be used as a scripting language for one-off tasks. You can program games in Python (although it's not great for that) or build machine learning algorithms (it *is* great for that).

* Interpreted: Many other languages are constructed such that their code must be "compiled" into an application/executable file before it is run. Python code is compiled as it is run, or "just in time", by something called an interpreter, allowing for newly-written code to be deployed much more quickly.

Python is more human-readable than most other programming languages. Also, you don't typically need to write as much Python code to do a task as you'd have to write in another programming language.

Python has a fairly comprehensive standard library, with well-defined functions.

Python is relatively slow to run (compared to C, C++, Rust, C#, Java, etc.), but fast to code in. It also has a larger user base and robust documentation. This is why it's preferred by many as a scripting language for small projects.

Python was first published in February, 1991, by Guido van Rossum.

Python was initially written in C, although there are now Java and .net implementations.

Python is named after Monty Python's Flying Circus. Many older Python code examples have variable names like "spam" and "eggs", and other references to the sketch show.

### Why can I use Python for?

Python scripts are a quick and *relatively* painless way to automate tasks on your computer. Do you need to catalog, rename, or move around a bunch of files on your system? Python's `os` and `pathlib` modules make that straightforward.

The `pandas` library lets you process tabular data easily in Python, and in ways that would require a lot of extra work in Excel or Google Sheets.

Python's suite of statistical, prediction/modeling, and data visualization packages are well-suited for data science projects.

APIs are relatively easy to access with Python.


### What are the downsides?

Learning to program is a time investment. You may decide that it's not worth it for you to invest time in learning to code in Python, and that's fine.

Learning to program is *frustrating*. It can be a humbling experience, and it can take time to acclimate the emotional rollercoaster of running into barriers and finding ways to break through them.

Automation is only helpful if the time saved is greater than the time spent automating... *in the long run.* If it's your first time using a programming language, automating a task may initially incur a huge expense of time. What's worse: until you've gained more experience programming, you may not necessarily know whether a particular project will be worth the time.

### With all that being said...

If you *do* want to learn Python, whether you dabble in it or jump in with both feet, there is an enormous community of programmers who are eager to share what they know and to help beginners.

### What is "Just Enough" Python?

Part 0 - enough to get set up, and to run the script and export a .xlsx file (essentially, no Python).

Part 1 - enough to understand the basics of the Python language.

Part 2 - enough to understand the script, and to start tinkering with it and customizing it.

### What's the Goal?

This workshop is a demonstration of the OCLC New Titles API, one of many APIs OCLC has available.

The code in the OCLC_API_base.ipynb notebook will produce an Excel file that contains New Titles data in the specified date range.

Let's take a look at that Excel file first, so we can see what the code does.

# Part 0 - Installation & Setup

## Setup - Anaconda

Anaconda is one of the most convenient out-of-the-box solutions for beginning Python programmers. It offers built-in support for Jupyter Notebooks, which are one of the most common formats for data science projects because they allow code to be presented elegantly alongside text and visualizations. 

### Installing Anaconda

If you just go to the Anaconda homepage, it will ask for your email but you may instead go directly to the [Anaconda download success page](https://www.anaconda.com/download/success) to start your download.

Once you have downloaded the installer, open it and follow the instructions on the Anaconda docs page for [Windows](https://docs.anaconda.com/anaconda/install/windows/), [Mac](https://docs.anaconda.com/anaconda/install/mac-os/), or [Linux](https://docs.anaconda.com/anaconda/install/linux/), depending on your operating system.

### Virtual Environments

Virtual environments are ["sandboxes"](https://en.wikipedia.org/wiki/Sandbox_(software_development)) which allow the user to create a fresh installation of Python (and associated modules and packages) for each project they start. This may seem like an unnecessary extra step when you're just starting out, but as you begin to work on multiple projects, using virtual environments can be crucial for maintaining a clean Python workspace.

When virtual environments are not used, Python packages with different dependencies can cause problems and create a lot of extra work. Fortunately, setting up virtual environments in Anaconda is straightforward (relative to other methods of setting them up, such as with venv, virtualenvironment, or Poetry).


### Virtual Environments - Setup

On the left side of the Anaconda Navigator window, click the "Environments" tab. Here you will find a list of the virtual environments you have available for use in Anaconda.

If this is your first time using Anaconda, you will only see one environment listed: "base (root)".  On the right side of the page there is a list of all the packages installed in that environment.

![title](AnacondaVE2.png)

### Virtual Environments - Creating

Click "Create" at the bottom left corner of the page. When prompted to choose a name for the new environment, type "OCLC". You can leave the other choices at their default settings.

![title](AnacondaVE3.png)

### Virtual Environments - Using `conda` to Install Packages and Modules

Now that we've created an environment, it's time to install a package we will need for exporting a file in Excel format.

Next to our new environment, there is a "play" button.

Click this, and select "Open Terminal" from the dropdown menu.

![title](OpenTerminal.png)


Type in `conda install xlsxwriter` and hit enter.

![title](Terminal1.png)

conda will say "Solving environment" for a moment before presenting you with a summary of what it will install, and asking you to make a decision. When prompted, type in "y" and hit enter. 

![title](Terminal2.png)

Once conda has finished installing the `xlsxwriter` module, it will notify you that it is finished. You may now close the terminal window.

![title](Terminal3.png)

### More Modules

Repeat this process for the following modules, using the anaconda commands below:

`conda install pandas`

`conda install matplotlib`

`conda install -c main requests-oauthlib`

`conda install -c conda-forge jupyterlab jupyterlab-git` 

Pandas depends on numpy, so it will be installed as one of its components. The same is true of requests-oauthlib depending on oauthlib ("Open Authorization Library").

The JupyterLab Git extension will allow you to "clone" the GitHub repository for this notebook and its associated files, automatically downloading them to your local environment.

### Virtual Environments - Installing Jupyter

Once Anaconda finishes setting up the environment, go back to your Home page by clicking the Home icon at the top left of the page. Click "Install" on JupyterLab and Jupyter Notebook (it doesn't matter which order).

![title](AnacondaVE5.png)

![title](AnacondaVE6.png)

Next, click "Launch" on JupyterLab.

## What are Jupyter Notebooks?

"Jupyter" is an abbreviation for Julia, Python, and R, the three languages it was originally designed to support (Jupyter notebooks can now be used for code in [many languages](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels))

Jupyter Notebooks allow code and text to be presented elegantly in the same document.

Jupyter Notebook and JupyterLab are both platforms that support Jupyter Notebooks (.ipynb or "IPython notebook" files).

These files are collections of cells that can contain code, raw text, or markdown. Here are examples of each type of cell, labeled accordingly.

In [1]:
### code cell

x = 2
print(x)

2


### markdown cell

markdown with `code examples`

### Jupyter Notebook/JupyterLab Commands

In the Help menu, select "Show Keyboard Shortcuts". The most commonly used are:

Esc: Switch from Edit Mode to Command Mode

**Shift-Enter/Shift-Return - run current cell**

A - Make a new cell above

B - Make a new cell below

C - Copy current cell

**X - Cut current cell**

V - Paste copied/cut cell

**DD - Delete current cell (be careful!)**

**Z - Undo**

M - Change current cell to markdown

Y - Change current cell to code

R - Change current cell to raw text

\* NB: "Z" for Undo works while you are in Command Mode, but ctrl-z/command-z works when you're in Edit Mode in a cell. Each cell has a separate edit history for ctrl-z/command-z Undo functions.

### Checking our Environment

Given that we're using a virtual environment, it's good to double-check that we're in the right environment.

The sys module provides one way to verify this. It can also be done through the use of system-specific shell commands, which can be run in Jupyter by prefixing them with a "!".

On Mac and Linux systems, use `!which python` and on Windows, use `!WHERE python`

In [2]:
import sys

print(sys.executable)

C:\Users\davidm\AppData\Local\anaconda3\envs\OCLC_API\python.exe


In [3]:
#Windows:
!WHERE python

#Mac/Linux:
#!which python

C:\Users\davidm\AppData\Local\anaconda3\envs\OCLC_API\python.exe
C:\Users\davidm\AppData\Local\anaconda3\python.exe
C:\Users\davidm\AppData\Local\Microsoft\WindowsApps\python.exe


## What is a pandas DataFrame?

Pandas (from "panel data") is a Python library (a collection of modules) that extends Python's basic capabilities by adding support for tabular data.

It's like a spreadsheet, but you don't click in cells and type into it; you use functions to manipulate the data it contains.

Though individual cells are more difficult to edit in a DataFrame than in an application like Excel, the platform lends itself well to making sweeping edits quickly. This is a huge advantage when you want to clean data or engineer new features in a dataset.

Pandas has excellent integration with statistical functions, so it's easy to get summary statistics for an entire dataset.

Pandas is used so often by Python programmers that they abbreviate it to `pd` so they don't have to type four extra letters each time they use it.


In [4]:
import pandas as pd

df = pd.read_excel('OCLC_New_Titles_May_2024.xlsx', index_col=0)
df.head()

Unnamed: 0,oclcNumber,title,titleInfo,creator,contributors,isPrimary,relatorCodes,date,machineReadableDate,language,...,specificFormat,publisher,publicationPlace,isbns,subjectsText,id,holdsItem,dateHoldingSet,peerReviewed,citationUrl
0,3010303,"The white angel of the world, that foretells t...","The white angel of the world, that foretells t...",Samuel W. Small,"['Samuel W. Small', 'Charles Morris']","[True, False]","['', '']",[©1891],1891,eng,...,PrintBook,Peerless Pub. Co,"Philadelphia, Pa.",,"Temperance, Tempérance",5640,True,2024-05-31T23:52:01.000+0000,N,
1,30403332,"Temperance recollections. Labors, defeats, tri...","Temperance recollections. Labors, defeats, tri...",John Marsh,['John Marsh'],[True],[''],1866,1866,eng,...,PrintBook,C. Scribner & Co.,New York,,"Temperance, Tempérance",5640,True,2024-05-31T23:40:05.000+0000,N,
2,1141833,The ribbon workers,The ribbon workers,James M. Hiatt,['James M. Hiatt'],[True],[''],1878,1878,eng,...,PrintBook,J.W. Goodspeed,Chicago,,"Temperance Biography, Tempérance Biographies,...",5640,True,2024-05-31T23:27:42.000+0000,N,
3,939385837,미국 오리건 대학교 조던 슈니처 박물관 소장 한국 문화재 = Korean art c...,미국 오리건 대학교 조던 슈니처 박물관 소장 한국 문화재 = Korean art c...,Jordan Schnitzer Museum of Art,[' '],[False],[''],2015,2015,kor,...,PrintBook,국립 문화재 연구소,Taejŏn Kwangyŏksi,"['9788929907044', '8929907040']","Jordan Schnitzer Museum of Art Catalogs, Jorda...",5640,True,2024-05-31T23:23:43.000+0000,N,
4,1296680,Autobiography of the first forty-one years of ...,Autobiography of the first forty-one years of ...,Sylvanus Cobb,['Sylvanus Cobb'],[True],[''],1867,1867,eng,...,PrintBook,Universalist Publishing House,Boston,"['0524073090', '9780524073094']","Cobb, Sylvanus, 1798-1866",5640,True,2024-05-31T23:23:04.000+0000,N,


## OCLC - New Titles in a DataFrame

This DataFrame contains data from the OCLC New Titles API for the month of May, 2024.

The other notebook in this folder, OCLC_API_base.ipynb, will query the API, then convert the data to .xlsx format and export it for the dates you select. (Select the dates by entering them in the cell near the top of the notebook labeled "EDIT THIS CELL:".)

You don't need to know how Python works to run the other notebook. All you have to do is pick the start and end date for your list of new titles. (1 month max recommended; the script can take between 15-25 minutes to retrieve data for ~30,000 titles).

# Part I - Python Essentials

Python Essentials

## What is the most important rule to remember when writing code in Python?

**COMMENT YOUR CODE.**

#Comments (with an octothorpe) and """docstrings""" (with three sets of quotation marks, either single or double) provide context for people who don't know what your code does.

*That includes future you.* Seriously. Document your code so you'll be able to pick up the trail if you have to pause or reopen a project.

## Data Types

Python's built-in data types are as follows (the grayed-out types won't be relevant for this project):

Text: str (string)

Numeric: int (integer), float (floating-point number), <span style="color:#BABABA">complex (imaginary numbers)</span>

Sequences: list, tuple, range

Mapping: dict (dictionary)

Set: set, <span style="color:#BABABA">frozenset (immutable set)</span>

Boolean: bool (True and False)

<span style="color:#BABABA">Binary: bytes, bytearray, memoryview</span>

None: None


## Strings

For now, we'll hold off on discussing integers and floats (decimal numbers), but strings are very important to this project, as we'll be using them for our API parameters, as components of a URL, and also to access keys in dictionaries.

Strings can be combined using the "+" operator. Say we have two string variables that denote the start date and end date of a process, and we want to print them out in a coherent sentence. We can do so like this:

In [5]:
start_date = '2024-05-01'
end_date = '2024-05-31'

full_string = start_date + " to " + end_date

print(full_string)

2024-05-01 to 2024-05-31


Strings can contain any kind of character, but they have to be opened and closed by the same kind of quotation mark, either single or double. If a string contains one of these characters, such as an apostrophe in a contraction, the other kind must be used to open and close it, otherwise the code won't run correctly.

In [6]:
string1 = "This doesn't work unless you use double-quotes to open and close the string."
print(string1)

This doesn't work unless you use double-quotes to open and close the string.


### Coding Exercise

Try replacing the double-quotes around the string with single-quotes:

In [7]:
################################################################################
################################################################################

string2 = "This doesn't work unless you use double-quotes to open and close the string."

################################################################################
################################################################################

This should generate the following error message: "`SyntaxError: unterminated string literal (detected at line 4)`"

### Error Messages Are Your Friends!

No, really!

This may seem counterintuitive, but two of the best reasons to learn Python as opposed to other programming languages are that its error handling system is very robust, and that *many other programmers have already made the same mistakes as you.*

If you run your code and get an error message, you can copy-paste the last line of it (minus any sensitive information) into a search engine and you will likely find an example of someone else who's gotten the same (or very similar) error message, and asked about it on a forum.

Great sources to check for this sort of thing include [StackOverflow](https://stackoverflow.com/) and [GitHub](https://github.com/). Reddit works too, if you can ignore the usernames.

Remember, if you get an error message, change your code, and get a new error message, that's progress!

### ** Escape Characters

"`\`" is called an "escape character" in Python (and in markdown cells). Placing an escape character before another character in a string will cause a different behavior from the character by itself. In Python strings, "\\t" represents a tab, and "\\n" represents a newline character. 

Also, in order for a "\\" to show up correctly in markdown cells, it has to have another \\ in front of it.

Double-click in this cell to see how many backslashes there actually are in the markdown text.

Notice that the backslash in graves doesn't require a leading backslash.

In [8]:
print('line 1\n\n\tline 2 (tab-indented)')

line 1

	line 2 (tab-indented)


## Objects

An "object" in Python is kind of like a noun (if you're thinking about it in grammatical terms, an "object" can be either a subject or an object). It's a thing that exists in the virtual space of the coding environment.

A text string is an example of an object, and so is an integer. So are lists.

The following code defines the variable t as the string of characters "text string", and sets the variable z to be an integer with a value of 50, then puts them in a list called "stuff". The variables "t", "z", and "stuff" are all objects.

Everything\* in Python is an object, which is not true of all programming languages.


*\*nearly everything... there are a couple of exceptions, but they're well outside the scope of this project.*

In [9]:
t = 'text string'
z = 50
stuff = [t, z]

In [10]:
stuff

['text string', 50]

In [11]:
z

50

## Classes

`Classes` are blueprints/templates for objects. Individual objects are "instances" of Classes. When you define a variable as a string in Python, you are making an instance of the string class.
You can tell what Class an object is by using the `type()` function around the object.

In [12]:
x = 'PBJ sandwich'
type(x)

str

In [13]:
type(df)

pandas.core.frame.DataFrame

### ** Dynamic Typing

While many other programming languages require the user to declare what kind of type a new object is as it's created, Python's approach is more laissez-faire. Python is coded such that it will attempt to detect what type an object is based on context, usually by checking what methods are associated with the object.

This is sometimes called "duck typing", as in "if it walks like a duck, and it quacks like a duck, then it's probably a duck".

This is useful in that objects in Python are a lot more flexible than in some other languages; you can iterate over most of the container objects the same way, and Python will "know" what to do with them. The downside is that if a function receives input in a format it's not designed for, there can sometimes be unexpected results.

In [14]:
class Duck:

    # __init__() sets attributes of a new class instance
    def __init__(self, name):
        self.name = name
        
    # define class methods swim and fly
    def swim(self):
        print("{} the duck is swimming.".format(self.name))

    def fly(self):
        print("{} the duck is flying.".format(self.name))

class Whale:
    def __init__(self, name):
        self.name = name
        
    #define class method swim
    def swim(self):
        print("{} the whale is swimming.".format(self.name))

In [15]:
ferdinand = Duck('Ferdinand')
ferdinand.swim()

Ferdinand the duck is swimming.


In [16]:
george = Whale('George')
george.swim()

George the whale is swimming.


### ** Coding Exercise

Try replacing "`.swim()`" with "`.fly()`":

In [17]:
################################################################################
################################################################################

george.swim()

################################################################################
################################################################################

George the whale is swimming.


## Functions

A function in Python is a kind of object, specifically an object that is a tool used for a single purpose. 

Functions in Python are blocks of code that only run when they are "called".

Functions may have inputs, and may also produce outputs. In Python, functions are defined using the `def function_name():` syntax.

When defining a function, the variables within the parentheses are referred to as "parameters". When the function is called, the values that are actually passed to the function via these parameters are referred to as "arguments".

To call a function, type its name appended with parentheses that contain whatever arguments are supposed to be passed to it, like `function_name(arg1, arg2)`

In [18]:
def hello_world():
	print('Hello world!')
    
def print_yelling(text):
	print(text.upper())

In [19]:
hello_world()

Hello world!


In [20]:
print_yelling('Hello world!')

HELLO WORLD!


### The `return` Keyword

Printing the output is fine for human users who just want to read it, but it's generally more useful to have the function give an output by means of a `return` statement. If you want to print the output, you can store the output as a variable, and then print it.

In [21]:
def hello_world_return():
    return 'Hello world!'

greeting = hello_world_return()

print(greeting)

Hello world!


### ** Built-in Functions

[Python Documentation on Built-In Functions](https://docs.python.org/3/library/functions.html)

### Methods

Did you catch that `.upper()` on the last slide?
That's an example of a method, which is a function specific to a class. When Classes are constructed, they may have methods defined, which can later be accessed using the syntax "`object.method()`". Unlike regular functions, methods are reliant on their respective Classes when they are called.

In this case, `.upper()` is a method for the string Class (str) that converts all the alphabetical characters in the string to uppercase while ignoring other kinds of characters.
.
The original string is unchanged, but calling the `.upper()` method returns an altered version of the string. `.lower()` works the same way, for lowercase. There are [many other built-in methods for strings](https://www.w3schools.com/python/python_ref_string.asp) besides these two.

In [22]:
p = '101 Dalmatians'
p.upper()

'101 DALMATIANS'

In [23]:
s = 'Lentil Soup - Cup: $5.99, Bowl: $9.99'
s.upper()

'LENTIL SOUP - CUP: $5.99, BOWL: $9.99'

In [24]:
f = 'RoSeS, vIoLeTs, ChRySaNtHeMuMs'
f.lower()

'roses, violets, chrysanthemums'

### Attributes

Attributes are the properties of Python objects. Like methods, they are specific to the class of an object.

They can also be accessed with a dot, but unlike methods, they do not use parentheses. (`object.attribute`)

In [25]:
x = 2
x.__class__

int

## `dir`

The built-in function `dir()` yields a directory of an object's attributes and methods.

As illustrated below, even the lowliest integers have many attributes and methods associated with them.

In [26]:
dir(2)

['__abs__',
 '__add__',
 '__and__',
 '__bool__',
 '__ceil__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floor__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__le__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 '__xor__',
 'as_integer_ratio',
 'bit_count',
 'bit_length',
 'conjugate',
 'denominator',
 'from_bytes',
 'imag',
 'numerator',
 '

## Container Objects

Of the built-in data types, four of them are designed to contain other objects.

List: an array that contains elements in a specific order. `[ ]`

Set: an unordered collection with no duplicate elements. `{ }`

Tuple: a sequence of elements, immutable (cannot be changed once defined.) `( )`

Dictionary: A `set` of "key: value" pairs. Keys must be unique. `{ }`

All container objects by definition have a special class method `.__contains__()`, which is used internally to check whether the object has another object in it.

Specific objects within some container objects may be retrieved by supplying an index or key. This is called "subscripting". Lists and tuples are subscriptable by means of providing an index number (e.g.: `some_list[0]`) whereas dictionaries are subscriptable via providing a key (`some_dictionary['some_key']`).

Sets are not subscriptable.

DataFrames are more complex, specialized container objects. We'll cover them in greater detail later.

In [27]:
('red', 8)

('red', 8)

### Coding Exercise

Try using different arguments in square brackets at the end of the following four container objects:

In [28]:
sample_list = [1, 2, 3, 4]
sample_tuple = ('red', 8)
sample_dict = {'home_team':12,'away_team':9}
sample_set = {}
empty_list = []

################################################################################
################################################################################

# Example:
sample_list[0]

################################################################################
################################################################################

1

## `for` Loops

Container objects are handy for several reasons, but probably the most important of those reasons is that they can be iterated over. This is the cornerstone of automation in Python, as it lets the user instruct Python to repeat a task for every item in a defined range (a list, a set, a tuple, the values in a dictionary, or a column in a DataFrame.)

In [29]:
p = ['a', 'b', 'c']
for item in p:
	print(item)

a
b
c


In [30]:
p = ['a', 'b', 'c']
q = []
for item in p:
	q.append(item.upper())

q

['A', 'B', 'C']

Within a `for` loop, a temporary variable is created to reference each object in the container. One by one, the objects in the container object are stored in this variable. In the previous examples, we used the word "item" for that temporary variable. 'Item' and 'element' are fairly conventional names for this temporary variable, but they can often be too generic.

You should usually consider using other words* which convey more meaning to whomever might be reading your code.

This can be especially important when you're dealing with a specific kind of data and you want to reduce the level of abstraction necessary to understand the code.

\* *NB: There are a few words in Python that you should never use for variable names because they have specific predefined meanings, but we'll cover those later.*

In [31]:
letter_list = ['a', 'b', 'c']
for letter in letter_list:
	print(letter)

a
b
c


## List Comprehensions

for loops are useful, but sometimes they're a bit clunky. When one has to process a field of a dataframe, it is often better to use what's called a list comprehension.
List comprehensions are formatted:

`[function(x) for x in container_object]`

or

`[x.method() for x in container_object]`

The output of this code is (as the external square brackets may indicate) a list object.

## Dictionaries in for Loops:

To access the values in a dictionary through a `for` loop, the user must first use the `.keys()` method to retrieve the keys.

In [32]:
dictionary_object = {'key1':'value1','key2':'value2'}

for key in dictionary_object.keys():
	print(dictionary_object[key])

value1
value2


## `while` Loops

`for` loops are great for looping over container objects, but what if you don't know how many items have to be processed by a task before you start it?

`while` loops are another loop construction in Python that allows the user to instruct Python to do a task until a condition is met (or in this case, not to do the task until the condition is met.)

In [33]:
def squares(x):
    n = 1
    while n <= x:
        print(n**2)
        n += 1

squares(9)

1
4
9
16
25
36
49
64
81


## Boolean Variables and Comparison Operators

In very broad terms, Boolean algebra (named after its creator George Boole) is a branch of mathematics that deals with true and false statements, and the comparison thereof.

In Python, there are two Boolean variables, `True` and `False`. These are used in conjunction with comparison operators to change what code does based on particular conditions.

The comparison operators in Python are "equals" (`==`), "does not equal" (`!=`), "greater than" (`>`), "less than" (`<`), "greater than or equal to" (`>=`) and "less than or equal to" (`<=`)

Python also has a set of Boolean operators, `and`, `or`, and `not` that can combine or alter the behavior of these comparison operators.

\* *Note that the "equals" operator must be constructed with two equals signs, as in Python, assigning a variable uses a single equals sign.*

In [34]:
True == True

True

In [35]:
False == False

True

In [36]:
True == False

False

## ** Booleans and Other Values

Objects in Python that are *not* Boolean variables may still be evaluated as if they are. `1` and `0` are equivalent to `True` and `False`, respectively. 

`3` does not *equal* `True` like `1` does, but if you use the `bool()` builtin function, any nonzero number passed as an argument will evaluate to `True`.

Any number except `0` will evaluate to True. Most container objects will also evaluate to `True` if passed to `bool()`, provided they are not empty.

In [37]:
1 == True

True

In [38]:
0 == False

True

In [39]:
True + True

2

### ** Coding Exercise

Try replacing True with False:

In [40]:
################################################################################
################################################################################

1 / True

################################################################################
################################################################################

1.0

In [41]:
True > False

True

In [42]:
3 == True

False

In [43]:
bool(3)

True

In [44]:
bool(0)

False

In [45]:
bool([])

False

In [46]:
bool([0])

True

## "Control Flow" Statements

There are several Python keywords that have the ability to change the way code works based on conditions. You may hear these statements (which include `if`-`elif`-`else` as well as `for`, `while`, and `try`-`except`) referred to more generally as ["control flow"](https://docs.python.org/3/tutorial/controlflow.html) statements.

## If-Elif-Else Statements

You may have learned about "if-then" statements in a class on symbolic logic, or you may have encountered them in Excel formulas. 

Python's syntax is somewhat idiosyncratic, and allows for multiple courses of action based on different conditions. No 'then' is required, just a colon and an indent.

Python if statements almost always use comparison operators, although sometimes the comparison is implicit.

For the first condition, Python expects an `if`, then to specify additional possible conditions, optional "`elif`" statements. Finally, if none of the conditions are met, an "`else`" (also optional).
If you don't want your code to do anything if the condition in the `if` statement isn't met, you don't have to write an elif or else statement, but it can help make things clearer to whomever reads your code.

In [47]:
cabbage = 'pink'

if cabbage == 'purple':
    print("pH 7 (neutral solution)")
elif cabbage == 'blue':
    print("high pH (basic solution)")
elif cabbage == 'pink':
    print("low pH (acidic solution)")
else:
    print("That's not the right kind of cabbage!")

low pH (acidic solution)


Another example: say there is a task you want to automate. You want part of it to happen every day, and another part of it to happen every Tuesday, and another part to happen every Wednesday. Rather than writing three scripts and scheduling them separately, you could use an if-elif-else block to check the day of the week, so one script can do all three tasks.

In [48]:
from datetime import datetime

# "weekday()" returns the day of the week as an integer; 0 is Monday, 6 is Sunday

if datetime.today().weekday() == 1:
	print('Weekly on Tuesday')
elif datetime.today().weekday() == 2:
	print('Weekly on Wednesday')
else:
	pass

print('Daily')

Weekly on Tuesday
Daily


## Try-Except Blocks

When Python encounters an error, it stops executing. This is usually a problem.

Along similar lines to if-elif-else statements, if you know your code will fail some proportion of the time due to *foreseen* circumstances, you can build in a try-except block so the code will acknowledge an error but not stop executing because of it.

Say you expect a certain data type as an input, but you don't want your code to break entirely if a user enters the wrong kind of input. You can use a try-except block to look for a `TypeError`. Your code won't break, and it can execute a back-up plan instead.

In [49]:
def number_plus_one(n):
    try:
        return n + 1
    except TypeError:
        print("{} is not a number! You can't add one to it!".format(str(n)))

In [50]:
number_plus_one(2)

3

In [51]:
number_plus_one(str([]))

[] is not a number! You can't add one to it!


In [52]:
number_plus_one(str('Horse'))

Horse is not a number! You can't add one to it!


## Functions (Again?!)

Now that we've touched upon a few more concepts, let's revisit functions.
The value returned from a function can be turned into a variable, but it can also be passed on directly as an argument to another function.

In this way, functions can work together in a sort of assembly line:

In [54]:
def separate_words(sample_string, delimiter=' '):
	words = sample_string.split(delimiter)
	return words
    
def add_elipses(sample_string):
    return(sample_string+'...')

def join_words(sample_list, delimiter=' '):
	title = delimiter.join(sample_list)
	return title

input_string = 'Please speak more slowly'
join_words([add_elipses(word) for word in separate_words(input_string)])

'Please... speak... more... slowly...'

### Coding Exercise

Try using each function (`separate_words()`, `add_elipses()`, `join_words()`) by itself, using the same input string.

Did you expect that to happen?


In [55]:
################################################################################
################################################################################

separate_words(input_string)

################################################################################
################################################################################

['Please', 'speak', 'more', 'slowly']

## Reserved Keywords and Named Objects

As of Python 3.11, there are 35 reserved keywords, protected terms that cannot be used for anything other than their predefined function.

It is important to be mindful of other variables in one's namespace when defining new variables, functions, or other objects in Python, whether or not the terms in question are on this list.

Typing `help('keywords')` into a code cell or the console will show the list of reserved keywords.

In [56]:
help('keywords')


Here is a list of the Python keywords.  Enter any keyword to get more help.

False               class               from                or
None                continue            global              pass
True                def                 if                  raise
and                 del                 import              return
as                  elif                in                  try
assert              else                is                  while
async               except              lambda              with
await               finally             nonlocal            yield
break               for                 not                 



However, there are other predefined names in Python that should not be used to name new objects. For example, one might be tempted to use the word "range" as the name of a variable for a list that contains a range of things; however, `range` already has a predefined function in Python, even though it isn't on the list of reserved keywords.

# Part II - Advanced Functions, JSON Data, APIs

## What is an API?

Application Programming Interfaces (APIs) are the means by which different software programs share information with each other.

The current usage of the term refers primarily to Web APIs, which are APIs that deal with interactions between client devices and web servers, in the form of an HTTP (HyperText Transfer Protocol) request.

Responses to these requests are typically given in the form of structured data, often in JSON (JavaScript Object Notation) or XML (eXtensible Markup Language).

Many websites have their own APIs, usually with proprietary specifications that dictate the kinds of requests their servers will accept.

OCLC has many individual APIs, each with its own purpose and distinct syntax; the New Titles API is one such example.

## Requests and OAuthLib

Python has several ways of performing HTTP requests, including `requests`, `urllib`, and `oauthlib`. The most commonly used of these is `requests`. This module allows Python programmers to very quickly and efficiently send HTTP requests, and to retrieve data from webpages simply by passing URLs as strings.

Because we are passing our institutional credentials to OCLC, we will use Python's `requests-oauthlib` module. This works similarly to the `requests` module, but there are a couple of extra steps involved.

In [57]:
import requests

In [58]:
source = requests.get('https://www.wikipedia.org')

In [59]:
source.content[:1510]

b'<!DOCTYPE html>\n<html lang="en" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>Wikipedia</title>\n<meta name="description" content="Wikipedia is a free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation.">\n<script>\ndocument.documentElement.className = document.documentElement.className.replace( /(^|\\s)no-js(\\s|$)/, "$1js-enabled$2" );\n</script>\n<meta name="viewport" content="initial-scale=1,user-scalable=yes">\n<link rel="apple-touch-icon" href="/static/apple-touch/wikipedia.png">\n<link rel="shortcut icon" href="/static/favicon/wikipedia.ico">\n<link rel="license" href="//creativecommons.org/licenses/by-sa/4.0/">\n<style>\n.sprite{background-image:linear-gradient(transparent,transparent),url(portal/wikipedia.org/assets/img/sprite-de847d1a.svg);background-repeat:no-repeat;display:inline-block;vertical-align:middle}.svg-Commons-logo_sister{background-position:0 0;width:47px;height:47px}.svg-MediaWiki-logo_sister

In [60]:
from requests_oauthlib import OAuth2Session

oauth = OAuth2Session()

In [61]:
source2 = oauth.get('https://www.wikipedia.org')

In [62]:
source2.content[:1510]

b'<!DOCTYPE html>\n<html lang="en" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>Wikipedia</title>\n<meta name="description" content="Wikipedia is a free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation.">\n<script>\ndocument.documentElement.className = document.documentElement.className.replace( /(^|\\s)no-js(\\s|$)/, "$1js-enabled$2" );\n</script>\n<meta name="viewport" content="initial-scale=1,user-scalable=yes">\n<link rel="apple-touch-icon" href="/static/apple-touch/wikipedia.png">\n<link rel="shortcut icon" href="/static/favicon/wikipedia.ico">\n<link rel="license" href="//creativecommons.org/licenses/by-sa/4.0/">\n<style>\n.sprite{background-image:linear-gradient(transparent,transparent),url(portal/wikipedia.org/assets/img/sprite-de847d1a.svg);background-repeat:no-repeat;display:inline-block;vertical-align:middle}.svg-Commons-logo_sister{background-position:0 0;width:47px;height:47px}.svg-MediaWiki-logo_sister

In [63]:
source.content == source2.content

True

## Sometimes It *DOES* Hurt to Ask

You may have heard the term "[DOS (Denial-of-service) attack](https://en.wikipedia.org/wiki/Denial-of-service_attack)". This occurs when a website is flooded with so many HTTP requests to the point that it can no longer maintain operable status.

Even if you don't *intend* to perform a DOS attack, you may accidentally write code that generates too many request in a short span of time.

Most websites safeguard against DOS attacks by limiting how many requests may be made from a specific IP address within a given time period. If you send too many requests to a given site, your access to it your current IP address may be suspended or revoked. Also, most websites do not make this information public.

Larger-scale websites like wikipedia.org are less likely to shut down your requests, since they already expect a high volume of traffic. 

For smaller sites, one can avoid sending too many requests all at once by spacing them out with the `sleep()` function in Python's `time` module.

In [64]:
#Without pauses:
for number in sample_list:
    print(number)

1
2
3
4


In [65]:
#With pauses:
from time import sleep

for number in sample_list:
    print(number)

    # Wait a second!
    sleep(1)

1
2
3
4


## OCLC's Traffic?

How many users can access OCLC's API at the same time? We don't know. Hopefully we won't find out.

OCLC is *probably* widely-used enough that a small group of people on one campus using the API concurrently won't trigger a shutdown, but it's still better if we don't all run the OCLC_New_Titles_API notebook at once.

## OCLC New Titles API

The OCLC New Titles API may be found here:

https://developer.api.oclc.org/new-titles-api

Click on "GET /new-titles". Here, the parameters for the API are specified, as well as success/error HTML responses, which depend on whether the API query is correctly formatted and whether the user has permission to access it.

Below "GET /new-titles", under "Schemas", the types of data the query retrieves are outlined.

The information on this page will help us build our function.

## JSON Objects

The OCLC API returns the new titles data in the form of a JSON object.

Whereas DataFrames are very similar to spreadsheets (tabular data in neat columns-for-fields, rows-for-values configuration), JSON data is not as straightforward.

On the surface level, a JSON object is a specialized Python dictionary object (curly braces, key-value pairs). 

The key differences between a Python dictionary and a JSON object (before it is read by the json module) are that instead of True, False, and None, JSON files store equivalents as "true", "false", and "null", all strings are formatted with double-quotes (never single quotes), and keys must be strings (they cannot be integers, floats, etc.).

But JSON doesn't usually stop at the surface level…


Can you make visual sense of this at first glance?

JSON objects are dictionaries that contain other dictionaries, as well as lists of dictionaries. The resulting "nested" data is a great format for web developers; JSON objects are flexible containers for data and their structure lends itself well to interoperability between a wide variety of different systems and web services.

However, they're not great for comparing, aggregating, or summarizing items in a dataset. Tabular data is much better for that, so we have to find a way to convert nested JSON data into a tabular format.


## Recursive Functions

Recursive functions are functions that either directly or indirectly call themselves as a subprocess. If a recursive function takes arguments as inputs, it must be structured such that it can pass on a subset of its top-level inputs and pass it on to another instance of itself as a lower-level process.

Here's a simple example of a recursive function that counts down from a positive integer to zero:

In [66]:
def countdown(start):
    
    print(start)

    next = start - 1
    if next > 0:
        countdown(next)
    elif next == 0:
        print('Done')

countdown(3)

3
2
1
Done


This function could easily be constructed with a while loop instead of recursion, but it shows how this kind of algorithm works. For more complex problems, such as dealing with nested JSON data, recursive functions can save a lot of time.

## Generators

Functions usually send back data in the form of a return statement. Objects sent by a return statement are sent all at once. Unless they are stored in a variable, returns do not persist between calls of the function.

However, one can use `yield` in a recursive function to make sure that the results get sent back from every sub-instance of the function as they are produced. Functions that use this `yield` keyword are called generators.

For an example of a recursive function that is also a generator, let's take an example from real life (sort of). If you have a backpack full of items, and some of those items are bags that also contain items, you have to effectively perform a recursive function to retrieve all the items in the backpack.

If we were to write this out in Python, it might look something like this:

In [67]:
small_bag = ['pocket knife', 'matchbook','compass','bandages']
bag_of_trail_mix = ['peanuts','raisins','M&Ms','dates','almonds']
medium_bag = ['notebook', small_bag, 'pen', 'chapstick']

backpack = ['water_bottle', bag_of_trail_mix, medium_bag, 'hat','sunscreen']

def open_bag(bag):
    for item in bag:
        if isinstance(item, list):
            # Recursively call open_bag()
            for object in open_bag(item):
                yield object
        else:
            yield item

In [68]:
open_bag(backpack)

<generator object open_bag at 0x0000022CE3D52890>

Wait, where's our result? And is that an error message?

Generators don't produce their results until you tell them to. (And `<generator object open_bag at 0x0000029A7993DA80>` isn't an error; it's an object memory address - we don't need to cover that right at this moment.)

Fortunately, we can get a generator to dump its contents into a list by passing it as an argument to `list()`

In [69]:
list(open_bag(backpack))

['water_bottle',
 'peanuts',
 'raisins',
 'M&Ms',
 'dates',
 'almonds',
 'notebook',
 'pocket knife',
 'matchbook',
 'compass',
 'bandages',
 'pen',
 'chapstick',
 'hat',
 'sunscreen']

Notice the order in which the items were printed out... we paused midway through the process of counting our items in order to open the smaller bags within the larger ones, then continued getting the other items.

We'll be covering recursive functions and generators in greater detail in the next notebook, so don't worry if you don't fully understand them yet.

In [70]:
backpack

['water_bottle',
 ['peanuts', 'raisins', 'M&Ms', 'dates', 'almonds'],
 ['notebook',
  ['pocket knife', 'matchbook', 'compass', 'bandages'],
  'pen',
  'chapstick'],
 'hat',
 'sunscreen']

### Coding Exercise

Try using square brackets to access different items in `backpack`. For example, you'll see that `backpack[1][0]` is 'peanuts'. When dealing with nested data, like our "backpack" list or a JSON object, it is often necessary to access specific elements by using multiple layers of indices. The difference between lists and dictionaries is how these indices are labeled (numbers vs. keys)

In [71]:
################################################################################
################################################################################

backpack[1][0]

################################################################################
################################################################################

'peanuts'

# Part III - Too Much Python?

*"A little learning is a dang'rous thing; / Drink deep, or taste not the Pierian spring."*

\- Alexander Pope

If you go through the Just_Enough_Python_Slideshow notebook or OCLC_New_Titles_API notebook *between sessions*, please keep track of your time. This will help us get an estimate of how long it takes a self-paced learner to get through this material, which may help us with future workshops.

Please don't hesitate to reach out if you have questions regarding the notebooks.

## PEP 8

[Python Enhancement Proposals](https://peps.python.org/), or PEPs, are stages of development for the Python language.

PEP 8 is the official [Style Guide for Python Code](https://peps.python.org/pep-0008/).

PEP 8 covers many topics, but there are a few in particular that are useful best practices for legibility of code:

* Indentation - This is one of the features that makes Python a distinctive language to program in. The screen isn't cluttered with curly braces and semicolons; rather, the same purpose is accomplished in Python by means of nested indentation. Python uses 4 spaces as a default spacing for indentation. *Indentation is not optional.*

* Maximum line length: 79 characters - This allows the user to work on multiple projects side-by-side on the same screen. One way to check is to copy-paste a bar of 80 hashmarks into your code cells. (In practice, more of a guideline than a rule.)

* Naming conventions: variables, functions, and classes each have their own expected type formatting. Classes use CapsWords, functions and variables use lowercase_separated_by_underscores. This makes different kinds of objects easily distinguishable. You will also see camelCaps, but this will usually be when importing data that was stored using another programming language, like JavaScript.

In [72]:
# 80 hashmarks:
################################################################################
#If you need to measure out your 79-character lines, this fits the bill nicely.

## ** Constellate

For a much more in-depth tutorial on the basics of Python, check out the [Python Basics and Python Intermediate tutorials on Constellate](https://constellate.org/dashboard/tutorials) (you will have to search for "python-basics" and "python-intermediate" once you are in the lab). [Constellate](https://constellate.org/docs/how-to-use-constellate) is a text and data analysis service that is part of ITHAKA.

TCCL gained access to Constellate on July 1 of this year, and it's now available as a resource for students, staff, and faculty. The learning materials on Costellate are focused primarily on textual analysis, but their introductory materials cover the fundamentals of Python in much greater detail than we do in this workshop.

## ** Punctuation and Style

The keen-eyed copy-editors among you may have noticed that while we have used periods inside quotation marks at the ends of sentences in this document, there is not a single comma inside the end of a quotation. This is atypical of American publications (but typical in British English publications). In Python, a list of string variables is separated by commas, but in order to maintain the integrity of the strings themselves, commas must be used on the outside of the quotation marks. Be mindful of accidentally inserting commas in lists of strings if you are accustomed to typing punctuation inside quotes.

Dates formatted YYYY-MM-DD are preferable in programming contexts because an alphabetical sort is equivalent to a chronological sort (largest increment first, smallest increment last). MM-DD-YYYY and DD-MM-YYYY are both less optimal for this task. In a similar vein, two-digit days and months are preferred (09 instead of 9 for September) because an alphabetical sort will put 10 before 9.

## ** DRY Code

"[Don't Repeat Yourself](https://en.wikipedia.org/wiki/Don't_repeat_yourself)" is a motto that many Python programmers (attempt to) adhere to.

When possible, rather than typing out the same code more than once, it is considered best practices to turn it into a function so it may be used in multiple places. If the code does something *similar* to other code, it's often helpful to make a more generalized function that may be adapted to different situations.

"WET" is the opposite of DRY; it can stand for "Write Everything Twice" or "We Enjoy Typing" depending on whom you ask.

## ** Python "Easter Eggs"

Python has a several built-in Easter Eggs. Here are a couple of them:

In [73]:
#The PEP 20 (also called the Zen of Python, by Tim Peters):

import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [74]:
# Randall Patrick Munroe's endorsement (?) of Python in an xkcd comic:
#(December 5, 2007, about a year prior to the release of Python 3)
import antigravity

[Explain xkcd](https://www.explainxkcd.com/wiki/index.php/353:_Python)

## ** One More Note about JSON Files...

This Jupyter Notebook is a JSON file.

![title](alljson.jpg)

Jupyter Notebooks are stored as JSON. JupyterLab and other applications read the data in the JSON file and present it in a tidier, human-readable format, using the metadata in the file to determine what kind of cells to show.

In [78]:
import json

with open('Just_Enough_Python_Slideshow.ipynb', 'rb') as file:
    contents = json.load(file)

contents['cells'][0]

{'cell_type': 'markdown',
 'id': '357f34ac-6b2a-4f31-a9aa-0f2e67926ec1',
 'metadata': {'slideshow': {'slide_type': 'slide'}, 'tags': []},
 'source': ['# Just Enough Python\n',
  '\n',
  'Accessing the [OCLC New Titles API](https://developer.api.oclc.org/new-titles-api) with Python']}

# (For More, See the "OCLC_New_Titles_API.ipynb" Notebook)