<center>
  <h1>Digital Tools and Methods for the Humanities and Social Sciences</h1>
  <img src="https://raw.githubusercontent.com/sul-cidr/Workshops/master/cidr-logo.no-text.240x140.png" alt="Center for Interdisciplinary Digital Research @ Stanford"/>
</center>

# Introduction to Python

### Instructors
- Simon Wiles (CIDR), <em>simon.wiles@stanford.edu</em>
- Peter Broadwell (CIDR), <em>broadwell@stanford.edu</em>

### Sign In
Please sign in for this workshop at: https://signin.cidr.link/Intro._to_Python/ -- when you've submitted the sign-in form, please keep your browser tab open on the evaluation form as a reminder to complete it when the workshop is over.


### Goal
By the end of our workshop today, we hope you'll have an understanding of basic syntax in Python for variables, functions, and control flow, and have some familiarity with some of the fundamental data structures you'll need to work with. With these in hand, you'll know enough to write simple scripts and begin to explore other features of the language.

### Topics
- Variables and Types (strings, lists, dictionaries)
- Functions
- Control Flow
- Reading and writing text to a file 


### Evaluation survey
At the end of the workshop, we would be very grateful if you can, please, spend 1 minute answering a few questions that will help us to continue our workshop series.
- https://evaluations.cidr.link/Intro._to_Python/


## Why Python?
<mark>REVISE</mark>
It's multi-use: you can write simple scripts to automate tasks, write complex code for machine learning and other approaches, and even build full-scale web applications.

The biggest reason we see people learning Python right now is for data science and related approaches, regardless of disciplinary background.

## Jupyter Notebooks and Google Colaboratory

Jupyter notebooks are a way to write and run Python code in an interactive way. They're quickly becoming a standard way of putting together data, code, and written explanations or visualizations into a single document and sharing that. There are a lot of ways that you can run Jupyter notebooks, including just locally on your computer, but we've decided to use Google's Colaboratory notebook platform for this workshop.  Colaboratory is “a Google research project created to help disseminate machine learning education and research.”  If you would like to know more about Colaboratory in general, you can visit the [Welcome Notebook](https://colab.research.google.com/notebooks/welcome.ipynb).

Using the Google Colaboratory platform allows us to focus on learning and writing Python in the workshop rather than on setting up Python, which can sometimes take a bit of extra work depending on platforms, operating systems, and other installed applications. If you'd like to install a Python distribution locally, though, we have some instructions (with gifs!) on installing Python through the Anaconda distribution, which will also help you handle virtual environments: https://github.com/sul-cidr/Workshops/wiki/Installing-and-Configuring-Anaconda-and-Jupyter-Notebooks

If you run into problems, or would like to look into other ways of installing Python or handling virtual environments, feel free to send us an email (contact-cidr@stanford.edu) or visit us during our [consulting hours](https://library.stanford.edu/research/cidr/consulting).

It should be possible to follow along with this workshop using a regular Python console ("terminal", "command-prompt", etc.)  Please note however that we will not be able to support you with problems related to a local environment during this workshop, and we do recommend using the Colaboratory notebooks if you are at all unsure.

## Some Basic Language Features

### The `print()` function
The most basic way for a Python program to generate output is via the `print()` function, which `print`s directly to the console or terminal environment in which the Python interpreted is executed.  In our Jupyter / Google Colab notebooks, this produces output beneath the code cell:

```python
print("Hello World!")
```

In [None]:
# Your turn:


Jupyter code cells (and other Python console environments) will also automatically output the _return value_ of a cell (or other code block) -- in our context here that means the last statement in a code cell:
```python
"Hello world!"
```

In [None]:
# Your turn:


For more information about the print() statement, we can access Python's built-in documentation and the Colab envionment's autocomplete and intellisense functionality.
```python
print?
```

In [None]:
# Your turn:


### Comments
Comments are an important part of any program.  In Python, comments begin with `#`:

```python
# This is a comment
print("Hello world!")

```

and they can also be used to annotate lines of code:
```python
print("Hello world!")  # Comments are fine here too!
```

Use comments liberally, when you are learning and even when you think you know exactly what you're doing.  Your comments should allow readers of your code to understand _why_ your code does what it does.  Remember that a primary audience for your comments will be you yourself when you revisit your code in the future!

### Indentation and Blocks
White-space (specifically white-space at the beginning of lines) is meaningful in Python programs, and defines groups of statements which constitute _blocks_, which will become important when we discuss control flow and functions.  For now, note that indentation must be consistent in a python file -- the convention is to use four spaces.  Mixing up indentation levels will result in errors:

In [None]:
# Run this cell, and notice the error
# Notice too that the whole cell fails to run, not just the line with the error
print("this is fine")
  print("this is not")

## Variables and Types

In Python, all the "bits of information" of one sort or another in our programs can be referred to as _**value**s_.  Every _**value**_ has a _**type**_ (_**string**_, _**number**_, etc.), and _**variables**_ give us a way of naming and referring to those objects.


Python has lots of native datatypes, but the most important ones are:

* **Booleans** -- `True` or `False`
* **Numbers** -- primarily integers and floats
* **Strings** -- sequences of characters
* **Lists** -- ordered sequences of values
* **Tuples** -- like **list**s but immutable
* **Dictionaries** -- unordered collections of key-value pairs
* **Sets** -- unordered collections of values


_**Assignment**_ is the operation of attaching an _**identifier**_ to a _**value**_, and in Python this is done with a single equals sign (`=`).


### Strings

Strings are sequences of characters -- what we tend to call "text".

```python
# A simple string assignment
greeting = "Welcome to Introduction to Python!"
print(greeting)
print(type(greeting))
```

In [None]:
# Your turn -- type the above code or a similar greeting of your own below:


Strings can be indicated by single (`'`) or double (`"`) quotes -- it makes no difference at all, and is merely a convenience.

In [None]:
fair_warning = 'I will be using examples from "Monty Python"...'
print(fair_warning)

"Escaping" (use of the backslash `\` before a character) can be used also, typically if we want single and double quotes in a string:

In [None]:
fair_warning = "I'll be using examples from \"Monty Python\"..."
print(fair_warning)

In [None]:
# Strings may also be indicated by the use of triple quotes (''' or """ may be used).
fair_warning = '''I'll be using examples from "Monty Python"...'''
print(fair_warning)

In [None]:
# This is especially useful if you'd like to have multi-line strings
cheesemakers = '''
Man: I think it was, "Blessed are the cheesemakers"!
Gregory's wife: What's so special about the cheesemakers?
Gregory: Well, obviously it's not meant to be taken literally. It refers to any manufacturer of dairy products.
'''
print(cheesemakers)

#### String Methods

**String**s in Python have some special behaviors attached to them (called _method_s) that perform common tasks.

Let’s look a case-manipulation, for example:

In [None]:
# Upper- and lower-case are straightforward:
print("Uppercase:   ", "monty python's flying circus".upper())
print("Lowercase:   ", "MONTY PYTHON'S FLYING CIRCUS".lower())

# Case-manipulation is unicode-aware
#  (but don't make the mistake of thinking this solves all problems!).
print("Uppercase:   ", "norsk blå papegøye".upper())  # "Norweigian Blue parrot" in Norweigian :)

# Capitalization is mostly trivial...
print("Capitalized: ", "monty python's flying circus".capitalize())

# ...but it takes a naïve approach to title-case.
print("Title Case:  ", "monty python's flying circus".title())

Strings also provide the `.replace()` method, which behaves as expected:

In [None]:
brian = "Brian is the Messiah!"
print(brian)

# He's not the Messiah...
brian = brian.replace("the Messiah", "a very naughty boy")
print(brian)

The `.strip()` and `.split()` methods are amongst the most commonly used, and are especially useful when dealing with data from external sources.

In [None]:
# .split() breaks a string into pieces, and returns those pieces as a list
input_string = "Monty Python and the Holy Grail"
input_string.split()

In [None]:
# .split() takes an argument which specifies the sequence to split on
knights_of_the_round_table = "Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, Sir Not-Appearing-in-this-Film"
knights_of_the_round_table.split(',')

In some of the previous examples we have had some extraneous white-space in our strings.  For the `knights_of_the_round_table` we could remove this by splitting on `", "` (try it!), but in “the wild” we commonly need to clean-up the text that we read into our programs, and the `.strip()` method is a good first step:

In [None]:
# .strip() can be called on string literals as well as variables
"  ...And now for something completely different!  ".strip()

In [None]:
# .strip() can also take an argument which is a sequence of
#  characters to strip from the ends of the input
"  ...And now for something completely different!  ".strip(' .!')

In [None]:
# in our excerpt from the script for the Life of Brian, we had some
#  whitespace at the beginning and the end, in the form of (probably)
#  extraneous new-line characters -- as can be seen when we .split()
cheesemakers.split('\n')

In [None]:
# first we can remove these:
print(cheesemakers.strip())

In [None]:
# and because cheesemakers.strip() returns another string, we can chain
#  these operations and do them both in one line:
cheesemakers.strip().split('\n')

#### Indexing

Indexing is one of the most important and recognizable features in Python.  It's very powerful, and Python programmers agree that it's very elegant and intuitive once you get the hang of it.  At the same time, it is often something that newcomers to Python find a little confusing.



Strings have many other useful methods.  The full list is available in the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods), but it's often more convenient to use Google Colab’s autocompletion feature to see what’s available.  Hovering the mouse-cursor over an identifier will give some pop-up help, too, and if that's not sufficient we can access Python's built-in documentation (known as `docstrings`) by appending a question mark (`?`) and executing the cell.

In [None]:
# for the autocompletion to work, a string must already be assigned to
#  the variable identifier
my_string = "abc"

In [None]:
# now type "mystring." below
# select a method, and hover the mouse pointer over the method name for more information
# you can also try affixing a "?" and running the cell



#### More about Strings

There's lots more to learn about strings that we don't have time to cover here.  The most important thing that's not covered here is string formatting using the `.format()` method, or even better, **f-strings**.

### Lists

**List**s are one of the most important data-structures available in Python.  They can be created in a number of different ways.

In [None]:
# literal list are built with square-bracket notation
pythons = ['Graham', 'John', 'Terry', 'Eric', 'Terry', 'Michael']
print(pythons)

Above we saw how **list**s are returned by the `.split()` method of a **string**.

In [None]:
# a list created by splitting a string
knights_of_the_round_table = 'Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, Sir Not-Appearing-in-this-Film'
knights_of_the_round_table.split(', ')

**List**s can be indexed just like strings:

In [None]:
print("pythons:     ", pythons)
print("pythons[0]:  ", pythons[0])
print("pythons[-1]: ", pythons[-1])
print("pythons[:3]: ", pythons[:3])
print("pythons[3:]: ", pythons[3:])

In [None]:
# Your turn:


A third way is to create an empty **list** and use the `.append()` method that exists on all lists:

```python
# we can create an empty list with the following syntax
menu = []

# and then add items to the list with the .append() method
menu.append('egg')
menu.append('bacon')
menu.append('sausage')
menu.append('spam')

print(menu)
```

In [None]:
# we can create an empty list with the following syntax
menu = []

# and then add items to the list with the .append() method
menu.append('egg')
menu.append('bacon')
menu.append('sausage')
menu.append('spam')

print(menu)

This example highlights one of the features that make **list**s so versatile and useful -- they can be added to once they have been created.  Building **list**s programmatically is one of the most common types of task you are likely to find yourself tackling in Python.

This feature of **list**s is called _**mutability**_, and it includes the ability to modify entries in the list in-place:

In [None]:
print("before: ", menu)

menu[0] = "spam"
menu[1] = "spam"
menu[2] = "spam"

print("after:  ", menu)

In [None]:
# joining our list literal from above
print(", ".join(knights))

In [None]:
# join the menu with the word "and"
print(" and ".join(menu))

In [None]:
# joining on the newline character is often useful, too
print("\n".join(pythons))

### Dictionaries

A dictionary in Python is a collection of `key`/`value` pairs -- each pair together is known as an `item`.  Values can be any kind of value you like -- strings, numbers, lists, more dictionaries, etc. -- but keys have certain restrictions.  For the majority of purposes you should be using strings as your dictionary keys.

In [None]:
# dictionary keys must be unique in the dictionary, so note that while
#  this is legal...
pythons_by_first_name = {
    "Graham": "Graham Chapman",
    "John": "John Cleese",
    "Terry": "Terry Gilliam",
    "Eric": "Eric Idle",
    "Terry": "Terry Jones",
    "Michael": "Michael Palin"
}
# the second appearance of the duplicate key has over-written the first
pythons_by_first_name

In [None]:
# dictionaries are accessed using square-bracket notation -- a little
#  bit like indexing for sequence types

cheeseshop = {
    "Red Leicester": "I'm afraid we're fresh out of Red Leicester sir.",
    "Tilsit": "Never at the end of the week, sir. Always get it fresh first thing on Monday.",
    "Caerphilly": "Ah well, it's been on order for two weeks, sir. I was expecting it this morning.",
    "Bel Paese": "Sorry.",
    "Red Windsor": "Normally, sir, yes, but today the van broke down.",
    "Stilton": "Sorry."
}

print("Do you have any Tilsit?", cheeseshop["Tilsit"], sep="\n")


In [None]:
# attempting to access a key that doesn't exist raises a `KeyError` exception
cheeseshop["Stinking Bishop"]

We're not going to work much more with dictionaries in this workshop, but they are one of the most important weapons in the Python programmer's arsenal and there's much more to be learned about them.

## Functions

**Function**s are blocks of reusable code.  They provide a way to break monolithic programs into parts; this aids greatly in writing code that is readable, maintainable, and extensible.  It is especially important, of course, when a block of code will or may be used multiple times in the same or similar ways, but a section of code does not need to be used multiple times to justify creating it as a function.

### Built-In Functions

We've already seen a number of **function**s in what we've done so far -- `print()`, of course, but also `len()` and `type()`.  When functions are bound to objects -- as `.format()`, `.lower()` etc. on string objects, or `.append()` on list objects, we call them **method**s, but the difference is minimal and need not concern us at this point.

The functions (and methods) we've seen so far are “built-in” functions -- they're provided to us by the Python interpreter.  The important thing we need to learn in this section is how to _define_ and _call_ (or _invoke_) our own functions.

### `def`ining Your Own Functions

User-Defined Functions, as they are sometimes called, can be created at any time in Python code.  Notice that defining a function in this way creates no output.

In [None]:
# functions are introduced with the "def" keyword (note the final colon)
def knights_who_say_ni():
    # the block of indented statements beneath the definition statement
    #  constitutes the body of the function
    print("Ni!")

In [None]:
# when defined in this way, they can be called just like built-ins:
knights_who_say_ni()

In [None]:
# of course, functions are more useful when they take arguments
def ask_for_cheese(cheese):
    print("Do you have any " + cheese + "?")
    print(cheeseshop[cheese])
    print()

In [None]:
# arguments (or parameters) are passed when the function is invoked:
ask_for_cheese("Red Leicester")

In [None]:
# the value of a function like this, of course, is when it will be called many times:
ask_for_cheese("Red Leicester")
ask_for_cheese("Tilsit")
ask_for_cheese("Caerphilly")
ask_for_cheese("Bel Paese")
ask_for_cheese("Red Windsor")
ask_for_cheese("Stilton")

# ...
ask_for_cheese("Venezuelan Beaver Cheese")

In [None]:
# functions can take as many arguments as you need
def greeting(first_name, last_name):
    print("Good evening, my name is " + first_name + " " + last_name + ".")

In [None]:
# and they may be called with "keyword arguments" -- this can
#  help readability, and allows passing the arguments in any order
greeting(first_name="Michael", last_name="Palin")
greeting(last_name="Palin", first_name="Michael")

Unless otherwise specified with a `return` statement, functions do not return a value (which is to say, their return value is `None`).

In [None]:
value = greeting("Eric", "Idle")
print(value)

In [None]:
# to make a function return a value, use an explicit return statement
def greeting(first_name, last_name):
    return "Good evening, my name is " + first_name + " " + last_name + "."

value = greeting("Graham", "Chapman")
print(value)

### Activity: Write a Pig-Latinizer Function

Pig Latin is a language game where you take the first letter of a word, move it to the end of the word, then add “-ay” at the end. For example, “pig latin” would be “igpay atinlay” and “monty python” would turn into “ontymay ythonpay”.

In the cell below, write a function that takes a string, lowercases it, and returns the Pig Latin translation of the word. You'll need to use slicing and string concatenation or formatting to make this work.

In [None]:
def pig_latinize(word):
    # write your code here -- don't forget to return a result!


    
# the following should return 'pamalotsay'
pig_latinize('Spamalot')

In [None]:
#@title → Click Here to Show Hints

# 1. use word.lower() to lowercase the input
# 2. you'll need to use the indexing techniques to get the first letter
#    of the word, and everything-but-the-first-letter of the word separately
# 3. once you have the pieces you need, stick them together to make your output


In [None]:
#@title → Click Here to Reveal a Prepared Solution

def pig_latinize(word):
    word = word.lower()
    return word[1:] + word[0] + 'ay'


# the following should return 'pamalotsay'
pig_latinize('Spamalot')

## Control Flow

Using functions to split up our monolithic code in to manageable blocks also introduces our first element of **_control flow_** -- the ability to have code execute (or not execute) in a non-linear order (i.e. not simply from top to bottom, line by line).

In addition to functions, there are two main control flow mechanisms that we will look at in this section: **_conditional execution_** or **_branching_** causes blocks of code to be executed or ignored based on the evaluation of a boolean expression; and **_loop_**s allow us to specify the execution of blocks of code a fixed or indeterminate number of times.

Raising and handling **_exception_**s is another way to control the flow of execution in a Python program, but we're not going to cover that here.

### Conditional Execution (`if` statements)

In Python, the compound `if` statement is comprised of three clauses: `if`, `else`, and `elif`.  Each of these clauses defines a block which executes if the appropriate condition is met.

In [None]:
# if statements test a condition, and execute a block of code accordingly
if 2 + 2 == 4:
    print('Of course, two plus two equals equals four [sic].')

In [None]:
# TODO



In [None]:
# in addition to an else clause, if statements can have as many elif clauses as needed
def spanish_inquisition(weapons):
    if weapons == 1:
        print("surprise")
    elif weapons == 2:
        print("fear and surprise")
    elif weapons == 3:
        print("fear and surprise and ruthless efficiency")
    elif weapons == 4:
        print("fear and surprise and ruthless efficiency and an almost fanatical devotion to the Pope")
        
spanish_inquisition(3)

### Loops and Iteration (`for` loops)

There are two basic kinds of loops in Python: those introduced with the `for` statement are sometimes called _iterative loops_, and we'll look at those below.  Python also has loops that can be introduced with the `while` statement, and these are sometimes referred to as _conditional loops_.  `while` loops are less common in Python, and we won't cover them in this workshop.

`for` loops repeat the execution of a code block dependent on an iterable expression. For our purposes an iterable may be understood to refer to anything that can be looped over in a for loop -- in particular, this includes anything that is a sequence type; a list, a tuple, or a string.

In [None]:
# the syntax for a for-loop looks like this:
#  (as with function definition, note the colon and the indented block)
for python in pythons:
    print(python)

In [None]:
# we can also iterate over the keys of a dictionary
for cheese in cheeseshop:
    print(cheese)

In [None]:
# this becomes more and more useful as we do more in the body of the loop
#  -- here we can re-use the ask_for_cheese() function from above
for cheese in cheeseshop:
    ask_for_cheese(cheese)

### Activity: Pig-Latinize a `list`

In the cell below, write a function that loops over a list and returns a new list where all the strings have been replaced with their Pig Latin translations.

For example, if your list is `['hello', 5, 'world']` your output should be `['ellohay', 5, 'orldway']`.

Feel free to reuse the `pig_latinize` function you wrote above. You'll also need to think about checking the type of each item in the list.

In [None]:
def pig_latinize_list(items):
    # Your code goes here

    
pig_latinize_list(['hello', 5, 'world'])

In [None]:
#@title → Click Here to Show Hints

# create an empty list and add pig-latinized words to it in turn
# for each list item, test whether it's a string or not
# if it's not a string, just append it to the new list unaltered

In [None]:
#@title → Click Here to Reveal a Prepared Solution

def pig_latinize_list(items):

    latinized_items = []
    for item in items:
        if (type(item) == str):
            latinized_items.append(pig_latinize(item))
        else:
            latinized_items.append(item)
    return latinized_items


pig_latinize_list(['hello', 5, 'world'])

## Group Activity: Pig-Latinize a Whole File

# Further resources and topics

## Resources
- https://python.swaroopch.com/ (A Byte of Python is a great intro book and reference for Python)
- https://docs.python.org/3/ (Official Python documentation and tutorials)
- https://realpython.com/ (Contains a lot of different tutorials at different levels)
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/) (Simon's personal favourite)
- LinkedIn Learning (formerly Lynda.com): https://www.linkedin.com/learning/topics/python (LinkedIn Learning is avaliable for free to those with Stanford accounts.)

## Topics
- Other data structures:, in particular `sets`, `tuples`
- Comprehensions
- Modules, libraries, packages, and `pip`
- Writing `.py` scripts, and using IDEs
- Virtual environments
- The object-oriented paradigm in Python: classes, methods