<center>
  <h1>Digital Tools and Methods for the Humanities and Social Sciences</h1>
  <img src="https://raw.githubusercontent.com/sul-cidr/Workshops/master/cidr-logo.no-text.240x140.png" alt="Center for Interdisciplinary Digital Research @ Stanford"/>
</center>

# Introduction to Python

### Instructors
- Simon Wiles (CIDR), <em>simon.wiles@stanford.edu</em>
- Peter Broadwell (CIDR), <em>broadwell@stanford.edu</em>

### Sign In
Please sign in for this workshop at: https://signin.cidr.link/Intro._to_Python/


### Goal
<mark>REVISE</mark>
By the end of our workshop today, we hope you'll understand basic syntax in Python for variables, functions, and control flow, and understand some of the basic data structures in Python. With these in hand, you'll know enough to write basic scripts and explore other features of the language. 

### Topics
<mark>REVISE</mark>
- Variables and types/structures (String, Int, Float, List, Dictionary)
- Functions
- Control flow
- Reading and writing text to a file 
- A basic workflow of reading in content, doing something to it, then writing a new output


### Evaluation survey
At the end of the workshop, we would be very grateful if you can, please, spend 1 minute answering a few questions that will help us to continue our workshop series.
- https://evaluations.cidr.link/Intro._to_Python/


## Why Python?
<mark>REVISE</mark>
It's multi-use: you can write simple scripts to automate tasks, write complex code for machine learning and other approaches, and even build full-scale web applications.

The biggest reason we see people learning Python right now is for data science and related approaches, regardless of disciplinary background.

## Jupyter Notebooks and Google Colaboratory

Jupyter notebooks are a way to write and run Python code in an interactive way. They're quickly becoming a standard way of putting together data, code, and written explanations or visualizations into a single document and sharing that. There are a lot of ways that you can run Jupyter notebooks, including just locally on your computer, but we've decided to use Google's Colaboratory notebook platform for this workshop.  Colaboratory is “a Google research project created to help disseminate machine learning education and research.”  If you would like to know more about Colaboratory in general, you can visit the [Welcome Notebook](https://colab.research.google.com/notebooks/welcome.ipynb).

Using the Google Colaboratory platform allows us to focus on learning and writing Python in the workshop rather than on setting up Python, which can sometimes take a bit of extra work depending on platforms, operating systems, and other installed applications. If you'd like to install a Python distribution locally, though, we have some instructions (with gifs!) on installing Python through the Anaconda distribution, which will also help you handle virtual environments: https://github.com/sul-cidr/Workshops/wiki/Installing-and-Configuring-Anaconda-and-Jupyter-Notebooks

If you run into problems, or would like to look into other ways of installing Python or handling virtual environments, feel free to send us an email (contact-cidr@stanford.edu) or visit us during our [consulting hours](https://library.stanford.edu/research/cidr/consulting).

It should be possible to follow along with this workshop using a regular Python console ("terminal", "command-prompt", etc.)  Please note however that we will not be able to support you with problems related to a local environment during this workshop, and we do recommend using the Colaboratory notebooks if you are at all unsure.

## Some Basic Language Features

### The `print()` function
The most basic way for a Python program to generate output is via the `print()` function, which `print`s directly to the console or terminal environment in which the Python interpreted is executed.  In our Jupyter / Google Colab notebooks, this produces output beneath the code cell:

```python
print("Hello World!")
```

In [None]:
# Your turn:


Jupyter code cells (and other Python console environments) will also automatically output the _return value_ of a cell (or other code block) -- in our context here that means the last statement in a code cell:
```python
"Hello world!"
```

In [None]:
# Your turn:


For more information about the print() statement, we can access Python's built-in documentation and the Colab envionment's autocomplete and intellisense functionality.
```python
print?
```

In [None]:
# Your turn:


### Comments
Comments are an important part of any program.  In Python, comments begin with `#`:

```python
# This is a comment
print("Hello world!")

```

and they can also be used to annotate lines of code:
```python
print("Hello world!")  # Comments are fine here too!
```

Use comments liberally, when you are learning and even when you think you know exactly what you're doing.  Your comments should allow readers of your code to understand _why_ your code does what it does.  Remember that a primary audience for your comments will be you yourself when you revisit your code in the future!

### Indentation and Blocks
White-space (specifically white-space at the beginning of lines) is meaningful in Python programs, and defines groups of statements which constitute _blocks_, which will become important when we discuss control flow and functions.  For now, note that indentation must be consistent in a python file -- the convention is to use four spaces.  Mixing up indentation levels will result in errors:

In [None]:
# Run this cell, and notice the error
# Notice too that the whole cell fails to run, not just the line with the error
print("this is fine")
  print("this is not")

## Variables and Types


Variables are....
Assignment is...


Python has lots of native datatypes, but the most important ones are:

* **Booleans** -- `True` or `False`
* **Numbers** -- primarily integers and floats
* **Strings** -- sequences of characters
* **Lists** -- ordered sequences of values
* **Tuples** -- like **list**s but immutable
* **Dictionaries** -- unordered collections of key-value pairs
* **Sets** -- unordered collections of values

Below we will cover a brief introduction to _strings_, _lists_, and _dictionaries_.


### Strings

Strings are sequences of characters -- what we tend to call "text".

```python
# A simple string assignment
greeting = "Welcome to Introduction to Python"
print(greeting)
print(type(greeting))
```

In [None]:
# Your turn


Strings can be indicated by single (`'`) or double (`"`) quotes -- it makes no difference at all, and is merely a convenience.

"Escaping" (use of the backslash `\` before a character) can be used also.


```python
greeting = 'Welcome to "Introduction to Python"'
print(greeting)

greeting = "Welcome to \"Introduction to Python\""
print(greeting)
```


In [None]:
# Your turn


Strings may also be indicated by the use of triple quotes (`'''` or `"""` may be used).

```python
cheesemakers = '''
Man: I think it was, "Blessed are the cheesemakers"!
Gregory's wife: What's so special about the cheesemakers?
Gregory: Well, obviously it's not meant to be taken literally. It refers to any manufacturer of dairy products.
'''
print(cheesemakers)

```

In [None]:
# Your turn


#### String Methods

**String**s in Python have some special behaviors attached to them (called _method_s) that perform common tasks.

Let’s look a case-manipulation, for example:

```python
print("monty python’s flying circus".upper())
print("MONTY PYTHON’S FLYING CIRCUS".lower())
```

Case-manipulation is unicode-aware(but don't make the mistake of thinking this solves all problems!).

```python
print("norsk blå papegøye".upper())  # "Norweigian Blue parrot" in Norweigian :)
```

Capitalization is mostly trivial...
```python
print("monty python’s flying circus".capitalize())
```

...but it takes a naïve approach to title-case.
```python
print("monty python’s flying circus".title())
```

In [None]:
# Your turn


Strings also provide the .replace() method, which behaves as expected:

```python
brian = "Brian is the Messiah!"
print(brian)
brian = brian.replace("the Messiah", "a very naughty boy")
print(brian)
```

In [None]:
# Your turn


The `.strip()` and `.split()` methods are amongst the most commonly used, and are especially useful when dealing with data from external sources.

```python
# .split() breaks a string into pieces, and returns those pieces as a list
input_string = 'Monty Python and the Holy Grail'
input_string.split()
```

```python
# .split() takes an argument which specifies the sequence to split on
knights_of_the_round_table = 'Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, Sir Not-Appearing-in-this-Film'
knights_of_the_round_table.split(',')
```

In [None]:
# Your turn:


In the previous examples we have some extraneous white-space in the strings in our list. For the knights_of_the_round_table we could remove this by splitting on ', ' (try it!), but in “the wild”, we commonly need to clean-up the text that we read into our programs, and the .strip() method is a good first step.

```python
print(cheesemakers.split('\n'))
# because cheesemakers.strip() returns another string, we can chain these operations:
print(cheesemakers.strip().split('\n'))
```

Strings have many other useful methods -- use Google Colab’s tab-autocompletion feature to see what’s available.


