<center>
  <h1>Digital Tools and Methods for the Humanities and Social Sciences</h1>
  <img src="https://raw.githubusercontent.com/sul-cidr/Workshops/master/cidr-logo.no-text.240x140.png" alt="Center for Interdisciplinary Digital Research @ Stanford"/>
</center>

# Introduction to Python

### Instructors
- Simon Wiles (CIDR), <em>simon.wiles@stanford.edu</em>
- Peter Broadwell (CIDR), <em>broadwell@stanford.edu</em>

### Sign In
Please sign in for this workshop at: https://signin.cidr.link/Introduction_to_Python/ -- when you've submitted the sign-in form, please keep your browser tab open on the evaluation form as a reminder to complete it when the workshop is over.


### Goal
By the end of our workshop today, we hope you'll have an understanding of basic syntax in Python for variables, functions, and control flow, and have some familiarity with some of the fundamental data structures you'll need to work with. With these in hand, you'll know enough to write simple scripts and begin to explore other features of the language.

### Topics
- Variables and Types (strings, lists, dictionaries)
- Functions
- Control Flow
- Reading and Writing Text To/From a File 


### Evaluation survey
At the end of the workshop, we would be very grateful if you can, please, spend 1 minute answering a few questions that will help us to continue our workshop series.
- https://evaluations.cidr.link/Introduction_to_Python/


## Why Python?

* **Python is a great language for learning**
  - Python is a relatively simple and highly readable language
  - Python is a high-level interpreted language
  - This makes it fairly easy to learn, and fun to code in
  - Learning Python will make learning other languages easier
  
  
* **Python is powerful and multi-purpose**
  - Python can be used for almost anything, from writing simple scripts to automate tasks to building full-scale applications
  - Python is “Batteries-Included” -- the Standard Library includes a huge range of modules that can be depended upon
  - The broader Python community is huge and vibrant -- hundreds of thousands of third-party modules and tutorials cover almost all areas of computing
    + In particular, for our purposes, Python is at the very center of the Data Science and Machine Learning ecosystems


* **Python is cross-platform and free and open-source**
  - Python is freely available and is maintained and governed by its community of users
  - Because of this, and because of Python’s great versatility and popularity, it is available on more-or-less any computer system you are likely to come into contact with


In short, an investment in learning Python is as safe a bet as you are likely to find, no matter where you are now or where you may want to go from here.

---

### [XKCD #353: Python](https://xkcd.com/353/)

[![XKCD 353: Python](https://imgs.xkcd.com/comics/python.png)](https://xkcd.com/353/)

> <small>Alt. Text: *I wrote 20 short programs in Python yesterday.  It was wonderful.  Perl, I'm leaving you.*</small>



## Jupyter Notebooks and Google Colaboratory

Jupyter notebooks are a way to write and run Python code in an interactive way. They're quickly becoming a standard way of putting together data, code, and written explanations or visualizations into a single document and sharing that. There are a lot of ways that you can run Jupyter notebooks, including just locally on your computer, but we've decided to use Google's Colaboratory notebook platform for this workshop.  Colaboratory is “a Google research project created to help disseminate machine learning education and research.”  If you would like to know more about Colaboratory in general, you can visit the [Welcome Notebook](https://colab.research.google.com/notebooks/welcome.ipynb).

Using the Google Colaboratory platform allows us to focus on learning and writing Python in the workshop rather than on setting up Python, which can sometimes take a bit of extra work depending on platforms, operating systems, and other installed applications. If you'd like to install a Python distribution locally, though, we have some instructions (with gifs!) on installing Python through the Anaconda distribution, which will also help you handle virtual environments: https://github.com/sul-cidr/Workshops/wiki/Installing-and-Configuring-Anaconda-and-Jupyter-Notebooks

If you run into problems, or would like to look into other ways of installing Python or handling virtual environments, feel free to send us an email (contact-cidr@stanford.edu) for an online consultation.

It should be possible to follow along with this workshop using a regular Python console ("terminal", "command-prompt", etc.).  Please note however that we will not be able to support you with problems related to a local environment during this workshop, and we do recommend using the Colaboratory notebooks if you are at all unsure.

---
---

## Some Basic Language Features

### The `print()` function
The most basic way for a Python program to generate output is via the `print()` function, which `print`s directly to the console or terminal environment in which the Python interpreted is executed.  In our Jupyter / Google Colab notebooks, this produces output beneath the code cell:

```python
print("Hello World!")
```

In [None]:
# Your turn:


Jupyter code cells (and other Python console environments) will also automatically output the _return value_ of a cell (or other code block) -- in our context here that means the last statement in a code cell:
```python
"Hello world!"
```

In [None]:
# Your turn:


For more information about the print() statement, we can access Python's built-in documentation and the Colab envionment's autocomplete and intellisense functionality.
```python
print?
```

In [None]:
# Your turn:


### Comments
Comments are an important part of any program.  In Python, comments begin with `#`:

```python
# This is a comment
print("Hello world!")
```

and they can also be used to annotate lines of code:
```python
print("Hello world!")  # Comments are fine here too!
```

Use comments liberally, when you are learning and even when you think you know exactly what you're doing.  Your comments should allow readers of your code to understand _why_ your code does what it does.  Remember that a primary audience for your comments will be you yourself when you revisit your code in the future!

### Indentation and Blocks
White (blank) space -- specifically space at the beginning of lines -- is meaningful in Python programs, and defines groups of statements which constitute _blocks_, which will become important when we discuss control flow and functions.  For now, note that indentation must be consistent in a Python file -- the convention is to use four spaces.  Mixing up indentation levels will result in errors:

In [None]:
# Run this cell, and notice the error
# Notice too that the whole cell fails to run, not just the line with the error
print("this is fine")
  print("this is not")

---
---

## Variables and Types

In Python, all the "bits of information" of one sort or another in our programs can be referred to as _**value**s_.  Every _**value**_ has a _**type**_ (_**string**_, _**number**_, etc.), and _**variables**_ give us a way of naming and referring to those objects.


Python has lots of native datatypes, but the most important ones are:

* **Booleans** -- `True` or `False`
* **Numbers** -- primarily integers and floats: `3`, `3.14`
* **Strings** -- sequences of characters: `"abc"`
* **Lists** -- ordered sequences of values: `[ "no", 1, "expects" ]`
* **Tuples** -- like **list**s but immutable: `( "Graham", "John", "Terry", "Michael", "Eric", "Terry" )`
* **Dictionaries** -- unordered collections of key-value pairs: `{ "members": 6, "affiliates": [ "Douglas Adams", "Neil Innes" ] }`
* **Sets** -- unordered collections of unique values: `Set([ "Graham", "John", "Terry", "Michael", "Eric" ])`


_**Assignment**_ is the operation of attaching an _**identifier**_ to a _**value**_, and in Python this is done with a single equals sign (`=`).


---

### Strings

Strings are sequences of characters -- what we tend to call "text".

```python
# A simple string assignment
greeting = "Welcome to Introduction to Python!"
print(greeting)
print(type(greeting))
```

In [None]:
# Your turn -- type the above code or a similar greeting of your own below:


Strings can be indicated by single (`'`) or double (`"`) quotes -- it makes no difference at all, and is merely a convenience.

In [None]:
fair_warning = 'We will be using examples from "Monty Python"...'
print(fair_warning)

"Escaping" (use of the backslash `\` before a character) can be used also, typically if we want single and double quotes in a string:

In [None]:
fair_warning = "We'll be using examples from \"Monty Python\"..."
print(fair_warning)

In [None]:
# Strings may also be indicated by the use of triple quotes (''' or """ may be used).
fair_warning = '''We'll be using examples from "Monty Python"...'''
print(fair_warning)

In [None]:
# This is especially useful if you'd like to have multi-line strings
cheesemakers = '''
Man: I think it was, "Blessed are the cheesemakers"!
Gregory's wife: What's so special about the cheesemakers?
Gregory: Well, obviously it's not meant to be taken literally. It refers to any manufacturer of dairy products.
'''
print(cheesemakers)

In [None]:
# Strings in Python 3 are fully unicode-compliant
monty_python_zh = '蒙提·派森的飛行馬戲團'
print(monty_python_zh)

There are many ways to programmatically construct and create strings, but the only one we'll note here is joining strings together using the addition operator `+`:

In [None]:
# we can join string literals
"Monty Python's" + " " + "Flying Circus"

In [None]:
# we can also join combinations of literals and variables
string_1 = "Monty Python"
string_2 = "Holy Grail"

joined = string_1 + " and the " + string_2

print(joined)

---

#### String Methods

**String**s in Python have some special behaviors attached to them (called **method**s) that perform common tasks.

Let’s look a case-manipulation, for example:

In [None]:
# Upper- and lower-case are straightforward:
print("Uppercase:   ", "monty python's flying circus".upper())
print("Lowercase:   ", "MONTY PYTHON'S FLYING CIRCUS".lower())

# Case-manipulation is unicode-aware
#  (but don't make the mistake of thinking this solves all problems!).
print("Uppercase:   ", "norsk blå papegøye".upper())  # "Norwegian Blue parrot" in Norwegian :)

# Capitalization is mostly trivial...
print("Capitalized: ", "monty python's flying circus".capitalize())

# ...but it takes a naïve approach to title-case.
print("Title Case:  ", "monty python's flying circus".title())

Strings also provide the `.replace()` method, which behaves as expected:

In [None]:
brian = "Brian is the Messiah!"
print(brian)

# He's not the Messiah...
brian = brian.replace("the Messiah", "a very naughty boy")
print(brian)

The `.strip()` and `.split()` methods are amongst the most commonly used, and are especially useful when dealing with data from external sources.

In [None]:
# .split() breaks a string into pieces, and returns those pieces as a list
input_string = "Monty Python and the Holy Grail"
input_string.split()

In [None]:
# .split() takes an argument which specifies the sequence to split on
knights_of_the_round_table = "Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, Sir Not-Appearing-in-this-Film"
knights_of_the_round_table.split(',')

In some of the previous examples we have had some extraneous white-space in our strings.  For the `knights_of_the_round_table` we could remove this by splitting on `", "` (try it!), but in “the wild” we commonly need to clean-up the text that we read into our programs, and the `.strip()` method is a good first step:

In [None]:
# .strip() can be called on string literals as well as variables
"  ...and now for something completely different!  ".strip()

In [None]:
# .strip() can also take an argument which is a sequence of
#  characters to strip from the ends of the input
"  ...and now for something completely different!  ".strip(" .!")

In [None]:
# in our excerpt from the script for the Life of Brian, we had some
#  whitespace at the beginning and the end, in the form of (probably)
#  extraneous new-line characters -- as can be seen when we .split()

cheesemakers = '''
Man: I think it was, "Blessed are the cheesemakers"!
Gregory's wife: What's so special about the cheesemakers?
Gregory: Well, obviously it's not meant to be taken literally. It refers to any manufacturer of dairy products.
'''

cheesemakers.split("\n")

In [None]:
# a reminder of what our value looks like when we don't print() it
#  (notice the \n symbols which represent the newline character)
cheesemakers

In [None]:
# first we can remove these leading and trailing new-line characters:
print(cheesemakers.strip())

In [None]:
# and because cheesemakers.strip() returns another string, we can chain
#  these operations and do them both in one line:
cheesemakers.strip().split("\n")

Strings have many other useful methods.  The full list is available in the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods), but it's often more convenient to use Google Colab’s autocompletion feature to see what’s available.  Hovering the mouse cursor over an identifier will give some pop-up help, too, and if that's not sufficient we can access Python's built-in documentation (known as `docstrings`) by appending a question mark (`?`) and executing the cell.

In [None]:
# for the autocompletion to work, a string must already be assigned to
# the variable identifier
my_string = "abc"

In [None]:
# now type "my_string." below
# select a method, and hover the mouse pointer over the method name for more information
# you can also try affixing a "?" and running the cell




---

#### Indexing

Indexing is one of the most important and recognizable features in Python.  It's very powerful, and Python programmers agree that it's very elegant and intuitive once you get the hang of it.  At the same time, it is often something that newcomers to Python find a little confusing.

> Python sequences are "zero-indexed" -- meaning that the first member of a sequence is found at index 0.  It can be helpful to think of the indices as offsets from the beginning the sequence.

In [None]:
# to access ("to index") a single character from a string, we use square-bracket notation.
norwegian_blue = "This parrot is no more! It has ceased to be!"

norwegian_blue[1]

In [None]:
# we can also index using negative numbers
norwegian_blue[-1]

Slicing is the practice of using indexing notation to return a 'slice' of a sequence.  The basic pattern is `identifier[start:end:step]`.

In [None]:
# from the 6th character (at index 5) to the 11th (at index 10)
norwegian_blue[5:11]

The `start` and `end` values can be omitted, and the beginning or end of the string are implied, respectively:

In [None]:
alphabet = 'abcdefghijklmnopqrstuvwxyz'

print(alphabet[:10])               # from the beginning to the 10th character
print(alphabet[0:10])              # equivalent

print(alphabet[10:])               # from the 10th character to the end
print(alphabet[10:len(alphabet)])  # equivalent

print(alphabet[:-10])              # from the beginning up to 10 characters from the end
print(alphabet[0:-10])             # equivalent

In [None]:
# indexing and slicing becomes more powerful when using variable indices
slice_length = 10
print("slice:     ", alphabet[:slice_length])  # the first `slice_length` characters
print("remainder: ", alphabet[slice_length:])  # the remainder (regardless how long that is)

> Try changing the value of `slice_length` above and running the cell again.  Try out-of-bounds values, and negative values, and see how it behaves.

---

#### String Formatting

There's lots more to learn about strings that we don't have time to cover here.  Probably the most important thing that's not covered here is string formatting using the `.format()` method, or even better, **f-strings**.

* For a nice quick introduction try this article: [Python String Formatting Best Practices](https://realpython.com/python-string-formatting/) (the flow chart at the end is good -- you should almost always be using methods #2 or #3).
* For a rather deeper dive, try: [A Guide to the Newer Python String Format Techniques](https://realpython.com/python-formatted-output/).

---

### Lists

**List**s are one of the most important data-structures available in Python.  They can be created in a number of different ways.

In [None]:
# literal lists are built with square-bracket notation
pythons = ['Graham', 'John', 'Terry', 'Eric', 'Terry', 'Michael']
print(pythons)

Above we saw how **list**s are returned by the `.split()` method of a **string**.

In [None]:
# a list created by splitting a string
knights_of_the_round_table = 'Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, Sir Not-Appearing-in-this-Film'
knights_of_the_round_table.split(', ')

A third way is to create an empty **list** and use the `.append()` method that exists on all lists:

In [None]:
# we can create an empty list with the following syntax
menu = []

# and then add items to the list with the .append() method
menu.append('egg')
menu.append('bacon')
menu.append('sausage')
menu.append('spam')

print(menu)

This example highlights one of the features that make **list**s so versatile and useful -- they can be added to once they have been created.  Building **list**s programmatically is one of the most common types of task you are likely to find yourself tackling in Python.

This feature of **list**s is called _**mutability**_, and it includes the ability to modify entries in the list in-place:

In [None]:
print("before: ", menu)

menu[0] = "spam"
menu[1] = "spam"
menu[2] = "spam"

print("after:  ", menu)

If `.split()` allows us to turn **string**s into **list**s, `.join()` allows us to turn **list**s into **string**s.

In [None]:
# joining our list literal from above
pythons = ['Graham', 'John', 'Terry', 'Eric', 'Terry', 'Michael']
print(", ".join(pythons))

In [None]:
# join the menu with the word "and"
print(" and ".join(menu))

In [None]:
# joining on the newline character is often useful, too
pythons = ['Graham', 'John', 'Terry', 'Eric', 'Terry', 'Michael']
print("\n".join(pythons))

In [None]:
# splitting and re-joining is a common operation
print(knights_of_the_round_table)
print("-------")
print("\n".join(knights_of_the_round_table.split(", ")))

**List**s can be indexed just like strings:

In [None]:
print("pythons:     ", pythons)
print("pythons[0]:  ", pythons[0])
print("pythons[-1]: ", pythons[-1])
print("pythons[:3]: ", pythons[:3])
print("pythons[3:]: ", pythons[3:])

---

### Dictionaries

A dictionary in Python is a collection of `key`/`value` pairs -- each pair together is known as an `item`.  Values can be any kind of value you like -- strings, numbers, lists, more dictionaries, etc. -- but keys have certain restrictions.  For the majority of purposes you should be using strings as your dictionary keys.

In [None]:
# literal dictionaries are created using brace notation:
pythons_by_family_name = {
    "Chapman": "Graham Chapman",
    "Cleese": "John Cleese",
    "Gilliam": "Terry Gilliam",
    "Idle": "Eric Idle",
    "Jones": "Terry Jones",
    "Palin": "Michael Palin"
}
pythons_by_family_name

In [None]:
# dictionary keys must be unique in the dictionary, so note that while
#  this is legal...
pythons_by_first_name = {
    "Graham": "Graham Chapman",
    "John": "John Cleese",
    "Terry": "Terry Gilliam",
    "Eric": "Eric Idle",
    "Terry": "Terry Jones",
    "Michael": "Michael Palin"
}
# the second appearance of the duplicate key has over-written the first
pythons_by_first_name

In [None]:
# dictionary values are accessed using square-bracket notation -- a little
#  bit like indexing for sequence types

cheeseshop = {
    "Red Leicester": "I'm afraid we're fresh out of Red Leicester sir.",
    "Tilsit": "Never at the end of the week, sir. Always get it fresh first thing on Monday.",
    "Caerphilly": "Ah well, it's been on order for two weeks, sir. I was expecting it this morning.",
    "Bel Paese": "Sorry.",
    "Red Windsor": "Normally, sir, yes, but today the van broke down.",
    "Stilton": "Sorry."
}

print("Do you have any Tilsit?", cheeseshop["Tilsit"], sep="\n")


In [None]:
# attempting to access a key that doesn't exist raises a `KeyError` exception
cheeseshop["Venezuelan Beaver Cheese"]

We're not going to work much more with dictionaries in this workshop, but they are one of the most important weapons in the Python programmer's arsenal and there's much more to be learned about them.

---
---

## Functions

**Function**s are blocks of reusable code.  They provide a way to break monolithic programs into parts; this aids greatly in writing code that is readable, maintainable, and extensible.  It is especially important, of course, when a block of code will or may be used multiple times in the same or similar ways, but a section of code does not need to be used multiple times to justify creating it as a function.

---

### Built-In Functions

We've already seen a number of **function**s in what we've done so far -- `print()`, of course, but there's also `len()` and `type()`.  When functions are bound to objects -- as `.format()`, `.lower()` etc. on string objects, or `.append()` on list objects, we call them **method**s, but the difference is minimal and need not concern us at this point.

The functions (and methods) we've seen so far are “built-in” functions -- they're provided to us by the Python interpreter.  The important thing we need to learn in this section is how to _define_ and _call_ (or _invoke_) our own functions.

---

### `def`ining Your Own Functions

User-Defined Functions, as they are sometimes called, can be created at any time in Python code.  Notice that defining a function in this way creates no output.

In [None]:
# functions are introduced with the "def" keyword (note the final colon)
def knights_who_say_ni():
    # the block of indented statements beneath the definition statement
    #  constitutes the body of the function
    print("Ni!")

In [None]:
# when defined in this way, they can be called just like built-ins:
knights_who_say_ni()

In [None]:
# of course, functions are more useful when they take arguments
def ask_for_cheese(cheese):
    print("Do you have any " + cheese + "?")
    print(cheeseshop[cheese])
    print()

In [None]:
# arguments (or parameters) are passed when the function is invoked:
ask_for_cheese("Red Leicester")

In [None]:
# the benefit of a function like this, of course, is when it will be called many times:
ask_for_cheese("Red Leicester")
ask_for_cheese("Tilsit")
ask_for_cheese("Caerphilly")
ask_for_cheese("Bel Paese")
ask_for_cheese("Red Windsor")
ask_for_cheese("Stilton")

In [None]:
# ...
ask_for_cheese("Venezuelan Beaver Cheese")

In [None]:
# functions can take as many arguments as you need
def greeting(first_name, last_name):
    print("Good evening, my name is " + first_name + " " + last_name + ".")

greeting("Michael", "Palin")

In [None]:
# and they may be called with "keyword arguments" -- this can
#  help readability, and allows passing the arguments in any order
greeting(first_name="Michael", last_name="Palin")
greeting(last_name="Palin", first_name="Michael")

Unless otherwise specified with a `return` statement, functions do not return a value (which is to say, their return value is `None`).

In [None]:
value = greeting("Eric", "Idle")
print(value)

In [None]:
# to make a function return a value, use an explicit return statement
def greeting(first_name, last_name):
    return "Good evening, my name is " + first_name + " " + last_name + "."

value = greeting("Graham", "Chapman")
print(value)

---

### Activity: Write a Pig-Latinizer Function

Pig Latin is a language game where you take the first letter of a word, move it to the end of the word, then add “-ay” at the end. For example, “pig latin” would be “igpay atinlay” and “monty python” would turn into “ontymay ythonpay”.

In the cell below, write a function that takes a string, lowercases it, and returns the Pig Latin translation of the word. You'll need to use slicing and string concatenation or formatting to make this work.

In [None]:
def pig_latinize(word):
    # write your code here -- don't forget to return a result!


    
# the following should return 'pamalotsay'
pig_latinize('Spamalot')

In [None]:
#@title → Double-click Here to Show/Hide Hints

# 1. use word.lower() to lowercase the input
# 2. you'll need to use the indexing techniques to get the first letter
#    of the word, and everything-but-the-first-letter of the word separately
# 3. once you have the pieces you need, stick them together to make your output


In [None]:
#@title → Double-click Here to Show/Hide a Prepared Solution

def pig_latinize(word):
    word = word.lower()
    return word[1:] + word[0] + 'ay'


# the following should return 'pamalotsay'
pig_latinize('Spamalot')

---
---

## Control Flow

Using functions to split up our monolithic code in to manageable blocks also introduces our first element of **_control flow_** -- the ability to have code execute (or not execute) in a non-linear order (i.e. not simply from top to bottom, line by line).

In addition to functions, there are two main control flow mechanisms that we will look at in this section: **_conditional execution_** or **_branching_** causes blocks of code to be executed or ignored based on the evaluation of a boolean expression; and **_loop_**s allow us to specify the execution of blocks of code a fixed or indeterminate number of times.

Raising and handling **_exception_**s is another way to control the flow of execution in a Python program, but we're not going to cover that here.

---

### Conditional Execution (`if` statements)

In Python, the compound `if` statement is comprised of three clauses: `if`, `else`, and `elif`.  Each of these clauses defines a block which executes if the appropriate condition is met.

In [None]:
# if statements test a condition, and execute a block of code accordingly
if 2 + 2 == 4:
    print('Of course, two plus two equals equals four [sic].')

In [None]:
# We can also add a block of code in an "else" clause
test_word = "shrubbery"

if len(test_word) > 8:
    print("that's a long word!")
else:
    print("that's a short word!")

In [None]:
# and there's an elif clause if you need more than two branches
test_word = 5

if type(test_word) != str:
    print("that's not a word at all!")
elif len(test_word) > 8:
    print("that's a long word!")
else:
    print("that's a short word!")
    

In [None]:
# if statements can have as many elif clauses as needed
def spanish_inquisition(weapons):
    if weapons == 1:
        print("surprise")
    elif weapons == 2:
        print("fear and surprise")
    elif weapons == 3:
        print("fear and surprise and ruthless efficiency")
    elif weapons == 4:
        print("fear and surprise and ruthless efficiency and an almost fanatical devotion to the Pope")
        
spanish_inquisition(3)

---

### Loops and Iteration (`for` loops)

There are two basic kinds of loops in Python: those introduced with the `for` statement are sometimes called _iterative loops_, and we'll look at those below.  Python also has loops that can be introduced with the `while` statement, and these are sometimes referred to as _conditional loops_.  `while` loops are less common in Python, and we won't cover them in this workshop.

`for` loops repeat the execution of a code block dependent on an iterable expression. For our purposes an iterable may be understood to refer to anything that can be looped over in a for loop -- in particular, this includes anything that is a sequence type; a list, a tuple, or a string.

In [None]:
# the syntax for a for-loop looks like this:
#  (as with function definition, note the colon and the indented block)
for python in pythons:
    print(python)

In [None]:
# we can also iterate over the keys of a dictionary
for cheese in cheeseshop:
    print(cheese)

In [None]:
# this becomes more and more useful as we do more in the body of the loop
#  -- here we can re-use the ask_for_cheese() function from above
for cheese in cheeseshop:
    ask_for_cheese(cheese)

### Additional Topics in Control Flow

There is lots more to learn about flow of execution in Python that we've not had time to cover here.  Among the more important topics you might want to look into are:

* comparison operators beyond `==`, `>`, `<`
* logical operators such as `and`, `or`, and `not`
* `while` loops, sometimes known as **conditional loops**
* the `break` and `continue` statements, and other strategies for exiting loops early (including recursive functions and methods)
* `try: ... except:` blocks and raising and handling `exceptions`

---

### Activity: Pig-Latinize a sentence

In the cell below, write a function that can pig-latinize and entire phrase or sentence.

For example, if your input is:
> `"A 5 ounce bird could not carry a 1 pound coconut"`

the output should be:
> `"aay 5 unceoay irdbay ouldcay otnay arrycay aay 1 oundpay oconutcay"`


* You should reuse the `pig_latinize` function you wrote above.
* You'll need to use a loop to operate on each word in turn.
* Use the `.isnumeric()` method on each word to test for digits.
* Use a conditional structure (`if`/`else`) to execute a different behavior depending on the result of the test.



In [None]:
def pig_latinize_sentence(sentence):
    # Your code goes here

    
    
pig_latinize_sentence("A 5 ounce bird could not carry a 1 pound coconut")

In [None]:
#@title → Double-Click Here to Show Hints


# You'll need to begin by splitting the sentence into a list of words

# Create an empty list to `.append()` the pig-latinized words to

# For each word in the list, test whether it's numeric or not
#  -- if it is, just append it to the list unaltered

# Join the list back together to create the output sentence

# Don't forget to `return` the output!


In [None]:
#@title → Double-Click Here to Reveal a Prepared Solution

def pig_latinize_sentence(sentence):

    words = sentence.split()
    latinized_words = []
    for word in words:
        if word.isnumeric():
            latinized_words.append(word)
        else:
            latinized_words.append(pig_latinize(word))
    latinized_sentence = " ".join(latinized_words)

    return latinized_sentence

pig_latinize_sentence("A 5 ounce bird could not carry a 1 pound coconut")

---
---

## Reading and Writing Text To/From a File

We cover working with CSV and other kinds of data files in some of our other CIDR workshops.  If we have time in this workshop, though, we'll finish by working together as a group to apply our `pig_latinize` function to an entire file.

To do so, we'll begin by looking at how to write and read simple files.  We'll use the text of Eric Idle's "Galaxy Song".

In [None]:
galaxy_song = '''
[spoken]
Whenever life gets you down, Mrs. Brown,
And things seem hard or tough,
And people are stupid, obnoxious or daft,

[sung]
And you feel that you've had quite eno-o-o-o-o-ough,

Just remember that you're standing on a planet that's evolving
And revolving at 900 miles an hour.
It's orbiting at 19 miles a second, so it's reckoned,
The sun that is the source of all our power.
Now the sun, and you and me, and all the stars that we can see,
Are moving at a million miles a day,
In the outer spiral arm, at 40,000 miles an hour,
Of a galaxy we call the Milky Way.

Our galaxy itself contains a hundred billion stars;
It's a hundred thousand light-years side to side;
It bulges in the middle sixteen thousand light-years thick,
But out by us it's just three thousand light-years wide.
We're thirty thousand light-years from Galactic Central Point,
We go 'round every two hundred million years;
And our galaxy itself is one of millions of billions
In this amazing and expanding universe.

[waltz]

Our universe itself keeps on expanding and expanding,
In all of the directions it can whiz;
As fast as it can go, at the speed of light, you know,
Twelve million miles a minute and that's the fastest speed there is.
So remember, when you're feeling very small and insecure,
How amazingly unlikely is your birth;
And pray that there's intelligent life somewhere out in space,
'Cause there's bugger all down here on Earth!
'''

When working with files in Python, we need to:
* open the file and save a pointer to it
* write our content to the file
* close the file

This could look something like this:
```python
file_handle = open("galaxy_song.txt", "w")
file_handle.write(galaxy_song)
file_handle.close()
```

This can be a cumbersome operation, and so the common idiom in Python looks like this:

In [None]:
# using a "with" block means we don't have to worry about closing
#  the file properly ourselves
with open("galaxy_song.txt", "w") as file_handle:
    file_handle.write(galaxy_song)

The "Files" tab in left-hand sidebar should now reveal the new file we've created.  You may need to click the "Refresh" button to see it.

In [None]:
# Now, let's read the file back.
with open("galaxy_song.txt", "r") as file_handle:
    file_contents = file_handle.read()
    
print(file_contents)

In [None]:
# We can also read the file line by line
with open("galaxy_song.txt", "r") as file_handle:
    for line in file_handle:
        line = " == " + line
        print(line, end="")

### Group Activity: Pig-Latinize a File

Our goal is to read in the contents of our `galaxy_song.txt` file, Pig-Latinize it, and write the translated version out to a new file.  

Here are some hints:

* it will probably be easiest to go through the song line-by-line, and then word-by-word
* for each line, build an new list of pig-latinized words, and then save that line, before moving on to the next
* the direction lines (in square brackets) should be left as they are


In [None]:
# (Y)Our code here:
# (there's no need to write this as a function -- unless you want to)





In [None]:
#@title → Double-Click For Step-by-Step Hints

#  1. Create an empty list called "latinized_lines" to store each line after it
#     has been pig-latinized.
#  2. Open the file using a "with" block, getting a reference to a file_handle
#     object.
#  3. Inside the "with" block, loop through the file line-by-line.
#  4. For each line, begin by stripping the newline character from the end.
#  5. If the line begins with an opening square bracket, consider it a
#     stage-direction and add it to the "latinized_lines" list as it is.
#  6. Otherwise, split the line into a list of words
#  7. Create an empty list called "latinized_words" to store each word after it
#     has been pig-latinized.
#  8. For each word, pig-latinize it by calling the pig_latinize() function we
#     created above.
#  9. Add the pig-latinized version of the work to the "latinized_words" list.
# 10. Once every word has been processed, exit the for-each-word loop and join
#     the "latinized_words" list back together into a string that represents
#     the whole line.
# 11. Add this pig-latinized line to the "latinized_lines" list.
# 12. Once every line has been processed, exit the for-each-line loop and the
#     "with" block, and join the "latinized_lines" list back together to create
#     a string that comprises the entire text.
# 13. Open a new file for writing using a new "with" block.
# 14. Inside the "with" block, write the string to the file.
# 15. Profit?


In [None]:
#@title → Double-Click Here to Reveal a Prepared Solution

# there are lots of ways to achieve this task: here is a
#  simple solution using only the techniques we've covered
#  in this workshop.


# create an empty list to .append() our latinized lines to
latinized_lines = []

# open the file for "r"eading
with open("galaxy_song.txt", "r") as file_handle:
    
    # loop through the contents of the file one line at a time
    for line in file_handle:
        
        # remove the newline character at the end of every line
        line = line.strip("\n")        
        
        # lines with stage directions should just be added to our
        #  output unaltered
        if line.startswith("["):
            latinized_lines.append(line)
        
        # otherwise...
        else:           
            # split the line into individual "words"
            words = line.split()
            
            # create a new empty list to .append() latinized words to
            # (a new list just for this contents of *this current line*)
            latinized_words = []

            # loop through the words
            for word in words:
                
                # latinize the word...
                latinized_word = pig_latinize(word)
                
                # ...and then append it to the list 
                latinized_words.append(latinized_word)

            # now that we've latinized all the words (we've completed 
            #  and exited the loop), we need to join the words back
            #  together into a line
            latinized_line = " ".join(latinized_words)

            # we can capitalize the first letter of each line
            latinized_line = latinized_line.capitalize()
            
            # and then we can add the newly created line to our list of lines
            latinized_lines.append(latinized_line)

# finally, now that we've got all the lines, we join them back together
#  into a new block of text, by joining on the newline character "\n"
latinized_text = "\n".join(latinized_lines)

# and then just write it out to our new file!
with open("galaxy_song_latinized.txt", "w") as file_handle:
    file_handle.write(latinized_text)


print("Look for galaxy_song_latinized.txt in the Files panel on the left!")

In [None]:
#@title → Double-Click Here For an Alternative Solution

# For a point of comparison, here's the exact same algorithmic operation coded 
#  in a much terser fashion.  If you spend a lot of time writing Python code this
#  is surprisingly readable -- but even then, I think you could make a strong case
#  that the longer, more spelled-out version is better in most respects.

latinized_galaxy_song = "\n".join(
    " ".join(pig_latinize(word) for word in line.split()).capitalize()
    if not line.startswith("[") else line.strip()
    for line in open("galaxy_song.txt", "r")
)

print(latinized_galaxy_song)

---

A couple of obviously desirable improvements that could be made to this include better handling of punctuation and numbers expressed in digits.  Both of these would probably be best handled my making the `pig_latinize()` function a bit more sophisticated.  This is a bit out-of-scope for our workshop today, but below is one approach for those who would like to go a little further.  Let us know if you'd like to know more about these kinds of more advanced techniques and packages in future workshops!


In [None]:
# inflect is a third-party package that happens to be installed by
#  default in Google Collaboratory environments, so we'll use it to
#  convert the digits into words, and then we'll pig-latinize those
#  words -- e.g. 19 -> "ineteennay"
import inflect 

inf = inflect.engine()

def pig_latinize_2(word):
    # This bit deals with the digits by converting them into words and
    #  then pig-latinizing the words.
    # Note that since converting digits to words might mean turning one "word" into 
    #  more that one (e.g. 900 -> "nine hundred"), the function calls itself again
    #  on each of the new words -- this is what is called "recursion".
    if any(char.isnumeric() for char in word):
        return " ".join(
            pig_latinize_2(new_word)
            for new_word in inf.number_to_words(word).split()
        )

    word = word.lower()

    # This bit handles getting the punctuation in the right place.
    # It's pretty naïve -- it only works when there's a single punctuation
    #  character in the final position (which is fine for our text).
    # I'm not sure what the canonical way to deal with apostrophized 
    #  contractions and ellipsis is in Pig Latin!
    if word.endswith(tuple(",.;!")):
        return word[1:-1] + word[0] + 'ay' + word[-1]

    return word[1:] + word[0] + 'ay'


latinized_galaxy_song = "\n".join(
    " ".join(pig_latinize_2(word) for word in line.split()).capitalize()
    if not line.startswith("[") else line.strip()
    for line in open("galaxy_song.txt", "r")
)

print(latinized_galaxy_song)

# Additional Resources and Topics

## Resources
- https://python.swaroopch.com/ (A Byte of Python is a great intro book and reference for Python)
- https://docs.python.org/3/ (Official Python documentation and tutorials)
- https://realpython.com/ (Contains a lot of different tutorials at different levels)
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/) (Simon's personal favourite)
- LinkedIn Learning (formerly Lynda.com): https://www.linkedin.com/learning/topics/python (LinkedIn Learning is avaliable for free to those with Stanford accounts.)

## Further Topics
- Other data structures: in particular, `sets` and `tuples`
- Comprehensions
- Modules, libraries, packages, and `pip`
- Writing `.py` scripts, and using IDEs
- Virtual environments
- The object-oriented paradigm in Python: classes, methods

Thank you!