# Python Basics, Part 1

In our first session this morning, we want to get you familiar with how to use `JupyterLab`, along with the basics of `Python` syntax and the data types we use in `Python`: numbers, strings, lists, and dictionaries. If you have used `Python` before, some of this material will be review, but you'll probably see some new stuff too!

## -1. What is Python?

`Python` is a flexible and widely used programming language in data science and software engineering. Unlike many programming languages, one of the central tenets of the design (and writing) of `Python` is that `Python` code should be _easy to read_. Python is frequently used in a wide range of applications, from web development to machine learning.

Why do people like to use `Python`? `Python` is:
* Fast,
* Modular,
* Object-oriented,
* Extensible (i.e., there are _lots_ of libraries),
* Easy to read,
* Most importantly: gets the job done.

`Python` is the name of a programming language. But different people often interface with `Python` in a number of different ways:
1. Running interactive commands with the `python` interpreter. This is done by typing in `python` (or `python3`, if you have `Python 2` and `Python 3` installed on your computer) at the command line.

<img src=img/ss_python_interp.png width=500>
2. Python development in some kind of text editor or integrated development environment (IDE).

<img src=img/ss_python_spyder.png width=500>
3. Research-related scripting with heavy documentation and snippets of code. This is especially common in data wrangling.

<img src=img/ss_python_notebook.png width=500>
    
Today, we'll be focusing on the third approach, namely, using `Python` in `Jupyter` notebooks for research, but it's worth exploring all approaches!

## 0. Using `Jupyter` Notebooks

`Jupyter` notebooks are an excellent tool for integrating your code, notes and explanations, and results all in one place. `Jupyter` notebooks are reproducible, and allow you to capture not only _what_ you did in your analysis, but also _why_ you did the analysis. We'll be working with them through the `JupyterLab` interface.

### 0.0. Launching `JupyterLab`

Getting a `Jupyter` notebook up and running is easy. Simply open up your favorite terminal app, and type `jupyter lab`. `Jupyter` will print some information about the server (which you can ignore for now), and then a webpage should open in your browser that looks like the second picture below.

<img src="img/ss_jupyter_launch.png" width=500>

<img src="img/ss_jupyter_main_screen.png" width=500>

(Don't worry if things look slightly different locally---the appearance of your terminal and the `JupyterLab` interface will change slightly from computer to computer.)

#### 0.0.0.Exercise

Open `JupyterLab` on your computer.

### 0.1. Navigation

Navigating around `JupyterLab` is relatively easy. The screen is broken into three pieces:

1. **The Menu Bar:** Similar to a program like Microsoft Word, many important actions are performed through the menu bar located at the top of the screen. To modify, save, or close files, or navigate through directories, click, e.g., `File > Save`. To alter the appearance of the `JupyterLab` interface, click, e.g., `View > Show line numbers`.
2. **The Side Bar:** The left sidebar is home to a number of commonly used tabs, the names of which you can display by mousing over them. We will primarily work with the topmost tab, which contains the file browser. Other tabs allow you to manage kernel and terminal sessions, issue commands, or control open tabs.
3. **The Main Panel:** This is the main surface for programming and other activities. The main panel is broken up into tabs, which can be dragged, dropped, and resized.

<img src="img/ss_screen_example.png" width=500>

In the picture above, the main panel is overlaid in blue, the sidebar is overlaid in red, and the menu bar is overlaid in green.

To launch a notebook, simply navigate to the folder containing the notebook, and double-click it.

#### 0.1.0. Exercise

Open `python_basics_part_1.ipynb` to start the first lesson. You should now be able to see locally the instructions that I've been projecting onto the main screen.

### 0.2. Markdown and Code Cells

As we mentioned earlier, `Jupyter` notebooks are a valuable tool because they allow you to interweave explanatory prose for humans to read with code for your computer to run. The way this is accomplished is with "cells." Cells come in two types: "text" or "markdown" cells, where your notes and explanations live; and "code" cells, which will contain your `Python` code.

Cells can be created by clicking on the "+" icon located at the top of the current pane.

<img src=img/ss_add_cell.png width=500>

#### 0.2.0. Exercise

Try adding a cell to this notebook with the "+" icon. To remove it, click on the scissors icon next to it. (This actually "cuts" the cell, so it will still be present on your clipboard.)

#### 0.2.1. Markdown Cells

Text cells are formatted using a lightweight markup language called `markdown`. What this means is that you can easily style the text.

* For *italics*, write `*text*` or `_text_`.
* For **bold** text, write `**text**` or `__text__`.
* To add bulleted items, simply add an asterisk to the beginning of a line, e.g., `* text text text`.
* To start a numbered list, begin the line with a number and a period, e.g., `1. text text text`.
* To insert `code` directly into a text block, simply place backticks around the code, e.g., `` `code` ``.

While these basics are enough for our purposes, much, much more is possible in `markdown`: you can add headers, footnotes, and even directly insert HTML. For a more thorough introduction, click [here](https://www.markdownguide.org/getting-started).

#### 0.2.2. Code Cells

The bread and butter of `Jupyter` notebooks are code cells, which we'll be working with at length below. Code cells have two parts: an input box, where you write the `Python` code, and an output box.

<img src="img/ss_code_cell.png" width=500>

Here, the input is higlighted in red, and the output is highlighted in blue.

#### 0.2.3. Exercise

Create a cell, and then change it from a code cell to a markdown cell and back again. Use the dropdown menu located  in the toolbar:

<img src="img/ss_change_cell.png" width=500>

### 0.3. Saving and switching between notebooks

Let's say you've added a lot of code and text to a notebook---what should you do to "render," or complete the notebook? Click `Run > Run all cells`. This will output nicely formatted text in all of the text boxes, and run all of the code in the code cells. You can also click the play button in the current pane.

To save the notebook output, go to `File > Save`.

### 0.4. How to get help <a id='help'></a>

Even seasoned software engineers frequently come across functions they don't know how to use or code they don't understand. One of the best places to look for help is at the official `Python` documentation. To bring it up, simply click `Help > Python Reference` in the menu bar. Usually, you can find the function (or "module," `Python`'s special word for libraries) just by searching in the search box!

#### 0.4.1. Exercise

Python also has a built-in help function: `help()`. It can be a little intimidating at first, but it's one of the best ways to get help on whatever you're working with. Practice calling the `help()` function by calling `help()` on, well, itself: `help(help)`.

In [1]:
### START
help(help)
### END

Help on _Helper in module _sitebuiltins object:

class _Helper(builtins.object)
 |  Define the builtin 'help'.
 |  
 |  This is a wrapper around pydoc.help that provides a helpful message
 |  when 'help' is typed at the Python interactive prompt.
 |  
 |  Calling help() at the Python prompt starts an interactive help session.
 |  Calling help(thing) prints help for the python object 'thing'.
 |  
 |  Methods defined here:
 |  
 |  __call__(self, *args, **kwds)
 |      Call self as a function.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



## 1. Warmup: Approximating Pi

Let's begin with a warmup: using `Python` to approximate $\pi \approx 3.14159\ldots$.

There's a smart aleck way to do this: `Python` already stores the constant `pi` in the `math` module:

In [2]:
import math

math.pi

3.141592653589793

(Don't worry if none of that makes sense yet. It all will by the end of the day today!)

Let's instead pretend that we need a way of calculating $\pi$ on our own. From scratch.

How might we go about doing that? Well, one thing that we definitely know about $\pi$ is the classic formula for the area of a circle: $A_\circ = \pi \cdot r^2$. The area of a square is even easier: $A_\square = h^2$. Let's look at a picture:

<img src=img/ss_circ_sqr.png width=500>

Clearly the blue circle has area $\pi\ \text{units}^2$ and the red square has area $4\ \text{units}^2$. What's behind the shapes?

<img src=img/ss_dart_board.png width=500>

A dart board! What's interesting about this dartboard is the probabilities of hitting it. Let's imagine we're playing darts and our aim is ok, but not very good. We can always land in the red square, but within the red square we're as likely to hit one spot as another.

#### 1.0. Exercise
What percentage of our darts should we expect even to hit the board?

In [3]:
# Type your answer here!

So if we just had a dartboard instead of this pesky `JupyterLab` notebook, approximating $\pi$ would be a breeze! We'd just throw lots of darts at the board. We'd just count the percentage that hit the board, instead of missing, and then multiply by four.

Fortunately, this strategy---frequently called _Monte Carlo simulation_---works in `Python` too!

Let's begin by generating some dart throws. A dart throw needs an $x$ and a $y$ coordinate. Our darts always land in the box, so we need the darts coordinates to be between $1$ and $-1$; but other than that, they should be completely random.

Python has a whole bunch of functions for doing random things all ready for us. One of these, `uniform(a,b)`, will give us a randomly generated number between `a` and `b`.

#### 1.1. Exercise
How could we use this to generate the $x$ coordinate of a dart throw? Try it out in the box below:

In [4]:
from random import uniform

# START
uniform(-1,1)
# END

0.9232179826742455

Cool! But that's only one coordinate---what about the $y$ coordinate? Remeber, we need the point $(x,y)$ where the dart lands. This notation is pretty suggestive of how things are done in `Python`: if we want to store a pair of numbers, just write `(NUMBER_1, NUMBER_2)`.

#### 1.2. Exercise
In the code box below, try and generate _both_ coordinates of a random dart throw, and put them in a pair.

In [5]:
# START
(uniform(-1,1), uniform(-1,1))
# END

(0.5587488990834497, -0.48133821806217125)

That's only one throw, though. To really approximate $\pi$, we need _lots_ of throws. Fortunately, `Python` makes it very easy to do something repeatedly: that's what the `for` keyword is for.

Let's suppose we want to model a busy graduate student's schedule across ten days. We could write what's called a _`for` loop_ to do it:

In [6]:
for day in range(0,10):
    print("On day", day, "I worked.")

On day 0 I worked.
On day 1 I worked.
On day 2 I worked.
On day 3 I worked.
On day 4 I worked.
On day 5 I worked.
On day 6 I worked.
On day 7 I worked.
On day 8 I worked.
On day 9 I worked.


Here `Python` simulated (very loosely) what the graduate student did on each of the ten days. If we wanted to simulate fifteen days instead, or only want to know what the graduate student did on days five through eight (inclusive), we could do that too by changing the `range` of our `for` loop:

In [7]:
for day in range(0,15):
    print("On day", day, "I worked.")

print()
    
for day in range(5,9):
    print("On day", day, "I worked.")

On day 0 I worked.
On day 1 I worked.
On day 2 I worked.
On day 3 I worked.
On day 4 I worked.
On day 5 I worked.
On day 6 I worked.
On day 7 I worked.
On day 8 I worked.
On day 9 I worked.
On day 10 I worked.
On day 11 I worked.
On day 12 I worked.
On day 13 I worked.
On day 14 I worked.

On day 5 I worked.
On day 6 I worked.
On day 7 I worked.
On day 8 I worked.


__PROTIP:__ You might notice that our for loops (and even our section numbering) begin with 0 instead of 1. This is because `Python` is what's called a _zero-indexed_ programming language. In general, that means that the _first_ thing in any sort of `Python` object isn't actually 1---it's 0!

Let's say that the graduate student's advisor wants to know what they were up to for the last few days. The graduate student needs to _record_ something showing what they were doing each day. Fortunately, there's an easy pattern to do this too:

In [8]:
records = []
for day in range(0,10):
    records.append("On day " + str(day) + " I worked.")

records

['On day 0 I worked.',
 'On day 1 I worked.',
 'On day 2 I worked.',
 'On day 3 I worked.',
 'On day 4 I worked.',
 'On day 5 I worked.',
 'On day 6 I worked.',
 'On day 7 I worked.',
 'On day 8 I worked.',
 'On day 9 I worked.']

#### 1.3. Exercise
Use the pattern above to store 10 random dart throws.

__PROTIP:__ You should rewrite the code yourself instead of copying and pasting! In general, copying and pasting code is something you want to avoid: when you write code out yourself, you get more familiar with how it works, and copying and pasting will often introduce errors that are annoying and time-consuming to fix. Later on we'll talk about _functions_ and other good ways to avoid repeating yourself, but for now just keep in mind that you should be writing things out yourself.

__HINT:__ If you try to do something like `record.append(a,b)` you might get an error saying `append()` takes exactly one argument. `Python` thinks you're trying to tell it to `append()` two things, and it only knows how to append one. If you combine those two things together into one thing first, it'll work correctly: `append((a,b))`.

In [9]:
# START
records = []
for day in range(0,10):
    records.append((uniform(-1,1), uniform(-1,1)))

records
# END

[(0.8787009527744922, -0.007509817205446412),
 (0.6386659745748045, 0.8281543065918198),
 (-0.8112667931932636, 0.8202574021732982),
 (-0.5265970944601677, 0.4854971178944991),
 (0.12147072125892389, -0.38480913613873113),
 (-0.11569673138082237, -0.016405868937351853),
 (0.6476390927450675, 0.537343556287643),
 (0.5086732706644175, -0.8279976462872063),
 (-0.6653776513205159, -0.22013342402113745),
 (0.299205931826916, -0.4040750650373517)]

We're very nearly there! The only thing we still have to do is figure out if a point is in the circle or not. We've written a function for you called `on_board()`, which will check to see if a dart throw lands on the board or not. Don't worry if it looks a little complicated---there's nothing going on that you won't understand by the end of the day! 

In [10]:
from math import sqrt

def on_board(x, y):
    if (abs(x) > 1 or abs(y) > 1):
        raise Exception(
            'Your throw should lie in the square from (-1, -1) to (1, 1). The'
            ' throw you gave me was: ' + str((x, y))
        )
    if (sqrt(x ** 2 + y ** 2) < 1):
        return True
    else:
        return False

#### 1.4. Exercise

Using `on_board()` and the `for` loop pattern we just learned, simulate 1,000,000 test throws at the dart board, but instead of recording _where_ the dart landed (i.e., it's coordinates), record _whether_ the dart landed on the board.

In [11]:
### START
records = []
for throw in range(0, 1000000):
    records.append((on_board(uniform(-1, 1), uniform(-1,1))))
### END

Now, all that's left to calculate $\pi$ is to find the percentage of times our dart hit the board and multiply by four!

#### 1.5. Exercise
Use your 1,000,000 simulated dart throws to approximate $\pi$.

__HINT:__ If you need to calculate the percentage of times something happens, use the `mean()` function. E.g.,
```python
    >>> mean([1,2,3])
    2
    >>> mean([True, True, True])
    1
    >>> mean([True, False, False])
    0.3333333333333333
```
note that _before you can use `mean()`_ you have to import it from the `statistics` module. That means you need to write the following line at the top of your code cell:
```python
    from statistics import mean
```

In [12]:
from statistics import mean

### START
mean(records) * 4
### END

3.14174

Amazing! Admittedly, you probably know a better approximation of $\pi$ if you just remember that International Pi Day is March 4th, but the technique we just used---Monte Carlo simulation---is an incredibly powerful technique, especially in situations where we don't know the answer ahead of time.

__PROTIP:__ We actually could increase the accuracy of our simulation just by throwing more darts. The difficulty is that the accuracy of our simulation looks like $1/\sqrt{n}$ (do you see why?), so to get an extra digit of accuracy, we have to increase the number of throws by a factor of 100. You can put `%%timeit` at the beginning of a code cell to see how long it takes to run. Go back to Exercise 0.4 and see how long it takes to simulate 1,000,000 throws. About how many throws would we need for three digits of accuracy? Four?

## 2. Getting the hang of numbers and variables

That was an awful lot of stuff we just did. Let's try and unpack everything that happened.

### 2.0. Arithmetic

The answer is that you probably _already_ know how to do lots of things in `Python`! One useful feature of `Python` that I use quite often is that it can be used as a calculator. Let's try typing in a basic numerical formula.

In [13]:
1 + 1 # Try something like: `1 + 1`

2

Cool! Probably not something we really needed an expensive computer to figure out, but good to know in any case. You can make `Python` do all sorts of numerical operations for you.

#### 2.0.0. Exercise

Get the hang of using `Python` as a calculator. See if you can figure out how to do the following: addition, subtraction, multiplication, division, exponentiation, _integer_ division, (i.e., $5 / 3 = \mathbf{1}\ \text{remainder}\ 2$), and _remainder_ (i.e., $5 / 3 = 1\ \text{remainder}\ \mathbf{2}$).

In [14]:
### START
1 + 1
4.2 - 1.8
2 * 3
4 / 9    
3 ** 2
5 // 2
5 % 2
### END

1

In addition to the standard `+`, `-`, `*`, and `/`, there are a couple operations here that may seem unfamiliar:
* `**` is the exponentiation operator, so `2 ** 3` returns `8`.
* `//` is the _integral_ division operator, so `5 // 2` returns `2` (with a remainder of `1`).
* `%` is the _modular_ ("mod") division---or _remainder_---operator, so `5 % 2` returns `1`. Modular division is very useful when programming. For instance, if you want to do something "every other time," you can simply do it every time `i % 2 == 0`. (More on this later.)

### 2.1. Comparison operators

It's also possible to make `Python` tell you whether certain things are true or false, using the _less than_ (`<`), _greater than_ (`>`), and _equals_ (`==`) operators.

In [15]:
1 > 2

False

In [16]:
2 > 1

True

In [17]:
(2 ** 3) ** 3 == 2 ** 9

True

### 2.1.0. Exercise

Try a couple of comparisons of your own. Is $e^{\pi}$ less than, greater than, or equal to $\pi^e$?

In [18]:
### START
2.71828 ** 3.14159 > 3.14159 ** 2.71828
### END

True

**A WORD OF CAUTION:** Note that the equality operator is `==`---i.e., _two_ equals signs---and not `=`---i.e., a single equals sign. The single equals sign, `=`, is for assigning to variables, as we saw in the warmup, and which we'll talk about in a second. This is a very commn source of errors; even expert programmers will inadvertently type `=` when they meant to type `==`.

__PROTIP:__ In addition, you might have noticed that numbers in `Python` come in two types: _integers_, which are just whole numbers; and _floating point numbers_, which can represent decimal numbers, like `1.5`, `3.14159`, or `2019.0`. (You'll notice that we're representing a whole number using floating point in `Python`, the first decimal place is still printed.)

The important difference between integers and floating point numbers is that operations on integers are _exact_, whereas operations on floating point numbers are only _approximate_. For instance, compare the following:

In [19]:
(((((2 ** 10) // 2 ** 5) * 2 ** 5) // 2 ** 5) * 2 ** 5) == 2 ** 10

True

In [20]:
1.2 - 1.0 == 0.2

False

### 2.2. Variables

Let's suppose we're running a grocery store, and we have some inventory:
* __Milk,__ which we bought for \\$0.99 per gallon, and resell for \\$1.99 per gallon.
* __Eggs,__ which we bought for \\$2.59 per dozen, and resell for \\$4.00 per dozen.
* __Bread,__ which we bought for \\$6.31 a loaf, and resell at \\$7.99.
* __Coffee,__ which we bought for \\$0.31 a cup, and resell for \\$2.99 a cup.

Alright, now let's suppose it's been a slow morning: we sold four gallons of milk, 36 eggs, two loaves of bread, and seven cups of coffee. What was our net profit?

This probably seems like an annoying problem, and it would be very annoying to solve given what we know how to do in Python so far. We could try typing something like

In [21]:
(1.99 * 4 + 2.59 * 3 + 7.99 * 2 + 2.99 * 7) - (0.99 * 4 + 4.00 * 3 + 6.31 * 2 + 0.31 * 7)

21.89

Not only was this gross to do, but it's not even right! I accidentally introduced an error into the calculation, and didn't get the profit right.

In long calculations, it can become difficult to keep track of intermediate steps. Luckily, `Python` allows us to store the intermediate results of calculations.

#### 2.2.0. Storing values in variables

A better way to solve this problem is to store intermediate results in variables. This is done using `=`, i.e., the _single_ equals sign. For instance, we might do something like

In [22]:
price_milk_buy    = 0.99
price_milk_sell   = 1.99
price_eggs_buy    = 2.59
price_eggs_sell   = 4.00
price_bread_buy   = 6.31
price_bread_sell  = 7.99
price_coffee_buy  = 0.31
price_coffee_sell = 2.99

revenue = (price_milk_sell * 4 +
           price_eggs_sell * 3 +
           price_bread_sell * 2 +
           price_coffee_sell * 7)

costs   = (price_milk_buy * 4 +
           price_eggs_buy * 3 +
           price_bread_buy * 2 +
           price_coffee_buy * 7)

revenue - costs

30.349999999999994

If this seems familiar, it should! We've been doing this since the warmup! Remember when we wrote `records = []`?

#### 2.2.1. Exercise

Store a secret message in a variable called `message`. Then, show that message to the world by `print()`-ing it.

In [23]:
### START
message = "The crow flies at midnight."
print(message)
### END

The crow flies at midnight.




__PROTIP:__ In `Python`, variables are case sensitive. That means that `dog` and `DOG` possibly represent different things.

In [24]:
dog = "good boy"
DOG = "WOOF"
dog == DOG

False

also, variables have to be stored before they can be used. If we try to bring our neighbor's dog into the equation without first saying what it is, `Python` will throw an error.

In [25]:
dog == DoG

NameError: name 'DoG' is not defined

Some languages will try to infer a "default value," but `Python` will not. However, there's no issue if we store a value first and then use it.

In [26]:
DoG = "neighbor dog"
dog == DoG

False

## 3. Control flow

There are three basic ways of changing your codes behavior depending on the input: `if` statements, `for` statements, and `while` statements. At a high level, `if` statements only do something (surprise, surprise) if something is true: you only take the square root of a number if it's positive or open a file if it actually exists. On the other hand, `for` statements do the same thing for everything in a group: if you want to double every number in a list or extract the phone number of everyone in your address book, use a `for` statement. While they're very important in other languages, `while` statements are very rarely necessary in `Python`. If you're using a `while` statement, chances are you could do the same thing more safely and more cleanly with a `for` statement.

### 3.0. `if` statements
An example should suffice:

In [27]:
x = int(input('Give me a BIG number: '))
if x < 0:
    print('You\'re joking, right?')
elif x < 1e3:
    print('Try harder ... ')
else:
    print('Nice.')

Give me a BIG number:  12123


Nice.


Some notes on the above code:
- the `input()` function (as you've now seen), promts the user for an input
- the `int()` (tries to) convert string values to integers (`raw_input()` will always return the user's input as a string)
- `elif` is short for `else, if`, and there can be none or more than one `elif` sequences
- the `else` clause is optional

One more thing that's implicit but *__extremely__* important: **indents.**

- `Python`, unlike many other languages out there, doesn't use curley brackets {}
- instead, blocks of grouped code are identified by the level of indents (this is something to get used to, if you've never seen it before)
- word of caution: NEVER USE <kbd>Tab</kbd> (although, don't worry too much: `JupyterLab` changes all your <kbd>Tab</kbd>s to four spaces by default, which is the [PEP 8 spec for indentation][pep8] in `python`)

[pep8]: https://www.python.org/dev/peps/pep-0008/#indentation

### 3.0.0. Exercise

Try writing some code with an `if` statement. Prompt the user for a number. If the number is odd, print "This number is odd." If the number is even, print "This number is even."

__HINT:__ Do you remember the modular arithmetic operator, `%`? What is `n % 2` if `n` is odd? Even?

In [28]:
### START
n = int(input("Enter your number: "))
if (n % 2 == 1):
    print("This number is odd.")
else:
    print("This number is even.")
### END

Enter your number:  23


This number is odd.


### 3.1. `for` statements
The `for` statement in `python` iterates over the items of any sequence (e.g., lists and even strings!), in the order that they appear in the sequence.

In [29]:
names = ['Jamie', 'Cersei', 'Jon', 'Sansa']

for name in names:
    print(name, 'has', len(name), 'characters and starts with a', name[0])

Jamie has 5 characters and starts with a J
Cersei has 6 characters and starts with a C
Jon has 3 characters and starts with a J
Sansa has 5 characters and starts with a S


The example above introduces a few new concepts:
- the variable `name` is defined along with the declaration of the `for` statement. It doesn't need to exist beforehand.
- it's good practice to use plurals for collections (`names` for the list) and singulars for individual items (`name` for each name)

#### 3.1.0. Exercise

1. Make a collection called `vegetables` and store the names of few vegetables in it. Then, print out the vegetable and its name.

2. Add `"tomato"`es into `vegetables`. Then, for every `vegetable` in `vegetables`, check if it's _really_ a vegetable. If it's actually a fruit, print `"FRUIT is actually a fruit"`, where `FRUIT` is the name of the fruit in question.

In [30]:
### START
# Part 1
vegetables = ["potato", "cucumber", "faroe bulb", "carrot"]
for vegetable in vegetables:
    print(vegetable, "is a vegetable!")

# Part 2
print()
vegetables.append("tomato")
for vegetable in vegetables:
    if (vegetable == "tomato"):
        print(vegetable, "is actually a fruit!")
    else:
        print(vegetable, "is a vegetable")
### END

potato is a vegetable!
cucumber is a vegetable!
faroe bulb is a vegetable!
carrot is a vegetable!

potato is a vegetable
cucumber is a vegetable
faroe bulb is a vegetable
carrot is a vegetable
tomato is actually a fruit!


You can use the built-in `range()` function to do a more 'classic' `for` loop over a sequence of numbers.

In [31]:
for i in range(10): print(i)

0
1
2
3
4
5
6
7
8
9


`range(len)` generates the legal indices (starting from 0) for a sequence of length `len`. You can also use `range(start, stop[, step])` to specify the start, end, and (optionally) step to take.

(The `[, step]` notation in the fuction signiture shows that the `step` argument is optional. It's useful to know such conventions when refering to the docs.)

In [32]:
for i in range(4,8): print(i)

4
5
6
7


In [33]:
for i in range(4,8,2): print(i)

4
6


In [34]:
for i in range(20,4,-3): print(i)

20
17
14
11
8
5


#### 3.1.1. Exercise

Print out a range that goes from 42 to -3 in steps of size 3.

In [35]:
### START
for i in range(42, -6, -3):
    print(i)
### END

42
39
36
33
30
27
24
21
18
15
12
9
6
3
0
-3


Occasionally, you might want to loop over two or more sequences at a time. You can pair the entries with the `zip()` function.

In [36]:
title = 'Game of Thrones'
houses = ['Lannister', 'Lannister', 'Snow', 'Stark']
for char, house, name in zip(title, houses, names):
    print(char, '-', name, house)

G - Jamie Lannister
a - Cersei Lannister
m - Jon Snow
e - Sansa Stark


Note how `zip()` gracefully fits the iterator to the length of the shortest sequence, i.e., only the first four characters of the string 'Game of Thrones' were iterated.

#### 3.1.2. Exercise

Let's go back to our vegetables. Create two variables, `vegetables` and `scientific_names`, like so:
```python
    >>> vegetables = ["carrots", "broccoli", "kale", "brussel sprouts"]
    >>> scientific_names = ["Daucus carota", "Brassica oleracea", "Brassica oleracea", "Brassica oleracea"]
```
Then, zip together `vegetables` and `scientific_names` and print the scientific name of each of the vegetables.

In [37]:
### START
vegetables = ["carrots", "broccoli", "kale", "brussel sprouts"]
scientific_names = ["Daucus carota", "Brassica oleracea", "Brassica oleracea", "Brassica oleracea"]
for vegetable, name in zip(vegetables, scientific_names):
    print("The scientific name of", vegetable, "is", name)
### END

The scientific name of carrots is Daucus carota
The scientific name of broccoli is Brassica oleracea
The scientific name of kale is Brassica oleracea
The scientific name of brussel sprouts is Brassica oleracea


__PROTIP:__ According to [wikipedia](https://en.wikipedia.org/wiki/Broccoli#Other_cultivar_groups_of_Brassica_oleracea), a whole bunch of vegetables I thought were completely different are actually the same species. Who knew!?

### 3.2. `break` and `continue` statements
You can manage your loops in more detail using `break` and `continue` statements. 

A `break` statement, as the name implies, will break you out of the smallest enclosing loop.

In [38]:
for name, house in zip(names, houses):
    if house == 'Snow':
        break
    else:
        print(name, house)

Jamie Lannister
Cersei Lannister


A `continue` statement will simply skip over to the next item in the iterator, instead of breaking out of the loop.

In [39]:
for name, house in zip(names, houses):
    if house == 'Snow':
        continue  # compare to the previous example where we stopped the loop at Snow, now we simply skip it
    else:
        print(name, house)

Jamie Lannister
Cersei Lannister
Sansa Stark


#### 3.2.0. Exercise

Zip together `vegetables` and `scientific_names` like you did in the last exercise. This time, though, as soon as you start repeating yourself, break out of the loop and `print()` the fact that the vegetables are the same.

In [40]:
### START
for vegetable, name in zip(vegetables, scientific_names):
    print("The scientific name of", vegetable, "is", name)
    if (vegetable == "kale"):
        break
print("Didn't I just say that?")
### END

The scientific name of carrots is Daucus carota
The scientific name of broccoli is Brassica oleracea
The scientific name of kale is Brassica oleracea
Didn't I just say that?


## 4. Strings, lists, and tuples, oh my!

Now that we've got the hang of doing a little bit of programming, let's turn to basic `Python` data types: strings (`str`), lists (`list`), and tuples (`tuple`). These three things are different ways of storing and interfacing with information in `Python`, and they solve lots of problems you'll run into frequently. In fact, we've secretly been using all three of these things all along, as you'll soon see.

### 4.0. Strings

You might have noticed that some of the variables we've been using actually stored text instead of numbers. Text, or "strings," as they're usually called, are one of the things that can be a bit of a hassle in other languages, but are very easy to deal with in `Python`.

As you may have guessed at this point, strings are enclosed in either single quotes(`'...'`) or double quotes (`"..."`). The only difference betwen the two is that you need to escape literal double quotes with `\` if you're using single quotes, and vice versa. Examples:

In [41]:
'This is a string in single quotes, and it works fine'

'This is a string in single quotes, and it works fine'

In [42]:
'This is a string in single quotes, but it's broken!'

SyntaxError: invalid syntax (<ipython-input-42-ff88f75c76eb>, line 1)

In [43]:
'This is a string with single quotes, and now it\'s fixed!'

"This is a string with single quotes, and now it's fixed!"

In [44]:
"So, there's incentive to use double quotes"

"So, there's incentive to use double quotes"

That should make it pretty clear. Note that when escaping a single quote in a string enclosed in single quotes, the interpreter internally changes the enclosing quotes to double quotes. The two are absolutely identical, so you should use whichever set of quotes you prefer to enclose your strings, as long as you're consistent.

Use the `print` function to make output more readable by omitting the enclosing quotes and printing special characters escaped with `\`.

In [45]:
'enclosing double quotes ("") and single quotes(\'\') are the same thing in python'

'enclosing double quotes ("") and single quotes(\'\') are the same thing in python'

In [46]:
print('enclosing double quotes ("") and single quotes(\'\') are the same thing in python')

enclosing double quotes ("") and single quotes('') are the same thing in python


Stings can also have more than one line. You can either use an explicit line-break character 
(`\n`):

In [47]:
print('This string has\ntwo lines!')

This string has
two lines!


... or use triple quotes `'''...'''` or `"""..."""`

In [48]:
print('''
This string has
two lines!
''')


This string has
two lines!



If you look carefully enough, you'll notice that the last string actually has four lines. This is because the triple quotes literally encode all white spaces, including the new lines after the first `'''` and the last "`!`". To avoid this, you can escape the new lines with `\`.

In [49]:
print('''\
This string (really) has
two lines!\
''')

This string (really) has
two lines!


What if you want to write a string that actually contains the `\` character?
You can either:
* escape `\` with `\` (e.g., write **two** `\` characters for one), or
* prepend a single `r` to the quotes to indicate that you are writing a **r**aw string

In [50]:
print("A backslash (\\) is awesome!")
print(r"A backslash (\) is awesome!")

A backslash (\) is awesome!
A backslash (\) is awesome!


Unlike some other stingy languages, the plus operator (`+`) does exactly what you'd expect it to do with strings!

In [51]:
first_name = 'Hans'
last_name = 'Gaebler'
full_name = first_name + ' ' + last_name
print('Hello', full_name)

Hello Hans Gaebler


Even the multiplying operator (`*`) works!

In [52]:
print('Sing, ' + 'la ' * 3)

Sing, la la la 


#### 4.0.0. Exercise

Print a few different kinds of strings. Make sure you do all of the following at least once:
* Use an escape character like `\n`,
* Multiply a string with `*`,
* Add two strings with `+`,
* Make a string with single, double, and triple quotes.

In [53]:
### START
print("The snow fell\n             fell\n                 fell.")
print("ha" * 10 + "HA")
print('It\'s awfull inconvenient to escape single quotes.')
print("""
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
""")
### END

The snow fell
             fell
                 fell.
hahahahahahahahahahaHA
It's awfull inconvenient to escape single quotes.

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.



### 4.1. Lists

Strings are great and all, but they're hardly the most versatile data type in `Python`. In particular, while they're helpful for storing _textual_ data, they aren't very useful for dealing with much else. Luckily, `Python` has lists!

Lists are "array-like." What that means is that lists consist of lots of slots, each slot containing some item. The easiest way to construct lists is by placing the items you want in between square brackets like so: `[item_1, item_2, item_3]`. For instance,

In [54]:
fruits = ["apples", "bananas", "pears", "mangosteens", "strawberries"]

print(fruits)

['apples', 'bananas', 'pears', 'mangosteens', 'strawberries']


Lists can contain more than just strings:

In [55]:
# A list containing numbers
primes = [2, 3, 5, 7, 11, 13]

# A list containing other lists
junk_drawer = [fruits, primes, ["kittens", "kettles", "mittens", "packages"]]

# A list of different things
misc = [3, 4, "five", "six"]

print(primes, junk_drawer, misc)

[2, 3, 5, 7, 11, 13] [['apples', 'bananas', 'pears', 'mangosteens', 'strawberries'], [2, 3, 5, 7, 11, 13], ['kittens', 'kettles', 'mittens', 'packages']] [3, 4, 'five', 'six']


That's all well and good if we want to put things _into_ lists, but how do we take things _out of_ lists? The answer is that lists (and in fact, all built-in `python` [sequence](https://docs.python.org/2/glossary.html#term-sequence) types) can be indexed and sliced.
* __Indexing__ refers to when we get a specific element of a list at a given position. The result is an _element_ of a list, which may or may not be a list itself.

In [56]:
print(fruits[0])      # Returns the fruit at position 0
print(junk_drawer[1]) # Returns the _list_ at position 1

apples
[2, 3, 5, 7, 11, 13]


* __Slicing__ refers to when we take a "slice" of a list between some given indices. The result is _always_ another list.

In [57]:
misc[1:3]

[4, 'five']

Also like strings, lists can be concatenated with the `+` operator:

In [58]:
fruits + ["tomatoes", "grapefruits"]

['apples',
 'bananas',
 'pears',
 'mangosteens',
 'strawberries',
 'tomatoes',
 'grapefruits']

#### 4.1.0. Exercise

Practice with lists by doing the following things:
1. Make a list of three strings.
2. Pull out the second string.
3. Make a list of four lists. (The bottom-level lists can contain whatever you want---strings, numbers, or even more lists!)
4. Slice out everything except the last element of second list.

In [59]:
### START
# Part 1
string_list = ["hi", "there", "everybody"]

# Part 2
print(string_list[1])

# Part 3
list_list = [["hi", "there", "everybody"], [1,1,2,3,5,8], [(0,0), (1,1), (3.4, 2.3), (1e4, 2e3)], []]

# Part 4
print(list_list[1][:-1])
### END

there
[1, 1, 2, 3, 5]


### 4.2. Strings are (almost) like lists

Strings, like many things in `python`, can be indexed (subscripted). The first element (character) has index 0.

In [60]:
job = 'jedi'
job[0]  # character at position 0

'j'

`Python` will yell at you if you go out of range, whether you do so with a string or a list

In [61]:
job[10]

IndexError: string index out of range

In [62]:
fruits[7]

IndexError: list index out of range

But you *can* go backwards with a negative index, -1 being the right-most character.

In [63]:
print(job[-1])    # right-most character
print(fruits[-1])  # right-most fruit

i
strawberries


Like with lists, *slicing* is another useful way to get subsets of your string. 

In [64]:
job[0:2]  # characters from position 0 (included) to 2 (excluded)

'je'

In [65]:
job[2:4]  # characters from position 2 (included) to 3 (excluded)

'di'

Omitting the first slice index will default to zero (the first element)

In [66]:
print(job[1:])    # slice from position 1 (included) to the end
print(fruits[1:])

edi
['bananas', 'pears', 'mangosteens', 'strawberries']


Use slices creatively, to make your life easier!

In [67]:
job[-2:]  # slice the last two characters

'di'

Unlike indexing, slicing is generous to ambitious ranges

In [68]:
job[0:100]

'jedi'

The big difference in `Python` between strings and lists is what's called "mutability." Strings in `Python` are _immutable,_ meaning they can't be changed. In other words, you can't assign a value to a string index.

In [69]:
job[0] = 'T'

TypeError: 'str' object does not support item assignment

Instead, you have to build a new string from the existing string.

In [70]:
job = 'T' + job[1:]
print(job)

Tedi


Lists, however, can be modified at a certain value. This makes them _mutable_. For instance, 

In [71]:
misc[0] = 30  # change indexed item
print(misc)

[30, 4, 'five', 'six']


### 4.3. Exercise

Let's try and put this all together to do something a bit more like a programming problem you're likely to encounter "in the wild." You should spend a few minutes on this and work with the other people at your table. You'll need some of the functions we talk about in the next section ("Convenience functions for lists and strings")---we'll give you some hints, but this is also an opportunity to practice reading how code works, which is a very important skill. (You'll also need a function that we _don't_ talk about, so you'll need to practice looking at the official documentation. Feel free to look back to [Section 0.4](#help) for help with that.)

Okay, here's what we're asking you to do:
1. Declare a string variable named **`s`**, that has the value:
> "Double quotes" and single 'quotes' are equally acceptable in Python.

2. Make `Python` count how many times the letter `t` appears in the string **`s`**.
3. Replace all quotation marks in the string **`s`** with an underbar ('\_').
4. Split the string **`s`** into a list named **`words`**.
5. Count the length of the string **`s`** and the list **`words`**.
6. Join the elements of the list **`words`** into a comma separated string.

__HINT:__ You're going to need a few different functions: `.split()`, `.count()`, `len()`, `.join()`, and `.replace()`. The functions with names that begin with a `.` are called _methods_. Generally speaking, instead of giving a method the string (or list) you want it to act on, like this:
```python
    >>> .split("(414)-123-4567", "-") # Wrong
```
you put the method _after_ the string you want to change, like this:
```python
    >>> "(414)-123-4567".split("-") # Right
    ["(414)", "123", "4567"]
```
Be careful with `join()`! You might find that it works a little differently from what you expect. If you run into trouble, you know where to find help!

__HINT:__  When you get to Part 3, to be precise, you're not *replacing* the quotations, but *reassigning* the variable **`s`** with a copy of the old **`s`** that has underbars replacing quotations; remember that `python` strings are **immutable** (i.e., they are NEVER changed, only reassigned). Note that `.replace()` is not covered below, but you should use `python`'s `replace()` method for strings; now would be a good time to practice reading the docs (https://docs.python.org/3/library/stdtypes.html#str.replace).

In [72]:
### START
# Part 1
s = "\"Double quotes\" and 'single quotes' are equally acceptable in Python."
print(s)

# Part 2
print(s.count("t"))

# Part 3
s = s.replace('"', "_").replace("'", "_")
print(s)

# Part 4
words = s.split(" ")
print(words)

# Part 5
print("The length of `s` is", len(s))
print("The length of `words` is", len(words))

# Part 6
",".join(words)

### END

"Double quotes" and 'single quotes' are equally acceptable in Python.
4
_Double quotes_ and _single quotes_ are equally acceptable in Python.
['_Double', 'quotes_', 'and', '_single', 'quotes_', 'are', 'equally', 'acceptable', 'in', 'Python.']
The length of `s` is 69
The length of `words` is 10


'_Double,quotes_,and,_single,quotes_,are,equally,acceptable,in,Python.'

### 4.4. Utility functions for lists and strings

Strings and lists have a lot in common. In particular, there are certain things that we do with both of them _so often_ that `Python` provides a whole bunch of functions to do the job for us. For instance, it's common to want to know the length of both lists and strings. That's where the `len()` function comes in:

In [73]:
print("The length of `fruits` is", len(fruits))
print("The length of `job` is", len(job))

The length of `fruits` is 5
The length of `job` is 4


There are a _whole bunch_ of these functions. We're only going to go through a few of them here, but you can see all of them in the docs [here](https://docs.python.org/3/library/stdtypes.html#string-methods) for strings and [here](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) for lists.

Don't worry if these don't all sink in right away. The point is just to give you a taste of what's possible. As a general rule, if it's easy to describe what you want to do, and you could do it to any list (or string), there's probably already a built-in function for doing it.

In [74]:
s = 'Hi, my name is Johann. Yes, Johann.'
s.split()  # splits the string into a list (split at spaces by default)

['Hi,', 'my', 'name', 'is', 'Johann.', 'Yes,', 'Johann.']

In [75]:
s.split(',')  # can specify which character to split at

['Hi', ' my name is Johann. Yes', ' Johann.']

In [76]:
s.count('n')  # count the number of non-overlapping occurrences of a substring

5

In [77]:
s.count('Jongbin')

0

In [78]:
s.upper()  # makes everything uppercase

'HI, MY NAME IS JOHANN. YES, JOHANN.'

In [79]:
s.lower()  # makes everything lowercase

'hi, my name is johann. yes, johann.'

In [80]:
s.lower().count('y')  # methods that return a string can be chained

2

In [81]:
':'.join(s.split())

'Hi,:my:name:is:Johann.:Yes,:Johann.'

That last one is a little tricky. So, `str.join(some_sequence)` will take each item of `some_sequence` and stick them together with the value of `str` inbetween, making a single large string. It may seem like a crazy thing to do, but is actually pretty useful whem converting data into comma-separated values. i.e.,

In [82]:
','.join(['some','data','in','a','list'])

'some,data,in,a,list'

You can also use the <kbd>Tab</kbd> character (`\t`) to create tab-separated values:

In [83]:
print('\t'.join(['some','data','in','a','list']))

some	data	in	a	list


In [84]:
fruits.append('tomatoes')  # add an item to the end of the list
print(fruits)

['apples', 'bananas', 'pears', 'mangosteens', 'strawberries', 'tomatoes']


In [85]:
fruits.remove('tomatoes')  # remove the first item in the list that matches the argument
print(fruits)

['apples', 'bananas', 'pears', 'mangosteens', 'strawberries']


In [86]:
fruits.index('bananas')  # return the index of the first item matching the argument

1

In [87]:
fruits.append('bananas') # bring some extras in case we're hungry
fruits.count('bananas')  # return the number of times x appears in the list

2

In [88]:
eat = fruits.pop(0)  # return and remove item at position 0 from list (removes last item if no index is specified)
print(fruits)

['bananas', 'pears', 'mangosteens', 'strawberries', 'bananas']


In [89]:
print(eat)  # the item previously at position 0 ('apples') is now 'popped' into the variable eat

apples


Not exactly a method, but the `in` keyword is useful for checking if a list contains a particular item.

In [90]:
'bananas' in fruits

True

In [91]:
'tomatoes' in fruits # Not anymore!

False

# Exercises

Try these over the break. Work with the people sitting around you.

## Exercise 1.


## Exercise 2.
1. Create a very, very _long_ string that contains all of Charles Dickens's _A Tale of Two Cities_ using the following code. (Don't worry that this doesn't make much sense now---we'll get to what's going on here in the next lesson.) `with open("data/two_cities.txt", "r") as f: ttc = f.read()`
2. Turn `ttc` into a list by splitting on spaces (i.e., the string `" "`). How many words is _A Tale of Two Cities_?
3. How many times does the word "the" occur in _A Tale of Two Cities_? How about "king"? (__HINT:__ Use the `.count()` function. For instance, `[1,2,3,1].count(1) = 2`.)

## Exercise 3.
1. Turn `ttc` from the previous exercise into a dictionary using the following code. (Don't worry if this code doesn't make sense.) `ttc = {word:ttc.count(word) for word in ttc[:1000]}`
- This dictionary is made up of the following `key: value` pairs: the keys are the first 1000 words in _A Tale of Two Cities_, and the values are the number of times the word appears in the whole book.
- This might take a little while, so don't be alarmed if this doesn't happen right away.
2. Check how many times the word "lords" occurs.
3. Reduce the number of times a word you don't like occurs by modifying a `key: value` pair in `ttc`.
4. Add a word that you think _should_ occur but doesn't to `ttc`.