# Python Basics, Part 1

In our first session this morning, we want to get you familiar with how to use `JupyterLab`, along with the basics of `Python` syntax and the data types we use in `Python`: numbers, strings, lists, and dictionaries. If you have used `Python` before, some of this material will be review, but you'll probably see some new stuff too!

## What is Python?

`Python` is a flexible and widely used programming language in data science and software engineering. Unlike many programming languages, one of the central tenets of the design (and writing) of `Python` is that `Python` code should be _easy to read_. Python is frequently used in a wide range of applications, from web development to machine learning.

Why do people like to use `Python`? `Python` is:
* Fast,
* Modular,
* Object-oriented,
* Extensible (i.e., there are _lots_ of libraries),
* Easy to read,
* Most importantly: gets the job done.

`Python` is the name of a programming language. But different people often interface with `Python` in a number of different ways:
1. Running interactive commands with the `python` interpreter. This is done by typing in `python` (or `python3`, if you have `Python 2` and `Python 3` installed on your computer) at the command line.
<img src=img/ss_python_interp.png width=500>
2. Python development in some kind of text editor or integrated development environment (IDE).
<img src=img/ss_python_spyder.png width=500>
3. Research-related scripting with heavy documentation and snippets of code. This is especially common in data wrangling.
<img src=img/ss_python_notebook.png width=500>
    
Today, we'll be focusing on the third approach, namely, using `Python` in `Jupyter` notebooks for research, but it's worth exploring all approaches!

## Installing Python

If you haven't already installed JupyterLab and, the instructions are ![here][BROKENURL].

## Using `Jupyter` Notebooks

`Jupyter` notebooks are an excellent tool for integrating your code, notes and explanations, and results all in one place. `Jupyter` notebooks are reproducible, and allow you to capture not only _what_ you did in your analysis, but also _why_ you did the analysis. We'll be working with them through the `JupyterLab` interface.

### Launching `JupyterLab`

Getting a `Jupyter` notebook up and running is easy. Simply open up your favorite terminal app, and type `jupyter lab`. `Jupyter` will print some information about the server (which you can ignore for now), and then a webpage should open in your browser that looks like the second picture below.

<img src="img/ss_jupyter_launch.png" width=500>

<img src="img/ss_jupyter_main_screen.png" width=500>

(Don't worry if things look slightly different locally---the appearance of your terminal and the `JupyterLab` interface will change slightly from computer to computer.)

### Navigation

Navigating around `JupyterLab` is relatively easy. The screen is broken into three pieces:

1. **The Menu Bar:** Similar to a program like Microsoft Word, many important actions are performed through the menu bar located at the top of the screen. To modify, save, or close files, or navigate through directories, click, e.g., `File > Save`. To alter the appearance of the `JupyterLab` interface, click, e.g., `View > Show line numbers`.
2. **The Side Bar:** The left sidebar is home to a number of commonly used tabs, the names of which you can display by mousing over them. We will primarily work with the topmost tab, which contains the file browser. Other tabs allow you to manage kernel and terminal sessions, issue commands, or control open tabs.
3. **The Main Panel:** This is the main surface for programming and other activities. The main panel is broken up into tabs, which can be dragged, dropped, and resized.

<img src="img/ss_screen_example.png" width=500>

In the picture above, the main panel is overlaid in blue, the sidebar is overlaid in red, and the menu bar is overlaid in green.

To launch a notebook, simply navigate to the folder containing the notebook, and double-click it. Let's open `python_basics_part_1.ipynb` to start the first lesson. You should now be able to see locally the instructions that were displayed on the main screen.

### Markdown and Code Cells

As we mentioned earlier, `Jupyter` notebooks are a valuable tool because they allow you to interweave explanatory prose for humans to read with code for your computer to run. The way this is accomplished is with "cells." Cells come in two types: "text" or "markdown" cells, where your notes and explanations live; and "code" cells, which will contain your `Python` code.

Cells can be created by clicking on the "+" icon located at the top of the current pane.

<img src=img/ss_add_cell.png width=500>

Try adding a cell to this notebook with the "+" icon. To remove it, click on the scissors icon next to it.

#### Markdown Cells

Text cells are formatted using a lightweight markup language called `markdown`. What this means is that you can easily style the text.

* For *italics*, write `*text*` or `_text_`.
* For **bold** text, write `**text**` or `__text__`.
* To add bulleted items, simply add an asterisk to the beginning of a line, e.g., `* text text text`.
* To start a numbered list, begin the line with a number and a period, e.g., `1. text text text`.
* To insert `code` directly into a text block, simply place backticks around the code, e.g., `` `code` ``.

While these basics are enough for our purposes, much, much more is possible in `markdown`: you can add headers, footnotes, and even directly insert HTML. For a more thorough introduction, click [here](https://www.markdownguide.org/getting-started).

#### Code Cells

The bread and butter of `Jupyter` notebooks are code cells, which we'll be working with at length below. Code cells have two parts: an input box, where you write the `Python` code, and an output box.

<img src="img/ss_code_cell.png" width=500>

Here, the input is higlighted in red, and the output is highlighted in blue.

### Saving and switching between notebooks

Let's say you've added a lot of code and text to a notebook---what should you do to "render," or complete the notebook? Click `Run > Run all cells`. This will output nicely formatted text in all of the text boxes, and run all of the code in the code cells. You can also click the play button in the current pane.

To save the notebook output, go to `File > Save`.

### How to get help

Even seasoned software engineers frequently come across functions they don't know how to use or code they don't understand. One of the best places to look for help is at the official `Python` documentation. To bring it up, simply click `Help > Python Reference` in the menu bar. Usually, you can find the function (or "module," `Python`'s special word for libraries) just by searching in the search box!

## Numbers

Alright, so we know how to make `Jupyter` notebooks, how to render them, how to save them, and how to format them. How do we write the code that brings it all together?

The answer is that you probably _already_ know how to do lots of things in `Python`! One useful feature of `Python` that I use quite often is that it can be used as a calculator. Let's try typing in a basic numerical formula.

In [90]:
1 + 1 # Try something like: `1 + 1`

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



Cool! Probably not something we really needed an expensive computer to figure out, but good to know in any case. You can make Python do all sorts of numerical operations for you.

In [2]:
4 * 3

12

In [3]:
22 / 7

3.142857142857143

In [4]:
3 ** 2 + 4 ** 2

25

In [5]:
(1 + 2 + 3) / 4

1.5

In [6]:
(1 + 2 + 3) // 4

1

In [7]:
(1 + 2 + 3) % 4

2

In [8]:
2.02e3 - 1

2019.0

In addition to the standard `+`, `-`, `*`, and `/`, there are a couple operations here that may seem unfamilair.
* `**` is the exponentiation operator, so `2 ** 3` returns `8`.
* `//` is the _integral_ division operator, so `5 // 2` returns `2` (with a remainder of `1`).
* `%` is the _modular_ ("mod") division---or _remainder_---operator, so `5 % 2` returns `1`. Modular division is very useful when programming. For instance, if you want to do something "every other time," you can simply do it every time `i % 2 == 0`. (More on this later.)

It's also possible to make `Python` tell you whether certain things are true or false, using the _less than_ (`<`), _greater than_ (`>`), and _equals_ (`==`) operators.

In [9]:
1 > 2

False

In [10]:
2 > 1

True

In [11]:
(2 ** 3) ** 3 == 2 ** 9

True

(**BEWARE:** Note that the equality operator is `==`---i.e., _two_ equals signs---and not `=`---i.e., a single equals sign. The single equals sign, `=`, is for assigning to variables, which we'll talk about in a second.)

In addition, you might have noticed that numbers in `Python` come in two types: _integers_, which are just whole numbers; and _floating point numbers_, which can represent decimal numbers, like `1.5`, `3.14159`, or `2019.0`. (You'll notice that we're representing a whole number using floating point in `Python`, the first decimal place is still printed.)

The important difference between integers and floating point numbers is that operations on integers are _exact_, whereas operations on floating point numbers are only _approximate_. For instance, compare the following:

In [12]:
(((((2 ** 10) // 2 ** 5) * 2 ** 5) // 2 ** 5) * 2 ** 5) == 2 ** 10

True

In [13]:
1.2 - 1.0 == 0.2

False

## Variables

Let's suppose we're running a grocery store, and we have some inventory:
* __Milk,__ which we bought for \$0.99 per gallon, and resell for \$1.99 per gallon.
* __Eggs,__ which we bought for \$2.59 per dozen, and resell for \$4.00 per dozen.
* __Bread,__ which we bought for \$6.31 a loaf, and resell at \$7.99.
* __Coffee,__ which we bought for \$0.31 a cup, and resell for \$2.99 a cup.

Alright, now let's suppose it's been a slow morning: we sold four gallons of milk, 36 eggs, two loaves of bread, and seven cups of coffee. What was our net profit?

This probably seems like an annoying problem, and it would be very annoying to solve given what we know how to do in Python so far. We could try typing something like

In [14]:
(1.99 * 4 + 2.59 * 3 + 7.99 * 2 + 2.99 * 7) - (0.99 * 4 + 4.00 * 3 + 6.31 * 2 + 0.31 * 7)

21.89

Not only was this gross to do, but it's not even right. I accidentally introduced an error into the calculation, and didn't get the profit right.

In long calculations, it can become difficult to keep track of intermediate steps. Luckily, `Python` allows us to store the intermediate results of calculations.

### Storing values in variables

A better way to solve this problem is to store intermediate results in variables. This is done using `=`, i.e., the _single_ equals sign. For instance, we might do something like

In [15]:
price_milk_buy    = 0.99
price_milk_sell   = 1.99
price_eggs_buy    = 2.59
price_eggs_sell   = 4.00
price_bread_buy   = 6.31
price_bread_sell  = 7.99
price_coffee_buy  = 0.31
price_coffee_sell = 2.99

revenue = (price_milk_sell * 4 +
           price_eggs_sell * 3 +
           price_bread_sell * 2 +
           price_coffee_sell * 7)

costs   = (price_milk_buy * 4 +
           price_eggs_buy * 3 +
           price_bread_buy * 2 +
           price_coffee_buy * 7)

revenue - costs

30.349999999999994

### Displaying variables with `print()`

Let's suppose that we wanted to save our final computation---that is, our net profit---for later use. We might do something like

In [16]:
profit = revenue - costs

This is nice, but you'll notice that the previous code cell doesn't have any output. How do we figure out what our profit actually _was_? This is where the `print()` function comes in handy.

In [17]:
print(profit)

30.349999999999994


The nice thing about `print()` is that it gives us a lot of control over how numbers are displayed. For instance, if all we saw from the last cell were the output---i.e., `30.349999999999994`---it would be pretty hard to figure out what was going on. Luckily, it's fairly easy to augment this bare number:

In [18]:
print("Our profit this morning was", profit)

Our profit this morning was 30.349999999999994


We can do this with many arguments:

In [19]:
name = "Hans"
time = "morning"
print("Hello, my name is", name, "and I'll be going over some Python basics this", time)

Hello, my name is Hans and I'll be going over some Python basics this morning


### Things to watch out for

Here are two things to remember to avoid some variable-related headaches down the road.

#### `Python` variables are case sensitive

In `Python`, variables are case sensitive. That means that `dog` and `DOG` possibly represent different things.

In [20]:
dog = "good boy"
DOG = "WOOF"
dog == DOG

False

#### Variables have to be stored before they can be used

If we try to bring our neighbor's dog into the equation without first saying what it is, `Python` will throw an error.

In [21]:
dog == DoG

NameError: name 'DoG' is not defined

Some languages will try to infer a "default value," but `Python` will not. However, there's no issue if we store a value first and then use it.

In [22]:
Dog = "neighbor dog"
dog == Dog

False

## Strings

You might have noticed that some of the variables we used in the last actually stored text instead of numbers. Text, or "strings," as they're usually called, are one of the things that can be a bit of a hassle in other languages, but are very easy to deal with in `Python`.

Strings are enclosed in either single quotes(`'...'`) or double quotes (`"..."`). The only difference betwen the two is that you need to escape literal double quotes with `\` if you're using single quotes, and vice versa. Examples:

In [23]:
'This is a string in single quotes, and it works fine'

'This is a string in single quotes, and it works fine'

In [24]:
'This is a string in single quotes, but it's broken!'

SyntaxError: invalid syntax (<ipython-input-24-ff88f75c76eb>, line 1)

In [25]:
'This is a string with single quotes, and now it\'s fixed!'

"This is a string with single quotes, and now it's fixed!"

In [26]:
"So, there's incentive to use double quotes"

"So, there's incentive to use double quotes"

That should make it pretty clear. Note that when escaping a single quote in a string enclosed in single quotes, the interpreter internally changes the enclosing quotes to double quotes. The two are absolutely identical, so you should use whichever set of quotes you prefer to enclose your strings, as long as you're consistent.

Use the `print` function to make output more readable by omitting the enclosing quotes and printing special characters escaped with `\`.

In [27]:
'enclosing double quotes ("") and single quotes(\'\') are the same thing in python'

'enclosing double quotes ("") and single quotes(\'\') are the same thing in python'

In [28]:
print('enclosing double quotes ("") and single quotes(\'\') are the same thing in python')

enclosing double quotes ("") and single quotes('') are the same thing in python


Stings can also have more than one line. You can either use an explicit line-break character 
(`\n`):

In [29]:
print('This string has\ntwo lines!')

This string has
two lines!


... or use triple quotes `'''...'''` or `"""..."""`

In [30]:
print('''
This string has
two lines!
''')


This string has
two lines!



If you look carefully enough, you'll notice that the last string actually has four lines. This is because the triple quotes literally encode all white spaces, including the new lines after the first `'''` and the last "`!`". To avoid this, you can escape the new lines with `\`.

In [31]:
print('''\
This string (really) has
two lines!\
''')

This string (really) has
two lines!


What if you want to write a string that actually contains the `\` character?
You can either:
* escape `\` with `\` (e.g., write **two** `\` characters for one), or
* prepend a single `r` to the quotes to indicate that you are writing a **r**aw string

In [32]:
print("A backslash (\\) is awesome!")
print(r"A backslash (\) is awesome!")

A backslash (\) is awesome!
A backslash (\) is awesome!


Unlike some other stingy languages, the plus operator (`+`) does exactly what you'd expect it to do with strings!

In [33]:
first_name = 'Hans'
last_name = 'Gaebler'
full_name = first_name + ' ' + last_name
print('Hello', full_name)

Hello Hans Gaebler


Even the multiplying operator (`*`) works!

In [34]:
print('Sing, ' + 'la ' * 3)

Sing, la la la 


## Lists

Strings are great and all, but they're hardly the most versatile data type in `Python`. In particular, while they're helpful for storing _textual_ data, they aren't very useful for dealing with much else. Luckily, `Python` has lists!

Lists are "array-like." What that means is that lists consist of lots of slots, each slot containing some item. The easiest way to construct lists is by placing the items you want in between square brackets like so: `[item_1, item_2, item_3]`. For instance,

In [35]:
fruits = ["apples", "bananas", "pears", "mangosteens", "strawberries"]

print(fruits)

['apples', 'bananas', 'pears', 'mangosteens', 'strawberries']


Lists can contain more than just strings:

In [36]:
# A list containing numbers
primes = [2, 3, 5, 7, 11, 13]

# A list containing other lists
junk_drawer = [fruits, primes, ["kittens", "kettles", "mittens", "packages"]]

# A list of different things
misc = [3, 4, "five", "six"]

print(primes, junk_drawer, misc)

[2, 3, 5, 7, 11, 13] [['apples', 'bananas', 'pears', 'mangosteens', 'strawberries'], [2, 3, 5, 7, 11, 13], ['kittens', 'kettles', 'mittens', 'packages']] [3, 4, 'five', 'six']


That's all well and good if we want to put things _into_ lists, but how do we take things _out of_ lists? The answer is that lists (and in fact, all built-in `python` [sequence](https://docs.python.org/2/glossary.html#term-sequence) types) can be indexed and sliced.
* __Indexing__ refers to when we get a specific element of a list at a given position. The result is an _element_ of a list, which may or may not be a list itself.

In [37]:
print(fruits[0])      # Returns the fruit at position 0
print(junk_drawer[1]) # Returns the _list_ at position 1

apples
[2, 3, 5, 7, 11, 13]


* __Slicing__ refers to when we take a "slice" of a list between some given indices. The result is _always_ another list.

In [38]:
misc[1:3]

[4, 'five']

Also like strings, lists can be concatenated with the `+` operator:

In [39]:
fruits + ["tomatoes", "grapefruits"]

['apples',
 'bananas',
 'pears',
 'mangosteens',
 'strawberries',
 'tomatoes',
 'grapefruits']

## Strings are (almost) like lists

Strings, like many things in `python`, can be indexed (subscripted). The first element (character) has index 0.

In [40]:
job = 'jedi'
job[0]  # character at position 0

'j'

`Python` will yell at you if you go out of range, whether you do so with a string or a list

In [41]:
job[10]

IndexError: string index out of range

In [42]:
fruits[7]

IndexError: list index out of range

But you *can* go backwards with a negative index, -1 being the right-most character.

In [43]:
print(job[-1])    # right-most character
print(fruits[-1])  # right-most fruit

i
strawberries


Like with lists, *slicing* is another useful way to get subsets of your string. 

In [44]:
job[0:2]  # characters from position 0 (included) to 2 (excluded)

'je'

In [45]:
job[2:4]  # characters from position 2 (included) to 3 (excluded)

'di'

Omitting the first slice index will default to zero (the first element)

In [46]:
print(job[1:])    # slice from position 1 (included) to the end
print(fruits[1:])

edi
['bananas', 'pears', 'mangosteens', 'strawberries']


Use slices creatively, to make your life easier!

In [47]:
job[-2:]  # slice the last two characters

'di'

Unlike indexing, slicing is generous to ambitious ranges

In [48]:
job[0:100]

'jedi'

The big difference in `Python` between strings and lists is what's called "mutability." Strings in `Python` are _immutable,_ meaning they can't be changed. In other words, you can't assign a value to a string index.

In [49]:
job[0] = 'T'

TypeError: 'str' object does not support item assignment

Instead, you have to build a new string from the existing string.

In [50]:
job = 'T' + job[1:]
print(job)

Tedi


Lists, however, can be modified at a certain value. This makes them _mutable_. For instance, 

In [51]:
misc[0] = 30  # change indexed item
print(misc)

[30, 4, 'five', 'six']


### Convenience functions for lists and strings

Strings and lists have a lot in common. In particular, there are certain things that we do with both of them _so often_ that `Python` provides a whole bunch of functions to do the job for us. For instance, it's common to want to know the length of both lists and strings. That's where the `len()` function comes in:

In [52]:
print("The length of `fruits` is", len(fruits))
print("The length of `job` is", len(job))

The length of `fruits` is 5
The length of `job` is 4


There are a _whole bunch_ of these functions. We're only going to go through a few of them here, but you can see all of them in the docs [here](https://docs.python.org/3/library/stdtypes.html#string-methods) for strings and [here](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) for lists.

Don't worry if these don't all sink in right away. The point is just to give you a taste of what's possible. As a general rule, if it's easy to describe what you want to do, and you could do it to any list (or string), there's probably already a built-in function for doing it.

In [53]:
s = 'Hi, my name is Johann. Yes, Johann.'
s.split()  # splits the string into a list (split at spaces by default)

['Hi,', 'my', 'name', 'is', 'Johann.', 'Yes,', 'Johann.']

In [54]:
s.split(',')  # can specify which character to split at

['Hi', ' my name is Johann. Yes', ' Johann.']

In [55]:
s.count('n')  # count the number of non-overlapping occurrences of a substring

5

In [56]:
s.count('Jongbin')

0

In [57]:
s.upper()  # makes everything uppercase

'HI, MY NAME IS JOHANN. YES, JOHANN.'

In [58]:
s.lower()  # makes everything lowercase

'hi, my name is johann. yes, johann.'

In [59]:
s.lower().count('y')  # methods that return a string can be chained

2

In [60]:
':'.join(s.split())

'Hi,:my:name:is:Johann.:Yes,:Johann.'

That last one is a little tricky. So, `str.join(some_sequence)` will take each item of `some_sequence` and stick them together with the value of `str` inbetween, making a single large string. It may seem like a crazy thing to do, but is actually pretty useful whem converting data into comma-separated values. i.e.,

In [61]:
','.join(['some','data','in','a','list'])

'some,data,in,a,list'

You can also use the <kbd>Tab</kbd> character (`\t`) to create tab-separated values:

In [62]:
print('\t'.join(['some','data','in','a','list']))

some	data	in	a	list


In [63]:
fruits.append('tomatoes')  # add an item to the end of the list
print(fruits)

['apples', 'bananas', 'pears', 'mangosteens', 'strawberries', 'tomatoes']


In [64]:
fruits.remove('tomatoes')  # remove the first item in the list that matches the argument
print(fruits)

['apples', 'bananas', 'pears', 'mangosteens', 'strawberries']


In [65]:
fruits.index('bananas')  # return the index of the first item matching the argument

1

In [66]:
fruits.append('bananas') # bring some extras in case we're hungry
fruits.count('bananas')  # return the number of times x appears in the list

2

In [67]:
eat = fruits.pop(0)  # return and remove item at position 0 from list (removes last item if no index is specified)
print(fruits)

['bananas', 'pears', 'mangosteens', 'strawberries', 'bananas']


In [68]:
print(eat)  # the item previously at position 0 ('apples') is now 'popped' into the variable eat

apples


Not exactly a method, but the `in` keyword is useful for checking if a list contains a particular item.

In [69]:
'bananas' in fruits

True

In [70]:
'tomatoes' in fruits # Not anymore!

False

## Dictionaries

The final data structure we'll talk about before taking a break are dictionaries. Dictionaries (or, more properly, `dicts`) work a lot like lists, except instead of looking up things in a dictionary by their _index_, we look them up by their _key_. It's easy to think of a `Python` dictionary as almost the same thing as a Webster's dictionary:
* __The Key:__ The "key" in a `Python` dictionary corresponds to the word you want to look up.
* __The Value:__ The "value" in a `Python` dictionary is whatever is stored at a key, which corresponds to a word's definition in a real dictionary.

(Dictionaries are known as *associative arrays* or *hash tables* in other languages.)

Let's try it. Much like a list, dictionaries are constructed using curly braces, with key value pairs separted by commas, like so: `{key1: value1, key2: value2, key3: value3}`. Let's try it:

In [71]:
gre_study_guide = {
    "bucolic": "adj. pastoral, rustic, countryfied",
    "tendentious": "adj. controversial, one-sided",
    "skulk": "v. to move in a stealthy or furtive manner"
}

Now, suppose the big test is tomorrow, but we can't remember the defintion of "skulk." No problem---we'll just look it up in much the same way that we'd look something up in a list:

In [72]:
gre_study_guide["skulk"]

'v. to move in a stealthy or furtive manner'

Of course, like lists, dictionaries can store more than strings. For instance, you might use a dictionary to store information for something like an address book:

In [73]:
address_book = {
    "simon": {
        "first_name": "Simon",
        "last_name": "Cowell",
        "phone": 447911123456
    },
    "paula": {
        "first_name": "Paula",
        "last_name": "Abdul",
        "phone": 5303228051
    },
    "randy": {
        "first_name": "Randy",
        "last_name": "Jackson",
        "phone": 2122002099
    }
}

Now we can find someone's contact information just by looking up their name.

In [74]:
address_book["randy"]

{'first_name': 'Randy', 'last_name': 'Jackson', 'phone': 2122002099}

What's more, we can even go directly to their phone number by accessing the `"phone"` key in the dictionary returned after looking up `"randy"`!

In [75]:
address_book["randy"]["phone"]

2122002099

Let's get familiar with how we can work with dictionaries.

In [76]:
me = {'name':'Hans', 'email':'jgaeb@stanford.edu'}
print(me)

{'name': 'Hans', 'email': 'jgaeb@stanford.edu'}


In [77]:
me['cell'] = '414-123-4567'
print(me)

{'name': 'Hans', 'email': 'jgaeb@stanford.edu', 'cell': '414-123-4567'}


Or delete existing `key:value` pairs with the `del` statement.

In [78]:
del(me['email'])
print(me)

{'name': 'Hans', 'cell': '414-123-4567'}


The `key` of a dictionary can't be a list (because lists are mutable), but the `value` sure can!

In [79]:
me['siblings'] = ['Carrie', 'Karl']
print(me)

{'name': 'Hans', 'cell': '414-123-4567', 'siblings': ['Carrie', 'Karl']}


Use the `keys()` method of dictionary objects to get a list of the keys used in the dictionary.

In [80]:
me.keys()

dict_keys(['name', 'cell', 'siblings'])

And use the `in` keyword (compatible with all lists) to see if the a certain key exists in the dictionary.

In [81]:
'name' in me.keys()

True

In [82]:
'email' in me.keys()

False

When the keys are simple strings, it is sometimes easier to specify pairs using the `dict` constructor.

In [83]:
me = dict(name='Hans', email='jgaeb@stanford.edu', siblings=['Carrie', 'Karl'])
print(me)

{'name': 'Hans', 'email': 'jgaeb@stanford.edu', 'siblings': ['Carrie', 'Karl']}


Of course, it's worth keeping in mind is that the `key` in a dictionary can be _anything_, as long as it's immutable. So `key`s can be strings, numbers, or... tuples!

## Tuples

The (very last!) data type we'll need to talk about in `Python` is tuples. Fortunately, tuples are much easier than lists or dicts. Think of tuples as lists that are _immutable_: once you've put some stuff in a tuple, you can't change it.

Tuples consist of a number of values separted by commas (not necessarily, but often, enclosed in parentheses).

In [84]:
description = 'male', 'dark hair'
print(description)

('male', 'dark hair')


In [85]:
description[0]  # tuples are also sequences, and can be indexed

'male'

In [86]:
description[1:]  # or sliced

('dark hair',)

In [87]:
description[0] = 'female'  # but NOT changed, because they are immutable

TypeError: 'tuple' object does not support item assignment

While being immutable may seem like a minor difference from lists, the implications are quite big, and tuples are generally used for very different purposes compared to lists. For example, tuples can be used as the `key` for dictionaries (think sparse matrices). 

In [88]:
super_sparse_matrix = {(0, 0):1, (1000, 1000):1}  # a 1000*1000 matrix with only two non-zero elements?
print(super_sparse_matrix)

{(0, 0): 1, (1000, 1000): 1}


In [89]:
word_matrix = {('apples', 'bananas'):1, ('apples', 'pears'):1}  # a matrix indexed by words
print(word_matrix)

{('apples', 'bananas'): 1, ('apples', 'pears'): 1}


There are many more data structures commonly used in `python`, but lists, dictionaries, and tuples pretty much cover the basics (not to mention that these three constitute enough to fully represent the [JSON](http://json.org/) format in `python`, something you might see some of this afternoon when you're working with APIs and scrapers.)

# Exercises

Try these over the break. Work with the people sitting around you.

## Exercise 1.
1. Declare a string variable named **`s`**, that has the value:
> "double quotes" and single 'quotes' are equally acceptable in python

1. Make `Python` count how many times the letter `t` appears in the string **`s`**.
1. Replace all quotation marks in the string **`s`** with an underbar ('\_').
 - to be precise, you're not *replacing* the quotations, but *reassigning* the variable **`s`** with a copy of the old **`s`** that has underbars replacing quotations; remember that `python` strings are **immutable** (i.e., they are NEVER changed, only reassigned).
 - this was not covered above, but you should use `python`'s `replace()` method for strings; now would be a good time to practice reading the docs (https://docs.python.org/3/library/stdtypes.html#str.replace).
1. Split the string **`s`** into a list named **`words`**.
1. Count the length of the string **`s`** and the list **`words`**.
1. Join the elements of the list **`words`** into a comma separated string.

## Exercise 2.
1. Create a very, very _long_ string that contains all of Charles Dickens's _A Tale of Two Cities_ using the following code. (Don't worry that this doesn't make much sense now---we'll get to what's going on here in the next lesson.) `with open("data/two_cities.txt", "r") as f: ttc = f.read()`
2. Turn `ttc` into a list by splitting on spaces (i.e., the string `" "`). How many words is _A Tale of Two Cities_?
3. How many times does the word "the" occur in _A Tale of Two Cities_? How about "king"? (__HINT:__ Use the `.count()` function. For instance, `[1,2,3,1].count(1) = 2`.)

## Exercise 3.
1. Turn `ttc` from the previous exercise into a dictionary using the following code. (Don't worry if this code doesn't make sense.) `ttc = {word:ttc.count(word) for word in ttc[:1000]}`
- This dictionary is made up of the following `key: value` pairs: the keys are the first 1000 words in _A Tale of Two Cities_, and the values are the number of times the word appears in the whole book.
- This might take a little while, so don't be alarmed if this doesn't happen right away.
2. Check how many times the word "lords" occurs.
3. Reduce the number of times a word you don't like occurs by modifying a `key: value` pair in `ttc`.
4. Add a word that you think _should_ occur but doesn't to `ttc`.