<a href="https://colab.research.google.com/github/gawron/python-for-social-science/blob/master/intro/python_types_nb_lecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Prerequisites**

1.  You know what a type is in the context of a computer language.
2.  You know what a Python container is.
3.  You know what the Python builtin container types are and have some familiarity with their differences:
    a.  List
    b.  Tuple
    c.  String
    d.  Dictionary
    e.  Set
    
One way you might have picked up these odd bits of knowledge is to have read [Chapter 3 of the online text.](https://gawron.sdsu.edu/python_for_ss/course_core/book_draft/Python_introduction/Python_introduction.html)
It is also probably of use in focusing the ideas of particular importance in this notebook to have had a look at
[the Python Types Slides I](https://gawron.sdsu.edu/python_for_ss/course_core/lectures/python_types_slides_one.pdf) and
[the Python Types Slides Ii](https://gawron.sdsu.edu/python_for_ss/course_core/lectures/python_types_slides_two.pdf).

In following along with this lecture, it is best to have [the notebook we are using](https://colab.research.google.com/github/gawron/python-for-social-science/blob/master/intro/programming_nb.ipynb) open in a separate tab.

This notebook/lecture is particularly "Socratic" in its teaching style.  The notebook supplies you with small bits of information or data and then asks to to learn by answering questions that take you beyond what youy've been given.  The answers to the questions all require you to supply a snippet of code.   Feel free to pause the lecture at any time, so you can switch over to the notebook tab and try out an answer of your own.  If you do this, you should do your best to **verify your answers answers yourself** by evaluating the code cell you've written your answer in.  More hints on this as we go.

In any case, the lecture video will take you through the set up for each question, then tell you what a good time to try out your answer is.  It will then provide an answer.  That is why the title of the notebook at the top of the page is **python_types_nb-Copy1**.

<font size=7> <b> Python types </b></font>

## Tuples

To start using this notebook, recall the directions you were given in the `running_python` video about how to use a notebook.  Execute the next cell.  That is, in Jupyter, position your cursor in the next cell and hit the `[Enter]` key while holding down the shift key, which we will henceforth write as `[Shift]`-`[Enter]`.  Nothing visible will happen.  But you will have defined the variable T to be a tuple with four members.  In Google Colab, just hit the small triangle (or `Play` button) to the left of the cell.

In [1]:
#Variable assignment. T is now a tuple.
T = tuple('abcd')
T

('a', 'b', 'c', 'd')

The next cell retrieves the third element of `T` by using the positional index 2 (0 is first position):

In [2]:
T[2]

'c'

The next cell retrieves the third from the last element of T by using the positional index -3 (-1 is last position):

In [3]:
T[-3]

'b'

Consider a new tuple, defined from a string using the type name `tuple` as a creator
or factory function.

In [4]:
S = 'abcdefg'
A = tuple(S)
print('A', A)
B = (S[0:3],S[1:4],S[2:5],S[3:6],S[4:7])
print('B', B)

A ('a', 'b', 'c', 'd', 'e', 'f', 'g')
B ('abc', 'bcd', 'cde', 'def', 'efg')


Recall how **splices** work.

In [5]:
# elements from position 1 up to but not including position 4
# elements at positions 1,2, and 3
A[1:4]

('b', 'c', 'd')

In [5]:
A

('a', 'b', 'c', 'd', 'e', 'f', 'g')

#### Retrieve `('c','d')` from `A`

In [12]:
A

('a', 'b', 'c', 'd', 'e', 'f', 'g')

In [None]:
A[2:4]

('c', 'd')

#### Retrieve `'cde'` from `B`

In [13]:
B

('abc', 'bcd', 'cde', 'def', 'efg')

This is not a splice because `'cde'` is an element of `B` not a subsequence.

In [None]:
B[2]

'cde'

In [15]:
B

('abc', 'bcd', 'cde', 'def', 'efg')

In [19]:
B

('abc', 'bcd', 'cde', 'def', 'efg')

#### Retrieve `('cde', 'def')`  from B

In [7]:
B

('abc', 'bcd', 'cde', 'def', 'efg')

In [None]:
B[2:4]

('cde', 'def')

The next cell repeats the original definition of `T` from
above and assigns the **string** `"x"` to be the value at the second position.

The raises an **Exception**. To see what the `Exception` is, position your cursor in the cell and type `[Shift]`-`[Enter]`. Explain what the error is in the cell 2 cells down (labeled `Enter your answer here`).

In [8]:
T = tuple('abcd')
##  Attempted assignment
T[1] = 'x'

[Enter your answer here]

In the next cell your task is to illustrate that you understand the source of the problem
with the **assignment statement** `T[1] = 'x'`.

To do that, use the cell below to define `T` differently so that the assignment is not an error.

Here are some guidelines:

1.  The new `T` must still be a sequence  (or `T[1]` won't work).
2.  Try to have the new `T` contain the same elements as the old `T`.

In [None]:
#T =  tuple('abcd') # Assignment raises Exception
T =  list(T)              # Assignment in next line is valid
T[1] = 'x'
T

Takeaways:

1.  Tuples are sequences.  data is accessed by **positional indexing**, with the first element at index 0.  This is called 0-based indexing.
2.  A splice from any sequence is always a sequence of the same type.
3.  The length of a splice `start:stop` is always  `stop - start`.
4.  Tuples can't be updated.  That is, they don't support **assignment**.  You can't change the value of an element at a particular position.
5.  Lists do support assignment.  To create a sequence you can update, just turn a tuple into a list.


# Lists (& Containers as compound data)

In [11]:
T = tuple('abcde')
print(T)
L = list('abxde')
print(L)
# Defines same B as above
S = 'abcdefg'
B = (S[0:3],S[1:4],S[2:5],S[3:6],S[4:7])

('a', 'b', 'c', 'd', 'e')
['a', 'b', 'x', 'd', 'e']


In [22]:
L

['a', 'b', 'x', 'd', 'e']

In [None]:
L[1]

'b'

In [None]:
L[-2]

'd'

Tuple versus list syntax and print names

In [None]:
print(T)
print(L)

('a', 'b', 'c', 'd', 'e')
['a', 'b', 'x', 'd', 'e']


In [None]:
print(L)
'x' in L

['a', 'b', 'x', 'd', 'e']


True

In [9]:
'x' in 'abcxde'

True

In [12]:
tuple(L)

('a', 'b', 'x', 'd', 'e')

In [14]:
'x' in tuple(L)

True

Wnat will the value be  when I execute the following cell?

In [None]:
x in tuple(L)

In [15]:
print(L)

['a', 'b', 'x', 'd', 'e']


In [None]:
'c' in L

False

The objects that are `in` a container are its **elements**.

In [31]:
B

('abc', 'bcd', 'cde', 'def', 'efg')

In [32]:
print('B', B)
print('"b" in B: ', 'b' in B)
print('"bcd" in B: ', 'bcd' in B)

B ('abc', 'bcd', 'cde', 'def', 'efg')
"b" in B:  False
"bcd" in B:  True


The length of a container is the count of its elements, no matter what size the elements are.

In [33]:
print("B",B)
print("L", L)
print("len(B)", len(B))
print("len(L)", len(L))

B ('abc', 'bcd', 'cde', 'def', 'efg')
L ['a', 'b', 'x', 'd', 'e']
len(B) 5
len(L) 5


For `L`, which is a list, assignments of new values to positions are possible:

In [16]:
L

['a', 'b', 'x', 'd', 'e']

In [17]:
L[1]  = ('1','2')

In [None]:
L

Note that we can assign to slice locations just as we assign to index
locations.   These slice assignments may change the langth of the list.

In [27]:
L = list('abcde')
print(L)
#L[1:3]  = ['w','f','h']
#print(L)

['a', 'b', 'c', 'd', 'e']


In the next cell,  use a slice assignment to replace the last two elements of
`L` with a single element `"zr"`.  That is, starting with

```
['a', 'w', 'f', 'h', 'd', 'e']
```

the result is:

```
['a', 'w', 'f', 'h', 'zr']
```


In [46]:
print(L)
# Your splice assignment goes here

print(L)

['a', 'w', 'f', 'h', 'd', 'e']
['a', 'w', 'f', 'h', 'd', 'e']


In [1]:
#Is this right?
#L[-1:] = "zr"

In [2]:
#How about this?
#L[-1:] = ["zr"]

In [34]:
# Or this?
#L[-2] = ["zr"]

In [None]:
print(L)
L[-2:] = ["zr"]

print(L)

['a', 'w', 'f', 'h', 'd', 'e']
['a', 'w', 'f', 'h', 'zr']


### Calling `L` the assignee and `"zr"` what is assigned, the
rule in slice assignment is:  The elements of what is assigned become
elements of the assignee.  So for 

```
L[-1:] = "zr",
```

the two elements of the string `"zr"`
become elements of `L`.  In order to assign the single element `"zr"`
to a slice, we need a container with  `"zr"` as its single element,
for example  `["zr"]`.

We introduce a new list, which is a list of **tuples**.  We emphasize that this is a **compound container**, a container
which contains other containers inside of it.

In [35]:
Mothers = [('George Clooney', 'Rozemary Clooney'),
           ('Jimmy Carter',  'Lillian Carter'),
           ('George H. W. Bush','Dorothy Walker Bush'),
           ('George W. Bush', 'Barbara Bush'),
           ('Liza Minnelli', 'Judy Garland')]

In [36]:
# 10 strings inside Mothers but the length of Mothers is not 10
len(Mothers)

5

In [37]:
len(Mothers[0])

2

In [38]:
Mothers[0]

('George Clooney', 'Rozemary Clooney')

In [39]:
Mothers

[('George Clooney', 'Rozemary Clooney'),
 ('Jimmy Carter', 'Lillian Carter'),
 ('George H. W. Bush', 'Dorothy Walker Bush'),
 ('George W. Bush', 'Barbara Bush'),
 ('Liza Minnelli', 'Judy Garland')]

**Retrieving data from compound data structures**

In the next cell, write a single expression that retrieves the value `"Judy Garland"` from `Mothers`.  Test your answer by typing `[Shift]`-`[Enter]` to see what you get.  If you get `NameError`, it's because you didn't first execute the expression in the cell above to define the variable `Mothers`.  

In [None]:
#Answer should be of the form Mothers[...

In [40]:
# This is a single expression but not the right one.
Mothers[-1]

('Liza Minnelli', 'Judy Garland')

In [41]:
# Right value retrieved but wrong because it's not a single expression
x = Mothers[-1]
x[-1]

'Judy Garland'

**Answer**

In [42]:
Mothers[-1][-1]

'Judy Garland'

In [43]:

Mothers[4][1]

'Judy Garland'

**Retrieving data from compound data structures: Going deeper**

In the next cell, type a single expression that retrieves the value `z` from `Mothers` and sets  a variable to that value.  This answer will be of the  form `Variable = Mothers[...`.  

Note:  There are mutliple valid answers

In [44]:
Mothers

[('George Clooney', 'Rozemary Clooney'),
 ('Jimmy Carter', 'Lillian Carter'),
 ('George H. W. Bush', 'Dorothy Walker Bush'),
 ('George W. Bush', 'Barbara Bush'),
 ('Liza Minnelli', 'Judy Garland')]

In [45]:
Mothers[-1][0][2]

'z'

Also

In [None]:
print(Mothers[4][0][2])
print(Mothers[4][0][-11])


z
z


And so on

Takeaways

1.  Lists are sequences just like tuples.  Indexing by position works and it works just the same way.
2.  All containers `C` have lengths and elements; that is, `len(C)` and the `x in C`  Boolean test always works.
3.  Containers are always compound. They are data structures composed of some other smaller, usually differently typed, data.
4.  Indexing container `C` by position retrieves an element of `C`; if that element is itself a container, you can index again.  That is, expressions of the form `C[idx1][idx2]`  will work.

###  Assignments to compound data structures

Challenge:  Update `Mothers` from

```
[('George Clooney', 'Rosemary Clooney'),
 ('Jimmy Carter', 'Lillian Carter'),
 ('George H. W. Bush', 'Dorothy Walker Bush'),
 ('George W. Bush', 'Barbara Bush'),
 ('Liza Minnelli', 'Judy Garland')]
```

to

```
[('George Clooney', 'Rosemary Clooney'),
 ('Jimmy Carter', 'Lillian Carter'),
 ('George H. W. Bush', 'Dorothy Walker Bush'),
 ('George W. Bush', 'Barbara Bush'),
 ('Hazel Moder', 'Julia Roberts')]
```


In [34]:
Mothers[-1]  = ('Hazel Moder', 'Julia Roberts')
Mothers

[('George Clooney', 'Rosemary Clooney'),
 ('Jimmy Carter', 'Lillian Carter'),
 ('George H. W. Bush', 'Dorothy Walker Bush'),
 ('George W. Bush', 'Barbara Bush'),
 ('Hazel Moder', 'Julia Roberts')]

Challenge: Update Mothers from

```
[('George Clooney', 'Rozemary Clooney'),
 ('Jimmy Carter', 'Lillian Carter'),
 ('George H. W. Bush', 'Dorothy Walker Bush'),
 ('George W. Bush', 'Barbara Bush'),
 ('Hazel Moder', 'Julia Roberts')]
``
to

[('George Clooney', 'Rosemary Clooney'),
 ('Jimmy Carter', 'Lillian Carter'),
 ('George H. W. Bush', 'Dorothy Walker Bush'),
 ('George W. Bush', 'Barbara Bush'),
 ('Hazel Moder', 'Julia Roberts')]


In [46]:
Mothers

[('George Clooney', 'Rozemary Clooney'),
 ('Jimmy Carter', 'Lillian Carter'),
 ('George H. W. Bush', 'Dorothy Walker Bush'),
 ('George W. Bush', 'Barbara Bush'),
 ('Liza Minnelli', 'Judy Garland')]

In [51]:
Mothers[0][1][2]# = 'Rosemary Clooney'

'z'

In [52]:
Mothers[0][1]# = 'Rosemary Clooney'

'Rozemary Clooney'

In [53]:
Mothers[0] = ('George Clooney', 'Rosemary Clooney')

In [54]:
Mothers

[('George Clooney', 'Rosemary Clooney'),
 ('Jimmy Carter', 'Lillian Carter'),
 ('George H. W. Bush', 'Dorothy Walker Bush'),
 ('George W. Bush', 'Barbara Bush'),
 ('Liza Minnelli', 'Judy Garland')]

# Strings

The cell below defines a string.  Execute that definition. (This is another way of saying, "Place your cursor in the cell and type `[Shift]`-`[Enter]`)".  Notice that when you evaluate this cell, Python doesn't seem to do anything.  Nevertheless it has changed its state to reflect the fact that the variable `Example_str` has been defined to denote a particular string.

In [57]:
Example_str = "program"

Strings are sequence containers just as lists and tuples are.  Because they are containers, `len` and `in` and
iteration work.  Because they sequences (they are ordered), indexing by position works.

In [67]:
len(Example_str)

7

In [68]:
'r' in Example_str

True

We now demonstrate looping through a string -- or **iteration** -- processing
each element in the expected order.

In [70]:
for char in Example_str:
    print(char)

p
r
o
g
r
a
m


Only **iterables** can be looped through:

In [69]:
for x in 1:
    print(x)

TypeError: 'int' object is not iterable

In [58]:
Example_str

'program'

In the next cell, write and execute an expression that retrieves the value `"r"` from `Example_str`.  There are two *r*'s in the string, so there are at least two possible answers.

In [59]:
Example_str[1]

'r'

The string `program` has the nice property that at least one of its substrings is an English word.  In the cell below write a single Python expression that retrieves the word 'pro' from `Example_str`. You might need to review the textbook material on [string slices](http://www-rohan.sdsu.edu/~gawron/python_for_ss/course_core/book_draft/Python_introduction/strings.html).  Evaluate it with `[Shift]`-`[Enter]` to check your answer.  

Help with splice indices.  Think of the indices as pointing
to positions between the elements, with the first index (`0`)
pointing to the very beginning of the string. It follows that the length of the `[start:stop]`
is `stop - start`.

So in the string `"Python"` the splice `1:4`
has length 3 and covers the span `'yth'` as in the following picture:

```
 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1   
```

In [None]:
P = 'Python'
print(P[1])
print(P[1:4])
print(P[-6])
print(P[-5:-2])

y
yth
P
yth


In the  cell below, add a slice
that retrieves the English word 'ram'
from `Example_str`.

In [None]:
Example_str = "program"


# Dictionaries

We begin discussing dictionaries with an observation about our last example, `Mothers`, a list of pairs.  Suppose what we want to do is to look up the mother of one of the celebrities.

In [74]:
Mothers = [('George Clooney', 'Rosemary Clooney'), ('Jimmy Carter', 'Lillian Carter'),
           ('George H. W. Bush', 'Dorothy Walker Bush'), ('George W. Bush', 'Barbara Bush'),
           ('Liza Minnelli', 'Judy Garland')]

In [None]:
# Look up George Bush's mother
Mothers[1]

This is fine if you know that the information about George W. is in 4th position.  Then
to get an expression that returns George W.'s mother, you could do:

In [None]:
Mothers[3][1]

But what if you knew that `Mothers` contains information about celebrity mothers
but you didn't know that the information about George W. was in 4th position?

For example, suppose the information is being updated all the time and nothing
keeps track of how many updates have occurred.  The information you want
would still be in the string, but you would have to **search** for it.

By which I mean something like this:

In [63]:
for (c,m) in Mothers:
    if c == 'George W. Bush':
        # print the MOM we're looking for
        print(m)
        # Stop this loop.   We're done!
        break
    else:
        # print the pairs we're NOT looking for
        print("Not the pair we want",c,m)

Not the pair we want George Clooney Rosemary Clooney
Not the pair we want Jimmy Carter Lillian Carter
Not the pair we want George H. W. Bush Dorothy Walker Bush
Barbara Bush


Loop through the string until you find the tuple beginning with George W.

At that point stop the loop.  The value of `m` will be the mother you you want.

In [61]:
m

'Barbara Bush'

This is both inefficient and a nuisance.  As the printout shows,
we have to examine a number of irrelevant pairs before we find the
one of interest.  What we want is something where we plug in the child and it outputs the mother.

And the next cell shows us what does that:  a **Dictionary** made out of the list of pairs.

In [66]:
print(Mothers)
ddm = dict(Mothers)
ddm

[('George Clooney', 'Rosemary Clooney'), ('Jimmy Carter', 'Lillian Carter'), ('George H. W. Bush', 'Dorothy Walker Bush'), ('George W. Bush', 'Barbara Bush'), ('Liza Minnelli', 'Judy Garland')]


{'George Clooney': 'Rosemary Clooney',
 'Jimmy Carter': 'Lillian Carter',
 'George H. W. Bush': 'Dorothy Walker Bush',
 'George W. Bush': 'Barbara Bush',
 'Liza Minnelli': 'Judy Garland'}

This transforms a list of pairs with the first member of each pair as a key and the second member as its value.  So `ddm` is now a dictionary with individuals as keys, and their mothers as values.

In [67]:
ddm

{'George Clooney': 'Rosemary Clooney',
 'Jimmy Carter': 'Lillian Carter',
 'George H. W. Bush': 'Dorothy Walker Bush',
 'George W. Bush': 'Barbara Bush',
 'Liza Minnelli': 'Judy Garland'}

With dictionaries we move from **positional indexing** (each index is an integer specifying the position
of an item), to **keyword indexing** (each index is an keyword defining the **key** for a **value**).

In [None]:
ddm['George W. Bush']

'Barbara Bush'

**Under the hood**:  the computational difference between a list of pairs and a dictionary
is that a dictionary provides **constant time lookup**.  The length of computing time
it takes find the value corresponding to a particular key won't change as the dictionary
grows larger,  On the other hand,  with the list, lookup time will grow with list length.

This is because information in the dictionary is **hashed** by the key: The key
string provides easily computed information about exactly where in memory
to look to find the associated value.  Without going into the computational details of
how hashing works, you can think of this as working like Webster's Dictionary.  The first
letter of "abacus" tells us to too look in the "a" section of the dictionary; the second
letter tells us to look early in the "a" section, and so on.

Often if we want **functional information** for entities we will store it in a dictionary.

A function on a set is an assignment of unique values to each of the entities in a set.
The **mother** relation is a function on people because people have unique mothers,
so it would be quite natural to store information aboiut people's mothers
in a dictionary.  Other typically functional relations on people
are father, address, social security number, congressional district, and place of birth.

Non-functional relations are sibling, cousin, child, and friend.  A person
can often have more than one sibling, cousin, child, or friend.

Place of employment used to be a fairly natural example of a functional
relation, but it is probably less of a natural example in our current *gig* economy,
where people often work multiple jobs.

In the next cell is another Python expression.  Before evaluating the cell, examine it carefully and try to predict what the result of evaluating it will be.  Then in the cell below that, explain the result that you got.

In [None]:
Mothers['George W. Bush']

In [64]:
Mothers

[('George Clooney', 'Rosemary Clooney'),
 ('Jimmy Carter', 'Lillian Carter'),
 ('George H. W. Bush', 'Dorothy Walker Bush'),
 ('George W. Bush', 'Barbara Bush'),
 ('Liza Minnelli', 'Judy Garland')]

In [None]:
ddm['George Clooney']

In [68]:
ddm

{'George Clooney': 'Rosemary Clooney',
 'Jimmy Carter': 'Lillian Carter',
 'George H. W. Bush': 'Dorothy Walker Bush',
 'George W. Bush': 'Barbara Bush',
 'Liza Minnelli': 'Judy Garland'}

Same question about the next expresssion:

In [None]:
ddm[1]

So now you know how to transform a list of pairs into a dictionary, so that the first member of each pair becomes a key for the dictionary and the second member becomes its value.

There are two other ways to define dictionaries which are both worth knowing.  We start with the easiest to type.

In [75]:
dd = dict(name = 'Guido van Rossum',
          job_title = 'Benevolent Dictator for Life',
          native_language = "Dutch",
          favorite_non_native_language = 'English',
          favorite_computer_language = "Perl")

This is called keyword notation.  It uses the same syntax that can be used to call functions in
Python.  Each argument of a function has a name; to provide a value for an argument when
calling the function, you just type `name=<value>`.  

In fact in the cell above, we are just calling the `dict` function; the builtin
Python type `dict` can be called as a function; as a function, it takes any number of arguments
with any allowable keywords and uses them to create a dictionary.

### A digression on keyword notation

In [71]:
def minus(fred, barney):
    return fred - barney

minus(2,3)

-1

In [72]:
minus(barney=2, fred=3)

1

In [73]:
minus(alice=2,betty=3)

TypeError: minus() got an unexpected keyword argument 'alice'

### Back to our dictionary

In [4]:
dd

{'name': 'Guido van Rossum',
 'job_title': 'Benevolent Dictator for Life',
 'native_language': 'Dutch',
 'favorite_non_native_language': 'English',
 'favorite_computer_language': 'Perl'}

In [None]:
dd['favorite_computer_language']

'Perl'

In the next cell down, write a single expression to retrieve the value `"Benevolent Dictator for Life"` from `dd`.

In [None]:
dd['job_title']

'Benevolent Dictator for Life'

What happens when we try to look up a key that isn't in the dictionary?

In [None]:
dd['favorite_pizza_topping']

KeyError: 'favorite_pizza_topping'

To forestall `KeyError`s, you can discover which keys are **in** the dictionary by using
the `in` test (Dictionaries are containers).

In [None]:
print('favorite_computer_language'in dd)
print('favorite_pizza_topping' in dd)

True
False


In the next cell down write an expression that *adds* information to `dd`.  Specifically add the *key* `favorite_tv_show` and set its value to be `"Monty Python's Flying Circus"`.  Note that strings can have spaces in them and still be valid strings.

You should not be satisfied that you have found the correct answer until you have done the following:

1.  Evaluated your answer in a code cell to make sure it is syntactically correct (does not raise a `SyntaxError`).
2.  Evaluated `dd` **after** the update to check by inspection that the new information you wanted
    has been added to `dd`.
3.  Tried to retrieve the information you added in the usual way, by asking the dictionary for the
    value associated with the new key:
     
    ```
    dd['favorite_tv_show']
    "Monty Python's Flying Circus"
    ```

Note: There are several possible answers to this question.  One is quite simple, and uses the syntax that is parallel to what we did for mutable sequences (for example, lists).

For list L we did something like

```
L[1] = 'abracadabra'

```

which changed the  second element of `L` to be the string `'abracadabra'`.  For a dictionary `dd` we can do:

```
dd[key] = value
```

To make `dd`'s value for the key be `value`.

In [None]:
dd["favorite_tv_show"] = "Monty Python's Flying Circus"

A second idea is to use the dictionary's `update` method, which allows a very simple syntax:

```
dd.update(key = value)
```

In [9]:

dd.update(favorite_tv_show =  "Monty Python's Flying Circus")
dd

{'name': 'Guido van Rossum',
 'job_title': 'Benevolent Dictator for Life',
 'native_language': 'Dutch',
 'favorite_non_native_language': 'English',
 'favorite_computer_language': 'Perl',
 'favorite_tv_show': "Monty Python's Flying Circus"}

Under the hood, the key-value pair that is the argument
of `update` being turned into a dictionary.  Alternatively,
you can just directly update with a dictionary

In [81]:
new_data = {'favorite_tv_show':  "Monty Python's Flying Circus"}
new_data

{'favorite_tv_show': "Monty Python's Flying Circus"}

In [77]:
dd["favorite_tv_show"]

KeyError: 'favorite_tv_show'

In [79]:
dd.update(upd)

In [8]:
dd

{'name': 'Guido van Rossum',
 'job_title': 'Benevolent Dictator for Life',
 'native_language': 'Dutch',
 'favorite_non_native_language': 'English',
 'favorite_computer_language': 'Perl',
 'favorite_tv_show': "Monty Python's Flying Circus"}

In [80]:
dd["favorite_tv_show"]

"Monty Python's Flying Circus"

In the next cell correct an error in our data.  Guido van Rossum's favorite computer
language is not Perl.  It is Python.  

**Write an assignment that corrects the error.**

The update

In [None]:
dd.update(favorite_computer_language =  "Python")

Inspecting the modified doctionary:

In [None]:
dd

{'name': 'Guido van Rossum',
 'job_title': 'Benevolent Dictator for Life',
 'native_language': 'Dutch',
 'favorite_non_native_language': 'English',
 'favorite_computer_language': 'Python'}

Checking that retrieval works as expected (the old information is gone!)

In [None]:
dd['favorite_computer_language']

'Python'

If you've done everything according to specs, here is what `dd` should look like
when you evaluate it in a code cell:

```
{'name': 'Guido van Rossum',
 'job_title': 'Benevolent Dictator for Life',
 'native_language': 'Dutch',
 'favorite_non_native_language': 'English',
 'favorite_computer_language': 'Python',
 'favorite_tv_show': "Monty Python's Flying Circus",
 }
 ```

In [None]:
dd

{'name': 'Guido van Rossum',
 'job_title': 'Benevolent Dictator for Life',
 'native_language': 'Dutch',
 'favorite_non_native_language': 'English',
 'favorite_computer_language': 'Perl'}

In the next cell we define a new dictionary using **curly-braces** syntax.  

In [None]:
mystery_sentence_letter_counts = {' ': 9, 'e': 5, 'o': 5, 'l': 3, 'd': 2,
                                  'h': 2, 'r': 2, 'u': 2, 'w': 2, 'y': 2, '.': 1,
                                  'a': 1, 'c': 1, 'b': 1, 'g': 1, 'f': 1,
                                  'i': 1, 'k': 1, 'j': 1, 'm': 1, 'n': 1, 'q': 1,
                                  'p': 1, 't': 2, 'v': 1, 'x': 1, 'z': 1}


Note the differences with what we did above with `dict`:

```
dd = dict(name = 'Guido van Rossum',
          job_title = 'Benevolent Dictator for Life',
          native_language = "Dutch",
          favorite_non_native_language = 'English',
          favorite_computer_language = "Perl")
```

Above we are using the type name `dict` as a function to create a dictionary instance
(it is a Python convention that type/class names also serve as factory,
or creator, functions).

The keyword arguments to the function are being interpreted as key/value pairs,
so no quotes are necessary around the keywords; keys and values are separated by
equal-signs; all business as usual for function keywords.

The curly brace syntax for creating a dictionary, on the other hand,
is completely analagous to the square bracket syntax for creating lists.
No function is called.  It is just a syntactic convention for entering
a dictionary: Within curly braces, enter the key-value pairs with key separated from
value by `:` and the pairs separated from each other by `,`.

**Using the same syntax as is used in defining `mystery_sentence_letter_counts` (curly braces and colons),
write another definition of `dd` in the next cell.**  Your answer should define exactly the same dictionary,
but using a different syntax.  Here's how your definition might might start:

```
dd =  { ....
```

Here is the answer:

In [None]:
dd  ={ "name": 'Guido van Rossum',
       "job_title": 'Benevolent Dictator for Life',
        "native_language": "Dutch",
        "favorite_non_native_language": 'English',
        "favorite_computer_language" : "Perl",
        'favorite_tv_show': "Monty Python's Flying Circus",
     }

We noted above that quotes are required around the dictionary keys in this "curly braces" notation, whereas they are not in the keyword notation.  In the keyword notation you are **using** the keys as names, which are Pythonic syntactic units like expressions and statements.  The code that executes the `dict` function uses these names to make strings, and later when retrieving values for keys, you must use those strings:

```
dd["job_title"]
```

not

```
dd[job_title]
```

This means the curly braces notation is more powerful than the keyword notation, since there are many
strings that do not correspond to legal names.  For example a name cannot start with an integer, so trying
to use `1var` as a name is an error:

In [None]:
1var = 33

SyntaxError: invalid decimal literal (3666288981.py, line 1)

And this is the same error:

In [None]:
dict(1var = 33)

SyntaxError: invalid decimal literal (170411066.py, line 1)

But `"1var"`` is a perfectly fine string, and any streing can be a dictionary key,  so this works:

In [None]:
dd22 = {"1var":33}
dd22

{'1var': 33}

Similarly you can't use keyword notation with builtin operators like `in`:

In [None]:
dd23 = dict(in=33)

SyntaxError: invalid syntax (344706876.py, line 1)

In [82]:
dd23 = {"in":33}

Let's return to `mystery_letter_counts`

```
mystery_sentence_letter_counts = {' ': 9, 'e': 5, 'o': 5, 'l': 3, 'd': 2,
                                  'h': 2, 'r': 2, 'u': 2, 'w': 2, 'y': 2, '.': 1,
                                  'a': 1, 'c': 1, 'b': 1, 'g': 1, 'f': 1,
                                  'i': 1, 'k': 1, 'j': 1, 'm': 1, 'n': 1, 'q': 1,
                                  'p': 1, 't': 2, 'v': 1, 'x': 1, 'z': 1}
```
The  dictionary `mystery_letter_counts` stores the number of times each of 28 characters occurs in a particular sentence of English.  We will show you this sentence shortly.  In the meantime, you are welcome to guess what the mystery sentence is; it is obviously a sentence that includes all 26 letters of the alphabet.

In the cell below, write an expression that retrieves the value `9` from `mystery_sentence_letter_counts`.

###  How to fill a dictionary

How might we get a dictionary like `mystery_letter_counts` defined?   Yes, typing it in is one option. But there are ways that might arise more naturally in the context of analyzing some data.  Suppose we had a sentence and we simply wanted to count the number of times each character occurs in it (acquiring accurate English letter frequencies is sometimes important, for example if you're a cryptographer.

We'd probably write a piece of code like this (using a loop, which is something
you haven't yet been introduced to).

In [78]:

# The mystery sentence
Sentence = 'the quick brown fox jumped over the lazy yellow dog.'
# Cook up an empty dictionary
letter_freqs = dict()
# Write a loop to count the letters in `Sentence`
for letter in Sentence:
    if letter in letter_freqs:
        letter_freqs[letter] += 1
    else:
        letter_freqs[letter] = 1

This dictionary named `letter_freqs` is defined in line 3 and filled with information in a `for` loop that begin on line 4 and ends on line 5.  The variable `Sentence` is defined to be a string (line 2) and the `for` loop steps through each character of that string and adds 1 to its count in the dictionary (line 5).  Looking
at `letter_freqs` you see that the keys and counts in the dictionary `letter_freqs` are the same as those in the dictionary `letter_counts`, defined in one of the cells above.  

In [79]:
letter_freqs

{'t': 2,
 'h': 2,
 'e': 5,
 ' ': 9,
 'q': 1,
 'u': 2,
 'i': 1,
 'c': 1,
 'k': 1,
 'b': 1,
 'r': 2,
 'o': 5,
 'w': 2,
 'n': 1,
 'f': 1,
 'x': 1,
 'j': 1,
 'm': 1,
 'p': 1,
 'd': 2,
 'v': 1,
 'l': 3,
 'a': 1,
 'z': 1,
 'y': 2,
 'g': 1,
 '.': 1}

Finally here's a very simple way to do what we just did in the `for` loop.  `Python` has many specialized kinds of dictionaries you can `import`, thus adding them to what you get in your builtin `Python`.   One of these is called a `Counter`.  

You use a `Counter` to count the kinds of things contained in a container.  Here's how to count the number of times each character occurs in a string.

In [80]:
import collections
mslc = collections.Counter('i have checked out on many queer lax days with flying bats jumping through hoops.')

In [81]:
mslc

Counter({'i': 4,
         ' ': 14,
         'h': 6,
         'a': 5,
         'v': 1,
         'e': 5,
         'c': 2,
         'k': 1,
         'd': 2,
         'o': 5,
         'u': 4,
         't': 4,
         'n': 4,
         'm': 2,
         'y': 3,
         'q': 1,
         'r': 2,
         'l': 2,
         'x': 1,
         's': 3,
         'w': 1,
         'f': 1,
         'g': 3,
         'b': 1,
         'j': 1,
         'p': 2,
         '.': 1})

In the cell below write an expression that retrieves the number of times the letter `e` occurred in the mystery sentence from `mslc`.

In the cell below, define the same dictionary we called `letter_freqs2` above, using the string `Sentence` and a `Counter`.  The minimal answer add justs one line of code to the cell below.

Check your answer.

In [None]:
import collections


# Filling a Dictionary with information

Now let's look at a more practical example of defining a dictionary.

In [83]:
from IPython.display import Image
Image(url = "https://gawron.sdsu.edu/python_for_ss/course_core/book_draft/_static/example_toy_graph.png",
      width= 600)

Using the example of the Enron email graph discussed in the online text as your model, represent the above graph as a dictionary. (By the way, the graph represents victories ina sports tournament.  A link from team A to team B means A defeated B are in the playoffs).  There are 8 teams in the graph.  Your dictionary should have exactly 8 keys. If `dd` is your dictionary, the `dd['Houston']` should return the teams Houston defeated in the tournament.

There's more than one correct answer.  You may use containers not yet discussed in the notebook.

In [85]:
dd = {'Golden State': {'Houston','New Orleans', 'Memphis'},
      'Houston': {'Dallas','Los Angeles'},
      'Los Angeles': {"San Antonio"},
       'Memphis': {"Portland"},
      'Dallas': {},
      'San Antonio': {},
       'Portland':{},
      'New Orleans': {},
      }

## Sets

In [None]:
S = {'a','b','c', ('d','e')}
T = {'d','a','f'}

In [None]:
T

In [None]:
len(S)

In [None]:
len(Mothers)

In [None]:
len(dd)

In [None]:
Mothers

In [None]:
('George Clooney', 'Rosemary Clooney') in Mothers

In [None]:
'George Clooney' in Mothers

In [None]:
'a' in S

In [None]:
'd' in T

In [None]:
('d','e') in T

In dictionaries it's the **keys** that count as the elements **in** the dictionary.

They uniquely define an item of information in the dictionary, a key-value association.

In [None]:
print(dd)
'name' in dd

In [83]:
S = {'a','b','c'}
T = {'b','a','c'}
U = {'a','b','c','b','b'}
STup = ('a','b','c')
TTup =  ('b','a','c')
UTup = ('a','b','c','b','b')

In [84]:
S == T

True

In [85]:
S == U

True

In [None]:
S

In [None]:
U

In [86]:
STup == TTup

False

In [87]:
STup == UTup

False

In [88]:
len(UTup)

5

#### Retrieve the element 'a' from `S`?

In [None]:
print(S)
print('a' in S)

There isn't any way to **retrieve** `'a'` from `S` because there isn't any additional piece of
information it's hooked onto (a position, or a key).  

It's just in `S` (or it's not).

What we can do is **remove** the element `'a'` from S.

In [93]:
dir(S)[-17:]

['add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']

In [88]:
S = set("ABC")
U = set('bcd')
S, U

({'A', 'B', 'C'}, {'b', 'c', 'd'})

In [90]:
S.update(U)

In [91]:
S

{'A', 'B', 'C', 'b', 'c', 'd'}

In [None]:
print(S)
S.remove('a')
print(S)

It's a `KeyError` to try to remove  something that isn't a member.

In [None]:
S.remove('w')

#### Sets as iterators

Sets are containers so they're iterators.


Here's what that means:

In [92]:
for x in S:
    print(x)

d
A
b
c
B
C


In [93]:
W = {4, -1, 8}
x,y,z = W

In [94]:
x,y,z

(8, 4, -1)

Sets

1.  Sets are **not** sequences.  Order doesn't matter.
2.  Sets are not indexable.  They support neither positional indexing (like sequences) nor keyword indexing (like dictionaries).  The question they answer about any object `x` is whether `x` is in **in** or **out**; that is, they serve to identify collections of objects that have some property.
2.  There are no duplicates.
3.  Sets may contain other containers (hierarchical structure), although there are limitations we'll
    discuss elsewhere.
4.  Sets are like dictionaries in several ways. Neither is a sequence.   They both raise `KeyErrors`
    when you try to perform a retrieval or update operation on a non-member.

# Data structure key: the elements of a sequence

There are several things we want to be able to do with a sequence.  
   1.  We want to be able store and retrieve elements in a particular position.
   2.  We want to be able to efficiently iterate through all the elements of the sequence in order and 
       perhaps do some thing with them.
       
The `for`  loop we used in the previous section lets us efficiently step through all the elements of a sequence
and do some uniform thing to all of them.  For example, the next `for` loop just prints all the elements of Sentence2.
Evaluate it and make sure you understand why it works the way it does.

In [None]:
Sentence2 = 'The quick brown fox jumped over the lazy yellow dog.'
for letter in Sentence2:
    print (letter)

Now suppose we wanted to print all the **words** in `Sentence2`, each on a line all its own.



Well we can't do that with a simple  `for` loop with a string like `Sentence2`.  This is because the words in `Sentence2` are all different lengths and therefore they turn out to be different-sized splices. The next cell prints out the first
three words using splices. Evaluate it and make sure you understand why the splices work the way they do.

In [None]:
# First we need a splice 3 characters long starting at the beginning of the string.
Sentence2 = 'The quick brown fox jumped over the lazy yellow dog.'
print(Sentence2[0:3]) # Now skip the space
print(Sentence2[4:9]) # Now skip the space
print(Sentence2[10:16])

## An easier way

Fortunately there's a much easier way to print out the words.  We turn the string into a list, each of whose elements is a word in the string.  Then we use a simple `for` loop.

In [None]:
Sentence2 = 'The quick brown fox jumped over the lazy yellow dog.'
word_list = Sentence2.split()
for word in word_list:
    print(word)

The next cell shows what `split` does.  It creates a list of strings by breaking the original string up at the spaces.

In [None]:
Sentence2 = 'The quick brown fox jumped over the lazy yellow dog.'
Sentence2.split()

 Notice the spaces themselves are not included. When we split "on" some element of a string, that element is regarded as a *separator character* and the things of interest are the things between separators.  `split` is actually a much more general function that can split a string on any character and return the substrings separated by that element.  So for example if we split `Sentence2` on `u`, we get three pieces back, because there are two `u`'s.

In [None]:
Sentence2 = 'The quick brown fox jumped over the lazy yellow dog.'
Sentence2.split('u')

Notice that this time the elements of the list returned by `split` did include spaces.


Try to guess what we'll get in the next example before looking at the result in the output cell. We will try to split `Sentence2` on `y`.  How many pieces will there be and what will the second piece be?

In [None]:
Sentence2 = 'The quick brown fox jumped over the lazy yellow dog.'
Sentence2.split('y')

More usefully, data is often stored in files with some special character used as a 'separator'.  For example, if you save an Excel file in `.csv` format, the comma character (",") is used as a separator, and a table of people, their ages, and their phone numbers might look like this:

```
 fred,22,7891234
 alice,28,9070077
 joe,20,8742399
```

The next cell shows what you get you get when you split one of these lines on comma:

In [None]:
'joe,20,8742399'.split(",")

Notice the numbers are not Python numbers; they are strings of digits, because splitting a string always returns a list of strings.

The `default` character that `split` splits on is actually a set of characters, anything that counts as whitespaces.  That includes space (' '), tab ('\t'), a line return, because turning a string into a list of words is often fairly useful.  This is what we did with `Sentence2` above.

Try to guess what the result of the following split will be:

In [None]:
S = """Rose are red;
       violets are blue.
       Sugar is sweet.
       And so are you.
       """
S.split()

So `split` treats line breaks just as another kind of white space.

####  Methods for strings

The moral of this section is that different kinds of sequences can have different kinds of elements.  Strings are probably best thought of as having characters (strings of length 1) as their elements, and that's the only convenient way
to loop through them, one character at a time.  Lists are more flexible.  An element of the list can itself be a sequence, and so we can arrange for a list to have chunks of any size as its elements. We just pick whatever chunks are useful, and in strings of text, the word chunks are useful, and the default version of splitting gives us (approximately) a list of word chunks.

## Counting words

Let's review how we did counting above.

In [None]:
from collections import Counter
Sentence = 'the quick brown fox jumped over the lazy yellow dog.'
word_freqs = Counter(Sentence)
word_freqs

Bearing in mind that we can initialize a `Counter` with any kind of sequence, **revise the code above so that what is counted is words, not characters.**

In [None]:
[your answer here]

# Notebook Takeaways

1. Different types => different constructions for retrieving information
2. All types => defined for `in` and `len`
3. In any container, what the elements are matters for retrieving information, for counting, for looping through the elements
4. All containers except strings are **compound**. Containers may have containers of other types as elements.
5. Python syntax allows for hierarchical structure (`X[y][z]`)

The cells below are not discussed in the video lecture, but useful.

# Embedded Structure

In [95]:
Higgledypiggledy = [1, ['a','b','c'], {1.4: 'Sam'}]

We saw this example above.  Here we emphasize the fact that some container types can contain containers, either of the same type  or of different types. `Higgledypiggledy` is a list that contains a list and a tuple.

####  Exercse


Write a single Python expression that returns the value `'S'` from `Higgledypiggledy`, being
sure to respect the container structure.  Also remember, case matters.

Incorrect.

In [96]:
Higgledypiggledy[2][1][1]

KeyError: 1

In [101]:
Higgledypiggledy[2][1.4][0]

'S'

In [None]:
dd = dict(name = 'Mark Gawron',
          languages = ['Python','Prolog', 'Lisp', 'Java'],
          favorite_tv_shows = ["The Tunnel","Monty Python's Flying Circus", "Justified"],
          favorite_deserts = ['Apple pie','Chocolate pudding','flan'])

The intended result of the following code is

```
'Apple'
```

Instead of returning this result, it raises an `Exception`.  Diagnose the problem and edit the code
to produce the intended result.

In [None]:
dd['favorite_deserts'].split()[0]

AttributeError: 'list' object has no attribute 'split'

The problem was that value of `dd['favorite_deserts']` was a list and `.split(...)` is
a method on strings.  To get to the string that produces the desired split, we need
to retrieve the first element of the list and then split.

In [None]:
dd['favorite_deserts'][0].split()[0]

'Apple'

### Methods suitable for types

Each type has a set of **methods** that do things taht bare in some way natural 
for that type. For example:

In [112]:
wnp = "War and Peace".casefold()
wnp

'war and peace'

In [124]:
capped = wnp.capitalize()
capped

'War and peace'

In [125]:
titled = wnp.title()
titled

'War And Peace'

In [131]:
".333".isdecimal()

False

In [117]:
"the horror, the horror".partition(",")

('the horror', ',', ' the horror')

In [118]:
"the horror, the horror, the horror".partition(",")

('the horror', ',', ' the horror, the horror')

In [119]:
"the horror, the horror, the horror".split(",")

['the horror', ' the horror', ' the horror']

In [123]:
"Quaff".startswith("Qu")

True

Similar example:  update for dictionaries, union/intersection for sets.

###  Further exercioses

In the next exercise we try to retrieve some data.

In [103]:
dd = dict(name = 'Mark Gawron',
          languages = ['Python','Prolog', 'Lisp', 'Java'],
          favorite_tv_shows = ["The Tunnel","Monty Python's Flying Circus", "Justified"],
          favorite_deserts = ['Apple pie','Chocolate pudding','flan'])

In the next cell, Write a single expression that retrieves the string `yin` from `dd`.

In [None]:
dd['favorite_tv_shows'][1].split()[2][2:5]

'yin'

#### Exercise

In [104]:
dd

{'name': 'Mark Gawron',
 'languages': ['Python', 'Prolog', 'Lisp', 'Java'],
 'favorite_tv_shows': ['The Tunnel',
  "Monty Python's Flying Circus",
  'Justified'],
 'favorite_deserts': ['Apple pie', 'Chocolate pudding', 'flan']}

In the next cell we try something a little trickier.Write a single Python expression that returns the value `"favorite_tv_shows"` from `dd`.  Hint: you will need to think about some dictionary methods.
That will give you a container with `"favorite_tv_shows"` as en element.
But that container won't be an indexable sequence.  So you will need to convert the container to
an indexable sequence, and then index it.

Study the following answer and make sure you understand it.

In [None]:
list(dd.keys())[2]

'favorite_tv_shows'

First we get `dd`'s keys:

In [None]:
dd.keys()

dict_keys(['name', 'languages', 'favorite_tv_shows', 'favorite_deserts'])

This is a special keys object, It is a container:

In [None]:
'favorite_tv_shows'  in dd.keys()

True

In [None]:
len(dd.keys())

4

But is cannot be indexed (**subscripted**):

In [None]:
dd.keys()[2]

TypeError: 'dict_keys' object is not subscriptable

So we convert it to something that can be indexed:

In [None]:
list(dd.keys())

['name', 'languages', 'favorite_tv_shows', 'favorite_deserts']

In [None]:
then index it:

In [None]:
list(dd.keys())[2]

'favorite_tv_shows'

Notice that the index used here corresponds to the order we used when we entered
the key-value pairs in defining the diction (several cells back).  Dictonaries in python
are now **ordered**; keys and key-value pairs will be returned in the ordee used to enter them into
the dictionary.

The moral of this exercise.  

1.  Pay attention to the specific properties native to the **type** of the container you are retrieving
data from.
2.  If the container type you've got doesn't have the properties you need, ot is often easy enough to convert to the type of container that does have those properties.

In this case the `dict_keys` container couldn't be subscripted, but it was easy to convert it to a list
that could be.

The next cell defines a tuple  `XX` that contains the word "Ringo".  Write a few lines
of code that define a  Python tuple identical to `XX` except that the word "Bob" replaces
the word "Ringo".  Hint A: A type change will make this easy.   Hint B:  This
will take three lines of code.

In [None]:
XX = ('The', 'winner', 'of', 'the', 'game', 'was', 'named', 'Ringo')

In [None]:
# Convert to a list to allow assignment
LL = list(XX)
LL [-1] ="Bob"
# Convert back to tuple
tuple(LL)

('The', 'winner', 'of', 'the', 'game', 'was', 'named', 'Bob')

Note that it is hard to do this in less than 3 lines, given what we know.

```
list(XX)[-1] = 'Bob'
```

is legal, but now we don't have a name for the list to use for converting it back to a tuple.

```
tuple(list(XX)[-1] = 'Bob')
```

is illegal because `list(XX)[-1] = 'Bob'` is a **statement** which has no return value
and therefore can't be the argument of a function:

In [None]:
tuple(list(XX)[-1] = 'Bob')

SyntaxError: expression cannot contain assignment, perhaps you meant "=="? (298957941.py, line 1)

In addition we want to want to convert the entire list into a tuple,
and `tuple(list(XX)[-1] = 'Bob')` certainly doesn't make that clear.

The moral of these last exercices  

1.  Pay attention to the specific properties native to the **type** of the container you are doing things to.
2.  If the type you've got doesn't have the properties you need, it is often easy enough to convert to the type of container that does have those properties, and, if necessary, to convert back.

