# Assignment and substitution

The most important principle of Python programming for data science is the *principle of substitution*: 

> If x = {some mess}, and one writes an expression of {some mess}, 
> then one can substitute x for {some mess} in the expression 
> with exactly the same results. 

Let's apply this in several circumstances. There are three independent circumstances that come up regularly: 
* substitution of data. 
* substitution of functions. 
* substitution of anonymous functions. 

In principle, we could do everything without substitution. The purpose of all of these substitution principles is *readability* and *reuseability*. 

# How to interact with this notebook
This notebook is not designed to stand alone. I will be using many Python functions. You should read up on anything you don't know about, from one of the following sources: 
* Google "python x" 
* [the python manual](https://docs.python.org/)
* [the official python tutorial](https://docs.python.org/3/tutorial/)
You should at some point go through the whole tutorial. 

# Substitution of data
In this exercise, we'll concentrate on data substitution. We've learned already that lists can be extremely complex in structure and that what we do with them can be complicated. Consider the code: 

In [1]:
fruits = [('apples', 2), ('oranges', 3), ('peas', 100)]
for f in fruits:
    if f[0] == 'apples':
        print("there are {} apples".format(f[1]))

there are 2 apples


Write this without use of the variable `fruits` below. It should do exactly the same thing. Use the principle of substitution. Execute it once done. 

In [None]:
# write your answer here

However, this is an **unbelievably inefficient** way to search a list. The better way is to convert the list to a dictionary, to wit: 

In [None]:
fruits = [('apples', 2), ('oranges', 3), ('peas', 100)]
dictionary = dict(fruits)
print("there are {} apples".format(dictionary['apples']))

# Some basic facts about substitution
1. It doesn't depend upon whether you understand the code or not. 
2. It can be used to understand code you don't understand, by giving things names and printing them! 

Consider the following fragment: 

In [None]:
fruits = [('apples', 2), ('oranges', 3), ('peas', 100)]
print("there are {} apples".format(dict(fruits)['apples']))

# Whoa there! What just happened? 
Substitution is a double-edged sword. It can be used to make code *less* readable. 

Let's make this readable by substitution. This is sometimes called "refinement". 

First, we take the expression out of the format statement and get: 


In [None]:
fruits = [('apples', 2), ('oranges', 3), ('peas', 100)]
napples = dict(fruits)['apples']
print("there are {} apples".format(napples))

But then we might want to understand what these things actually are, so we could write: 

In [None]:
fruits = [('apples', 2), ('oranges', 3), ('peas', 100)]
data = dict(fruits)
napples = data['apples']
print("there are {} apples".format(napples))

And then we might ask how this actually works: 

In [None]:
fruits = [('apples', 2), ('oranges', 3), ('peas', 100)]
data = dict(fruits)
print(data)
napples = data['apples']
print(napples)
print("there are {} apples".format(napples))

Thus, we have unraveled a complex expression into its parts without really understanding it beforehand! 

Let's put that into practice as a learning tool for figuring out complex data flows. Here's a dataflow to explore. It's actually a common trick in Python: 

In [None]:
print(sorted(list(set(['Brian', 'Brian', "Sarah", "Joe", "Sarah", "Mark"])))[0])

We can understand this by "unwrapping the onion". Please follow my instructions.
1. Put the list on the inside in a variable `people`. 

In [None]:
# {write your answer here} 
people = ...
people

2. explore what `set(people)` does by putting it into a variable `myset` and printing it. 

In [None]:
# {write your answer here} 
myset = ...
myset

3. Now put `sorted(myset)` into a variable `ordered` and print that. 

In [None]:
# {write your answer here} 
ordered = ...
ordered

Finally put `ordered[0]` into a variable `chosen` and print that. 

In [None]:
# {write your answer here} 
chosen = ...
chosen

... and you have just *explained* what the complex expression means. 

Now answer some questions: 

4. What does set(x) do? 

___Your answer:___

5. What does sorted(x) do? 

__Your answer:__



# In this example, 
1. I quite *intentionally* (and with "malice aforethought") exposed you to Python you might not know. 
2. Instead of explaining each part, I left it to you to explore the Python and figure out how it works. 
3. While you can look up the meaning of each function in isolation, understanding the expression requires taking it apart and looking at each operation separately. 
4. This act is *complementary* to reading the documentation of each function. 
5. When I face undocumented Python, I do *both.* 

# What these functions actually do:
1. `set(x)` turns x, a list, into a set. [tutorial on set()](https://docs.python.org/3/tutorial/datastructures.html#sets)
2. `dict(x)` turns a list of tuples into a dictionary.[tutorial on dict()](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) 
3. `sorted(x)` returns a sorted version of a list, assuming that elements are text. [HOWTO on sorting](https://docs.python.org/3/howto/sorting.html)

`set()`, `dict()`, `list()` are also called *constructors*. They create an instance of a data type from data you provide. 

# the moral of this story
1. Anyone can read the manual and know what these functions do, but 
2. The skill I just demonstrated -- of taking apart complex constructions into their component parts -- is the best way to *learn* how to use them and apply them.
3. You will often be given Python scripts you don't understand. *The technique above is the best way to learn what they do!*


# When you're done, submit the notebook

You can submit a notebook by saving it as PDF. In the cluster environment, it's File | Print (Save as PDF) and submit to Gradescope. https://www.gradescope.com/courses/182658, On other versions, it may be File | Download As (PDF) and then submit to Gradescope.

To submit to Gradescope, log into the [website](https://www.gradescope.com/courses/182658), add course **9W7PW3** (if not already added) and submit. The assignment name should match the name of this notebook.