In [None]:
%reload_ext postcell
%postcell register

In [None]:
from IPython.display import HTML, display
import math

In [None]:
def show_methods(e):
    return render_list([m for m in dir(e) if '__' not in m])
    
def render_list(l):
    style = 'list-style-type:none;display:inline-block;margin:0px 1%;'
    return HTML('<ul">' + ''.join([f'<li style="{style}"><b>{m}</b></li>' for m in l]) + '</ul>') 

# All of Python - The Basics

This notebook will introduce enough of Python and programming concepts to enable you to write basic programs.

We are taking a "breadth first" approach, rather than "depth first."

## Built-In Data Types

Python provides several built-in data types to represent numbers, text, true/false (boolean) value and others. In this calss, we will skip types such as bytes and complex numbers.

### Numbers
There are actually _two_ common number types: integer style numbers and decimal style numbers (aka floating point numbers). In Python, these are called `int` and `float`. Integers don't have any decimal parts while floating point numbers do.

As we will see in the computer architecture lecture, the algorithms and actual, physical hardware to do math on integers and floating point numbers is different.

**Warning** Floating point numbers are not exact. This becomes extremely important when we need to compare numbers. If you add 10 cents and 20 cents, do you get 30 cents? We will look at the surprising answers later.

Some languages refer to floating point numbers as doubles. Lower level programming languages have a whole heirarchy of numeric types to represent the range of values numeric types can take on. As we will see in the computer architecture lecture, numbers can not be arbitrarily large (or small).


Use the function `type()` to find the type of an expression

In [None]:
type(100)

In [None]:
type(3.14)

Combining `int` and `float` results in a `float`

In [None]:
type( 100 + 3.14 )

**Exercise** Check the type of `100.0`?

In [None]:
%%postcell exercise_025_100_a

#type your answer here

**Warning** Watch the next calculation carefully:

**PEMDAS**
Precededence of operators: parenthesis, exponentiation, multplication, division, addition, subtraction

How is the following expression evaluated `1 + 2 * 3`?

In [None]:
1 + 2 * 3

In [None]:
(1 + 2) * 3

**Floating point limits**

There are an infinite points between 1 and 2. Basic math functions like addition and multplication is implemented in CPU hardware. How can limited hardware support potentially infinite points? It can't!

In [None]:
.1 + .3

In [None]:
.1 + .4

In [None]:
.1 + .5

In [None]:
.2 + .3

In [None]:
.2 + .5

In [None]:
.1 + .2

In [None]:
.1 + .2 == .3

##### Operators available on numbers

| | |
|---|---|
|+| addition|
|-| subtraction|
|*| multiplication|
|/| division|
|//| safe or integer division |
|**| exponentiation (not '^')|
|%| modulus|


### Words and characters
Python has a single type to represent words and characters. Many lower level languages have one type to represent single characters (usually called `char`) and another to represent collections of characters, such as words, sentences, web pages, books, etc. This data type is usually called a `string` (sometimes abbrevaited as `str` or caplitalized as `String`). Imagine a `string` as a long strand, connecting characters.

![](images/best-mommy-ever-jewelry.jpg)

Python has an `str` type to represent strings (and characters).

In [None]:
type("hello!")

In [None]:
type('hello!')

In [None]:
type(hello) #<== What is going on here?

In [None]:
hello = "hi" #10

In [None]:
hello

In [None]:
type(hello)

In [None]:
type("A")

In [None]:
type('A')

**What is the type of 1 vs "1"?**

In [None]:
#Try it here

In [None]:
type("1")

What happens if we add two strings?

In [None]:
"hello" + "world"

In [None]:
"1" + "2" # What is going on here?

In [None]:
type("1" + "2")

In [None]:
"hello " * 5

**Exercise** Use `dir()` to find all operations available on an object "hello"

In [None]:
%%postcell exercise_025_100_b

#Type answer here

**Exercise** Reminder that you can quickly find out what a function does by trying it out in a cell. Execute the next two cells, what do `strip()` and `replace()` functions do?

In [None]:
print("hello \n world")

In [None]:
"   hello  ".strip()

In [None]:
print(" hello \n")

In [None]:
" hello \n".strip()

In [None]:
"hello".replace('o','a')

##### Methods available on strings

In [None]:
show_methods("hello")

##### Operators available on strings

| | |
|---|---|
|+| concatenate (combine) two strings|
|*| repeat string a number of times|


More useful operations on strings are described in the `list` section

### Type conversion Detour

Say you read a file which contains a line containing this: 
```
Homer Simpson
```

You know that is a string type. What if the line contains 

```
100
```
When python gets input from the outside world (using the method used so far), it has no idea if it is dealing with an integer, float, a picture or an audio file of a song. It assumes everything is a string. Lucky for us, there are several functions which convert from one type to another.

In [None]:
type(100)

In [None]:
type("100")

In [None]:
int("100")

In [None]:
type(int("100"))

In [None]:
int("hello") #<== what happened?

Notice that the `int` function converted a string to an integer. What if we hadn't convert "100" to a numeric value?

In [None]:
"100" + "200"

In [None]:
int("100") + int("200")

In [None]:
int("100.2") #<== what happened?

In [None]:
int(100.2)

In [None]:
int(float("100.2"))

In [None]:
float("100.2")

In [None]:
float("100")

In [None]:
type(float("100")) # Do you expect the output of this cell to be 'int' or 'float'?

How about the other way, given an integer or a float, how do we convert it to a string?

In [None]:
"Homer is " + 34 + " years old"

In [None]:
"Homer is " + str(34) + " years old"

In [None]:
str(1234.567)

In [None]:
type(str(1234.567))

### Dates

Like most languages, Python provides a `datetime` type. We can certainly represent dates as strings "Nov 11, 2023" or as a number _1699109972_ (number of seconds since some date). However, a type dedicated to dates can be very useful.

In [None]:
import datetime

In [None]:
nov5 = datetime.datetime(2023, 11, 5, 6, 5, 3)# Nov 5, 6:05:03
nov5

In [None]:
type(nov5)

Get current date and time

In [None]:
datetime.datetime.now()

Once a datetime object is created, it can provide very useful information

In [None]:
nov5.year, nov5.month, nov5.day

In [None]:
nov5.hour, nov5.minute, nov5.second

Number of seconds since _epoch_ (Jan 1, 1970)

In [None]:
nov5.timestamp()

You can precisely control how dates and times a printed See this page for a table of formatters: https://www.w3schools.com/python/python_datetime.asp

In [None]:
print( nov5.strftime("%B %d, %Y") )
print( nov5.strftime("%b %d, %Y") )
print( nov5.strftime("%d/%m/%Y") )
print( nov5.strftime("%m/%d/%Y") )

In [None]:
nov5.isoformat()

You can use the same formatting strings to _parse_ dates

In [None]:
date_string = "Nov 5, 2023"
datetime.datetime.strptime(date_string, "%b %d, %Y")

In [None]:
nov5 + datetime.timedelta(days=1)

In [None]:
show_methods(nov5)

### Boolean (True and False)
Python is among those languages which provide a way to represent `True` and `False` directly. This data type is unexpectedly common in programming languages. For example, if you ask Python if 5 is greater than 3, the answer will be a boolean value (hint: True). Numeric values have associated operations we are used to from school: addition, subtraction, etc. String types have natural function associated with them such as upper case, lower case, combining strings, etc. Similarly, there is a "boolean algebra." We will introduce this later in the lecture.

In many languages, even widely used ones like `C`, there is no explicit boolean type. Instead, the number `0` is used to represent false and `1` is used to represent true.

Although Python has `True` and `False` keywords, they are actually just aliases for `1` and `0`, respectively!

**Real world examples**

While numbers and strings are natural to us, the boolean type needs some context. As you will see in the example below, comparing things requires an answer that is either true or false. 

Since computers are so good at doing repetitive tasks (executing loops), we need to tell the computer when to stop executing a loop. This is done by using booleans: keep doing something, until the value of a boolean is set to false.

Although such statements haven't been introduced yet, you have probably heard of if/else statements. This is one of most common ways computers choose between options. Booleans are an integral part of such decisions. `if` a boolean value is `True`, then do this thing, `else` do something else. The operations carried out by your program depend on the value of a boolean value.

In [None]:
type(True)

In [None]:
type(False)

In [None]:
5 > 3 # Is 5 greater than 3?

In [None]:
type(5 > 3)

**Basic operations**

Python provides the following comparison operators:

In [None]:
1 < 2 # is 1 less than 2?

In [None]:
1 > 2 # is 1 greater than 2?

In [None]:
1 <= 2 # is 1 less than or equal to 2?

In [None]:
1 >= 2 # what is this?

Keep in mind that you obviously don't need to compare 1 and 2, you already know the answer, However, such comparisons are useful when one or both terms are variables. Notice in an earlier lecture where we used comparison operators to check if the record we were processing belonged to Arya Stark or other characters of interest.

**Possibly confusing**

In [None]:
1 == 2 # is one EQUAL to 2?

Notice that `=` is used for assignment of variables. Such as setting x equal to 10 `x = 10`. However, checking if one thing is equal to another is done via two equal signs `1 == 2`

In [None]:
1 != 2 #is 1 NOT EQUAL to 2? or is 1 different from 2?

Boolean statements, such as above can be combined 

In [None]:
homer_age = 34
marge_age = 32
bart_age  = 10
lisa_age  = 8
maggie_age = 1

In [None]:
# Marge's age is between Homer and Bart
marge_age < homer_age and marge_age > bart_age 

In [None]:
#Is there anyone who is younger than lisa?
anyone_younger_than_lisa = lisa_age > homer_age or lisa_age > marge_age or lisa_age > bart_age or lisa_age > maggie_age
anyone_younger_than_lisa

In [None]:
is_lisa_youngest = not anyone_younger_than_lisa
is_lisa_youngest

##### Operators associated with booleans, but used for all basic data types (including strings and numbers)

| | |
|---|---|
|==| equals|
|!=| not equal|
|>| greater than|
|<| less than|
|>=| greater than or equal to |
|<=| less than or equal to|


## Variables

Variables store values which your program needs to recall. You may also think of a variable as a name assigned to a value. You can type out 3.14, again and again in your program. Wouldn't it be more informative to use `pi` instead? 

You could go through a file (which may have thousands of lines) and assign a variable `current_line` to whichever line you are processing.

In [None]:
x = 100

In [None]:
x

In [None]:
x + 2

Once a variable is assigned a value, you should be able to substitute the variable any place that value is expected

In [None]:
type(x) # Is there a difference between type(x) and type(100) or dir(x) and dir(100)?

Notice that different values can be assigned to variabls (variables can _vary_ )

In [None]:
x = 200

In [None]:
x

You can create an expression where a variable is assigned to itself. This is used when you are aggregatign values in a loop.

In [None]:
x = x + 1

In [None]:
x

In [None]:
x += 1

In [None]:
x

In [None]:
x++ #why doesn't this work?

Python, unlike most languages, allows multiple assignment:

In [None]:
x, y = 1, 2

In [None]:
x

In [None]:
y

**Exercise** The formula for maximum safe heart rate is `220 - age`. In the cell below, set the age variable and execute the cell to find out what your max heart rate

In [None]:
age = ???
MHR = 220 - age
MHR

## Functions

We will break up the study of functions into two subjects: how to use functions and how to create them. In this tutorial, let's see how to use functions.

Functions are so fundamental to programming that we have already used them several times in this tutorial.

When we find the absolute value of an integer `abs(-10)`, we are using the `abs` function. When we check the `type("hello")` of a string or an integer, we are using the `type` function. When we want to find out which operations are valid for a data type, using `dir(100)`, we are using the `dir` function.

Think of functions as a machine. We provide some input to the machine and it provides us with some output. We don't always concern ourselves with how this machine works. How does `abs` remove the negative sign? How does `dir` figure out which operations are valid. We can safely ignore these details, until we have an actual need to understand these functions.

In [None]:
abs(-10)

Functions are _called_ , _invoked_ or _executed_ in the following manner:

`result = function_name(argument1, argument2, argument3)`

A function has a name (we will study nameless functions later). We pass it some input, more often called _arguments_ or _parameters_.

Once the function is done executing, it _returns_ a value.

Recall from an earlier lecture that, in Jupyter notebook, executing `function_name?` gives you documentation for that function. In some cases, `function_name??` gives you the source code for the same function.

#### Importing functions

Python has a number of built-in functions. We have already seen `type` and `dir`. 

This page describes the full list, along with their docs: https://docs.python.org/3/library/functions.html

Notice that math functions such as `abs` (absolute value) and `round` (return rounded number) as available. Surely python has other functions, such as ceiling, floor, log, power, sin, cos and many other middle school level functions? Where are they?

In order to keep functions organized, python puts them in relevant modules (also called libraries). For example, Python provides amodule called `math`, which contains _many_ useful math functions. In order to use those function, you have to tell your program that you would like to `import` a relevant module.

In [None]:
sqrt(100) # What happened?

In [None]:
floor(3.14)

In [None]:
import math #import the math module, all function accessible by prefixing the module name

In [None]:
math.sqrt(100)

In [None]:
math.floor(3.14)

In [None]:
dir(math)

In [None]:
from math import * # import all functions from math and make them accessible without a prefix

In [None]:
floor(3.14)

In [None]:
import math as m # import the math module, but change the prefix name to m

In [None]:
m.floor(3.14)

This link provides information about all built-in modules: https://docs.python.org/3/py-modindex.html

#### Methods

Notice that so far in this calss, we have called functions two ways: by themselves and using the _dot_ notation. For example:

In [None]:
abs(-100)

In [None]:
"hello, world".split(",") # split("hello world", ",")

Hopefully, you can tell from context that `"hello world".split(" ")` calls the `split` function, which understands that it is operating on the string `"hello world"`. 

When a function is called via the dot notation, it is called a method, and it is operating on an object. We will introduce objects and object oriented programming a little further down the line. For now, understanding this usage from context should be enough.

#### Third-party modules

When we discuss relative quality of a programming languages, their ecosystem of libraries is often extremely important. As powerful as programming languages are, if one had to rely only on built-in function, thousands of programmers would find themselves re-inventing the wheel countless times.

Luckily, programmers, such as you and I, can create our own modules (aka libraries) and distribute them for others to use. Recall from an earlier lecture that getting information about Game Of Thrones characters took a bit of work. However, when we used the Pandas library, we were able to answer the same question in a single line of code! This is because there were others before us who were annoyed enough at having to write so much code that they decided to _abstract_ away the complexity for us.

People have created libraries for practically every branch of mathematics, data science, physics, game development, communication. Whatever you can think of, someone has created a library for it.

In a later lecture, we will learn more about this ecosystem.

#### Writing functions

In this section, we see how to create our own functions. Since we have not yet learned many important Python constructs, let's write a silly function which checks if a number is greater than zero.

Without functions, this is what the code looks like:

In [None]:
mynumber = 42
is_greater_than_0 = mynumber > 0

In [None]:
is_greater_than_0

Here is the same logic, wrapped in a function:

In [None]:
def is_greater_than_0(mynumber):
    is_greater_than_0 = mynumber > 0
    return is_greater_than_0

In [None]:
type(is_greater_than_0)

In [None]:
is_greater_than_0(34)

In [None]:
is_greater_than_0(54)

In [None]:
is_greater_than_0(-3)

**Example** Write a function which adds thee numbers together

In [None]:
def addd(a, b, c):
    return a + b + c

In [None]:
def addd(a, b, c):
    rslt = a + b + c
    return rslt

In [None]:
addd(1, 2, 3)

In [None]:
addd(1, 1, 1)

**Exercise** Write a function which multiplies four numbers together. Call the function `mulllltiply`

In [None]:
%%postcell exercise_025_100_c

### write your function here

In [None]:
### execute your function here

## Container types (aka data structures): lists and dictionaries

We often need to keep track of several items. In an earlier lecture, we saw that we needed to keep a list of names. In another scenario, we need to keep track of numbers per person. This section describes some basic container types, also known as data structures (in computer science literature).

### Dictionaries
Recall this program from an earlier lecture:

In [None]:
#Find all killers and their kill count

killers = dict() # dictionary data type

file = open("../../datasets/deaths-in-gameofthrones/game-of-thrones-deaths-data.csv", "r", encoding='utf8')

for line in file:
  tokens = line.split(',')
  if tokens[4] in killers: kill_count = killers[tokens[4]]
  else: kill_count = 0
  kill_count = kill_count + 1
  killers[tokens[4]] = kill_count

file.close()
killers

We have a variable for jon and for arya. But what if we wanted to calculate how many people _everyone_ killed (evidence for their trials)? We need a way to _dynamically_ create variables. These variables will need to be assigned values and those values will need to be updated.

Almost all programming languages provdie a way to do this via _dictionaries_. Some languages call them maps (map a key to a value), some call them associative arrays. The keyword for such types in Python is `dict`.

You can create an empty dictionary via `my_dict = {}` or `my_dict = dict()`.

In [None]:
simpson_ages = dict()

In [None]:
simpson_ages

At this point, `simpson_ages` is a dictionary with nothing in it

| Keys | Values |
|------|--------|
|      |        |


In [None]:
simpson_ages["Homer"] = 36 #Set the value of key "Homer" to 36

In [None]:
simpson_ages

The dictionary is no longer empty

| Keys | Values |
|------|--------|
|  Homer    |  36      |

In [None]:
simpson_ages["Homer"] # Get the value of key "Homer"

We haven't modified the dictionary

| Keys | Values |
|------|--------|
|  Homer    |  36      |

In [None]:
simpson_ages["Marge"] # What happened?

In [None]:
person_name = "Marge"
person_age = 34

simpson_ages[person_name] = person_age
simpson_ages[person_name]

In [None]:
simpson_ages

The dictionary now looks like this:

| Keys | Values |
|------|--------|
|  Homer    |  36      |
|  Marge    |  34      |

Literal syntax

In [None]:
got_killers = {"arya": 1278, "jon": 112}

In [None]:
got_killers

In [None]:
got_killers["arya"]

You have seen how to create an empty dictionary, then add items.
You can also create a dictionary with items already in it:

In [None]:
my_second_dictionary = {"key1": "value1", "key2": 2, "key3": "value3", 4:"value4", "4":"value4str"}

In [None]:
my_second_dictionary["key1"]

In [None]:
my_second_dictionary[4]

In [None]:
my_second_dictionary["4"]

In [None]:
my_second_dictionary["key2"]

In [None]:
my_second_dictionary

**Exercise** Add 2 years to Homer's age in `simpson_ages` (hint: get Homer's age from the dictionary, add 2 to it, add the new value back into the dictionary)
**WARNING** Don't just set the value equal to 38, get Homer's current age, add 2, add it back to the dictionary

In [None]:
%%postcell exercise_025_100_d

# Try it here

**Exercise** Create a new dictionary, call it `last_names` and use this table to fill it:


| Keys | Values |
|------|--------|
|  Homer    |  Simpson      |
|  Marge    |  Simpson      |
|  Ned    |  Flanders      |
|  Barney    |  Gumble      |

In [None]:
%%postcell exercise_025_100_e

# Write code here

**Exercise** Say we want to create a combined database of The Simpson's and The Flinstones. What happens when you add "Barney Rubble" to this dictionary? 

(Hint: Insert "Barney" and "Rubble" into the dictionary you just created. Since there are two values associated with the first name "Barney", what does your dictionary show as Barney's last name? Gumble or Rubble?)


In [None]:
%%postcell exercise_025_100_f

# Write code here

**Exercise** What is the `type` of `last_names`?

##### Dictionary methods

In [None]:
show_methods(dict())

### Lists

Now that you understand dictionaries, what if you want a list of all the names in the dictionary `simpson_ages`? Notice that you can call simpson_ages.keys() to get this list.

In [None]:
simpson_ages

In [None]:
simpson_ages.keys()

In [None]:
simpson_ages.values()

List is an extremely important container type, so important that some languages are built around it.

Recall that dictionaries are created either using the `dict()` keryword or curly braces `{}`. Similarly, lists can be created via `list()` or square brackets `[]`

In [None]:
programming_languages = list() # or programming_languages = []

In [None]:
programming_languages

At this point, we have created an empty list:

[  ]

In [None]:
programming_languages.append("python")

In [None]:
programming_languages

The list now has an element:

["python"]

In [None]:
programming_languages.append("R")
programming_languages.append("julia")

In [None]:
programming_languages

In [None]:
programming_languages.append("R")

In [None]:
programming_languages

The list now has four items:

["python", "R", "julia", "R"]

If we knew exactly what we wanted to put in this list, we could have created it like this:

In [None]:
programming_languages = ["python", "R", "julia", "Java", "F#", "Haskell", "C#"]
programming_languages

#### List indexing - one of the reasons Python is so popular

In the next section, we will learn how to `loop` through each item in a list (or dictionary). If you want to access a single item at a known location, you can `index` into the list.

In [None]:
programming_languages[1]

In the previous statement, you asked for the item at location 1. Did you get what you expected? Try the next line:

In [None]:
programming_languages[0]

**Remember** Python is a zero-index language! Many programming languages (and programmers), start counting at zero, instead of one! R is one of the exceptions, as it starts counting at 1 (like most humans).

If you want to get a range of items from a list (say, the first 2), you can use the following syntax

In [None]:
programming_languages

In [None]:
programming_languages[0:2]

In [None]:
programming_languages[1:] #Start at one (second item), go until the end

In [None]:
programming_languages[:2] 

Here, instead of indexing with a single number, you are using `starting_point:ending_point` syntax, where `ending_point` is not inclusive (so the value at `ending_point` will NOT be included).

Finally (for this quick tutorial), you can access items from the end of a list with negative numbers.

In [None]:
programming_languages[-1] #This should get you the last item

In [None]:
programming_languages[-2] #This should get you second from the last item

In [None]:
programming_languages[-4:] #This should get you the last two items (start at -2, end at the end)

In [None]:
big_list = list(range(20))
big_list

Step size

In [None]:
big_list[5:15]

In [None]:
big_list[5:15:1]

In [None]:
big_list[5:15:2]

**Exercise** What is the type of `programming_languages`?

In [None]:
type(programming_languages)

**Exercise** Given the list `killed` in the Game of Thrones example, assuming the file is ordered chronologically, provide code for:

In [None]:
killed = list() # list data type

file = open("../../datasets/deaths-in-gameofthrones/game-of-thrones-deaths-data.csv", "r", encoding='utf8')

for line in file:
  tokens = line.split(',')
  if tokens[4]=="Jaime Lannister":
    name_of_killed = tokens[3]
    killed.append(name_of_killed)

file.close()
print(killed)

In [None]:
%%postcell exercise_025_100_g

# First person killed by Jaime
print(killed[0])

# Last person killed by Jaime
# your code here

# First three people killed by Jaime

# Last three people killed by Jaime

# Names of the second, third and fourth victim

#### Strings are also lists (of sorts)!

Did you notice that we used a type of list at the beginning of the lecture: strings!
While python strings are not identical to lists, their _interface_ (operations and function which can be used) is very similar! Everything you learn about accessing items from a list will apply to strings (but strings can't be modified)

In [None]:
homer = "Homer Simpson"

In [None]:
homer

In [None]:
homer[0]

In [None]:
homer[0:7]

In [None]:
homer[-3:]

Many functions return lists. One you have already seen is the `split` function.

In [None]:
"All models are wrong, some are useful".split(" ")

In [None]:
"All models are wrong, some are useful".split(" ")[2]

Recall this example from an earlier lecture, which recorded the names of people who were killed by Jaime. Notice where and how `list` is being used.

**Extended exercise** What operations can be done on lists, such as `programming_languages`?

In [None]:
dir([1,2])

Strings can be converted to a list using the `list` function:

In [None]:
list("Homer")

Lists can be converted to a single string using the `join` method:

In [None]:
", ".join(["homer", "marge", "bart", "lisa", "maggie"])

In [None]:
"; ".join(["homer", "marge", "bart", "lisa", "maggie"])

**Exercise** Which function finds the length of `killed`? (hint: it is a built-in function, use google to help you find this)

In [None]:
%%postcell exercise_025_100_h

#Type code here

**Exercise** Create a list and populate it with a few names. Can you add the same name multiple times?

In [None]:
%%postcell exercise_025_100_i

#Type code here

##### List methods

In [None]:
show_methods(list())

### Tuples

Tuples are very similar to lists. However, unlike lists, tuples can not be modified. Once a tuple is created, its length and member stay the same.

You will notice that tuples are defined with round parentheses `()` whereas lists are defined using square parentheses `[]`

In [None]:
l = ['homer', 'marge']
l

In [None]:
t = ('homer', 'marge')
t

In [None]:
t2 = 'homer', 'marge'
t2

In [None]:
print(type(l))
print(type(t))
print(type(t2))

In [None]:
l[0]

In [None]:
t[0]

In [None]:
l[0] = 'HOMER'
l

In [None]:
t[0] = 'HOMER'
t

We will look at tuples in greater detail in a later notebook. For now, just be aware that when you see output like `(1,2)`, you are still looking at a collection of items. 

In [None]:
1,2

Tuples are sometimes used to return multiple values from a function

In [None]:
def f():
    return 1,2

In [None]:
f()

Python allows multiple assignment, necessary for tuples

In [None]:
rslt = f()

In [None]:
rslt[0]

In [None]:
rslt[1]

In [None]:
x, y = f()

In [None]:
x

In [None]:
y

##### Methods on tuples

In [None]:
show_methods((1,2))

### Sets
Sets are very similar to Lists. Like mathematical sets, and unlike lists, an item can only exist once in a set.

Sets can be created either using the `{}` notation or the `set()` notation. If you already know the elements of a set, you can create it as such: `set([1,1,2,3,3,3,4,5])`

In [None]:
[1,1,2,3,3,3,4,5, 1]

In [None]:
set([1,1,2,3,3,3,4,5])

In [None]:
moes_customers = set()
moes_customers

In [None]:
moes_customers.add('Homer')
moes_customers.add('Barney')
moes_customers

In [None]:
moes_customers.add('Homer')
moes_customers

**Example** Homer Simpson can appear in multiple police records for disorderly conduct. However, when asked for the names of Moe's patrons, it makes no sense for Homer to be included more than once. 

In [None]:
#Trouble maker list, extracted from Springfield police records, in chronological order
TODO: fix variable name
ignoble_citizens = ['Homer', 'Barney', 'Moe', 'Dr. Nick', 'Homer', 'Homer']
ignoble_citizens

In [None]:
simplified_list = set(ignoble_citizens)
simplified_list

In [None]:
moes_customers = {'Homer', 'Barney', 'Carl', 'Lenny'} # set(['Homer', 'Barney', 'Carl', 'Lenny'])
moes_customers

#### Set operations: `union`, `intersection` and friends

In [None]:
marketing_campaign_customers = set(['homer', 'apu', 'lisa', 'monty', 'maggie'])
adult_customers = set(['abe', 'seymore', 'monty', 'homer', 'apu'])
senior_customers = set(['abe', 'monty'])

![image.png](attachment:8f0ebb7e-d7c9-4109-8a18-bb052cc8f006.png)

In [None]:
marketing_campaign_customers

In [None]:
adult_customers

* https://www.classtools.net/Venn/

Which customers appear in either set (`union`)

In [None]:
marketing_campaign_customers.union(adult_customers)

In [None]:
marketing_campaign_customers | adult_customers 

Which customers show up in both sets (`intersection`)

In [None]:
marketing_campaign_customers.intersection(adult_customers)

In [None]:
marketing_campaign_customers & adult_customers

Which customers are in one set but not the other? (`difference`)

In [None]:
marketing_campaign_customers.difference(adult_customers)

In [None]:
marketing_campaign_customers - adult_customers

In [None]:
adult_customers.difference(marketing_campaign_customers)

In [None]:
adult_customers - marketing_campaign_customers

TODO:
Do the sets have _any_ members in common? (`isdisjoint`)

In [None]:
marketing_campaign_customers.isdisjoint(adult_customers)

Are 'senior' customers also considered 'older' customers? (`issubset`, `issuperset`, proper sub/super sets)

In [None]:
senior_customers.issubset(adult_customers)

In [None]:
senior_customers <= adult_customers

##### Methods on sets

In [None]:
show_methods(set())

### Common operations

Python (as well as most other programming languages) provide functions which work across data structures. Programmers can accomplish similar tasks using the same syntax.

In [None]:
list1 = [1,2,3,4,5]
dict1 = {1:"one", 2:"two", 3:"three", 4:"four", 5:"five"}
tup1  = (1, 2, 3, 4, 5)
set1  = set(list1)
str1  = "hello"

In [None]:
list2 = []
dict2 = {}
tup2  = ()
set2  = set()
str2  = ""

Size of data structures

In [None]:
len(list1), len(dict1), len(tup1), len(set1), len(str1)

Get value at an index or a keyword

In [None]:
list1[2], dict1[2], tup1[2], str1[2] #, set1[2]

Check if an item exists

In [None]:
2 in list1

In [None]:
2 in dict1

In [None]:
"four" in dict1

In [None]:
2 in tup1

In [None]:
2 in set1

In [None]:
'l' in str1

Remove an item

In [None]:
list1

In [None]:
del list1[2]

In [None]:
list1

In [None]:
dict1

In [None]:
del dict1[2]

In [None]:
dict1

Strings and tupples are _immutable_, hence, can't be changed.

In [None]:
# del str1[2]
# del tup1[2]

Set elements can't be access via an index location (since they have no order)

In [None]:
set1

In [None]:
# del set1[2]
set1.remove(2)

In [None]:
set1

Modify an item

In [None]:
list1

In [None]:
list1[1] = 45

In [None]:
list1

In [None]:
dict1

In [None]:
dict1[1] = 45

In [None]:
dict1

As seen earlier, strings and tuples are immutable and sets don't have index locations

See the **truthiness** section below for more examples

## Control flow: if/else and loops

Programs which never make any decisions and never execute a line of code more than once are not very interesting. It can be argued that what sets a programming language apart from a calculator is its ability to jump over some lines (using if/else statements) or operate on some value a number of times (loops)

### Loops

Once you have seen lists, loops are an obvious next step. Given a list of things, you need to _go through them, one by one and operate on each item_. In Python, iterating through a list of items occurs via `for` loops.

Another usecase of loops is to continue doing something until a condition is met. For example, _while_ user input is less than some threshold, continue the program. In Python, `while` loops are best used for such scenarios. 

As novice data scientists, we mostly deal with for loops. We will see while loops in a more detail lecture in the future.

For example, given a list of items, how would you capitalize each entry?

In [None]:
for name in ['homer', 'marge', 'bart', 'lisa', 'maggie']:
    print(name.capitalize())

print("Done with the loop")

What you just saw is a _for loop_. Since the list above has 5 entries, the two lines above execute 5 times.

The syntax looks like this:

```python
for VARIABLE in LIST:
   EXECUTE_SOME_STATEMENT
   EXECUTE_SOME_OTHER_STATEMENT
   ...
```

Some important things to notice:
* The first line of the loop ands in the ':' character
* All _indented_ lines immediately below the loop are considered part of the loop.
* The value of _VARIABLE_ changes with each _iteration_ of the loop

![](images/loop_diagram.png)

**Exercise** Take a look at programs from notebook _first_programs_ and find all the loops

**Exercise** Describe the output of the code below before executing it (in a later lecture, we will see cleaner method of combining text and variables):

In [None]:
"this sentence has a few words in it"

In [None]:
"this sentence has a few words in it".split(" ")

In [None]:
for word in "this sentence has a few words in it".split(" "):
    print("Length of the word", word,"is", len(word))

**Nested loops**

In [None]:
for i in ["Hello", "Bonjour"]:
    print(i,"Homer")

In [None]:
for i in ["Hello", "Bonjour"]:
    for j in ["Homer", "Marge"]:
        print(i, j)

**Example** Create all combinations of clothes in your closet

In [None]:
for c in ['red', 'blue', 'purple']:
    for d in ['pants', 'shirt']:
        print("You can wear", c, d)

#### `range`
`range` provides a way to iterate a specific number or times

In [None]:
for i in range(4):
    print("hello")

In [None]:
range(20)

In [None]:
range(0, 20)

In [None]:
list(range(20))

In [None]:
list(range(5,20))

In [None]:
list(range(5,20,2))

#### `enumerate`
`enumerate` provides a way to iterate over a list, _as well as its index_

In [None]:
for  value in ['homer', 'marge', 'bart', 'lisa', 'maggie']:
    print( value)

In [None]:
for index, value in enumerate(['homer', 'marge', 'bart', 'lisa', 'maggie']):
    print(index, value)

#### `zip` provides a way to combine two or more lists

In [None]:
names = ['homer', 'marge', 'bart', 'lisa', 'maggie']
ages  = [34, 32, 10, 8, 1]


In [None]:
for i, n in enumerate(names):
    print(n, ages[i])

In [None]:
list(zip(names, ages))

In [None]:
for name, age in zip(names, ages):
    print(name, age)

In [None]:
dict(zip(names, ages))

In [None]:
list(zip(names, names[1:]))

**Exercise** Given the list below, square every number. In a loop, print the number and the square.

In [None]:
%%postcell exercise_025_100_j

numbers = [23, 43, 67, 1, 7, 9, 3]

#Type loop code here

#### `break`

In the examples above, you processed each item in the list. However, there are times when you want to exit your loops early. For example, if you were confirming the existance of an element in a list, once you found it, there is no reason to continue!

**Example** Is Lisa in the list below?

In [None]:
person_to_find = "ned"
some_list = ['homer', 'marge', 'bart', 'lisa', 'maggie']

found_her = False
for name in some_list:
    if name == person_to_find: 
        found_her = True
        break

print(person_to_find, "was found:", found_her)

Note that there are better ways of finding elements in a list, such as the `index` method.

If you are using multiple loops, the `break` statement only exists the loop which contains the `break` statement

In [None]:
outer_list = [1,2,3,4,5]
inner_list = ['a', 'b', 'c', 'd', 'e']
for o in outer_list:
    for i in inner_list:
        print(o, i)
        break

#### `continue`

While `break` provides a way to exit out of the current loop, `continue` lets you skip the rest of the loop and continue on to the next iteration. We will see motivation for `continue` in an assignment.

In [None]:
some_list = ['homer', 'marge', 'bart', 'lisa', 'maggie']

found_her = False
for name in some_list:
    if name == 'lisa': 
        found_her = True
        continue
        print("glad we found her!")
    else:
        print(f"Didn't find her yet, found {name} instead")

found_her

A more interesting and realistic example of `continue`

In [None]:
data_file_location = "../../datasets/deaths-in-gameofthrones/game-of-thrones-deaths-data.csv"

jon = 0 #variable containing Jon's score
arya = 0 #variable containing Arya's score

with open(data_file_location, 'r', encoding='utf8') as file:
    #Go through each line in file
    for i, line in enumerate(file):
      if i ==0: continue #<==============We should have skipped the first line to avoid reading column names!
      tokens = line.split(',') #separate line into columns
      if tokens[4] == "Arya Stark": arya = arya + 1
      if tokens[4] == "Jon Snow": 
        jon = jon + 1

print("Arya killed", arya, "people")
print("Jon killed", jon, "people")

Try this in pythontutor.com:

In [None]:
outer_list = [1,2,3,4]
inner_list = ['a', 'b', 'c', 'd']
for o in outer_list:
    for i in inner_list:
        print(o, i)
        continue
        #break
        print(o, i, "again")

Although not as common in data science as they are in software engineering, `while`  loops are also useful:

`while` loop allows you to continue the loop as long as something is true.

**Example** Run this loop as long as the guesses are below 10

In [None]:
a = 0
while a < 10:
    a = int(input())
    print("Just tried",str(a))

### Iterables

A Python object, closely related to loops and lists is an iterable. Think of iterables as objects which can be looped over. You have seen that we can loop over lists and files. Lists and files are clearly two completely differnt things. However, they both provide the concept if `next()`. A file can thought of as an object which returns a new line everytime the `next()` function is called. A list can do the same. They don't have to represent the same concept, but they can be similar just because we can _iterate_ over them.

There are many other items in Python which one can iterate over. A good example is the well used `range` function. See this example:

In [None]:
import sys

In [None]:
sys.getsizeof([1])

In [None]:
sys.getsizeof([1,2,3,4,5])

In [None]:
sys.getsizeof([1,2,3,4,5,6,7,8,9,10])

In [None]:
sys.getsizeof(range(100000000000000000000000000000000000000000000000000))

The `range` function is creating an absoloutely giant list! It should blow up our computer, instead it shows that its size, in memory, is only 48 bytes?

That's because `range()` doesn't actually create a giant list. It only creates an iterator. Everytime the loop calls `next()` on it, it takes the current number, adds one to it and returns that number. No need to maintain a giant list!

This is also why you can loop through a file which is 10 times larger than your memory. The for loop isn't actually creating a list, then iterating through it. It is ready the data _on demand_ or in a _lazy_ fashion.

Note that the `next()` function I've mentioned a few times is actually called `__next__`, since it is generally not supposed to be used by human programmers. It is used by Python code internally, such as loops.

### Conditional statements: if/else

Even non-techy folks seem to know about 'if/else' statements. Let's see how to use them in Python.

In [None]:
trouble_maker = 'bart'

simpsons = ['homer', 'marge', 'bart', 'lisa', 'maggie']

for s in simpsons:
    print("Hello", s.capitalize(), ", welcome to our establishment!")
    if s == trouble_maker:
        print("...please behave")

Sometimes you want to use an explicit `else` clause:

In [None]:
gpa = 3.5

if gpa > 3.0:
    print("Thank you for being a good student")
else:
    print("If you need help, please ask. I won't mind")

In other languages, you need to nest `if`/`else` clauses for more complicated logic

In [None]:
gpa = 2.0

if gpa > 0 and gpa < 2:
    print("I'm afraid we need to talk")
else: 
    if gpa >= 2 and gpa < 3:  #<===== What if GPA was 2.0, but we had used '>' instead of '>="?
        print("If you need help, please ask. I won't mind")
    else: #<===== What if GPA is negative?
        print("Thank you for being a good student")

Python way

In [None]:
gpa = 2.0

if gpa > 0 and gpa < 2:
    print("I'm afraid we need to talk")
elif gpa >= 2 and gpa < 3:  #<===== What if GPA was 2.0, but we had used '>' instead of '>="?
        print("If you need help, please ask. I won't mind")
else: #<===== What if GPA is negative?
        print("Thank you for being a good student")

The general syntax of a conditonal is:

```python
if BOOLEAN_VALUE:
  Excute if BOOLEAN_VALUE is true
elif BOOLEAN_VALUE2:
  Excute if BOOLEAN_VALUE2 is true
elfif...
else:
  Excute if none of the above boolean values was true
```

![](images/ifelse_diagram.png)

* Keep in mind that `elif` and `else` statements are optional
* Indentation is necessary
* You can execute as many statements after an if/elif/else clause as you like

**Exercise** Go through the list below, print every number which has a square above 50

In [None]:
%%postcell exercise_025_100_k

numbers = [23, 43, 67, 1, 7, 9, 3]

#Type loop and if/else code here

### Truthiness

In `if` conditions and loops, `True` and `False` (or functions which produce them) are not the only valid conditions. Python allows you to use other objects which make the code very convenient.

As a general rule, any 'zero' values, empty containers (including empty strings) or None are equal to `False` in `if` conditions and loops.

In [None]:
emptylist = []
notemptylist = [1,2,3]

In [None]:
type(emptylist)

In [None]:
if len(emptylist) > 0:
    print('this list is not empty')
else:
    print('this list is empty')

In [None]:
if emptylist: 
    print('this list is not empty')
else:
    print('this list is empty')

In [None]:
homer_bank_balance = 0

In [None]:
type(homer_bank_balance)

In [None]:
if homer_bank_balance:
    print('Homer is not yet broke')
else:
    print('Homer is broke')

In [None]:
filecontents = ''

In [None]:
if filecontents:
    print('File is not empty')
else:
    print('File is empty')

**Exercise** Find all instances of a conditional statement in notebook "first_programs"

**Homework**
1. Please list all locations in the filegame-of-thrones-deaths-data.csv
2. Please list all allegiances in the same file (second to last column)
3. Please show the number of killings per season.
4. Print out this cheatsheet and keep it handy: https://perso.limsi.fr/pointal/_media/python:cours:mementopython3-english.pdf
5. Book mark this page and start to go through it (you may ignore, for now, things you don't understand): https://learnxinyminutes.com/docs/python3/


**Reference**

Necklace image retrieved from https://www.designsbyleigha.com/name-necklace-snake.html

Diagram editor: https://mermaidjs.github.io/mermaid-live-editor