<a href="https://colab.research.google.com/github/franzruch/rrf24_training_ulli/blob/main/1-foundations/1-types-and-syntax/foundations-s1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DEC Foundations to Python - Session 1
# Variable types and Python syntax



# S0.0 - Introduction

## S0.1 - The building blocks of Python

#### The Atoms of Python
In this session we will cover the basic building blocks that is what everything in Python is made of. **Think of these building blocks as the atoms of Python**.

There are 5 types of atoms in Python you are likely to ever interact with as a Data Scientist.

<img src=https://upload.wikimedia.org/wikipedia/commons/6/6f/Stylised_atom_with_three_Bohr_model_orbits_and_stylised_nucleus.svg width="200">

#### The Containers of Atoms of Python
To organize and give structure to these atoms, there are 4 types of containers they can be stored in. Without these Python would just be a soup of atoms. Think of these as the chemical bonds between atoms in Python.

#### The Molecules of Python
These atoms and containers are combined into something called objects. **We can think of these objects as molecules**.

Object can be simple or very complex. But no matter how complex something you ever encounter in Python is, it can always be traced back to a combination of the atoms and their containers.

With these atoms and molecules, we can make everything from databases, machine learning algorithms, natural language projects or whatever you will end up using Python for.

<img src=https://upload.wikimedia.org/wikipedia/commons/e/e8/Sucrose_molecule_3d_model.png width="300">


## S0.2 - Do I really need to care about these building blocks?

Maybe you are thinking right now:

"**_I'm am not super-techy and never intend to develop my own custom data structures. I just want to use Python for some cool data science. So why is this person talking about inner fabric of Python and CHEMISTRY!?!_**".

If you load some data into a dataset in Python, it will be stored in an object. While you do not need to understand the chemical make-up an object to use with it, it will expect that you to provide inputs using atoms and containers when you want to modify it, analyze the data in it etc.

And after any operation it almost always returns the result in terms of a Python atom or container of atoms that you need to know how to identify and handle. But identifying and handling these atoms and containers does not require that you are an expert in them.

---

#### The scope of this session
This session will show how to identify the 5 atoms and the 4 containers, and the basics of how to interact with them. But we will focus most of time to 3 of the atoms and 2 of the containers.

After this session you will know enough to be able to use them in relation to other objects in Python.

You will see how we will come back to these basic types when we in the following sessions interact with Python molecules developed for us by other users.

---

## S0.3 - Google Colab

<img src=https://miro.medium.com/max/986/1*pimj8lXWwZnqLs2xVCV2Aw.png width="500">

Click this link to open the file you are currently viewing in Google Colab: https://colab.research.google.com/github/worldbank/dec-python-course/blob/main/1-foundations/1-types-and-syntax/foundations-s1.ipynb.

This will open an exact copy of this file in Google Colab. Since it is a copy, you can make edits in it without it affecting anyone else file. Through this course we expect you to always open the file for each session in Colab and follow along.

### What is Colab?

* It's like Google Docs but for Python code
* Requires no installing of Python itself or common libraries (add-ons)
* Runs on a Google server. Any files you save in Colab are saved in your Google Drive - so a very bad place for sensitive data
* Unfortunately, you need to be logged in to a Google account to run code on Google Colab.

**Do not use Google Colab for any non-public data**

---

### How to run code in Colab

Jupyter Notebook and Colab is organized in cells. A cell can either be code or text. The only purpose of text cells is to provide information to a human reader. This information can be a few comments to the code, or a full research paper. You can format this text using [markdown](https://commonmark.org/help).

Code cells is where you write your Python code. Next to each code cell there is a play button. You can run the code by either click the 'play' icon or select the cell and hit `CTRL-ENTER` on your keyboard.

Try running the cell below that says `2 + 2`.

In [4]:
2 + 14

16

### What to use for non-public data?

There are alternatives to Google Colab that organize text and Python code in blocks. These are called notebooks.

**Jupyter Notebooks**. The most common tool to run Python code in notebooks on your own computer is called _Jupyter Notebooks_. You can install _Jupyter Notebooks_ on your computer where you read data and other files directly from your computer such that no files needs to be shared over a server owned by Google.

On WB computers, the consensus seems to be that the easiest way to install and use Python is by requesting ITS to install Anaconda (https://www.anaconda.com/) for you or by installing Anaconda from the software center app.

**Databricks**. If you want to have a collaborative space in the cloud that is still approved at the WB for non-public data you can use Databricks. Databricks is also a notebook-styled Python interface. An instance in Databricks can be made more computatonally powerful than what you will ever need.

In the following sessions you will be given the option to open the sessions in a WB hosted databricks session where you can share non-public data.

# S1.0 - Variables

So atoms, containers and objects are the types of data or information we can have. But we need a way to identify each piece of data or information. For this we use variables.

All variables consist of three things:

* The name of the variable so it can be uniquely identified and accessed
* The "*type*" of the information stored (atom, container or object)
* The information/data that the variable holds

Variables are only stored in temporary memory,
so when restarting Python,
you need to recreate them by running your code again.

(**For Stata users:** In Stata,
"variable" always means a column in a dataset.
Variables in Python behave more like a `local` in Stata.)

In [5]:
# Create variables with the name hw and number
text_variable = 'Hello World'
number_variable = 42

In [6]:
# Access the variables with the name hw and number
# and then print the information they store
print(text_variable)
print(number_variable)

Hello World
42


Where is a variable saved? Variables are only stored in the RAM memory. RAM memory is very fast, but is cleared each time you restart python. So this memory is only for work-in-progress variables.

If you need to save data in a variable to a file then you need to save to disk memory. This allows the data to be accessed by other programs and the data will still be there next time you start Python.

This is the same no matter if you use Google Colab on a Google server, Jupyter Notebooks on your computer, or Databricks on a World Bank server. We will cover how to save to disk memory later.

# S2.0 The basic data types

These 5 basic types are the types you are ever likely to use:

| Class name | Full name      | Name used       | Usage                             |
|:---        |:---            |:---             | :---                              |
| int        | Integer        | "int"/"integer" | Number without decimal point      |
| float      | Floating point | "float"         | Number with decimal point         |
| str        | String         | "string"        | Text                              |
| bool       | Boolean        | "boolean"       | Either true or false              |
| none       | None           | "none"          | An explicit way of saying nothing |

Any information in Python you will ever interact with is a combination of these types.
This is similar to how tiny simple atoms in real life
can be combined to the most wonderful complex life forms.
This is why we in this training refer to
**the basic data types as the _atoms of Python_.**

## S2.1 Numeric variables

**Define a numeric variable:**

In [7]:
# Assign the value 6 to a variable we name x
x = 6

Now somewhere in memory there is a variable with the name `x` that currently stores the value 6.

We can reference this variable until we explicitly delete it or restart our Python session.

In [8]:
# We can output the value by calling it
x

6

**Ex. 1a:** (example excercise - do together)

In [12]:
# Create a variable called ex1_x and set it to the value 5

ex1_x = 4

# === Do not modify code below ===
# this is a way to check that the value is defined.

assert ex1_x == 5



AssertionError: 

**Do math using a variable:**

In [13]:
# Take the value in x and output that value plus 1
x + 1

7

In [14]:
# The value of x is still 6
x

6

In [15]:
# To update the variable x we need to overwrite it with a new value
# Assign x + 1 to x and output it
x = x + 1  # NOTE: this OVERWRITES the variable x
x

7

Note that we can only output a variable if it is by itself on the last line in a cell. We will soon learn how to _print_ a variable where this is not the case and where we have more options.

---
**Important error message: NameError**

Whenever you see an error where it says "not defined", as in `NameError: name 'z' is not defined`, then it means that you have tried to reference a variable `z` but that there is no variable with that name.

In [16]:
x = z + 4

NameError: name 'z' is not defined

**More math and using multiple variables:**

In [17]:
# Reset x to 6
x = 6

In [18]:
# Define a second variable - this time with a longer name
my_long_variable_name = 2

In [19]:
# Adding two variables together
x + my_long_variable_name

8

In [20]:
# Subtracting x from my_long_variable_name
x - my_long_variable_name

4

In [22]:
# Multiplying x with my_long_variable_name
x * my_long_variable_name

%whos

Variable                Type    Data/Info
-----------------------------------------
ex1_x                   int     4
my_long_variable_name   int     2
number_variable         int     42
text_variable           str     Hello World
x                       int     6


Here is a table of the most common mathematical operators:

| Symbol | Operation      | Example     | Definition |
|:---:   |:---            |:---         |:------ |
| +      | Addition       | 6+2 = 8   | |
| -      | Subtraction    | 6-2 = 4   | |
| *      | Multiplication | 6*2 = 12  | |
| /      | Division       | 6/2 = 3   | |
| **     | Power of       | 6**2 = 36 | |
| %      | Modulus        | 6%2 = 0 , 6%4 = 2 | Remainder integer in a division |
| //     | Floor division | 6//2 = 3 , 6//4 = 1 | Integer part of a division result |

See full list of mathematical operators here: https://www.w3schools.com/python/python_operators.asp

---

If we want to save the result of a mathematical operation we need to store it in a variable. Either in a new variable or by overwriting an existing one.

Only variables left of the assignment operator `=` are modified. If there is no `=` then no variable is modified from a mathematical operator.

In [23]:
# Create a new variable that is x multiplied by my_long_variable_name
y = x * my_long_variable_name

# Create a new variable that is the sum of x and my_long_variable_name
z = x + my_long_variable_name

If we want to print multiple variables in the same cell we need to use `print()`

In [24]:
# Print the variables one at the time
print(x)
print(my_long_variable_name)
print(y)
print(z)

6
2
12
8


In [25]:
# Only the last line is outputted
x
my_long_variable_name
y
z

8

In [26]:
# Print all variables at on the same line
print(x, my_long_variable_name, y, z)

6 2 12 8


In [27]:
# You can also print the results of an operation
print(12 * 89)
print(y - 20)

1068
-8


In [28]:
# You can combine printing and output
print(y - 20)
5 ** 3

-8


125

Since incrementing a variable with a value, such as in `x = x + 1`, is such a common action, there is a short hand for it that is `x += 1`

See what other operators you can use like this here: https://www.w3schools.com/python/python_operators.asp

In [29]:
count_matches = 5
count_matches += 2 # This is identical to: "count_matches = count_matches + 2"
print(count_matches)

7


**Ex. 2a** (do ex 2a, 2b and 2c independently ~ 5 min)

In [30]:
# Create two variables ex2_x and ex2_y. Set ex2_x to 3 and ex2_y to 5.

### ADD YOUR CODE HERE
ex2_x = 3
ex2_y = 5

# === Do not modify code below ===
assert ex2_x == 3 and ex2_y == 5

**Ex. 2b**

In [33]:
# Multiply ex2_x with ex2_y and save the result in a new variable ex2_z

ex2_z = ex2_x*ex2_y

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex2_z == 15

**Ex. 2c**

In [34]:
# Update the variable ex2_z by subtracting ex2_x from it
# (Hint: re-run the cells above if/when needed)

ex2_z -= ex2_x

# === Do not modify code below ===
assert ex2_z == 12

## S2.2 Two basic types of numeric variables

There are two types of numeric basic data types:

| Class name | Full name      | Name used       | Usage                        |
|:---        |:---            |:---             | :---                         |
| int        | Integer        | "int"/"integer" | Number without decimal point |
| float      | Floating point | "float"         | Number with decimal point    |


`int` is more memory efficient but cannot store decimal points.
Python will pick `int` for you
unless your variable must be a `float` to store your data without information loss.

Read more about `int` and `float` here: https://www.w3schools.com/python/python_numbers.asp

---

You can test which type your numeric variable using `type()`

In [35]:
# Numeric variables assigned a number WITHOUT decimal point are created as an int
x = 3
type(x)

int

In [36]:
# Numeric variables assigned a number WITH decimal points are created as a float
pi = 3.141592
print(type(pi))

<class 'float'>


In [37]:
# Python automatically assignes the appropriate type
diameter = 10
print(diameter, type(diameter))

#The result of division is always a float
radius = diameter / 2
print(radius, type(radius))

10 <class 'int'>
5.0 <class 'float'>


In [None]:
#The result of an operation with float and an int is always a float
radius = 5
print(radius, type(radius))

circumference = pi * (radius)
print(circumference, type(circumference))

In [38]:
# you can force a float to be an int - it rounds down the closest int
# NOTE: This leads to information loss about the decimal points
y = int(7.25)
print(y, type(y))

7 <class 'int'>


In [None]:
# This is not rounding, it just takes the integer part of the float
# and drops the decimal - rounding exists but its not int()
z = int(7.99999)
print(y, type(y))

In [39]:
# Python changes the type if needed
salary = 17
print(salary, type(salary))

#increase for inflation
salary = salary * 1.05
print(salary, type(salary))

17 <class 'int'>
17.85 <class 'float'>


**Ex. 3a** (do ex 3a, 3b and 3c independently ~ 5 min)

_Hint:_ Mathematical operators: https://www.w3schools.com/python/python_operators.asp

In [40]:
# Create a variable ex3_x that is a 13 to the power of 12

### ADD YOUR CODE HERE

ex3_x = 13**12

# === Do not modify code below ===
assert ex3_x == 23298085122481

**Ex. 3b**

In [41]:
# Create a variable ex3_y that is
# the remainder when dividing ex3_x with 17

ex3_y = ex3_x%17

# === Do not modify code below ===
assert ex3_y == 1

**Ex. 3c**

In [43]:
# Create a variable ex3_z that is a float with the value three
# (The solution has not been mentioned explicitly)

### ADD YOUR CODE HERE

ex3_z = 3.0
print(type(ex3_z))

# === Do not modify code below ===
assert hash(ex3_z) == 3 and type(ex3_z) is float

<class 'float'>


## S2.3 Text variables - basic type: string



There is only one basic data type for text, and it is called "string".

| Class name | Full name      | Name used       | Usage                        |
|:---        |:---            |:---             | :---                         |
| str        | String         | "string"        | Text                         |

The text in a string could be anything from a single letter or word, to a full-length text like an essay.

---

**Define a string variable**

In [44]:
# Assign the text Hello World! to both variable a and b

# We can use either " or ' to tell where the text starts and ends so python does not confuse it for code
a = "Hello world!"
b = 'Hello world!'

print(a, type(a))
print(b, type(b))

Hello world! <class 'str'>
Hello world! <class 'str'>


We must use either `""` or `''` for each string, we cannot mix. It only rarely matters which one we use.

In [45]:
# We can use either "" when the text includes one or several '
a = "Strings are Python's way to store text"

# We can use either '' when the text includes one or several "
b = 'Python is the "bestest" programming language'

print(a, type(a))
print(b, type(b))

Strings are Python's way to store text <class 'str'>
Python is the "bestest" programming language <class 'str'>


**Simple string operations:**

Some math operators work on strings as well

In [46]:
a = 'hello'
b = 'world'

# Addition and multiplication work on strings (but not subtraction and division)
c = a + ' ' + b + '!'
d = a * 3

print(c, type(c))
print(d, type(d))

hello world! <class 'str'>
hellohellohello <class 'str'>


**Ex. 4a** (do ex 4a, 4b independently ~ 10 min)

In [50]:
# Create a variable ex4_x that is a string with the word World
# and a variable ex4_y that is a string with the word Bank

ex4_x = 'World'
ex4_y = 'Bank'

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex4_x == 'World' and ex4_y == 'Bank'

**Ex. 4b**

In [52]:
# Create a variable ex4_z that use ex4_x and ex4_y
# to create the word World Bank

### ADD YOUR CODE HERE

ex4_z = ex4_x + ' ' +ex4_y

print(ex4_z)

# === Do not modify code below ===
assert ex4_z == 'World Bank'

World Bank


## S2.4 Methods

Atoms, containers and objects can all hold different types of information.
But Python wouldn't be that useful if we cannot do anything to the information they hold.

Each type of atom, container and object comes with actions design for that type.
Such **actions specific to a type are called _methods_**.

_All variables consist of three things:_

* The name of the variable so it can be uniquely identified and accessed
* The "*type*" - what type of atom, container or object
    * What data that type can store
    * What methods (if any) that this type comes with
* The information/data that the variable actually stores

A method is applied to the data in a variable like this `x.method()`.
A method can be something simple like making a `str` upper case,
or something extremely advanced as running a machine learning module.

---

**String methods:**

Strings is the only Python atom with methods. Upper/lower case, replace letters, remove excessive spaces etc.

You can read about all string methods here: https://www.w3schools.com/python/python_ref_string.asp

In [53]:
# Define a new string
a = 'Hello world!'
print(a, type(a))

Hello world! <class 'str'>


In [54]:
# Print the result of the method directly
print(a.upper())

HELLO WORLD!


In [55]:
# Store the results of upper() in new variable and then print
a_upper = a.upper()
print(a_upper,type(a_upper))

HELLO WORLD! <class 'str'>


In [57]:
# Lower case
a.lower()
print(a,type(a)) # Why is there still a capital "H" in the output? We are printing a

Hello world! <class 'str'>


Some methods take arguments in their parentheses:

In [61]:
# Replace letters in a string
a_all_i = a.replace('o', 'i')
a_one_i = a.replace('o', 'i', 1)

print('a_all_i:',a_all_i,type(a_all_i))
print('a_one_i:',a_one_i,type(a_one_i))

a_all_i: Helli wirld! <class 'str'>
a_one_i: Helli world! <class 'str'>


---

Methods are different from operators (`+`, `-`) that we used with numbers.
In addition to methods, each type can have support for these operators.

`int` and `float` does not have any methods, they only have support for operators.
`str` has method and support for some operators.

When you are concatenating strings (combining) you can use the `+` operator.


In [62]:
name = "Frodo Baggins"
age = 51

In [63]:
str_concat = "His name is " + name + " and his age is " + str(age) + "."
print(str_concat)

His name is Frodo Baggins and his age is 51.


---

We can also use the `.format()` method to achieve this.

In this example we have the string `"His name is {} and his age is {}."`
and we are using the `.format()` method to populate the two `{}` placeholders.

This method is designed to identify an `int` and how to turn the `int` into
a `str` without you having to think about it.

In [64]:
str_format = "His name is {} and his age is {}.".format(name,age)
print(str_format)

His name is Frodo Baggins and his age is 51.


---

The `.format()` works for shorter strings, but for longer strings and
paragraphs of text where only a few words should be dynamically populated
the better option is an `f-string`.

We won't cover what an `f-string` is,
but all you need to know is how to recognize it.

In [65]:
str_fstr = f"His name is {name} and his age is {age}."
print(str_fstr)

His name is Frodo Baggins and his age is 51.


**Important error message: AttributeError**

Whenever you see an error where it says "has no attribute", as in `AttributeError: 'int' object has no attribute 'upper'`, then it means that the type `int` does not have a method or attribute called `upper`.

Attribute is something similar to a method but attributes only return some meta data about a variable, and is not able to change the data in the variable.

If you get an AttributeError, test if you have misspelled the method/attribute or if the variable is of a different type than you expected. Below we get this error as we are using a `str` method on an `int` type variable.

In [None]:
x = 4
x = x.upper()

**Ex. 5a** (do ex 5a and 5b independently ~ 10 min)

In [66]:
# Use a string method on the variable p already provided,
# to create a variable ex5_x with the string "PYTHON"

p = 'Python'

ex5_x = p.upper()

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex5_x == 'PYTHON'

**Ex. 5b**

_Hint_: https://www.w3schools.com/python/python_ref_string.asp

In [68]:
# Use the variable ex5_x from ex 5a.
# Use a method to create the string "Python"
# (The solution has not been mentioned explicitly)

### ADD YOUR CODE HERE

ex5_y = ex5_x.lower().replace('p','P')
ex5_y2 = ex5_x.capitalize()
print(ex5_y2)

# === Do not modify code below ===
assert ex5_y == 'Python'

Python


## S2.5 True/False variables - basic type: boolean

| Class name | Full name      | Name used       | Usage                        |
|:---        |:---            |:---             | :---                         |
| bool       | Boolean        | "boolean"       | Either true or false         |

Boolean is another atom in Python. Session 2 will discuss common usages of them. This session only covers how to identify them as you will see them often in Python.

### Examples of usage of booleans

* **Method responses**: So far we have only used string methods that have returned new strings. `upper()`, `lower()`, `replace()` etc. Many methods returns booleans instead, such as `isnumeric()`, `islower()` etc.
* **If-conditions**: They are excellent to use to control if-else conditions.

---



In [69]:
# Generate a boolean
a = True
b = False

# Print the variables
print('Variable a:', a, type(a))
print('Variable b:', b, type(b))

Variable a: True <class 'bool'>
Variable b: False <class 'bool'>


In [70]:
# Get a boolean from a method

# Create a variable that is a string of a number
c = "42"
d = c.isnumeric()

# Print the variables
print('Variable c:', c, type(c))
print('Variable d:', d, type(d))

Variable c: 42 <class 'str'>
Variable d: True <class 'bool'>


You can also generate boolean by using logical operators:

In [71]:
3 > 1

True

In [72]:
3 < 2

False

In [73]:
x = 10
x == 10 # note the double equal sign, denoting a comparison

True

In [74]:
y = 12
y != 12 # means "different than"

False

## S2.6 The None type - a variable exists but it contains nothing

| Class name | Full name      | Name used       | Usage                             |
|:---        |:---            |:---             | :---                              |
| none       | None           | "none"          | An explicit way of saying nothing |

Sometimes we want to have a variable even if that variable is empty.
This will prevent `NameError` due to a variable not existing, even when there is no information to store in that variable.

In [75]:
name = "Frodo Baggins"
age = 51

print(f"{name} (age {age}) is employed by {employer}")

NameError: name 'employer' is not defined

In [76]:
# Frodo is unemployed
employer = None
print("employer",employer,type(employer))
print(f"{name} (age {age}) is employed by {employer}")

# Frodo gets a job at the World Bank
employer = "World Bank"
print('\n')
print("employer",employer,type(employer))
print(f"{name} (age {age}) is employed by {employer}")

employer None <class 'NoneType'>
Frodo Baggins (age 51) is employed by None


employer World Bank <class 'str'>
Frodo Baggins (age 51) is employed by World Bank


## S2.7 Basic data types summary

* The only way to store data in Python
* Stored in a variable with a name and type
* Which operations (`+`, `-`, etc..) or methods (`.upper()`) you can use depends on the type

**Important errors**

| Error name         | Likely reason for the error |
|:---                |:---                     |
| **NameError**      | You have a typo when referencing a variable or you try to reference a variable before it is created |
| **AttributeError** | You have used a method or an attribute on a type where that method or attributed does not exist |



## S3.0 Functions and summary of operators and methods

### S3.1 Functions

Python has some built in functions. Functions are like methods in that they're an action. But while methods are added to a variable, like `x.method()`, you use a function directly in your code.

You have already seen the functions `print()` and `type()`.
They accept any type of variable. Some functions do not work on every type.

In [77]:
#Define two strings and one int
str_a = "Frodo"
str_b = "Gandalf"
int_a = 51

# Start by printing them
print('str_a:',str_a,type(str_a))
print('str_b:',str_b,type(str_b))
print('int_a:',int_a,type(int_a))

str_a: Frodo <class 'str'>
str_b: Gandalf <class 'str'>
int_a: 51 <class 'int'>


The function `len()` is used to get the length of a variable.
For a string that is the number of characters.

In [78]:
print('Number of characters in',str_a,":", len(str_a))
print('Number of characters in',str_b,":", len(str_b))

Number of characters in Frodo : 5
Number of characters in Gandalf : 7


What do you think that the length of an `int` is?

In [79]:
print('The length of the int',int_a,"is:",len(int_a))

TypeError: object of type 'int' has no len()

### S3.2 Summary of the types of "actions" you can take on Python data

| Action name | Examples      | Description    | Documentation |
|:---         |:---           |:---            |:---           |
| Operator    | `+`,`-` etc.  | Only used for basic actions. Most common to use with numeric types but works in some cases on other types as well. | https://www.w3schools.com/python/python_operators.asp |
| Method      | `x.method()`  | Always specific to a type (does not imply unique to a type). Used to interact with, modify or analyze the data in the variable. | Read the documentation for each  type of atom, container or object.       |
| Function    | `function(x)` | Some built in and it is common to make your own (tomorrow's session) | Built-in functions: https://www.w3schools.com/python/python_ref_functions.asp |

### S3.3 Where to find documentation for each type or object?

For example, what methods does a type have and what do they do?


In [80]:
# Basic information of a function
print?

In [81]:
# If you know the name of method you can use
str.upper?

In [82]:
# A not so user friendly but complete way of listing all methods that a type has
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


In [None]:
# Returns information of a class and all its methods into a single output
help(str)

**But in reality**, most people use this to read the documentation: https://www.google.com/search?q=google+python+str+methods
        
Python is such a widely used language that there is always someone who has written a great guide for what you need to know, and google helps you find it.

# S4.0 - Container types

So far we have only covered the atoms of Python.
We have not yet introduced how you
combine the atoms into more useful molecules.

The basic data types `int`, `float`, `str`, `bool` and `none` can be combined in **the basic container types**.

| Class name | Full name  | Access                      | Occurrence  | Remarks |
|:---        |:---        | :---                        | :---        | :---
| list       | List       | Access items by order       | Common      | Since we access items by order, the order you add items to a list is important |
| dict       | Dictionary | Access items by key         | Common      | Since we access items by key names, the order is not important |
| tuple      | Tuple      | Access items by order       | Less common | Very similar to a list, but when created it cannot be modified |
| set        | Set        | Test if item already in set | Rare        | A container that cannot hold duplicates |

Containers can hold basic data types (atoms) variables
as well as other containers variables.
You can mix data types and container types if need.
Complex variables in Python are created by
nesting many layers of container variables.

`list`s and `dict`s -
we will cover lists and dictionaries properly
as you will create and use them a lot.

`tuples`s and `set`s -
Tuples are often returned from functions and methods
so we will cover how to use them.
We will only briefly cover sets.


## S4.1 Container types - Lists

| Class name | Full name  | Access                      | Occurrence  | Remarks |
|:---        |:---        | :---                        | :---        | :---
| list       | List       | Access items by order       | Common      | Since we access items by order, the order you add items to a list is important |

We can add variables to a list
at the time of creating the list
or we can add variables later.

---

**Create a list:**

In [85]:
# Create a list of ints
list_int = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print(list_int)
type(list_int)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


list

In [84]:
# Create a list of strings
list_str = ['a', 'b', 'c']
print(list_str)

['a', 'b', 'c']


In [86]:
# Create a mixed list
list_mix = [42, 'Arthur', False]
print(list_mix)

[42, 'Arthur', False]


In [87]:
# test the type of a list
type(list_mix)

list

**Access an item in a list**:

We access an item in the list by its order.
For example, the 3rd item, 7th item etc.

However, items are accessed by index,
and in computer science index starts on 0 and not 1.
So the item with index 1 is actually the second item in the list.

In [88]:
# Print list
print('List list_mix:', list_mix)

List list_mix: [42, 'Arthur', False]


In [89]:
# Print first item in the list
print('First item (index 0):' , list_mix[0])
type(list_mix[0])

First item (index 0): 42


int

In [90]:
# Print second item in the list
print('Second item (index 1):', list_mix[1])

Second item (index 1): Arthur


In [91]:
# Print third item in the list
print('Third item (index 2):' , list_mix[2])

Third item (index 2): False


In [92]:
# Access item in list and store in variable
name = list_mix[1]
print('Variable name:', name)

Variable name: Arthur


In [93]:
# Accessing items using the index does not modify the list
print('List list_mix:', list_mix)

List list_mix: [42, 'Arthur', False]


**Access multiple items in a list:**

In [95]:
# Re-create list of ints
list_int = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [96]:
# Get all items between the item with index 0
# up until but not including the item with index 3
# 0 ≤ index < 3
print(list_int[0:3])

[0, 1, 2]


In [97]:
# Index 0 is assumed if the fist number is omitted
# So 0:3 is the same as :3
print(list_int[:3])

[0, 1, 2]


In [100]:
# 5 ≤ index < 7
print(list_int[5:7])

[5, 6]


In [101]:
# 8 ≤ index < infinity
# All remaining items are included if the second number is omitted
print(list_int[8:])

[8, 9]


In [102]:
# (number of items - 3) ≤ index < infinity
print(list_int[-3:])

[7, 8, 9]


In [103]:
# (number of items - 7) ≤ index < (number of items - 2)
print(list_int[-7:-2])
# 3 ≤ index < 8
print(list_int[3:8])

[3, 4, 5, 6, 7]
[3, 4, 5, 6, 7]


**Important error message: IndexError**
    
Whenever you see an error where it says "index out of range", as in `IndexError: list index out of range`, then it means that you have tried to access an item in the list, using an index that is not used in the list.

In [104]:
# IndexError: list index out of range
print(list_int)
print(list_int[10])

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


IndexError: list index out of range

**Ex. 6a** (do ex 6a and 6b independently ~ 5 min)

In [None]:
# From the list digits, in one line of code,
# create the variable ex7_x with the list [0,1,2,3,4]

digits = [0,1,2,3,4,5,6,7,8,9]

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex7_x == [0,1,2,3,4]

**Ex. 6b**

In [None]:
# From the list digits, in one line of code,
# create the variable ex7_y with the list [5,6,7,8]

digits = [0,1,2,3,4,5,6,7,8,9]

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex7_y == [5,6,7,8]

**Edit a list:**

So far, every time we have modified a variable we have used a `=`. For example `x = x + 1` or `name = list_mix[1]`.

Lists have some _in-place_ operator methods, meaning methods that change modify the item itself.

You find a list of more list methods here: https://www.w3schools.com/python/python_ref_list.asp

In [105]:
# Create a list of strs
pets = ['cat','dog']
print('Variable pets:', pets)

# Add one item to the list
# Important: append() modifies the list in-place
pets.append('gold fish')
print('Variable pets:', pets)

Variable pets: ['cat', 'dog']
Variable pets: ['cat', 'dog', 'gold fish']


Note that we did not do: `pets = pets.append('gold fish')`. This would return a `None` type:

In [106]:
# Add another item to the list using in-place .append() and the "=" assign operator
pets_append_return = pets.append('butterfly')
print('Variable pets:', pets, type(pets))
print('Variable pets_append_return:', pets_append_return, type(pets_append_return))

Variable pets: ['cat', 'dog', 'gold fish', 'butterfly'] <class 'list'>
Variable pets_append_return: None <class 'NoneType'>


We can also insert new items in lists with `.insert()`:

In [None]:
# Re-create the list of pets
pets = ['cat', 'dog', 'gold fish', 'butterfly']
print(pets)

# Print item with index 3 in original list
print('Print pet at index 3:', pets[3])

In [None]:
# Add item at index 2
pets.insert(2,'parrot')

# Print item with index 3 again
print('Print pet at index 3:', pets[3])

#Print all pets
print('Variable pets:', pets)

In [107]:
# Modify item with index 1
pets[1] = 'wolf'
print('Variable pets:', pets)

Variable pets: ['cat', 'wolf', 'gold fish', 'butterfly']


In [108]:
#Print initial list
print('Variable pets:', pets)

# Erase item in list by index. Item returned
pet_pop = pets.pop(3)
# Erase item in list by value. Modifies list in-place
pets.remove("cat")

# Print results
print('Variable pet_pop:', pet_pop)
print('Variable pets:', pets)

Variable pets: ['cat', 'wolf', 'gold fish', 'butterfly']
Variable pet_pop: butterfly
Variable pets: ['wolf', 'gold fish']


**Work with lists:**

List addition:

In [109]:
# Create two lists.
odds = [1,3,5,7,9]
evens = [0,2,4,6,8]

# Combine and sort them
all_nums = odds + evens
print('Variable all_nums:', all_nums)

# Sort the list - note that .sort() is an in-place method
all_nums.sort()
print('Variable all_nums:', all_nums)

Variable all_nums: [1, 3, 5, 7, 9, 0, 2, 4, 6, 8]
Variable all_nums: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Nested lists:

In [110]:
# Create a list of lists
l1 = ['a','b','c']
l2 = ['d','e','f']
l3 = ['g','h','i']

# Create the list of list
nested_list = [l1,l2,l3]
print('Variable nested_list:', nested_list, type(nested_list))

Variable nested_list: [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']] <class 'list'>


Concatenated list indexing:

In [112]:
# Access the item "f" in nested_list - multiple lines
nested_lvl1 = nested_list[1]
print('Variable nested_lvl1:', nested_lvl1)
nested_f    = nested_lvl1[2]
print('Variable nested_f:', nested_f)

# Access the item "f" in nested_list - single line
f = nested_list[1][2]
print('Variable f:', f)

Variable nested_lvl1: ['d', 'e', 'f']
Variable nested_f: f
Variable f: f


Adding new elements at the end of a list:

In [113]:
# Start with an empty list
sample_means = []

# Add items to the list
sample_means.append(23.45)
sample_means.append(45.1)
sample_means.append(28.62)

print('Variable sample_means:', sample_means)

Variable sample_means: [23.45, 45.1, 28.62]


List "multiplication":

In [115]:
# Create a list by repeating another list
list_a = ['a']
list_a5 = list_a * 5
list_abc3 = ['a','b','c'] * 3

print('Variable list_a:', list_a)
print('Variable list_a5:', list_a5)
print('Variable list_abc3:', list_abc3)

Variable list_a: ['a']
Variable list_a5: ['a', 'a', 'a', 'a', 'a']
Variable list_abc3: ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']


**Get info about a list:**

We can use the same function `len()` we used for strings

In [116]:
#print('Number of items in list_a:', len(list_a))
print('Number of items in list_a5:', len(list_a5))

Number of items in list_a5: 5


In [None]:
# We can store the length of a list in a variable if needed
len_list_abc3 = len(list_abc3)
print('Number of items in list_abc3:', len_list_abc3)
type(len_list_abc3)

Test if an item is or isn't in a list

In [117]:
list_abc = ['a','b','c']

a_in_list_abc = 'a' in list_abc
d_not_in_list_abc = 'd' not in list_abc

print('Variable list_abc:', list_abc)
print('Variable a_in_list_abc:', a_in_list_abc)
print('Variable d_not_in_list_abc:', d_not_in_list_abc)

Variable list_abc: ['a', 'b', 'c']
Variable a_in_list_abc: True
Variable d_not_in_list_abc: True


**Ex. 7a** (do ex 7a, 7b, 7c, 7d and 7e independently ~ 8 min, 15 if including advanced)

In [None]:
# From the list digits, in one line of code,
# create the variable ex8_x with the int 6

digits = [[0,1,2],[3,4,5],6,[7,8,9]]

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_x == 6

**Ex. 7b**

In [None]:
# From the list digits, in one line of code,
# create the variable ex8_y with the int 4

digits = [[0,1,2],[3,4,5],6,[7,8,9]]

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_y == 4

**Ex. 7c**

In [None]:
# Using only the lists in the variables a, b and c and list methods
# create a list [1,2,3] and store it in the variable ex8_k.
# You may not use any ints

a = 2
b = [1,4,3]
c = 1

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_k == [1,2,3]

**Ex. 7d** (advanced)

In [None]:
# Using only the lists in the variables a, b and c and list methods
# create a list [1,2,3,4] and store it in the variable ex8_z.
# You may not use any ints

a = 4
b = [2,3]
c = [1]

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_z == [1,2,3,4]

**Ex. 7e** (advanced)

In [None]:
# Using only the lists in the variables a, b and c and list methods
# create a list [1,2,3] and store it in the variable ex8_i.
# You may not use any ints

a = 3
b = [1,4,2]
c = 1

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_i == [1,2,3]

## S4.2 Container types - Dictionaries

| Class name | Full name  | Access                      | Occurrence  | Remarks |
|:---        |:---        | :---                        | :---        | :---
| dict       | Dictionary | Access items by key         | Common      | Since we access items by key names, the order is not important |

Each item in a dictionary consists of two things. The item itself and a key used to refer to it.

The item can be of any type (anything from atoms to advanced molecules) but the key is always a string.

---

**Create dictionaries and access items:**

In [118]:
# Create a dictionary
x = {'a': 'alpha', 'b': 3, 'c': True, 'd': [1,2,3]}
print('Variable x:', x)
type(x)

Variable x: {'a': 'alpha', 'b': 3, 'c': True, 'd': [1, 2, 3]}


dict

In [None]:
# Access item in a dict using the key
print("Variable x['a']:", x['a'], type(x['a']))
print("Variable x['b']:", x['b'], type(x['b']))
print("Variable x['c']:", x['c'], type(x['c']))
print("Variable x['d']:", x['d'], type(x['d']))

In [None]:
# Lets say we are a bank keeping track of info about accounts

# Start with an empty dict
accounta = {}
accountb = {}

# Set up account A details
accounta['owner'] = 'Jerry Ehman'
accounta['id'] = '6EQUJ5'

# Set up account B details in different order
accountb['id'] = 'GTCTAT'
accountb['owner'] = 'Rosalind Franklin'

print('Variable accounta:', accounta)
print('Variable accountb:', accountb)

Values can only be accessed with their keys.

In [None]:
print('Owner account A:', accounta['owner'])
print('Owner account B:', accountb['owner'])

You can also add new key-value pairs in an existing dictionary:

In [None]:
# Deposit initial amount on account A
accounta['balance'] = 1420
print(accounta)

**Important error message: KeyError**
    
Whenever you see an error on the format
`KeyError: 'balance'`,
then it means that you have tried to access an item in the list, using a key that is not used in the dictionary.

In [None]:
print('Balance account B:', accountb['balance'], type(accountb['balance']))

In [None]:
# When applicable, use get() method to set a default value if key does not exist
print('Balance account A:', accounta.get('balance', 0))
print('Balance account B:', accountb.get('balance', 0))

In [None]:
# Python returns None when using .get() on a key that does not exist without a default value
print('Balance account B:', accountb.get('color'), type(accountb.get('color')))

**Ex. 8a** (do ex 8a and 8b independently ~ 10 min)

In [None]:
# Using only the already defined variables,
# (you may not type any keys manually)
# modify the empty dict ex8_z into
# {'pet1':'Dog','pet2':'Cat'}
# You may not overwrite ex8_z

p1 = 'pet1'
p2 = 'pet2'
Arthur = 'Cat'
b = "Dog"
c = Arthur
ex8_z = {}

### ADD YOUR CODE HERE

# === Do not modify code below ===
assert ex8_z == {'pet1':'Dog','pet2':'Cat'}

**Ex. 8b**

In [None]:
# Using only the complex_dict and
# accessing items using only keys and indexes,
# create the following variables zero as the int 0,
# d as the string d, minus_three as the int -3
# and symbol_list as the list ['%','?','~']
# Try create each of these variables in a single line of code


complex_dict = {
    'alpha': [
        'a','b','c','d'
    ],
    'numbers': [
        [1,2,3],
        0,
        [-1,-2,-3]
    ],
    'symbols' : {
        'percent' : '%',
        'question' : '?',
        'tilde' : '~'
    }
}

zero = ### ADD YOUR CODE HERE
d = ### ADD YOUR CODE HERE
minus_three = ### ADD YOUR CODE HERE
symbol_list = ### ADD YOUR CODE HERE

# === Do not modify code below ===
assert zero==0 and d=='d' and minus_three==-3 and symbol_list==['%','?','~']

**Get the keys and/or the items of a dictionary**

Containers are great at holding variables with data in a structured way.
It is often the case we want to access one item at the time in the list.

We will cover loops in the next session, but now we will already cover
three methods important when looping over the key-value pair of a dict.


In [None]:
# Define a dictionary with countries and capitals
capitals = {
    'China':'Beijing',
    'India':'New Delhi',
    'Ohio':'Columbus',
    'Peru':'Lima',
    'Sweden':'Stockholm',
}

# We can get all keys or all lists using the methods .keys() and .values()
print("Keys list:", capitals.keys(), type(capitals.keys()))
print("Values list:", capitals.values(), type(capitals.values()))

In [None]:
# Get all keys in a dictionary
for country in capitals.keys() :
    print(country)

In [None]:
# Get all values in a dictionary
for capital in capitals.values() :
    print(capital)

In [None]:
# Get both keys and values in a dictionary
for country, capital in capitals.items():
    print(f"The capital of {country} is {capital}.")