# Basic Python

This chapter introduces the core language ideas and constructions, with a few exampls. If you have basic familiarity with programming, you can [skip](less-basic-python.ipynb) the introduction.

We being our journey with a realistic example of a python script that loads data from a CSV file and produces a summary of its contents. This is a simplified example, in the real world you would use one of the popular Python data analysis libraries, such as pandas, numpy, or polars. However, it is an interesting real-world use case that is probably more interesting than some made-up problem. The goal of this chapter is to get you to a point where you can read this code and understand what it means.

In [73]:
# Import modules from the Python standard library (no need to install these separately)
from functools import reduce
from operator import add
from pathlib import Path

# Our first function!
# You read this as:
# def -> A function!
# parse_header -> the name of the function
# (...) -> function arguments/parameters
# In this case, we have only one argument which we decided to call 'csv_file'
def parse_header(csv_file):
    """
    Given a File object for our CSV file, read the first line and interpret it as
    the header, containing the column names separated by a comma ',' character.
    Return a list of cleaned-up column names, without the comma or leading/trailing spaces.
    """
    # Read a line from the file
    header_line = csv_file.readline()
    # If the line is empty, we were given an empty file, so we complain!
    if not header_line:
        raise RuntimeError("Missing header line")
    # A more complex python line:
    # 1. Split the header into pieces at each ',' character and return a list
    # 2. Go through the list, for each name in the list, we call the string function `strip()`.
    #    this will remove leading and trailing space characters.
    # 3. Collect the stripped names into a new list.
    names = [name.strip() for name in header_line.split(",")]
    # Return the list of names as the result of this function
    return names

def parse_line(line):
    """
    Similar to parse_header(), however we expect a line of text from the file as
    the 'line' argument to the function.
    """
    # This is similar to the case above where we strip() the name, instead here we
    # convert it to an integer number (if it fails it will raise an error).
    return [int(v) for v in line.split(",")]

def compute_summary(data):
    """
    For each element in the data dictionary, compute the mean. We treat the dictionary
    key as the column name, and the value associated to the key as the list of samples in the column.
    Return a new dictionary containing the mean of each column.
    """
    # A new dictionary (see Collections)
    summary = {}

    # Check every (name, column data) pair in the input data dictionary
    for name, column in data.items():
        # Compute the mean of all the samples in the column
        total = 0
        for sample in column:
            total = total + sample
        mean = total / len(column)
        summary[name] = mean
        # Note that all this could be written much more compactly as:
        # summary[name] = reduce(add, column, 0) / len(column)
        # Both are correct, but this is much more compact to write once you get used to it.
    return summary

def load_file(csv_file):
    """
    Given a valid opened file, load its content and produce a summary containing the mean
    for each column.
    """
    # A new dictionary (see Collections).
    # We will use this to store data by column.
    # Each entry in the dictionary has a label (key) and a value. We set the label to the column name
    # and the value to the list of samples for the column.
    # So, if the csv file contains:
    # A, B
    # 1, 3
    # 2, 4
    # The 'data' dictionary will contain
    # data["A"] = [1, 2]
    # data["B"] = [3, 4]
    data = {}

    # Read the first line of the file and interpret it as the column names
    columns = parse_header(csv_file)
    # Initialize each 'data' entry by creating an empty list of samples for each column name
    for col in columns:
        data[col] = []

    # Now read the file line-by-line (from the second line)
    for line in csv_file:
        # Convert the line into a list of numbers, these are the samples for a 'row' of the csv file.
        line_data = parse_line(line)
        # Double check that we have the right number of samples in the row
        if len(line_data) != len(columns):
            raise RuntimeError(f"Mismatch between number of columns in line and header")
        # Now go over every sample in the row, column_number is generated by `enumerate` from 0 to N (length of the row)
        for column_number, value in enumerate(line_data):
            # Fetch the column name for the given column number
            column_at_index = columns[column_number]
            # Append the new sample to the right list of samples
            data[column_at_index].append(value)

    # Call the function that computes the means of the columns, and return its result.
    return compute_summary(data)

def load(path):
    """
    Load a CSV file at the given path.

    Arguments:
    path: target file path as a Path object

    Returns:
    A dictionary containing the mean of each column.
    """
    if not path.exists() or not path.is_file():
        raise ValueError(f"Invalid path: {path}")

    # Open a file and name it 'csv_file', when the code block within the `with .. as` statement
    # is done processing the file, the file is automatically closed.
    with open(path, "r") as csv_file:
        return load_file(csv_file)


target = Path.cwd() / "sample-data" / "simple.csv"
summary = load(target)
print(summary)

{'column-A': 51.6, 'column-B': 49.7, 'column-C': 66.5, 'column-D': 54.8}


If all this seems confusing, don't worry! The remainder of this chapter will allow you to understand everything that this script does, so when you learn something below you can come up here and check whether it makes more sense.

## help!
You can (almost) always ask for help. This is especially useful in an interactive environment. If you run `help()` with a python function or other object, you will get some helpful insight on how it can be used.

In [None]:
help(print)

In [None]:
help(int)

## Comments #1
Plain text after a `#` is called a _comment_ and is ignored by Python. It is useful to the programmer to annotate what the heck is going on in the code. I will use them profusely, and you should too until you are comfortable enough with your writing that you need only more intricate pieces of code.
A good guideline is:
- While learning, annotate as much as you want
- When writing things that you expect to come back and look at in a few months, remember to annotate things otherwise you will forget why you did something
- When writing code that will be shared: that's why there is a part #2 to this.

Multi-line comments are done using triple quotes `"""..."""`, this is most commonly used to annotate functions with text.

## Printing out stuff
Printing out things is probably the first thing you learn in every language. In Python this is done using the built-in `print()` function. You can print almost everything, from integers, to strings and more complex data types.

Functions are pieces of code that perform a task given some input information. For instance, the `print()` function's job is to write out whatever you give it as text. We say that you _call_ functions by passing arguments to them, using the parentheses to delimit the list of arguments you give to the function. The arguments are the inputs that you give to the function. A function may give you a result back, you can capture this in a variable (see next section) or do whatever else you please with it. The `print()` function does not produce a result, so it returns a special value called `None`, which represents the absence of a value.

Here we also introduce two basic data types: integers and strings. Integers are just the integral numbers you are used to, strings are pieces of text that you can create by enclosing some arbitrary text in single or double quotes: `'I am a string'`, `"I am also a string"`.

In [92]:
# a number
print(10)
# a string of text
print("some text")
# a python function
print(print)
# multiple things
print(10, 20, "more things")

10
some text
<built-in function print>
10 20 more things
10
None


## Naming things
In order to perform any useful task you probably will to store temporary values, input data and output data somewhere, so that you can perform operations on them. You do this by assigning values to variables. You can think of variables as labels that you attach to the things you work with, so that you can later find them.

In [None]:
# Basic example using variables `a` and `b`
a = 10
b = 20
print(10 * 20)

# Names can be long with some special character
a_variable = 10
print(a_variable)

# Variables can be assigned any kind of data
something = 10
print("I am a number", something)
something = "hello!"
print("I am text", something)

# Variable names can include numbers, but not at the beginning
number1 = 1
number2 = 2
# 2number -> invalid

# Imports and modules
Python code is organised into modules. Each module may correspond to one or more Python source files somewhere in your system (this is a simplification). In order to use functions and other things from a module, you first need to tell Python where to find them. The tool for this job is the `import` statement.

The first lines of a Python file contain the `import` statements. These are used to tell Python that we are going to use some code from another module.
There are two styles of import: the `from` import and the regular `import`. The `import` statement tells Python to go find a module named "functools", load the functions, classes and everything else it contains and make it accessible to you via the variable `functools` in your program.

Conceptually, you can think of the `import` doing the following:
```
functools = magic_function_that_loads_a_module_by_name("functools")
# Do something with functools
```

In [89]:
import functools
# Some time later, we want to use the function 'reduce' from the functools module
# so we can call it by specifying that it comes from functools (which we imported)
# functools.reduce(...)

The `from <module> import <something>` statement is slightly different. It tells Python to go find a module with a given name, take the functions (or other Python constructs) that you list after the `import` keyword and make them available in your program.

In [93]:
from functools import reduce
# Some time later, we can use the function `reduce` directly, without naming the module it comes from,
# because we already told Python that the "thing" called `reduce` in our program really comes from the functools module.
# reduce(...)

Finally, you can rename the stuff you import. Consider the case where you **really** want to call one of your variables functools and you will not take "no" for an answer. Now, if you import the `reduce` function you will run into troubles. So, it is possible to use the `as` keyword to set a custom name for imported objects.

In [None]:
import functools

print("Before my variable assignment", functools.reduce)

functools = 10
# Recall that after the import is resolved, `functools` is just another Python variable that happens to contain a module.
# What happens here is that I'm accidentally changing the content of the variable to the number 10.
print("After my variable assignment", functools.reduce)

In [None]:
import functools as ft

# functools is not defined now, because we renamed it to `ft`
# print(functools)

functools = 10
print("After my variable assignment", ft.reduce)

# The same can be done in the from .. import
from functools import reduce as my_reduce
# As before, `reduce` is not defined now, because we renamed it
# print(reduce)
print("Renamed reduce function", my_reduce)

You may be asking yourself "where do modules come from?", and you would be right to ask. Managing third-party python packages is a rather involved topic that I defer to another chapter. For now, I will only use modules that come in the [Python standard library](https://docs.python.org/3/library/index.html); the standard library comes with your python installation, so you don't need to do anything particular to use them, just import what you need. The standard library documentation (linked before) will help you find and use the right functions, classes and whatever else Python offers.

## Errors #1
This is very early in the list because, if you are a beginner, you may or may not get scared of errors and forget to read what it says before going back to find what is wrong.
Errors are the most common thing you will get out of writing programs, so better get used to them early on.

This is an error:

In [None]:
print(1 / 0)

Some of the output will be confusing at this stage, and that is fine, you will learn all about it in due time. For now just be aware that even with very little knowledge you can extract useful information of what is going wrong. In this case you are dividing something by zero, and python is telling you the line of code where this is happening.

There are multiple _types_ of errors that convey information about what went wrong. The python language defines many different errors that have specific meanings (see [Exception](https://docs.python.org/3/library/exceptions.html) if you are curious). We will dig more into this topic when we learn to generate errors and create custom types of errors.

## Indentation #1
A very quick note about indentation. Indentation in Python matters, indented code is treated as a separate block of code, this will be clarified below, for now just appreciate the fact that everything in the early examples is not indented at all. However, if you fail to indent correctly, you will get errors.

In [None]:
# IndentationError: you messed up the indentation
print("hello")
    print("world")

It is also worth remembering that not all blank space is equal. There are (for the most part) two types of blank character: the space ' ' and the tab '\t'. The _tab_ origin dates back to [typewriters](https://en.wikipedia.org/wiki/Tab_key) and has since been adopted in programming languages to correspond to a power-of-two number of spaces. In python it is encouraged to avoid using _tab_ indentation and instead use spaces, this makes the code easier to read on different editors that configure the tab-width differently. Most editors will expand the tab key into 4 spaces. Mixing spaces and tabs is a recipe for disaster!

In [None]:
# TabError: you mixed tabs and spaces, please don't
def mixing_tab_and_space():
    print("Indented by 4 spaces")
	print("Indented by a tab. Hard to notice, right?")

## Objects and Types

In Python, at the core, everything you deal with is an _object_. The _object_ is a piece of information with some associated operations you can perform on it. The set of operations you can perform on the object and its properties are determined by its _type_ (or _class_). The _class_ is the language construct that allows you to create custom types; however, we will dig into custom types later on, this section focuses on the basic built-in types in Python.

As an aside, Python also has a more relaxed notion of data types, often referred to as _duck typing_ (this term comes from the saying “If it walks like a duck, and it quacks like a duck, then it must be a duck”). This is a concept related to dynamic typing, where the _class_ of the object is less important than the methods (operations) it defines. We will address this when talking about classes.

To illustrate the idea of a type, consider for example the number 10. It is an object and you can ask python its type using the `type()` function, which will tell you its type is `int`. As usual, the `help()` function will tell you something more about the `int` class, which is the class that describes all integer numbers.

In [None]:
type(10)

In [None]:
help(10)

Here, the type of the number 10 is `int`, which correspond to the class of integer numbers. We say that 10 is an _instance_ of the `int` _class_.

You will want additional data types for your programs, in general may find yourself in need of:
- numbers
- decimal numbers
- text
- collections of things
- complex objects that represent something in your program (e.g. a person identity is composed of text for the name, numbers for age and date of birth)

Finally, there is a special type called the `NoneType`, which is the type of the special object `None`. This represents the absence of a value and is used as a placeholder or a return value when the absence of an object is not an error.

The notion of _objects_ has very important consequences on how the language is structured, so it is important to introduce the concept early on. We will progressively expand on the idea.
For now the core ideas to recall informally are:
- An _object_ is everything you can manipulate, generally it represents some data (e.g. a number).
- A _variable_ is a label that you assign to objects to name and reference them in other operations
- The _type_ or _class_ of an _object_ tells you _what_ your object is, and the _operations_ you can perform on it. 
For example, when I type `some_value = 10`, I create an object of type `int` with the value of 10, and give it the name `some_value`, so that I can refer to it later. The `int` type will define what I can do with it, for example the ability to use the four basic arithmetic operations with it.

# CSV loader - Part 1
Now that we have some basics, let's start thinking about loading our CSV file. Let's split the problem into a mental (or written) set of smaller tasks that we need to do:
1. I need to know where the file is, so I want its **path** in the filesystem.
2. I need to somehow open the file.
3. Read the file line-by-line.
4. Somehow interpret the data in each line.
5. Produce the mean for each column of data.
6. Report the result back to the user.

Let's start from point (1). Given that we are in an example, we already know the path to the file, it is found at `sample-data/simple.csv`, relative to directory our Python program is running from.

In modern Python, we use the `pathlib` module to manage file paths. It provides an handy data type called `Path` which shields us from the differences between Linux, MacOS and Windows paths, and provide some useful functions to check whether a file exists or get the current directory. So the first thing we do is to import the `Path` type.

Given a **type**, you create an **instance** of that type by "calling" the type object in the same way you call a function.

In [98]:
from pathlib import Path

# A path to a file, the file does not need to exist for us to create the path.
my_path = Path("/some/path/to/file.txt")
print(my_path)

/some/path/to/file.txt


In fact, when you do this Python runs a special function associated to the type, called the `__init__()` function, using the arguments you specify. I mention this because sometimes it is useful to look at the documentation for that function in order to understand how to create an object of that type, e.g. `help(MyType.__init__)`. In most cases, the `help(MyType)` provides instruction about how to create new objects, this is the case for the `Path` class.

We will dig more into special functions, which are the ones starting and ending with two underscores. I'm mentioning this now so that you can better orient yourself in the Python documentation and `help()` output.

In [None]:
help(Path)

After browsing the help message for the `Path` class, we focus on the `cwd()` and `exists()` functions. Recall that you can use the `help()` function for individual functions, so you can actually limit the amount of text printed by help!

In [102]:
from pathlib import Path

help(Path.cwd)
help(Path.exists)

Help on method cwd in module pathlib:

cwd() method of builtins.type instance
    Return a new path pointing to the current working directory
    (as returned by os.getcwd()).

Help on function exists in module pathlib:

exists(self)
    Whether this path exists.



Note that `Path.cwd()` is somewhat special, because it can be called directly on the **type** `Path` instead of an **instance** `Path("/my/path/to/file.txt")`.

In [104]:
from pathlib import Path

path_object = Path("some/file.txt")
print("cwd() called on the Path type", Path.cwd())
print("cwd() called on a Path instance", path_object.cwd())

cwd() called on the Path type /home/qwattash/git/pytorial
cwd() called on a Path instance /home/qwattash/git/pytorial


Normally, you need to first create an object, then you can call the functions defined by its type (the _class_) on your object. This is because these functions **act** on your object, they modify its contents or use its contents to do their job. The `cwd()` function **creates** a new Path, so it does not need one in the first place.

If you try the same with the `exists()` function, it will fail. This is because to check whether a path exists you need some path to check, so the `exists()` function must **act** on an existing Path object.

In [105]:
print("Does path object exists?", path_object.exists())

Does path object exists? False


In [107]:
# I can't call exists() on the Path type, because it does not know which path to check!
Path.exists()

TypeError: Path.exists() missing 1 required positional argument: 'self'

As you can see, it complains about a mysterious `self` positional argument. We will dig more into it when talking about classes; however, the presence of that special argument tells you that the function **must** be called on an object.
If you check the help for the function `exists()` you will see that the first parameter is called `self`.

Now we are equipped with almost everything we need to create a path from the current directory, so let's see how it's done.

In [110]:
from pathlib import Path

target = Path.cwd() / "sample-data" / "simple.csv"
if not target.exists():
    print("Error, the file does not exist!")
else:
    print("Found our file", target)

Found our file /home/qwattash/git/pytorial/sample-data/simple.csv


Finally, this gives us our file path!

We have one final thing to check: what about that division operator `/`? I thought you can't divide text!

In [None]:
a = "hello" / "world"

Uhm, what is going on then? The `Path` class is the key here. In Python, you can _override_ the behaviour of operators for a class; in this case, the `Path` class modifies the behaviour of the `/` operator so that when you have `a_path / another_path_or_string`, the result will be the concatenation of the two path elements using the separator specific to your system ('/' for Linux, '\' for Windows).

In [112]:
my_path = Path("hello")
print(my_path / "world")

hello/world


Nice! This was quite a journey just to write one line, but we covered a lot of ground here so take a break and relax a bit.

# More Python basics

## Operators
You can perform basic arithmetic operations:

In [None]:
# Number arithmetic
print("add", 1 + 1)
print("subtract", 2 - 10)
print("multiply", 10 * 2)
print("divide", 10 / 5)
print("remainder", 10 % 3)
print("exponentiation", 10**2)
# with decimal points
print("floating point add", 1.10 + 3.5)

# Grouping can be done with parentheses
print("grouping", (1 + 1) * 4, "vs", 1 + 1 * 4)

Some of the operators are reused (_overloaded_ is the technical jargon) for other data types as well, text for example:

In [None]:
# Adding text? concatenation!
print("what" + "is" + "this" + "magic" + "you" + "speak" + "of?")
# note that this is concatenating the text but does not add any spaces, you have to do that yourself when you do need it.

In [None]:
# Subtracting text does not make sense
print("a" - "b")

## Python data types

Here is a quick showcase of some useful Python built-in data types.

**Integral numbers** Well, they are integral numbers.

In [None]:
# Explicitly create them using the `int` type
x = int(10)
print("explicit X", type(x))
# This looks quite redundant, we have already been using them by writing
x = 10
print("implicit X", type(x))
# the explicitness is useful if you WANT an integer, but somebody gives you something else
# e.g. a decimal number
x = 1.2
print("X is not an int", type(x))
y = int(x)
print("but Y is", type(y)) 

**Float numbers** These are numbers with decimal points. Easy as that. They behave essentially like `int`.

In [None]:
# Explicitly create one
x = float(10.5)
print("explicit X", type(x))
# Implicitly do the same
x = 10.5
print("implicit X", type(x))
# Conversion
x = 10
print("X is not a float", type(x))
y = float(x)
print("but Y is", type(y))

**Strings** Sequences of text. Note that these support Unicode, so you can have symbols and non-latin characters in there.

In [None]:
# Explicitly create one
x = str("I am a string!")
print("explicit string", type(x))

# Implicitly, we have been doing this for a long time now
x = "I am another string"
print("implicit string", type(x))

# Conversions are fun
string_number = str(10)
print("str(10) gives out a string", type(string_number))
number_string = int("10")
print("int('10') gives out a number", type(number_string))

In [None]:
# But beware
what_now = int("definitely not a number")

Sometimes you have to use the character **"** in a string, to do so, you _escape_ it by prepending it with a `\`, as in `"my fancy string with \" quote"`

**Collections** Collections are containers of other things. There are 4 types of collection in python: tuples, lists, dictionaries and sets.

## Collections
Collections are data types that contain other objects (including other collections). Here we introduce the basic built-in Python collections; however, may other container types exist both in the Python standard library and in third-party libraries.

It is useful to distinguish collections based on two different properties:
- Mutable or immutable? -- Can I add or remove elements from my collection?
- Ordered or unordered? -- Do the elements have a "comes before" relationship? The items in a shopping list are ordered, however beads in a bag are not.

### Tuple
A **tuple** is a sequence of objects that is ordered and immutable. It retains the order of the elements; however, once created, you can inspect it or copy it, but never change its contents.

In [None]:
# A tuple is created by enclosing a list of things in parentheses (,)
t = (1, 2, "hello tuple")
print("My tuple", t)
# Note that you need a comma ',' for a tuple with just one item.
# If you omit it you get the result of the operation between parentheses as a value,
# as the parentheses are used to enforce precedence between operators.
single_item = (1,)
# This is just the number '1', the parenteses work the same as for arithmetic operations.
not_a_tuple = (1)
# Or from another already existing sequence of items
t2 = tuple(t)
print("Copy of my tuple", t2)

# You can access elements by index
print("first element:", t[0])
print("second element:", t[1])

# You can concatenate tuples, this produces a NEW tuple.
a = (1, 2, 3)
b = (4, 5, 6)
c = a + b
print("concatenated tuple:", a, "+", b, "=", c)

In [None]:
# But you can't change the element
t[0] = 100
# TypeError!

### List
A **list** is a sequence of objects that is ordered and mutable. You can add and remove elements from it, and it retains the order of the elements.

In [None]:
# A list is created in the same way as a tuple, just by using square braces [,]
l = [1, 2, "hello list"]
single_item = [1]
print(l)
# Or from another existing sequence
l2 = list(l)
print(l2)
l3 = list((1, 2, "I am created from a tuple"))
print(l3)

Indexing and concatenation work as the tuple, however there are some questions for you here. I encourage you to try out the questions below, thinking about what you would expect to happen.

In [None]:
# What happens if you index an element that does not exist?
# x = [1,2,3]
# print(x[10])

# What happens if you assign a value to a position that does not exist?
# x[10] = 11

# Use the help(list) to find the description of the append(), insert(), pop() and remove() functions and use them to modify the list
# x = [1, 2, 3, 4, 5, 6]
# into the list
# x = ["a", 2, 3, "b", 5, "c"]

### Dictionary
A **dictionary** is an ordered and mutable collection that is indexed by an (almost) arbitrary key instead of a number. You can think of it as a list that allows to assign labels to the elements.

In [None]:
# A dictionary is created in a similar way, using curly braces {,}
# This time though, we label each item. The easiest way is to use an unique string or number
d = {"element 1": 1, "element 2": 2, "element 3": "I am item #3"}
print(d)
# or from another dictionary
d1 = dict(d)
print(d1)

In [None]:
# But not so easily from a list or tuple
d = dict(["a", "b", "c"])
# which makes sense if you think about it, which part is the key and which the value?

In [None]:
# however we can do that if we use tuples as key/value pairs
d = dict([("a", 1), ("b", 2), ("c", 3)])
print(d)

In [None]:
# Indexing is similar to lists and tuples, but you use your own keys
print("My first element is:", d["element 1"])

# Note that it is possible to assing values to labels that do not exist!
d["new thing"] = "something new"

There are various ways to add and remove elements from a dictionary. It will be more useful to learn by example than having a big list here.

### Set
A **set** is an unordered mutable collection. It is possible to add and remove elements, however they don't have an index and there is no ordering relationship between them. The set has the additional property that each element may appear only once.

In [None]:
# A set is created using curly braces, without the dictionary labels notation
s = {"a", "b", "c"}
print("my set:", s)
# or from another collection, note that the number 4 will only appear once.
s2 = set([1, 2, 3, 4, 4, 4, 4])
print("my other set:", s2)

As with the dictionary, it will be more useful to learn sets by using them.

### Common collection operations
There are two very common operations that can be performed on all collections: checking the number of elements it holds, and iterating through the elements.
The first is easily done using the built-in `len()` function, as in the following example.

In [None]:
my_list = [1, 2, 3, 4]
print("How many elements?", len(my_list))

All standard collections can be traversed, inspecting each element in turn. These containers are _iterable_ objects, which means you can get an iterator from. The _iterator_ is a special object associated to the collection that produces a countable sequence of elements from the collection. The _iterator_ gives you the ability to call the `next()` function to obtain the next element from the collection.

In [None]:
list_iter = iter(my_list)
print("first:", next(list_iter))
print("second:", next(list_iter))
print("third:", next(list_iter))
print("fourth:", next(list_iter))
# Calling next(list_iter) again will give a StopIteration error, which means that we have seen all
# the elements in the collection, and the iterator is consumed.
# print("??", next(list_iter))

While iterators are the base building block to traverse collections of things, it is cumbersome to always call `next()` on things and checking for errors.
There are other, more conveninent, ways to iterate over collections, one of which is using a `for` loop.
You can take any _iterable_ object and use it in a loop to inspect every element it contains.

In [None]:
my_list = [1, 2, 3, 4]

for i in my_list:
    print("Found:", i)

This is probably the right time to mention the `enumerate()` built-in function. This function accepts an iterator as an argument, and returns a new iterator that produces tuples of the form `(incremental number, value)`. Now you see one of the useful properties of iterators, you can pass them around and derive new iterators that transform the data in a certain way.

In [None]:
for index_and_value in enumerate(my_list):
    print("At", index_and_value[0], "found", index_and_value[1])

Finally, there is another trick that will make your life easier, it is called **tuple unpacking**. You may have observed in the previous example that accessing tuple elements from `enumerate()` is a bit verbose.

Wouldn't it be nicer if we wrote something like this?
```
for index_and_value in enumerate(my_list):
    index = index_and_value[0]
    value = index_and_value[1]
    print(...)
```

Right, this helps if you have to use `index` and `value` multiple times in the loop, but we can do better thanks to tuple unpacking!
Unpacking refers to the fact that you split up the tuple (the package) into its contents.

In [None]:
my_tuple = (1, 2, 3)
(one, two, three) = my_tuple
# In fact we can also omit the parentheses
# one, two, three = my_tuple
print("one", one, "two", two, "three", three)

In [None]:
# Exercise what happens if you get the number of elements wrong?
t = (1, 2, 3, 4)
# Too few elements
# a, b, c = t
# Too many elements
# a, b, c, d, e = t

# Try it out!

With this in mind, we can re-rwite our loop in a more concise and readable form:

In [None]:
for index, value in enumerate(my_list):
    print("At", index, "found", value)

### Weridness about mutability
We talked about mutable and immutable collections, and how it is possible to have nested collections. So I ask you: what happens if I have a tuple of lists? Lists are mutable and tuples are immutable, so will I be able to append a number to one of my lists?

In [None]:
# Exercise - What appens if I have tuple of lists? Can I append elements to a list within a tuple?

# A tuple with three empty lists
t = ([], [], [])

# What appens if I append an element to the first list?
# Try it out! (Hint, remember indexing and the list append() function)

I hear you say "I thought tuples were _immutable_! You lied to me!". So what is going on?
This is a perfect opportunity to understand the difference between an object and a reference to an object. This is a bit subtle in Python, so bear with me.

Until now, we have seen that you can assign a "label" to your objects by assigning them to variables. Consider a simple variable assignment:
```
x = 10
```
You can think of this as `x` containing the value 10, or `x` being a label for something that has the value 10.
While this difference is subtle for a number, consider another example with a list.

In [50]:
x = [1, 2]
y = x
# It is true that 'y' contains a list with the values 1 and 2, however it is the SAME list!
x.append(99)
print(y)

[1, 2, 99]


So, if you think in terms of labels, the above is easily explained by the fact that you are using two different _names_ for the same object (the list). We say that a label _references_ an object, because it gives you a way to reach the object.
We can go a step further now, if we consider that the same is true for the numbers inside the list: the list really contains _references_ to objects.

Consider the following simple list-of-lists.

In [51]:
x = [[1, 2], [3, 4]]
# Now we take the first element of the list
y = x[0]
# Note that y is the name we give to a reference to the list contained in x[0]
# so when we append to it
y.append(99)
# we also see it changing inside x
print(x)

[[1, 2, 99], [3, 4]]


This may seem conter-intuitive, but it is also quite useful because this means that you can give a collection to a function and have something done to it without ever copying the collection, which may be expensive.

Now, back to the tuple problem. If we think in terms of _references_, the immutability of the tuple means that you can not add or remove _references_ to the tuple, or modify one entry in the tuple to _reference_ another object. It does not say anything about the immutability of the **referenced** objects!

In [54]:
x = ([1, 2], [3, 4])
# x is immutable, I can not replace the reference in the first element
# x[0] = [99, 100]
y = x[0]
# y, however, is a list, and it is perfectly legal to modify it
y.append(99)
print(x)

([1, 2, 99], [3, 4])


Now you may be asking "what about when I want to actually copy things?". Python makes the copy operation explicit, because it may be complicated, expensive or outright impossible in some cases.
The collections above can be copied, however remember that you will copy _references_ to something, so if you want a complete copy of the collection and its contents, you will also have to copy the individual objects!

In [None]:
x = [1, 2, 3, [99, 100]]
# There are generally two ways of copying collections
# One is to create a new collection from an iterable object (such as another collection)
copy_of_x = list(x)
# Another is to use the copy() function, which should be provided by anything that allows to be copied.
another_copy = x.copy()
print("Simple copies", x, copy_of_x)

# Now if I append to the original list
x.append("xxx")
# It will not modify the copies
print("Append to the outer list", x, copy_of_x)

# HOWEVER, if I append to the inner list, things are different
x[3].append("yyy")
print("Append to the inner list", x, copy_of_x)

It is possible to copy recursively all objects within a collection, this is called a _deep copy_, in contrast with _shallow copies_ that we have seen so far. The simplest way to create _deep copies_ is to use the `copy` module from the standard library, which contains the `deepcopy()` function.

## Conditional expressions
Now the last basic building block you need to know is how to do things conditionally and repeat tasks.
We essentially want to create a branch of code that is executed one or more times, when a specific condition is met.

### Indentation #2
Before I anticipated that indentation makes a difference we will see this in action here.
Indentation is, very generally speaking, used to create blocks of code that are logically separate the rest. This is very intuitive when seen in action so let's dig into it.

**Conditional execution** Do something if some condition is met.
This is done using the `if`, `elif` and `else` keywords.

In [None]:
# Try changing the value of X to see which branch of code is executed
x = 10
if x == 50:
    # If x is exactly 50
    print("x is exactly 50")
elif x > 50:
    # otherwise, check if x is larger than 50
    print("x is large")
else:
    # if none of the above, do this
    print("x is small")

In [None]:
# It is possible to have more complex conditions
food = "biscuit"
drink = "juice"
if food == "chocolate" or drink == "milk":
    print("I like food")
else:
    print("I do not like food")
    
if (food == "biscuit" or food == "cake") and drink == "milk":
    print("I am hungry")
else:
    print("Not really hungry")

**Loops** Loops are a way to run a piece of code while/until a condition is met.
This is done using the `for...in` and `while` keywords.

In [None]:
# Do something until a condition is met
x = 0
while x < 10:
    x = x + 1
    print("increment x to", x)

In [None]:
# Do something for each item in a collection (e.g. a list)
my_list = [1, 10, 35, 40]
for element in my_list:
    print("my_list contains", element)

The built-in `range()` function is also useful to repeat some task a number of times.

In [None]:
for i in range(10):
    print(i)

Sometimes you need to stop a loop early, because some condition requires it; this can be done using the `break` keyword. Similarly, sometimes you want to skip to the next loop iteration; this is what the `continue` keyword is for.

In [None]:
for i in range(10):
    if i == 2:
        continue
    print(i)
    if i == 5:
        break

## Functions
Functions are the way we describe operations that can be invoked by other parts of the program. Functions may be associated with an _object_, or exist as "global" functions.
Examples of global functions we have seen are the built-in `len()` and `enumerate()` functions. Functions that are associated with an object are generally called _methods_, and we have seen examples of these when dealing with collections, for instance the list `append()` method.

You may want to create a new function to avoid repeating the same code in multiple parts of your program, or to split a complicated task into multiple sub-tasks that are simpler to reason about and easier to understand. The CSV example above does the latter thing, here I split the code into multiple functions to split sub-tasks and hopefully make the code easier to follow.

A function in its basic form has 4 components:
1. A **name** (which is really a variable in disguise, referencing the actual function _object_)
2. A list of **parameters**, that are the names of the arguments that the caller will send to the function to do its job
3. A **return value**, which is to say the result of the function
4. The **body**, the actual code that describes what the function is supposed to do

In [None]:
# Anatomy of a function definition
# def <function_name>(<parameter-1>, <parameter-2>, ... <parameter-N>):
#     return <return value>
def my_function_name(param_1, param_2):
    # body, do something here!
    return True

# Anatomy of a function call
# returned_value = <function_name>(<argument-1>, <argument-2>, ... <argument-N>)
result = my_function_name("val1", "val2")

When a function is called, the parameter names in the definition are substituted by (or more precisely, bound to) the _objects_ passed as the function call arguments, in order.
The function then operates on them (possibly modifying them) and optionally return a result.

In [None]:
def example_add(a, b):
    return a + b

print(example_add(10, 30))

In [None]:
def example_return_none():
    print("I do not return any value")

# Note that this will return the None value.
# This represents the absence of a return value, and has a special type, as I previously mentioned.
print(example_return_none())


In [None]:
def write_message(msg):
    print("A message:", msg)
    
write_message("Hello")
write_message("The value of msg changes")
write_message("For each call")

Remember multi-line comments with `"""..."""`? They can be used in functions just below the `def` line, to describe what the function does. This text is displayed by the `help()` function so it is quite useful.

In [None]:
def do_stuff():
    """
    A function that does stuff!
    I can be displayed using help()!
    """
    pass

help(do_stuff)

### Keyword arguments (named parameters)
So far, we have seen functions that use regular _positional_ arguments, which just means that arguments all mandatory and are mapped in-order to the parameters of the function.

In [74]:
def example(a, b, c):
    print(a, b, c)

# Here we set a=1, b=2, c=3
example(1, 2, 3)
# and you can not call the function with 2 arguments only e.g.
# example(1, 2)  # this is an error!

1 2 3


In reality, Python doesn't really require you to rely on the position of the arguments. You can reference them by name, as in the example below, these are called _keyword_ arguments. However, to avoid confusion, mandatory arguments are generally always passed to the function as _positional_ arguments. This makes the function calls more concise for those arguments that must always be present.

In [75]:
def example(a, b, c):
    print(a, b, c)

# Explicitly set the arguments by name
example(c=1, a=2, b=3)

2 3 1


However, a function can also have _optional_ parameters, that have a default value and may be omitted from the function call. Recall the dictionary `get()` function as an example.

In [79]:
def example(a, b, c="Default value for C"):
    print(a, b, c)

# This is valid, use the positional arguments for all three arguments
example(1, 2, 3)
# This is also valid, use the positional arguments for just the two mandatory parameters
example(1, 2)
# As before you can specify the names of the arguments, although it is generally discouraged
example(b=2, a=1)
# Naming arguments for default parameters is OK though
example(1, 2, c=3)

1 2 3
1 2 Default value for C
1 2 Default value for C
1 2 3


In [None]:
# However you can not have required positional arguments after a default one
def example(a, b="Default", c):
    pass

It is possible to have multiple _optional_ parameters, therefore it is OK to use _keyword_ arguments for these. Suppose you want to set parameter `c` but leave `b` as the default value, you simply can not do that without explicitly using a keyword argument.

In [None]:
def example(a, b=None, c="Default C"):
    print(a, b, c)

example(1, c="Keyword argument C")

Finally, you can have variable numbers of both positional and keyword parameters. Recall the `print()` function, you can pass however many strings you like to print out! So how does it do that?

If you run `help(print)`, you will notice that the function definition is `print(*args, sep=' ', end='\n', file=None, flush=False)`. We now know optional parameters, but what is that `*args` thingy? As you may have guessed, the `*` tells Python to collect all positional arguments into a tuple and place it in the parameter named `args`.

The name _args_ for variable number of arguments is a conventional name that is used for the parameter that groups positional arguments.

In [83]:
help(print)

def example(*args):
    print("args is a", type(args), "and contains", args)

example(1, 2, 3)
example("a", "b", "c")

Help on built-in function print in module builtins:

print(*args, sep=' ', end='\n', file=None, flush=False)
    Prints the values to a stream, or to sys.stdout by default.
    
    sep
      string inserted between values, default a space.
    end
      string appended after the last value, default a newline.
    file
      a file-like object (stream); defaults to the current sys.stdout.
    flush
      whether to forcibly flush the stream.

args is a <class 'tuple'> and contains (1, 2, 3)
args is a <class 'tuple'> and contains ('a', 'b', 'c')


Note that this also works in the reverse direction. Suppose you have a tuple containing some values and you want to call a function using the tuple values as the positional arguments. You can use the `*` notation in the function call.

In [None]:
def regular_function(a, b, c):
    print(a, b, c)

my_args = (1, 2, 3)
regular_function(*my_args)

Finally, you can do the same for keyword arguments, this time using the `**` notation. As before, the `**kwargs` name is the conventional way to name the parameter that groups the keyword arguments. You will notice that keyword arguments are collected as a dictionary and you can broadcast a dictionary as keyword arguments.

In [85]:
def example(**kwargs):
    print("kwargs is a", type(kwargs), "and contains", kwargs)

example(some=1, random=2, names=3)

kwargs is a <class 'dict'> and contains {'some': 1, 'random': 2, 'names': 3}


In [None]:
def example(a, b, c):
    print(a, b, c)

my_kwargs = {"a": 1, "b": 2, "c": 3}
example(**my_kwargs)

## Teatime Break!!
If you reached this point, you should take a break, brew some tea and relax. We are done with the boring basics. Next section will be more interactive, I promise.