# Python basics

## Whitespace

* whitespace in the form of **indentation matters**, e.g. (note the second line's indentation):
  ```
  if condition:
      ...
  ```
    * there's semantic meaning here, the `...` is the code to be executed when the condition is true
* blank space as opposed to tab is recommended ([PEP8](https://peps.python.org/pep-0008/)), 4 spaces per indentation level are customary

In [1]:
if True:
    ...    

## Comments

* A hash sign (`#`) introduces a single-line comment
* There are no multi-line comments per-se, but one can (ab)use triple-quoted links instead (more on that later)

In [2]:
# this is a comment

## Hello world

In [3]:
print("Hello world")

Hello world


## What are variables?

Variables are merely names referring to an object. They are, so to speak, aliases for objects.

In [4]:
a = 1 # name a refers to value 1
a

1

## What are types?

In [5]:
type(1)

int

In [6]:
type("hello")

str

## Python is object-oriented, what does that mean?

It means that even the most simple piece of data, like the integer literal `123` is treated as an object and comes with its very own -- type-specific -- magic methods and attributes.

In [7]:
dir(123)

['__abs__',
 '__add__',
 '__and__',
 '__bool__',
 '__ceil__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floor__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__le__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 '__xor__',
 'as_integer_ratio',
 'bit_count',
 'bit_length',
 'conjugate',
 'denominator',
 'from_bytes',
 'imag',
 'is_integer',
 

The items with `__something__` are called magic methods. Theoretically there would be also attributes (non-callables) using the same naming convention.

> **NB:** the double underscore is everywhere in Python and because it is a pervasive pattern and people found it too complicated to say "double underscore something double underscore" when referring to `__something__`, the Python community came with the short form: **"dunder something"**
>
> And now you can feel as a true Pythonista, knowing such detail so early on.

### Concrete example

Multiplication is usually expressed using the `*` operator, but you could also call the `__mul__` method on one of the involved operands, passing the other operand as an argument.

In [8]:
123 * 234
123 .__mul__(234)

28782

## Errors (exceptions)

* `raise`d, not `throw`n in Python

In [9]:
foobar

NameError: name 'foobar' is not defined

### Exceptions are objects, too

Generally all exceptions in Python inherit from the built-in exception class `BaseException`. We'll learn more details in the explanation of the `try` compound statement.

## Assignments (... variables continued)

* Parameters are passed by assignment (more on that later)

In [10]:
a = 42
b = a ** 2
print(a)
b

42


1764

## Types continued: `bool`

In [11]:
x = True
if x:
    print(x)

True


In [12]:
x = False
if x:
    print(x)
else:
    print("it's false")

it's false


## `None` singleton

* Singleton -- exists only once in the whole "language"
* Used to express "nothingness" and evaluates to false in conditions

In [13]:
if None:
    print("what's this?")

(output is empty, because the condition evaluates to false)

## Types continued: `int`

* `int` for integer
* not limited to 32 or 64-bit range; you can use _really_ big integers in Python without hassle (limited mainly by available memory)

In [14]:
123

123

In [15]:
123 * 234

28782

In [16]:
123 + 234

357

In [17]:
123 / 234
type(123 / 234)

float

In [18]:
334 // 3

111

In [19]:
123 - 34

89

In [20]:
-334 // 3

-112

In [21]:
10 % 3

1

In [22]:
10 // 3

3

## Types continued: `float`

* `float` is short for floating point number
* allows to express fractions of integers at the expense of precision (and range is more severely limited than integers; IEEE-754 double)

In [23]:
123.45

123.45

In [24]:
1/3

0.3333333333333333

## Built-in function: `len()`

* `len` is short for length
* allows us to retrieve the length -- i.e. number of elements in -- a sequence

## Types continued: `str`

* `str` is short for string
* sequence (ordered) of characters
    * stores text

In [25]:
"😁"

'😁'

In [26]:
if "o" in "Hello world":
    print("Found an o")

Found an o


In [27]:
len("Hello world")

11

In [28]:
len("😁")

1

In [29]:
type("hello")

str

### ... creating string from scratch

In [30]:
str(123)

'123'

In [31]:
type(len)

builtin_function_or_method

In [32]:
repr(len)

'<built-in function len>'

### Beware: `str` exists as built-in type and built-in function

The result is that it _feels_ as if you are able to cast from other types when you invoke `str`, when in reality you are invoking the `__str__` magic method on the target object.

### Slicing

* Allows us to slice and dice strings and other sequences as needed
* Allows to express start and (optionally) end index
  * optionally allows to give a step size (e.g.: every second element)

In [33]:
hello = "Hello world"
# access last character
print(hello[-1])
# access first character
print(hello[0])
# take characters from index 3 up to 6 (0-based)
print(hello[3:6])
# take everything from index 3 (0-based)
print(hello[3:])

d
H
lo 
lo world


## Factorial — a small "practical" example

Factorial is a mathematical function that computes the product of all numbers from 1 through n for a given n.

* The mathematical notation is `3!` for "three factorial"
* General form: `fac(n)` -> `1 * 2 * 3 ... * n`
* For `n == 3`: `fac(3)` -> `1 * 2 * 3` equals 6
* Zero is a special case and `0!` is defined as being `1`

In [34]:
# Recursive implementation, i.e. fac() calls itself with an ever decrementing value for n
def fac(n):
    if n == 0: # if n == 0 we simply return 1, because fac(0) is defined as 1
        return 1
    return n * fac(n - 1) # otherwise we call fac() again with n - 1 until we end up in the above condition
                          #and return the result of n times the call's result

# Iterative implementation, i.e. using a loop to achieve the same as above
def fac2(n):
    result = 1 # our "default" return value
    while n > 0: # loop as long as n is greater than 0
        result = result * n # multiply existing result by current n
        n -= 1 # decrement n by one
    return result # return result

n = 8
# printing results via f-strings
print(f"{fac(3)=}") # call with argument n == 3
print(f"{fac(n)=}") # call with variable n from above
print(f"{fac(n)=:#x}") # formatting representation as hexadecimal (x) with 0x prefix (#)

fac(3)=6
fac(n)=40320
fac(n)=0x9d80


This example is meant to illustrate a few elements of the Python language and we're going to come back to the individual elements when appropriate.

Right now let's focus on the last three lines that deal with strings.

## Strings continued (type: `str`)

### Modern string formatting with f-strings

So-called f-strings replace the old-style printf-like string formatting and are essentially syntactic sugar for the formatting that was available through the string `format()` method.

* The general form is `f"..."`, i.e. a string with a leading `f` before the literal.
* Inside it we use brace expressions, e.g. `{x}` to format an (ambient) variable `x`.
* We can also affect the _representation_ of the output by giving some specifier after a `:`
  * In the above example we have `:#x` which has the following effect
    * `x` alone converts the representation to hexadecimal
    * `#` gives the hexadecimal number the conventional `0x` prefix
* There are also conversions like `s` (which means the variable is passed through `str()`) and `r` (which means the variable is passed through `repr()`). Conversions appear _after_ specifiers for the representation and _after_ a separating `!`.
* To get a verbatim brace character, duplicate it. So in essence `f"{{x}}"` is the same as `"{}"`.

More information and also examples contrasting the different flavors can be found at [pyformat.info](https://pyformat.info) and [fstring.help](https://fstring.help/); with [cheat sheet here](https://fstring.help/cheat).

#### Old-style formatting (printf-like)

In [35]:
print("%d %s %x" % (123, "foo", 456))

123 foo 1c8


We can achieve something similar by converting `456` to a hexadecimal string through the `hex()` built-in function.

In [36]:
print("%d %s %s" % (123, "foo", hex(456)))

123 foo 0x1c8


#### Using string's `format()` method

Note, there's nothing wrong with this method. It's a lot better than the old printf-like method. However, it's not quite as succinct as modern f-strings. One reason to use this method is to control what names are available in the format string.

In [37]:
print("{} {} {:#x}".format(123, "foo", 456))

123 foo 0x1c8


#### **RECOMMENDED:** actual f-strings (introduced in Python 3.6)

Use this method for new code. It's more expressive and more succinct. The corresponding [PEP 498](https://peps.python.org/pep-0498/) explains the intent, reasoning and implementation details.

In [38]:
print(f"{123} {"foo"} {456:#x}")

123 foo 0x1c8


Obviously this is a contrived example. The advantages become more evident when we don't use values but variables.

In [39]:
a = 123
b = "foo"
c = 456
print(f"{a} {b} {c:#x}")

123 foo 0x1c8


But it gets cooler, you can insert an `=` after the variable name, too. This way we will get to see the variable name, followed by the `=` and the value.

In [40]:
a = 123
b = "foo"
c = 456
print(f"{a=} {b=} {c=:#x}")

a=123 b='foo' c=0x1c8


But we can do one better. Instead of just tapping into variables you can also call functions directly in those brace expressions.

In [41]:
def fac(n):
    if n == 0: # if n == 0 we simply return 1, because fac(0) is defined as 1
        return 1
    return n * fac(n - 1)

print(f"{fac(8)=}")

fac(8)=40320


In [42]:
def fac(n):
    if n == 0: # if n == 0 we simply return 1, because fac(0) is defined as 1
        return 1
    return n * fac(n - 1)

print(f"{{{fac(8)=}}} ... :-{{") # same as above, but surrounded by braces, with a disappointed smiley
# This is how to get verbatim braces

{fac(8)=40320} ... :-{


The advantage of using brace expressions is that they allow you also to nest Python code that itself contains double-quotes, for example.

In [43]:
b = ["foo", "bar", "baz"]
print(f"{b}")
print(f"{", ".join(b)}")

['foo', 'bar', 'baz']
foo, bar, baz


## Interlude: strings are immutable

In [44]:
foo = "spam"
bar = "ni ni ni"
print(id(foo)) # identity (in CPython usually the address of the object in memory)
print(id(bar))
bar = bar + foo # we assign a totally new object to the name (variable) bar
print(id(bar))

140719535705696
2436250033200
2436250029744


Each of these unique values means that each object on which we called the built-in `id()` function has its own distinct identity.

Note: as an implementation detail, CPython will normally treat the address of the object, i.e. pointer to it, as the ID.

## Interlude: lists (another sequence type) are mutable

In [45]:
b = [1, 2, 3]
b.append(4) # append single value to the list
b

[1, 2, 3, 4]

In [46]:
b = [1, 2, 3]
# this won't append the individual elements to b, instead it will create a nested list
b.append([4, 5, 6])
b

[1, 2, 3, [4, 5, 6]]

In [47]:
b = [1, 2, 3]
# this will extend the list b
b.extend([4, 5, 6]) # this will extend list b by the passed list
b

[1, 2, 3, 4, 5, 6]

## The consequences of mutability

Mutability affects whether an object gets manipulated or a new one gets created. This can be demonstrated:

In [48]:
b = [1, 2, 3]
b.extend([4, 5, 6]) # this will extend list b by the list
# b == [1, 2, 3, 4, 5, 6]
c = b # assign b to c, they both now refer to the same object
print(f"BEFORE: {c=}, {b=} ({id(c) == id(b)=})")
print(f"\t{id(c)=:#x}, {id(b)=:#x}")
# don't believe me? let's try: we _only_ change the value at index 3 of b (leaving c untouched)
b[3] = 654321
print(f"AFTER : {c=}, {b=} ({id(c) == id(b)=})")

BEFORE: c=[1, 2, 3, 4, 5, 6], b=[1, 2, 3, 4, 5, 6] (id(c) == id(b)=True)
	id(c)=0x2373bce1100, id(b)=0x2373bce1100
AFTER : c=[1, 2, 3, 654321, 5, 6], b=[1, 2, 3, 654321, 5, 6] (id(c) == id(b)=True)


... huh? ... so if `c` and `b` now refer to the same object, how do you dissociate them?

In [49]:
b = [1, 2, 3]
b.extend([4, 5, 6]) # this will extend list b by the list
# b == [1, 2, 3, 4, 5, 6]
c = b[:] # we use slicing without start and end index to copy the whole list
print(f"BEFORE: {c=}, {b=} ({id(c) == id(b)=})")
print(f"\t{id(c)=:#x}, {id(b)=:#x}")
b[3] = 654321 # just like before
print(f"AFTER : {c=}, {b=} ({id(c) == id(b)=})")

BEFORE: c=[1, 2, 3, 4, 5, 6], b=[1, 2, 3, 4, 5, 6] (id(c) == id(b)=False)
	id(c)=0x2373bce0f80, id(b)=0x2373b6f7a80
AFTER : c=[1, 2, 3, 4, 5, 6], b=[1, 2, 3, 654321, 5, 6] (id(c) == id(b)=False)


As you can see `b` and `c` are now two distinct objects and manipulating the element at `b[3]` does not affect `c` in any way!

The same can be achieved (also for types that have no slicing support, i.e. non-sequence types) with [`copy.deepcopy`](https://docs.python.org/3.13/library/copy.html#copy.deepcopy).

In [50]:
b = [1, 2, 3]
b.extend([4, 5, 6]) # this will extend list b by the list
# b == [1, 2, 3, 4, 5, 6]
from copy import deepcopy # import the deepcopy() function from the copy module
c = deepcopy(b)
print(f"BEFORE: {c=}, {b=} ({id(c) == id(b)=})")
print(f"\t{id(c)=:#x}, {id(b)=:#x}")
b[3] = 654321 # just like before
print(f"AFTER : {c=}, {b=} ({id(c) == id(b)=})")

BEFORE: c=[1, 2, 3, 4, 5, 6], b=[1, 2, 3, 4, 5, 6] (id(c) == id(b)=False)
	id(c)=0x2373bce1100, id(b)=0x2373bce1940
AFTER : c=[1, 2, 3, 4, 5, 6], b=[1, 2, 3, 654321, 5, 6] (id(c) == id(b)=False)


As you can see, the same effect!

## Back to strings: raw strings

They allow you to type backslashes without having to escape them.

In [51]:
r"bla\blubb"

'bla\\blubb'

Normally if you wanted to type a `"` without escaping them you'd use single-quoted strings:

In [52]:
'author Tom says "boo"'

'author Tom says "boo"'

If we were to use a double-quoted string you'd have to escape the double quotes like so:

In [53]:
"author Tom says \"boo\""

'author Tom says "boo"'

Same effect.

## Interlude: string escaping

The purpose is to be able to express characters that cannot otherwise be expressed, e.g. line breaks (i.e. `\n` and `\r`) or tabulators (`\t`). It can also be used to "escape" the conundrum that you want to use a character such as `"` inside a `"..."` (double-quoted) string or `'` inside a `'...'` (single-quoted) string.

The escape character is a backslash `\`. But now if you wanted to use a backslash verbatim you'd have to resort to raw strings (see above: `r"...\..."`) or escape the escape character: `"...\\..."`.

* `\n` line feed
* `\r` carriage return
* `\b` backspace
* `\t` tabulator
* `\\` literal backslash
* ... several more

### "Regular expression strings"? Nope: raw strings ...

You will often see raw strings in the context of regular expressions. However, the `r` does **not** mean regular expression or some such. It stands for raw.

The reason they often occur in this context is that they are convienent as regular expressions themselves use backslashes as escape characters. So in order to achieve the desired effect inside the regular expression you first need to sidestep the string escaping from the Python side before the regular expression engine sees it.

As an example consider a regular expression that attempts to match anything from the start of a line (`^`) _but_ the line feed character (i.e. `[^\n]+`). Both Python and regular expressions know of `\n`. So if we use:

```
"^[^\n]+"
```

Python will already have converted the `\n` to a literal line feed. This _may_ work and match. However, if you wanted the regular expression engine to see an actual backslash followed by an n then you'd have to use:

```
"^[^\\\n]+"
```

As you can see this can go overboard easily.  By using a raw string we convey the same without going through hoops:

```
r"^[^\n]+"
```

## Raw strings one last time (for now)

You can even combine raw strings and f-strings like so.

In [54]:
a = "foobar"
fr"^{a}[^\n]+?$"

'^foobar[^\\n]+?$'

It may not be immediately evident, but this is incredibly useful if you want to "build" regular expressions that are to match certain literal values (like `foobar` in this case) within a bigger string.

PS: the meaning of the regular expression is: match a literal `foobar` at the beginning of a line (or string), followed by one or more characters that are _not_ a line feed (`\n`) up until the end of the line (or string).

## Interlude: triple-quoted strings

Triple-quoted strings offer the ability to use line breaks freely and embed double quotes freely inside a string.

### Most basic form

In [55]:
"""
Some
very
very
long
multi-line
text
"""

'\nSome\nvery\nvery\nlong\nmulti-line\ntext\n'

The line breaks become part of the string. But imagine you are assigning with triple-quoting inside nested scope:

In [56]:
if True:
    a = """
    Some
      very
      very
    long
    multi-line
    text
    """
    print(repr(a))

'\n    Some\n      very\n      very\n    long\n    multi-line\n    text\n    '


Suddenly the leading blank spaces _also_ become part of the string. There is little you can do if you aim to observe the Python conventions. But one method is to use [textwrap.dedent](https://docs.python.org/3/library/textwrap.html#textwrap.dedent):

In [58]:
import textwrap
if True:
    a = textwrap.dedent("""\
    Some
      very
      very
    long
    multi-line
    text
    """)
    print(repr(a))

'Some\n  very\n  very\nlong\nmulti-line\ntext\n'


Notice how I snuck in a trailing backslash (`\`) after the opening `"""`. This is called a line continuation and also works in other contexts or with single and double quoted strings. The purpose is that the first line starts with the indentation of `Some` (four leading space) instead of an empty first line. And the reason is that `dedent()` uses the first line to deduce the indentation and strip that off of the start of each line.

That's also the reason why the two additional spaces from the lines containing `very` are still there, while the overall indentation got removed.

### Doc-strings

Very often you will see so-called doc-strings which end up populating the `__doc__` magic attribute and are used -- among other things -- to auto-generate documentation.

That's were often you will see triple-quoted strings. But that's not a hard rule. Single and double quoted will work equally well, but may require you to be more explicit when it comes to line breaks.

In [59]:
def foo():
    """\
    Here goes the description
    """
    ...
foo.__doc__

'Here goes the description\n    '

## That was the gist of basic types with a slight bias towards strings.