# <u>Python basics</u>

In this part of the tutorial, we will cover some of the Python's fundamentals. However, we will only scratch the surface. So, if you are new to Python, you can check this tutorial by W3School (or some other you'd like):<br> https://www.w3schools.com/python/default.asp

In [1]:
import torch
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## <u>Data types</u>

Python's standard data types are `bool`, `int`, `float`, `str`, and `dict`. In order to check the type of a variable, you can use the `type()` built-in function:

In [2]:
w = True
print(type(w))

x = 42
print(type(x))

y = 3.1415926
print(type(y))

<class 'bool'>
<class 'int'>
<class 'float'>


Regarding strings, the coding standard is commonly to use single-quotes for string literals, e.g. 'my-identifier', but use double-quotes for strings that are likely to contain single-quote characters as part of the string itself (such as error messages, or any strings containing natural language), e.g. "You've got an error!". To my knowledge, you can always use them interchangeably without even a warning:

In [3]:
z1 = 'grapefruit'
z2 = "I ate a grapefruit"
print(type(z1), type(z2))

<class 'str'> <class 'str'>


In [4]:
thisdict = {
  "key": "value",
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
print(f"{type(thisdict)}\n", thisdict["model"])

<class 'dict'>
 Mustang


As you can see, Python automatically infers the type of a variable when you assign a value to it, even if you were using a different type before the assignment. There is a way to declare the type of a variable, but in reality its mostly for decoration/clarity. Some IDEs may be able to warn you "Hey, you assigned a value with the wrong type!", but, the code will run anyway, so be careful:

In [5]:
z: int = 'grapefruit'
print(type(z))

z = 3
print(type(z))

<class 'str'>
<class 'int'>


You may have noticed that all the types have `class` in front of them.
That's because in Python *everything* is an object, even functions:

In [6]:
def foo(x):
    return x + 1

print(type(foo))

foo = 0.1
print(type(foo))
# function foo no longer exists

<class 'function'>
<class 'float'>


## <u>Basic number operations</u>


In [7]:
x = 5
y = 3
print(f" x = {x}, y = {y}")
print(f"Addition:\n x + y = {x + y}")
print(f"Subtraction:\n x - y = {x - y}")
print(f"Multiplication:\n x * y = {x * y}")
print(f"Float division:\n x / y = {x / y}")
print(f"Integer division:\n x // y = {x // y}")
print(f"Remainder of integer division (modulo):\n x % 5 = {x % y}")
print(f"Exponentiation:\n x ** y = {x**y}")
print(f"Increment:\n x += 1, x = {x - 1}")

 x = 5, y = 3
Addition:
 x + y = 8
Subtraction:
 x - y = 2
Multiplication:
 x * y = 15
Float division:
 x / y = 1.6666666666666667
Integer division:
 x // y = 1
Remainder of integer division (modulo):
 x % 5 = 2
Exponentiation:
 x ** y = 125
Increment:
 x += 1, x = 4


## <u> Assignment Operators</u> (or augmented assignment operators)

In [8]:
x = 10
print(f"  x = {x}")
x += 1
print(f"Addition Assignment:\n  x += 1, x = {x}")
x -= 2
print(f"Subtraction Assignment:\n  x -= 2, x = {x}")
x *= -3
print(f"Multiplication Assignment:\n x *= -3, x = {x}")
x /= 3
print(f"Division Assignment:\n  x /= 4, x = {x}")
x %= 2
print(f"Modulus Assignment:\n  x %= 2, x = {x}")

  x = 10
Addition Assignment:
  x += 1, x = 11
Subtraction Assignment:
  x -= 2, x = 9
Multiplication Assignment:
 x *= -3, x = -27
Division Assignment:
  x /= 4, x = -9.0
Modulus Assignment:
  x %= 2, x = 1.0


## <u> Strings</u>

Something useful to know is how to format strings, and there are many ways for doing so: https://docs.python.org/3/tutorial/inputoutput.html <br>
*Formatted String Literals* or *fstrings*  is the way I usually prefer:

In [9]:
x = 20/3
msg = f"When displaying your results, you can format them like this {x:.2f}, instead of this {x}."
print(msg)

When displaying your results, you can format them like this 6.67, instead of this 6.666666666666667.


Some basic operators are overloaded to work with strings:

In [10]:
str1 = "We can"
str2 = " add strings"
str3 = " with '+'."
msg = str1 + str2 + str3
print(msg)

We can add strings with '+'.


In [11]:
msg = "Repetition legitimizes. "
msg *= 6
print(msg)

Repetition legitimizes. Repetition legitimizes. Repetition legitimizes. Repetition legitimizes. Repetition legitimizes. Repetition legitimizes. 


This tutorial doesn't cover many cool operations you can do with strings, and Python is good for string manipulation in general. You can check out more here:
https://www.w3schools.com/python/python_strings.asp


## <u>Functions</u>

In Python there are two types of functions:
- Standard functions
- Lambda (or anonymous) functions

In [12]:
def foo(x, y):
    return (x + y)**2

foo2 = lambda x,y: (x + y)**2

print(f"type(foo)  = {type(foo)}\ntype(foo2) = {type(foo2)}\n")
print(f"1st function call:\n foo(2,3) = {foo(2, 3)}\n2nd function call:\n foo2(3,2) = {foo2(3, 2)}")

type(foo)  = <class 'function'>
type(foo2) = <class 'function'>

1st function call:
 foo(2,3) = 25
2nd function call:
 foo2(3,2) = 25


### <u>Arguments and keyword arguments</u>

In Python, there are two ways of specifying the parameters (inputs) of a function: *arguments* and *keyword arguments*.

<b>Arguments</b> are parameters that you pass to the function by specifying only the value (e.g., `foo(4)`). You will often see them referred as `args` in source code.

<b>Keyword arguments</b> are parameters that you pass to the function via an assignment operation (e.g., `foo(x=4)`). You will often see them referred as `kwargs` in source code.

Arguments are assigned to the function's parameters in sequential order. This means that if a function was declared with `def foo(x, y, z):`, it will assign the first argument to `x`, the second to `y`, and the third to `z`.

When keyword arguments are passed, instead, the parameter to be read is specified by the keyword, and does not require a specific order. This means that `foo(z=3, x=1, y=2)` is read as `foo(1, 2, 3)`. However, when a function is called using both arguments and keyword arguments, arguments always need to be <i>before</i> the keyword arguments. For example `foo(x=1, 2)` will raise an error.

In [13]:
def foo(x, y):
    return x - y

print(f"Only arguments:\n foo(4, 3) = {foo(4, 3)}")
print(f"Only keyword arguments:\n foo(x=4, y=3) = {foo(x=4, y=3)}")
print(f"Only keyword arguments, but in reverse order:\n foo(y=3, x=4) = {foo(y=3, x=4)}")
print(f"Mix of arguments and keyword arguments:\n foo(4, y=3) = {foo(4, y=3)}")

Only arguments:
 foo(4, 3) = 1
Only keyword arguments:
 foo(x=4, y=3) = 1
Only keyword arguments, but in reverse order:
 foo(y=3, x=4) = 1
Mix of arguments and keyword arguments:
 foo(4, y=3) = 1


## <u>If-Else statements</u>

In [14]:
x = '0'

if x == 0:
    ans = "x is 0."
elif x in [42, 3.14, "grapefruit"]:
    ans = "x is either 42, 3.14, or 'grapefruit'."
else:
    ans = "Idk what x is."

# One-liner if else:
ans += " But now I do." if x == '0' else " As of now."
print(ans)

Idk what x is. But now I do.


## <u>Loops</u>

In [15]:
for elem in [0, 42, 3.14]:
    print(f"The current element is: {elem}")

The current element is: 0
The current element is: 42
The current element is: 3.14


In [16]:
x = 3
while not x < 1:
  x -= 0.1
print(f"x = {x}")

x = 0.9999999999999983


The built-in function `range` is a common one to generate an iterable:

In [17]:
for elem in range(5):
    print(f"elem = {elem}")

elem = 0
elem = 1
elem = 2
elem = 3
elem = 4


The built-in function `zip` is useful as it allows you to iterate two or more iterables at the same time:

In [18]:
letters = ['a', 'b', 'c']
numbers = [1, 2, 3]
words = ['one', 'two', 'three']

for letter, number, word in zip(letters, numbers, words):
    print(f"{letter} -> {number} -> {word}")

a -> 1 -> one
b -> 2 -> two
c -> 3 -> three


## <u>Lists</u>

Python lists work like arrays, but its elements are allowed to be of different types.

[//]: ![image.png](attachment:image.png)

In [19]:
mylist = [3.14, 'e', 42, "it is 1.62", foo, thisdict]
print(f" mylist = {mylist}")

# indices start from zero
print(f"1st element:\n mylist[0] = {mylist[0]}")
print(f"3rd element:\n mylist[2] = {mylist[2]}")
print(f"Last element:\n mylist[-1] = {mylist[-1]}")
print(f"1st element (negative indexing):\n mylist[-6] = {mylist[-6]}")

print(f"Type of 5th element:\n type(mylist[4]) = {type(mylist[4])}")
print(f"Type of 2nd element:\n type(mylist[1]) = {type(mylist[1])}")

 mylist = [3.14, 'e', 42, 'it is 1.62', <function foo at 0x7c8e5e99b760>, {'key': 'value', 'brand': 'Ford', 'model': 'Mustang', 'year': 1964}]
1st element:
 mylist[0] = 3.14
3rd element:
 mylist[2] = 42
Last element:
 mylist[-1] = {'key': 'value', 'brand': 'Ford', 'model': 'Mustang', 'year': 1964}
1st element (negative indexing):
 mylist[-6] = 3.14
Type of 5th element:
 type(mylist[4]) = <class 'function'>
Type of 2nd element:
 type(mylist[1]) = <class 'str'>


Some commonly used built-in functions with lists that will prove useful:

In [20]:
mylist = [2, 3, 5, 2, [6, 5, 6]]
print(f"mylist = {mylist}")
print(f"len(mylist) = {len(mylist)}")

mylist.remove([6, 5, 6])
print(f"mylist = {mylist}")

mylist.append(-1)
print(f"sorted(mylist) = {sorted(mylist)}")

print(f"min(mylist) = {min(mylist)}\nmax(mylist) = {max(mylist)}")
print(f"sum(mylist) = {sum(mylist)}")

mylist = [2, 3, 5, 2, [6, 5, 6]]
len(mylist) = 5
mylist = [2, 3, 5, 2]
sorted(mylist) = [-1, 2, 2, 3, 5]
min(mylist) = -1
max(mylist) = 5
sum(mylist) = 11


## <u> Slicing </u>

In Python, you can get sublists (i.e., slices) of a lists using the *slice notation*. In general, the slice notation has the follwing form and meaning:
- `mylist[start:stop:step]` = return all the elements of `mylist` starting from `start` and ending at `stop-1` with step size `step`.

If `start` is not specified (empty before the first colon), it defaults to the beginning of the list.

Likewise, if `stop`is not specified (empty after the second colon), it defaults to the end of the list.

If not specified, `step` defaults to 1.

In [21]:
mylist = [3.14, 'e', 42, "it is 1.62", foo, thisdict["year"]]

bar = (2**7)*'-' + '\n'

print(f"{bar}lenth of list = {len(mylist)} total elements.")

print(f"{bar}First 3 elements = {mylist[0:3]} same as {mylist[:3]}")

print(f"{bar}3rd to 5th = {mylist[2:5]}")

print(f"{bar}From 3rd to the last = {mylist[3:]}")

print(f"{bar}From 3rd to next to last = {mylist[3:-1]}")

print(f"{bar}Reversed list = {mylist[::-1]}") # start from the end and go to the start with a step of -1.

--------------------------------------------------------------------------------------------------------------------------------
lenth of list = 6 total elements.
--------------------------------------------------------------------------------------------------------------------------------
First 3 elements = [3.14, 'e', 42] same as [3.14, 'e', 42]
--------------------------------------------------------------------------------------------------------------------------------
3rd to 5th = [42, 'it is 1.62', <function foo at 0x7c8e5e99b760>]
--------------------------------------------------------------------------------------------------------------------------------
From 3rd to the last = ['it is 1.62', <function foo at 0x7c8e5e99b760>, 1964]
--------------------------------------------------------------------------------------------------------------------------------
From 3rd to next to last = ['it is 1.62', <function foo at 0x7c8e5e99b760>]
------------------------------------------

### <u>List comprehension</u>

If you come from a C/C++ background, you will be tempted to spam for-loops all the time. In Python, for-loops are slower, and in some cases there are better alternatives. For example, when you want to perform some operation <i>independently</i> on all the elements of a list,
a better alternative could be to use <b>list comprehension</b>. <br>
The caveat here is the "independently": that requires that the operation can be performed in parallel for all the elements in the list. <br>
Toy Example:
- Add a constant, say 1, to each element of a list.

In [22]:
# randomly select 2^10 numbers in [0, 99] to make a list
mylist = random.choices(range(100), k=2**10)
print(type(mylist))

<class 'list'>


With normal for-loops:

In [23]:
def test_function(somelist):
    newlist = []
    for elem in somelist:
        newlist.append(elem + 1)
    return newlist

%timeit -r 40 -n 1000 test_function(mylist)

The slowest run took 5.38 times longer than the fastest. This could mean that an intermediate result is being cached.
191 µs ± 90.2 µs per loop (mean ± std. dev. of 40 runs, 1000 loops each)


With list comprehension:

In [24]:
def test_function2(somelist):
    return [elem + 1 for elem in somelist]

%timeit -r 40 -n 1000 test_function2(mylist)

The slowest run took 4.64 times longer than the fastest. This could mean that an intermediate result is being cached.
66.7 µs ± 35.4 µs per loop (mean ± std. dev. of 40 runs, 1000 loops each)


In [25]:
# print to see that they're identical
print(f"First method (for-loop): {' '*10} {test_function(mylist[10:20:2])}")
print(f"Second method (list comprehension): {test_function2(mylist[10:20:2])}")

First method (for-loop):            [82, 11, 35, 49, 80]
Second method (list comprehension): [82, 11, 35, 49, 80]


In this toy example, the difference might not be *so* relevant/important, but, for more complex operations it can make a difference, and, do not forget, that this *scales* with more data. In general, when we want to do numerical operations, we will prefer to use NumPy arrays (when we can) instead of lists, because they are significantly faster. We will cover them later.

Now, if you're wondering what in the world is that `%timeit` thing, it is just a Jupyter built-in magic function (that's how they are called for real), which can be used to time Python functions. The `-r` and `-n` parameters determine how many times the function gets tested. For more details, check: https://ipython.readthedocs.io/en/stable/interactive/magics.html

## <u>Containers</u>

Lists, like NumPy arrays or PyTorch tensors, are `Containers`.
Containers in Python have the peculiar property of being passed <i>by reference</i> to a function, differently from other objects like integers and floats that are passed <i>by value</i>.

An object that is passed by value to a function essentially passes a copy of itself to the function.
So any modification done by the function does not affect the original object.

Conversely, all the modifications that a function does to an object passed by reference are reflected to the original object.
Essentially, they work like pointers in C/C++:

In [26]:
def increment_number(number):
    number = number + 1
    print(f"Number inside the function call:  {number}")
    return # nothing is returned

def replace_element(somelist, index, value):
    somelist[index] = value
    print(f"List inside the function call: {somelist}")
    # also in this case nothing is returned

# A number is not a container, so it is passed by value:
mynum = 4
print(f"Number initialization:{' '*12}{mynum}")
increment_number(mynum)
print(f"Number after the function call:{' '*3}{mynum}")

# A list is a container, so it is passed by reference:
mylist = [3.14, 'e', 42, 1729, 1.41, 1.62]
print(f"list initialization: {' '*10}{mylist}")
replace_element(mylist, 1, 2.71)
print(f"List after the function call:  {mylist}")

Number initialization:            4
Number inside the function call:  5
Number after the function call:   4
list initialization:           [3.14, 'e', 42, 1729, 1.41, 1.62]
List inside the function call: [3.14, 2.71, 42, 1729, 1.41, 1.62]
List after the function call:  [3.14, 2.71, 42, 1729, 1.41, 1.62]


## <u>Generators and Iterators</u>

In Python, generators allow to create sequences of objects at runtime rather than allocating them in advance. These will be very useful for handling datasets.

For example, suppose that we want to write a script that iterates over a specific numerical sequence (e.g., Fibonacci), for an unknown number of iterations.

It would be pretty inefficient to allocate in advance a super large list containing all the numbers of the sequences, right?
Here is where generators come into play.

Generators are defined similarly to functions, but instead of returning objects, they `yield` them.
This means that at the next call of the generators, the code will resume after the last yield.

Let's look at a super simple example to understand this better.

In [27]:
def create_dummy_generator():
    print("call #1")
    yield 3
    print("call #2")
    yield 1
    print("call #3")
    yield 4
    print("call #4")
print(type(create_dummy_generator))

my_generator = create_dummy_generator()
print(my_generator)

<class 'function'>
<generator object create_dummy_generator at 0x7c8e5e9c5150>


You cannot get elements from a generator via subscripting like you do for lists. If you try to run something like `my_generator[2]`, it will raise an error. One way to get items from a generator is by creating an <u>iterator</u> that takes elements in order. To make an iterator out of a generator, you can use `iter(my_generator)`. To get elements from an iterator, you can use the `next()` command:

In [28]:
my_generator = create_dummy_generator()
my_iterator = iter(my_generator)

print(next(my_iterator))
print(next(my_iterator))
print(next(my_iterator))
# print(next(my_iterator)) # this will print "call #4" and crash

call #1
3
call #2
1
call #3
4


Alternatively, you can just loop through all the elements of a generator. An advantage is that the for loop listens for `StopIteration` explicitly:

In [29]:
my_generator = create_dummy_generator()

for elem in my_generator:
  print(elem)

call #1
3
call #2
1
call #3
4
call #4


Let's try to implement a simple generator that produces a range at runtime:

In [30]:
def range_generator(start, stop, step):
    curr_val = start
    while curr_val < stop:
        yield curr_val
        curr_val += step  # code is resumed here after the yield

my_range = range_generator(start=0, stop=5, step=2)

for i in my_range:
  print(i)

0
2
4


If we do not want to use `for`, then we can just loop until `StopIteration` is raised:

In [31]:
stp = random.randint(5, 20)
print(f"stop = {stp}")

my_range = range_generator(start=0, stop=stp, step=2)
range_iter = iter(my_range)

while True:
    try:
        value = next(range_iter)
        print(value)
    except StopIteration:
        # Iterator is exhausted, exit the loop:
        break

stop = 17
0
2
4
6
8
10
12
14
16


## Directory: one command to rule them all

The command `dir` lists all the possible attributes and methods of an object (remember: in Python everything is an object).

In [32]:
print(dir(mylist))

['__add__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


All the strings contained in `dir(mylist)` are either methods or attributes of `mylist`.

### Dunder methods:

Some notable methods are those that start and end with double underscores, like `__dir__`. Those are called "magic methods" or "Dunder methods", and can be used in two ways:
- `mylist.__method__()`
- `method(mylist)`

In [33]:
print("Length of mylist:", len(mylist))
print("Also the length of mylist:", mylist.__len__())

Length of mylist: 6
Also the length of mylist: 6


### Two Notable Dunder Methods: how Python works under the hood

- `__getitem__`: This method is called when subscripting an object. When you write `mylist[idx]`, Python actually reads `mylist.__getitem__(idx)`

- `__iter__`: This method is called when you use `iter(my_iterable_obj)`. You can check if an object has this method to determine if it is iterable:

In [34]:
my_num = 42

if '__iter__' in dir(my_num):
    print("Numbers are iterable.")
else:
    print("Numbers are NOT iterable.")

my_generator = range_generator(start=16, stop=32, step=2)

if '__iter__' in dir(my_generator):
    print("Generators are iterable.")
else:
    print("Generators are NOT iterable.")

Numbers are NOT iterable.
Generators are iterable.


In [35]:
if '__getitem__' in dir(my_generator):
    print("Generators are subscriptable.")
else:
    print("Generators are NOT subscriptable.")

my_list = [elem for elem in my_generator]

if '__getitem__' in dir(my_list):
    print("Lists are subscriptable.")
else:
    print("Lists are NOT subscriptable.")

Generators are NOT subscriptable.
Lists are subscriptable.


#<u>Classes</u>

We will use them to define our models in the future, but only the basics are necessary.

In [36]:
class Exponentiation():
  """Here we can enter information about what our class."""

  def __init__(self, exp=2, id="None"):
    """Default constructor needs argument exp and an id."""
    self.exp = exp
    self.name = f"CustomLayer(exp={exp}, id={id})"

  def forward(self, x):
    """Method that calls the operation."""
    return [elem**self.exp for elem in x]

In [37]:
sqrt = Exponentiation(exp=0.5)
print(sqrt.name)

inpt = [4, 1, 25, 0, 49]
outpt = sqrt.forward(inpt)
print(outpt)

cube = Exponentiation(exp=3, id="cube")
print(cube.name)

outpt = cube.forward(inpt)
print(outpt)

CustomLayer(exp=0.5, id=None)
[2.0, 1.0, 5.0, 0.0, 7.0]
CustomLayer(exp=3, id=cube)
[64, 1, 15625, 0, 117649]


In [38]:
class CustomOperation():
  """Write what your custom operation class does."""

  def __init__(self, num_layers: int):
    """Default constructor of the class."""
    self.num_layers = num_layers

    model = []
    for i in range(num_layers):
      model += [
          # could operations here
          Exponentiation(id=i),
          # could operations here
      ]
    self.model = model

  def summary(self,):
    """This could be a method to print the architecture of a model."""
    out = ""
    for layer in self.model:
      out += layer.name + "\n"
    print(out)
    return

  def forward(self, x):
    """Here we would implement the forward pass in case of network."""
    for layer in self.model:
      x = layer.forward(x)
    return x


In [41]:
model = CustomOperation(num_layers=3)

inpt = [2, 0, 1, -1, -2]
print(inpt)

outpt = model.forward(inpt)
print(outpt)

model.summary()

[2, 0, 1, -1, -2]
[256, 0, 1, 1, 256]
CustomLayer(exp=2, id=0)
CustomLayer(exp=2, id=1)
CustomLayer(exp=2, id=2)

