# Intro To Python

## Introduction

The only hard requirement for this course is knowing a bit about the Python programming language.

In this introductory chapter we quickly go over some features of Python. This should be enough for you to understand the code presented in the rest of the chapters.

These slides were made with both people who are comfortable with programming and beginners in mind. Don't worry if you do not understand some programming terms, just ignore them.

## Python

Python is an interpreted dynamically typed programming language. Code blocks are denoted using whitespace.

Here is a simple program in Python:

In [1]:
x = 3

if x % 2 == 0:
  print(f"x={x} is even")
else:
  print(f"x={x} is odd")

x=3 is odd


## Python

If you want to do an `else if` statement you have to do it like this:

In [2]:
x = 5

if x == 0:
  print(f"x={x} is equal to zero")
elif x > 0:
  print(f"x={x} is positive")
else:
  print(f"x={x} is negative")

x=5 is positive


## Python

Here is how to loop:

In [3]:
for _ in range(3):
  print("Hello from for loop!")

i = 0
while i < 3:
  print("Hello from while loop!")
  i += 1

Hello from for loop!
Hello from for loop!
Hello from for loop!
Hello from while loop!
Hello from while loop!
Hello from while loop!


## Python

Variable's type is inferred during assignment.

In [4]:
x = 5
print(f"x type is {type(x)}")
y = 5.
print(f"y type is {type(y)}")

x type is <class 'int'>
y type is <class 'float'>


## Python

You can use either single or double quotes for string literals.

In [5]:
print(type("Double quotes"))
print(type('Single quotes'))

<class 'str'>
<class 'str'>


## Lists

If you want to store multiple values in a single variable you can use lists. Under the hood lists are implemented as dynamic arrays.

In [6]:
#| output-location: slide
positions = ["First", "Second", "Third"] # List literals are denoted by square brackets

positions.append("Fourth") # Appending to a list, this appends in place
positions = [*positions, "Fifth"] # * - spread or unpacking operator

for position in positions: # Looping over lists
  print(position)
print("-------------------")

for idx, position in enumerate(positions): # If you also need the index of the list element
  print(f"{position}:{idx+1}")

First
Second
Third
Fourth
Fifth
-------------------
First:1
Second:2
Third:3
Fourth:4
Fifth:5


## Lists

Python makes it easy to slice lists:

In [7]:
primes = [2, 3, 5, 7, 11, 13, 17, 19]
print(primes[3]) # Get element of list
print(primes[1:3]) # Get a slice of a list
print(primes[:3]) # Empty gets interpreted as 0
print(primes[-3:-1]) # Negative numbers index from the end
print(primes[::2]) # You can also specify the step size for indexing

7
[3, 5]
[2, 3, 5]
[13, 17]
[2, 5, 11, 17]


## Lists

Python also has list comprehension:

In [8]:
numbers = [number for number in range(5)]
print([number**2 for number in numbers]) # ** - raising to a power
print([number for number in numbers if number % 2 == 0]) # You can also do ifs
print([number if number % 2 == 0 else -number for number in numbers]) # And if else, although the syntax is terrible

[0, 1, 4, 9, 16]
[0, 2, 4]
[0, -1, 2, -3, 4]


## Functions

Functions are defined as follows. Functions can have one, none or multiple return values. If a function does not explicitly return anything its return value is `None`.

In [9]:
#| output-location: slide
def void():
  print("Hello!")

print(void())

def next(x):
  return x+1

print(next(1))

def next_two(x):
  return x+1, x+2

print(next_two(1))

Hello!
None
2
(2, 3)


## Functions

All variables in Python are actually pointers to objects. All variables in functions are passed by value, but since all variables are pointers the behavior can be confusing at first glance. It's best to illustrate with example:

## Functions

In [10]:
#| output-location: slide
def next(x):
  # x inside the function scope is a new pointer that references the object that the input pointer references
  x += 1 # this line creates a new object in memory equal to x + 1 and makes x reference it
         # in consequence, the changes to x will not be visible outside the function scope
  return x

x = 1
print(next(x))
print(x)
print("--------------")

def append_1_inplace(x):
  x.append(1) # this line changes x in place so the change will be visible outside the function scope
  return x

x = [0]
print(append_1_inplace(x))
print(x)
print("--------------")

def append_1(x):
  x = [*x, 1] # this line creates a new object in memory and makes x reference it
              # in consequence, the changes to x will not be visible outside the function scope
  return x

x = [0]
print(append_1(x))
print(x)
print("--------------")

2
1
--------------
[0, 1]
[0, 1]
--------------
[0, 1]
[0]
--------------


## Functions

Python has anonymous functions. They are quite restrictive. For example, you can only have only one line of code in them.

In [11]:
def eval(f, x):
  return f(x)

print(
  eval(
    lambda x: x**2, # Anonymous function
    2
  )
)

4


## Dictionaries

Dictionaries store key value pairs

In [12]:
#| output-location: slide
positions = {"First": 1} # Dictionary literal
positions["Second"] = 2 # Adding new key value pair
positions = {**positions, "Third": 3} # You can use ** to spread dictionaries

for key in positions: # Looping over keys
  print(key)
print("-----------")

for key, value in positions.items():
  print(f"{key}:{value}")

positions["Fourth"] # Trying to access value that does not exist throws a KeyError

First
Second
Third
-----------
First:1
Second:2
Third:3


KeyError: 'Fourth'

## Classes and Objects

Python is an object oriented language. Actually, under the hood, everything in Python is an object.

Here is how to define classes:

In [13]:
class Vect2:
  def __init__(self, x, y): # Constructor
    self.x = x
    self.y = y

  def norm(self):
    return (self.x**2 + self.y**2)**(1/2)

vector = Vect2(3, 4)
print(vector.norm())

5.0


## Classes and Objects

The first argument in a class method is always the object the method was called on. By convention this variable is called `self` (in other languages the same thing is achieved using the keyword `this`). If you forget about `self` you will get cryptic errors!

## Classes and Objects

The method `__init__` is called the constructor. It gets run when an instance of an object is created.

The methods that begin and end in double underscore are built in methods used to implement operator overloading and some other things. They are called magic or dunder methods. The full list can be found [here](https://docs.python.org/3/reference/datamodel.html#special-method-names).

## Classes and Objects

In [14]:
class Vect2:
  def __init__(self, x, y): # Constructor
    self.x = x
    self.y = y

  def __add__(self, other): # Implements the "+" operation on Vect2
    return Vect2(self.x+other.x, self.y+other.y)

  def __str__(self): # This method gets called when you do str(object), for example when trying to print the object
    return f"Vect2({self.x}, {self.y})"

  def norm(self):
    return (self.x**2 + self.y**2)**(1/2)

vector1 = Vect2(1, 2)
vector2 = Vect2(3, 4)
print(vector1 + vector2)

Vect2(4, 6)


## Classes and Objects

There is inheritence. It is done as follows:

In [15]:
class Animal:
  def __init__(self, no_legs):
    self.no_legs = no_legs

class Dog(Animal):
  def __init__(self):
    super().__init__(4) # super() refers to the object that is inherited from

  def speak(self):
    return "Woof!"

dog = Dog()
print(f"Dog has {dog.no_legs} legs and says '{dog.speak()}'.")

Dog has 4 legs and says 'Woof!'.


## Classes and Objects

In Python all class fields and methods are public. By convention, if a field or method name starts with an underscore it means that you probably don't want to use it, if it starts with double underscore then you really shouldn't use it.

In [16]:
class SecretContainer():
  def __init__(self):
    self._secret = "This is sorta private"
    self.__super_secret = "This is really private!"

## Classes and Objects

Python has a garbage collector. Memory is freed if there are no references to it. If you want to free an object manually use the `del` keyword.

In [17]:
vector = Vect2(1, 2)
print(vector)
del vector
print(vector) # Will get an error because vector does not exist anymore

Vect2(1, 2)


NameError: name 'vector' is not defined

## Numpy

`Numpy` is a Python package for scientific computing. The main feature of `numpy` are fixed size, fixed type multidimensional arrays. `Numpy` also implements a lot of operations on these arrays. Under the hood `numpy` is implemented in C, so performing operations on `numpy` arrays is very quick.

Here are some examples:

## Numpy

In [18]:
#| output-location: slide
import numpy as np

array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
print(array1 + array2) # Sums componentwise
print(array1 * array2) # Multiplies componentwise
print("----------------")

matrix1 = np.array([
  [1, 2],
  [3, 4]
])
matrix2 = np.array([
  [5, 6],
  [7, 8]
])
print(matrix1 * matrix2) # Multiplies componentwise
print("----------------")
print(matrix1 @ matrix2) # Matrix multiplication, as defined in a linear algebra course
print("----------------")
print(np.linalg.eigvals(matrix1)) # Computes eigenvalues of matrix

[5 7 9]
[ 4 10 18]
----------------
[[ 5 12]
 [21 32]]
----------------
[[19 22]
 [43 50]]
----------------
[-0.37228132  5.37228132]


## Pandas

`Pandas` is a Python package implementing dataframes. Dataframe is a type for tabular data.

In [20]:
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/jputrius/ml_intro/refs/heads/main/data/titanic/titanic.csv") # Read from csv, here I load directly from a url, you can also provide a path to a file
df.head(3) # Return first three rows

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


## Pandas

Here is how to filter by column values.

In [21]:
df[df["Name"] == "Montvila, Rev. Juozas"]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S


## Pandas

Here is how to compute averages of dataframe's columns. You can also do [joins](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge), [grouping](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html#pandas.DataFrame.groupby) and all other operations you could do using SQL.

In [22]:
df["Fare"].mean()

np.float64(32.204207968574636)

## Pandas

Each dataframe has an index for its rows.

In [23]:
df = pd.DataFrame(
  [["orange", 4], ["brown", 4], ["green", 2]],
  index=['cat', 'dog', 'parrot'],
  columns=['color', 'leg_count']
)
df

Unnamed: 0,color,leg_count
cat,orange,4
dog,brown,4
parrot,green,2


## Pandas

In [24]:
df["color"] # Getting columns

cat       orange
dog        brown
parrot     green
Name: color, dtype: object

In [25]:
df.loc["dog"] # Getting row by index

color        brown
leg_count        4
Name: dog, dtype: object

In [26]:
df.loc["dog", "leg_count"] # Getting specific column of row by index

np.int64(4)

In [27]:
df.iloc[0:2] # Getting rows by row number (counting from 0)

Unnamed: 0,color,leg_count
cat,orange,4
dog,brown,4


## Numpy Practice Task

The best way to learn a new programming language / package is to implement something in it.

You can try implementing Newton's algorithm for finding a solution to the nonlinear equation
$$
  F(x) = 0,
$$
where $F: \mathbb{R}^{n} \rightarrow \mathbb{R}^n$ is a differentiable function.

## Numpy Practice Task

Using Newton's method you approximate the solution by starting from initial guess $x_0$ and iterating by using the formula
$$
  DF(x_n)(x_{n+1}-x_n) = -F(x_n),
$$
where $DF$ is the [Jacobian](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant) of $F$. Note that this is a system of linear equations.

## Numpy Practice Task

Tips:

1. You can approximately compute derivatives using the formula
$$
  f'(x) \approx \frac{f(x+h) - f(x)}{h}.
$$
where $h$ is a small number like $10^{-6}.$
2. When computing a new approximation you will need to solve a system of linear equations. You can do so using `numpy`. Have a look through its [documentation](https://numpy.org/doc/stable/reference/routines.html).

## Numpy Practice Task 

3. You can try
$$
  F(x, y) = (\cos(x)-\sin(y), \cos(x)).
$$

## Pandas Practice Task

Load the two csv files found [here](https://github.com/jputrius/ml_intro/tree/main/data/movielens) into a `pandas` dataframe. One file contains ratings of movies given by users and the other file contains information on said movies.

Answer the following questions:

1. What are the top 10 movies by rating count?
2. What are the top 10 movies by average rating?
3. What are the top 10 action movies by rating count?
4. What are the top 10 action movies by average rating?

## Pandas Practice Task

Truncate the dataset to contain only top 1000 movies by rating count. Produce a pivot table where rows are users, columns are movies and values are ratings (NA if a user has not rated a movie). [Pivot tables](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html#pandas.pivot_table) are built into `pandas`.