# Introduction to

<p>
    <center>
    <img src="imgs/python-logo.png", width=300>
    </center>
</p>


**Python** is an excellent programming language for data science, scientific computations, machine learning, statistics, and many other applications. During this course, we will be polishing up our Python skills specifically on **data handling and ML-related tasks.**

> **Python standard library** is a collection of basic functions that always come pre-installed with Python. Think of it as a with a hammer, a screwdriver, and some zipties. Those tools you can absolutely use to build something useful, but it will probably take you some time and effort. The standard library is exactly what you learn during the basic Python courses.

> A python **package** is a collection of useful functions and methods that allow you to perform many interesting tasks without writing your code from scratch. Packages are created by the Python community (which includes you and me) and have to be installed separately. You can think of a package as an electric drill. Someone has already done the hard work of building the drill, and now you can use it to make holes in the wall much faster than if you were to do it by hand. In this course, we will be using packages built for easy data manipulation, visualization, and machine learning.

There is a choice of many ML and data-related Python packages. Some of the most popular ones (Numpy, Pandas, Matplotlib, Seaborn, Scikit-learn, PyTorch) are already installed in the conda environment we use, and will be introduced during the next labs.

## Reading - *Automate the Boring Stuff with Python*

The Python textbook we will be referencing during our course will be the famous *Automate the Boring Stuff with Python* by Al Sweigart, with a particular focus on chapters 1-12. The book is available online for free at:

https://automatetheboringstuff.com/

The website is structured in such a way that you can quickly find and study the topic you are interested in. You can use it as a reference during the course, having it open in a separate tab while you are working on your assignments. All the exercises below can be solved using the knowledge from the first few chapters of the textbook.

---

## Exercise 1: Python data structures (4 points)

1. Create a list `zeros` containing only **zeros**, with 32 elements.
2. Create a list `odd` containing all **odd numbers** between 0 and 64.
3. Create a table `zeros_matrix` of size 4x3 (4 rows, 3 columns) with **zeros** in all entries. A matrix (table) is represented as a list of lists, where each **inner list** represents a **row** of the table.
    For example, a table
    $A =\begin{bmatrix}
    1 & 2 \\\
    3 & 4
    \end{bmatrix}$

    would be represented as `A = [[1, 2], [3, 4]]`
4. Create a matrix `eye` of size 8x8 (8 rows, 8 columns) with **ones** on the diagonal and **zeros** elsewhere. Refer to the example above for how to represent a matrix as a list of lists.

In [4]:
zeros = []
for i in range(32):
    zeros.append(0)
# print function is used to display the contents of a variable
print(zeros)


[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [7]:
odd = []
for i in range(64):
    if i%2==1:
        odd.append(i)

print(odd)

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63]


In [8]:
zeros_matrix = []
for i in range(4):
    zeros_matrix.append([0,0,0])
    

print(zeros_matrix)

[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]


In [9]:
eye = []
for i in range(8):
    matrix=[0,0,0,0,0,0,0,0]
    matrix.insert(i,1)
    eye.append(matrix)
    

print(eye)

[[1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0]]


---

In the next exercise you may need to use functions from the [math module](https://docs.python.org/3/library/math.html) of the standard Python library. To access a function from a module, you first need to **import** the module. For example, to use the `factorial` function from the `math` module, you would write something like this:

    import math
    result = math.factorial(5)  # This will calculate 5!

You can find a list of all functions in the `math` module in the documentation linked above. It's a good idea to familiarize yourself with the concept of studying code documentation, as it will be useful in our little journey later on.

## Exercise 2: Python functions (2 points)

1. Write a function `get_median` that takes a list of numbers (floats) as input and returns **the median** of the numbers. If the sample has an odd number of numbers, the median is the middle value. If the sample has an even number of numbers, the median is the arithmetic mean of the two middle values. The result should be rounded to 3 decimal places.

2. Write a function `get_euler` that takes an integer `n` as input and returns an approximation to **the mathematical constant e** (Euler's number) calculated using the following formula:
   $$ e = \sum_{i=0}^{n} \frac{1}{i!} $$
The result should be rounded to 10 decimal places.


In [19]:
from helpers.lab1 import test_median
import math

def get_median(numbers):
    numbers = sorted(numbers)
    list_len = len(numbers)
    if list_len%2 == 0:
        lower_number = int(list_len/2) - 1
        higher_number = int(list_len/2)
        median = (numbers|lower_number|+numbers|higher_number)/2
        return round(median,3)
    else: 
        mid_number = math.floor(list_len/2)
        return numbers|mid_number|
    

# You can use the following code to test your function
test_median(get_median)

SyntaxError: invalid syntax (565314754.py, line 14)

In [None]:
from helpers.lab1 import test_euler

def get_euler(n):
    

# You can use the following code to test your function
test_euler(get_euler)

### *Python type hinting

It is a good practice to use Python type hints when defining functions. Type hints are a way to specify the expected types of the function arguments and the return value. This can help you catch bugs early and make your code easier to understand. For example, the function `get_median` can be defined as follows:
```python
def get_median(numbers: list[float]) -> float:
    # This function takes a list of floats as input and returns a float
    ...
```
Python ignores these type hints, so they do not affect the performance of your code. However, they are an elegant way to document your code and make it more readable. By just looking at the definition, you can quickly understand what the function does and what kind of arguments it expects, thus making it easier to include your functions in larger projects. You can read more about type hints in the [official Python documentation](https://docs.python.org/3/library/typing.html).

---

## Quick introduction to the Python `random` module

The Python standard library includes a module called `random` that provides functions for generating pseudorandom numbers. To complete the exercises in this notebook, you may want to learn the following functions:

In [None]:
import random

random.random() # returns a random float between 0 and 1
print('The output of random.random() is:', random.random())

random.randint(0, 100) # returns a random integer between 0 and 100
print('The output of random.randint(0, 100) is:', random.randint(0, 100))

In [13]:
movies = [
    'Taxi Driver',
    'Cars 3',
    'How High',
    'The Seventh Seal',
    'Mean Girls',
]

random_movie = random.choices(movies, k=1) # returns k random elements from the list (as a list)
print('Tonight we will watch', random_movie)

NameError: name 'random' is not defined

In [16]:
movie_preferences = {
    'Taxi Driver': 0.1,
    'Cars 3': 0.4,
    'How High': 0.2,
    'The Seventh Seal': 0,
    'Mean Girls': 0.3
}

movies = list(movie_preferences.keys()) # list of movies
probabilities = list(movie_preferences.values()) # list of probabilities

random.choices(population=movies, weights=probabilities, k=1) # returns k random elements from the list based on the probabilities given in a separate list
print('Eww, I would rather watch', random.choices(movies, probabilities))

NameError: name 'random' is not defined

## Classes in Python
**Object-oriented programming (OOP)** is a programming paradigm that uses classes and objects to organize code. An **object** is a handy piece of code that contains:
- Some data (attributes)
- Some functions (methods) that operate on that data
It often makes sense to group related data and functions together into objects. You will see many examples of this during the course.

A **class** is a blueprint for creating objects. It defines the attributes and methods that the objects of that class will have. Once you have defined a class, you can create multiple objects (instances) of that class, each with its own unique data. See the example below:

In [None]:
class Cow:
    def __init__(self, name):
        """Initializer (constructor) method for the Cow class.
        The __init__ method is called when a new object of the class is created.
        It is used to set up the initial state of the object by initializing its attributes.
        In this case, we are initializing the 'name' attribute of the Cow object."""

        self.name = name # this is an attribute

    def moo(self):
        """
        Methods are functions that belong to a class. They take 'self' as the first parameter,
        which refers to the instance of the class. A method can access and modify the attributes
        of the object. This method accesses the name of the cow and returns a string with the
        cow's moo sound.
        """
        return f'{self.name} says Moo!'

    def eat(self, food):
        """
        This method takes an additional parameter 'food' and returns a string indicating
        what the cow is eating.
        """
        return f'{self.name} is eating {food}'

In [None]:
cow_1 = Cow(name='Michelle') # creating an object (instance) of the Cow class
cow_2 = Cow(name='Angelica') # creating another object (instance) of the Cow class

print(cow_1.moo()) # calling the moo method on cow_1
print(cow_2.eat('grass')) # calling the eat method on cow_2

# accessing the attributes of the objects
print(f'{cow_1.name} and {cow_2.name} are friends')

## Exercise 3: Rolling dice  (2 points)
```
       .-------.    ______
      /   o   /|   /\     \
     /_______/o|  /o \  o  \
     | o     | | /   o\_____\
     |   o   |o/ \o   /o    /
     |     o |/   \ o/  o  /
     '-------'     \/____o/     Art by Joan G. Stark
```
Implement `UnfairDie`, a class representing an unfair six-sided die (one that does not have an equal probability of landing on each of its sides).

**The probabilities of rolling each face should be set by the user when creating a die object** by passing a parameter `probs`, a list of six positive floats summing to one (as probabilities should).

You can create the `__init__` method to take the `probs` parameter and save it to an attribute `self.probs`. The `__init__` method, known as a **constructor**, is called only once when an object is created. It is a good place to set the initial state of the object.

The class should implement the following methods:
- `roll(self, n) -> int`: simulates rolling $n$ identical dice and returns the results as a list of integers.
- `get_average(self, n) -> float`: returns the mean result of rolling $n$ dice.

In [21]:
class UnfairDie:
    def __init__(self, probs):
        self.probs = probs

    def roll(self, n: int) -> list[int]:
        random.choices(population=[1,2,3,4,5,6], weights=self.probs, k=n)

    def get_average(self, n: int) -> float:
        rolls = self.roll(n)
        return sum(rolls)/n

In [22]:
# test your implementation

die = UnfairDie(probs=[0.1, 0.2, 0.3, 0.1, 0.2, 0.1])
print('Rolling 5 dice:', die.roll(5))
print('Average of rolling 100 dice:', die.get_average(100))

NameError: name 'random' is not defined

## *Python dunder methods

We have already encountered some special methods in Python, such as `__init__(self, args)`, which is called only once when an object is created. These methods are called dunder methods (short for *double underscore*). They are used to define how objects of a class behave when they are used in conjunction with built-in Python functions.

For example, the `__str__(self)` method is called when an object is passed to the `print` function. If you want to define how your object should be represented as a string, you can implement this method in your class. Here is an example:

In [None]:
# Example of using __str__()

class Cow:
    def __init__(self, name):
        self.name = name # Setting the name of the cow

    def __str__(self):
        cow_art = (
        rf"""
        This is a cow named {self.name}:
        ^__^
        (oo)\_______
        (__)\       )\/\
            ||----w |
            ||     ||
        """
        )
        return cow_art.strip() # The strip() method is for aesthetic purposes

cow = Cow('Angelica')

# The print function calls the __str__() method!
print(cow)

Dunder methods are a powerful tool in Python that allows you to use your custom classes in any code that was prepared to work with built-in Python objects. You just have to implement the right dunder methods in your class.

Other useful dunder methods:
- `__call__(self)`: makes an object callable. It is executed when the object is called as a function.
- `__add__(self, other)`: called by the `+` operator. It should return the sum of two objects (*whatever that means for your particular class*).
- `__mul__(self, other)`: called by the `*` operator. It should return the product of two objects).
- `__len__(self)`: called by the `len` function. It should return the length of the object).
- `__eq__(self, other)`: called by the `==` operator. It should return `True` if two objects are equal.
- `__getitem__(self, key)`: called to get an item from the object using square brackets. It should return the item at the given key (*useful if your class is a data-related*).

There are many other dunder methods that you can implement in your classes. You can find a list of them [here](https://docs.python.org/3/reference/datamodel.html#special-method-names).

---

## Exercise 4: Plotting averages of dice rolls (2 points)

1. Conduct an experiment where you roll $n$ = 1, 2, 10, 100 fair dice and take note of the average outcome 1000 times in a row. Store the results for each number $n$ of dice in separate lists. Plot a histogram of the results for each value of $n$ using the `plot_dice` function.

    > The function `plot_dice` was already implemented by me and can be used out of the box. It takes a list of numbers and plots an appropriate histogram. You will learn how to prepare plots like this with seaborn and pyplot during the next labs.

2. What shape does this distribution converge to as the number of dice $n$ increases? Conduct the same experiment for unfair dice with probabilities $[0.25, 0.1, 0.05, 0.15, 0.1, 0.35]$ and try to draw some conclusions regarding the distributions of averages.

In [None]:
averages_1 = ...
averages_2 = ...
averages_10 = ...
averages_100 = ...

In [None]:
from helpers.plotting import plot_hist

plot_hist(averages_1) # plot histogram of averages for n=1 dice