# Worksheet 00

Name: Mahdi Khemakhem
UID: U18251472

### Topics

- course overview
- python review

### Course Overview

a) Why are you taking this course?

I have a great interest in automation, and only so much can be done with just conditionals. Modeling answers using data is the next step to elevate my pipelines.

b) What are your academic and professional goals for this semester?

I am a non-traditional CS student, coming from a pre-medical background, and my main goal is to ensure my transition into a CS master's course structure is smooth. My career goals would be to hopefully find an internship related to ML/AI to supplement the self-learning journey I have been undertaking.

c) Do you have previous Data Science experience? If so, please expand.

I was a research assistant for two years at an Alzheimer's Disease Start-up during which I was able to apply my coding skills to my job. I was able to create several useful reports for my manager using webhooks, Azure Functions and PowerBI. I implemented a classifier to aid my manager in determining the nature of the tasks present in our project management software, and using Azure Functions and webhooks, I was able to create a live suggestion tooltip for employees to help them determine the category of the task they were undertaking. Most recently I have been learning about LLMs and have been finetuning Huggingface models for the past few months.

d) Data Science is a combination of programming, math (linear algebra and calculus), and statistics. Which of these three do you struggle with the most (you may pick more than one)?

I would say statistics, but it is mainly because I have forgotten my course work, and I have not yet reached it on my self-learning curriculum. My linear algebra understanding is strong, I might not be the best at hands-on problem solving, I have for the most part done linear algebra using numpy. I have a strong understanding of calculus.

The rest of this worksheet is optional. If you have prior Python experience, you are welcome to skip it HOWEVER I strongly encourage you to try out the questions marked as `challenging`.

### Python review (Optional)

#### Lambda functions

Python supports the creation of anonymous functions (i.e. functions that are not bound to a name) at runtime, using a construct called `lambda`. Instead of writing a named function as such:

In [1]:
def f(x):
    return x**2
f(8)

64

One can write an anonymous function as such:

In [2]:
(lambda x: x**2)(8)

64

A `lambda` function can take multiple arguments:

In [3]:
(lambda x, y : x + y)(2, 3)

5

The arguments can be `lambda` functions themselves:

In [4]:
(lambda x : x(3))(lambda y: 2 + y)

5

a) write a `lambda` function that takes three arguments `x, y, z` and returns `True` only if `x < y < z`.

In [None]:
(lambda x, y, z: x<y<z)

b) write a `lambda` function that takes a parameter `n` and returns a lambda function that will multiply any input it receives by `n`.

In [None]:
(lambda n: lambda x: x*n)

#### Map

`map(func, s)`

`func` is a function and `s` is a sequence (e.g., a list). 

`map()` returns an object that will apply function `func` to each of the elements of `s`.

For example if you want to multiply every element in a list by 2 you can write the following:

In [5]:
mylist = [1, 2, 3, 4, 5]
mylist_mul_by_2 = map(lambda x : 2 * x, mylist)
print(list(mylist_mul_by_2))

[2, 4, 6, 8, 10]


`map` can also be applied to more than one list as long as they are the same size:

In [9]:
a = [1, 2, 3, 4, 5]
b = [5, 4, 3, 2, 1]

a_plus_b = map(lambda x, y: x + y, a, b)
list(a_plus_b)

[6, 6, 6, 6, 6]

c) write a map that checks if elements are greater than zero

In [None]:
c = [-2, -1, 0, 1, 2]
gt_zero = map(lambda x: x > 0, c)
list(gt_zero)

d) write a map that checks if elements are multiples of 3

In [None]:
d = [1, 3, 6, 11, 2]
mul_of3 = map(lambda x: not x%3, d)
list(mul_of3)

#### Filter

`filter(function, list)` returns a new list containing all the elements of `list` for which `function()` evaluates to `True.`

e) write a filter that will only return even numbers in the list

In [None]:
e = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = filter(lambda x: not x%2, e)
list(evens)

#### Reduce

`reduce(function, sequence[, initial])` returns the result of sequencially applying the function to the sequence (starting at an initial state). You can think of reduce as consuming the sequence via the function.

For example, let's say we want to add all elements in a list. We could write the following:

In [13]:
from functools import reduce

nums = [1, 2, 3, 4, 5]
sum_nums = reduce(lambda acc, x : acc + x, nums, 0)
print(sum_nums)

15


Let's walk through the steps of `reduce` above:

1) the value of `acc` is set to 0 (our initial value)
2) Apply the lambda function on `acc` and the first element of the list: `acc` = `acc` + 1 = 1
3) `acc` = `acc` + 2 = 3
4) `acc` = `acc` + 3 = 6
5) `acc` = `acc` + 4 = 10
6) `acc` = `acc` + 5 = 15
7) return `acc`

`acc` is short for `accumulator`.

f) `*challenging` Using `reduce` write a function that returns the factorial of a number. (recall: N! (N factorial) = N * (N - 1) * (N - 2) * ... * 2 * 1)

In [None]:
factorial = lambda x : reduce(lambda acc, y: acc * y, range(1, x+1), 1)
factorial(10)

g) `*challenging` Using `reduce` and `filter`, write a function that returns all the primes below a certain number

In [None]:
def is_prime(n):
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    for i in range(3, int(n ** 0.5) + 1, 2):
        if n % i == 0:
            return False
    return True

sieve = lambda x : reduce(lambda acc, y: acc + [y], filter(is_prime, range(2, x)), [])
print(sieve(100))

### What is going on?

This whole section is `*challenging`

For each of the following code snippets, explain why the output is what it is:

In [1]:
class Bank:
  def __init__(self, balance):
    self.balance = balance
  
  def is_overdrawn(self):
    return self.balance < 0

myBank = Bank(100)
if myBank.is_overdrawn :
  print("OVERDRAWN")
else:
  print("ALL GOOD")

OVERDRAWN


The bool of the reference to the method myBank.is_overdrawn itself is true, and since we did not call the method by adding the parenthesis, the method was not executed, and the bool of the reference to the method is true, resulting in OVERDRAWN being printed.

In [2]:
for i in range(4):
    print(i)
    i = 10

0
1
2
3


The i is iterating through [0, 1, 2, 3] as that is the range of 4. Everytime, we go back to the beginning of the loop the value of i is set to the next element in the range iterator, and the effect of i = 10 is nullified.

In [4]:
row = [""] * 3 # row i['', '', '']
board = [row] * 3
print(board) # [['', '', ''], ['', '', ''], ['', '', '']]
board[0][0] = "X"
print(board)

[['', '', ''], ['', '', ''], ['', '', '']]
[['X', '', ''], ['X', '', ''], ['X', '', '']]


The board list contains a reference to the same row list object 3 times, hence when one of the elements is changed, the change is reflected in all the other lists in the board. 

In [5]:
funcs = []
results = []
for x in range(3):
    def some_func():
        return x
    funcs.append(some_func)
    results.append(some_func())  # note the function call here

funcs_results = [func() for func in funcs]
print(results) # [0,1,2]
print(funcs_results)

[0, 1, 2]
[2, 2, 2]


When the functions are exectued inside the loop to append to the results list, x is being iterated through, and the calls capture that increase from 0 to 2. For funcs_results, the functions are being called outside the loop once x has settled on its final value of 2. Since the functions hold a reference to x, and not its value at the time of the creation of the function, we get [2, 2, 2].

In [15]:
f = open("./data.txt", "w+")
f.write("1,2,3,4,5")
f.close()

nums = []
with open("./data.txt", "w+") as f:
  lines = f.readlines()
  for line in lines:
    nums += [int(x) for x in line.split(",")]

print(sum(nums))

0


Opening a file in w or w+ mode after it has been closed will delete its contents until you write something new in it, when the file was opened again there were no contents anymore hence the sum 0.