# Worksheet 00

Name:  Hao Qi  
UID:  U96305250  

### Topics

- course overview
- python review

### Course Overview


a) Why are you taking this course?

I am taking CS506 primarily to enrich my practical experience in the field of data science since this course has a widespread and long-standing reputation for cultivating students due to its course structure. I hope to learn the essential skills to become a competent data analyst, including not only foundational mathematical and programming knowledge but also a sense of teamwork and the ability to apply research results to the industry. 

b) What are your academic and professional goals for this semester?

Regarding academics, I plan to find a suitable topic and an advisor for my graduate thesis and maintain excellent academic standing. As for my career development, I intend to enhance my Python programming skills, particularly using some domain-specific libraries. I also want to connect with more outstanding individuals, including scholars, entrepreneurs, and others. In this course, I hope to make up for my lack of knowledge in the traditional data analysis field and prepare myself for the other goals. 

c) Do you have previous Data Science experience? If so, please expand.

Yes, I have a basic understanding of data science. On one hand, I took courses related to artificial intelligence and database systems during my undergraduate and graduate studies. On the other hand, I self-studied Python libraries like NumPy and pandas for information extraction and finished some small projects when seeking an internship. However, I have not yet worked in this field. The above is my entire background. 

d) Data Science is a combination of programming, math (linear algebra and calculus), and statistics. Which of these three do you struggle with the most (you may pick more than one)?

I find maths the most challenging. This is because, while programming is a skill that can be improved through hands-on exercises and statistics often deals with practical data problems that can be intriguing, math requires a deep and abstract understanding. Moreover, the availability of resources and discussions for advanced mathematics online is not as extensive as it is for the other two disciplines, which makes self-improving difficult. My motivation to learn maths stems from its critical importance in advancing research and applications, driving me to overcome these challenges through dedicated study.

### Python review

#### Lambda functions

Python supports the creation of anonymous functions (i.e. functions that are not bound to a name) at runtime, using a construct called `lambda`. Instead of writing a named function as such: 

In [3]:
def f(x):
    return x ** 2
f(8)

64

One can write an anonymous function as such: 

In [4]:
(lambda x: x ** 2)(8)

64

A `lambda` function can take multiple arguments: 

In [5]:
(lambda x, y : x + y)(2, 3)

5

The arguments can be `lambda` functions themselves: 

In [6]:
(lambda x: x(3))(lambda y: 2 + y)

5

a) write a `lambda` function that takes three arguments `x, y, z` and returns `True` only if `x < y < z`.

In [9]:
lambda x, y, z : x < y and y < z

<function __main__.<lambda>(x, y, z)>

b) write a `lambda` function that takes a parameter `n` and returns a lambda function that will multiply any input it receives by `n`. For example, if we called this function `g`, then `g(n)(2) = 2n`.

In [10]:
lambda n: (lambda x: x * n)

<function __main__.<lambda>(n)>

#### Map

`map(func, s)`

`func` is a function and `s` is a sequence (e.g., a list). 

`map()` returns an object that will apply function `func` to each of the elements of `s`.

For example if you want to multiply every element in a list by 2, you can write as follows: 

In [11]:
mylist = [1, 2, 3, 4, 5]
mylist_mul_by_2 = map(lambda x: x * 2, mylist)
print(list(mylist_mul_by_2))

[2, 4, 6, 8, 10]


`map` can also be applied to more than one list as long as they are the same size: 

In [12]:
a = [1, 2, 3, 4, 5]
b = [5, 4, 3, 2, 1]

a_plus_b = map(lambda x, y : x + y, a, b)
print(list(a_plus_b))

[6, 6, 6, 6, 6]


c) write a map that checks if elements are greater than zero

In [13]:
c = [-2, -1, 0, 1, 2]
gt_zero = map(lambda x: x > 0, c)
list(gt_zero)

[False, False, False, True, True]

d) write a map that checks if elements are multiples of 3

In [16]:
d = [1, 3, 6, 11, 2]
mul_of3 = map(lambda x: x % 3 == 0, d)
list(mul_of3)

[False, True, True, False, False]

#### Filter

`filter(function, list)` returns a new list containing all the elements of `list` for which `function()` evaluates to `True`. 

e) write a filter that will only return even numbers in the list

In [17]:
e = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = filter(lambda x: x % 2 == 0, e)
list(evens)

[2, 4, 6, 8, 10]

#### Reduce

`reduce(function, sequence[, initial])` returns the result of sequentially applying the function to the sequence (starting at an initial state). You can think of `reduce` as consuming the sequence via the function. 

For example, let's say we want to add all elements in a list. We could write as follows: 

In [18]:
from functools import reduce

nums = [1, 2, 3, 4, 5]
sum_nums = reduce(lambda acc, x : acc + x, nums, 0)
print(sum_nums)

15


Let's walk through the steps of `reduce` above:

1) the value of `acc` is set to 0 (our initial value)
2) apply the lambda function on `acc` and the first element of the list: `acc` = `acc` + 1 = 1
3) `acc` = `acc` + 2 = 3
4) `acc` = `acc` + 3 = 6
5) `acc` = `acc` + 4 = 10
6) `acc` = `acc` + 5 = 15
7) return `acc`

`acc` is short for `accumulator`.

f) `*challenging` Use `reduce` to write a function that returns the factorial of a number  (recall: N! (N factorial) = N * (N - 1) * (N - 2) * ... * 2 * 1)

In [20]:
factorial = lambda x: reduce(lambda acc, t: acc * t, [i for i in range(x, 0, -1)], 1)
factorial(10)

3628800

g) `*challenging` Use `reduce` and `filter` to write a function that returns all the primes below a certain number 

In [25]:
sieve = lambda x: reduce(lambda acc, t: acc + [t], filter(lambda num: 0 not in [num%d for d in range(2, num)], [i for i in range(x, 1, -1)]), [])
print(sieve(100))

[97, 89, 83, 79, 73, 71, 67, 61, 59, 53, 47, 43, 41, 37, 31, 29, 23, 19, 17, 13, 11, 7, 5, 3, 2]


### What is going on?

For each of the following code snippets, explain why the result may be unexpected and why the output is what it is: 

In [27]:
class Bank:
  def __init__(self, balance):
    self.balance = balance
  
  def is_overdrawn(self):
    return self.balance < 0

myBank = Bank(100)
if myBank.is_overdrawn:
  print("OVERDRAWN")
else:
  print("ALL GOOD")

OVERDRAWN


Assuming there is a banking system and the `is_overdrawn` method is used to check if the account balance is negative, in the code provided, even though the balance is positive, the printed result seems to imply insufficient funds. The unexpected output is due to a mistake in how the `is_overdrawn` method is used in the `if` statement.  Instead of calling the method `myBank.is_overdrawn()`, the code references the method object itself (i.e., `myBank.is_overdrawn`).  As a result, the `if` statement is constantly evaluated to be `True` because a method object is considered "truthy" in Python.  This leads to the output "OVERDRAWN".

In [28]:
myBank = Bank(100)
if myBank.is_overdrawn():   # fixed
  print("OVERDRAWN")
else:
  print("ALL GOOD")

ALL GOOD


In [29]:
for i in range(4):
    print(i)
    i = 10

0
1
2
3


The code tries to set the loop variable `i` to 10 in a `for` loop, but the loop behaves as if the internal reassignment of `i` does not occur, resulting in the sequence 0, 1, 2, 3 being printed. The unexpected output is due to the fact that the loop variable is updated at the start of each iteration based on the range provided, and any changes made to it within the loop body do not affect the next iteration. 

In [30]:
row = [""] * 3  # row i['', '', '']
board = [row] * 3
print(board)    # [['', '', ''], ['', '', ''], ['', '', '']]
board[0][0] = "X"
print(board)

[['', '', ''], ['', '', ''], ['', '', '']]
[['X', '', ''], ['X', '', ''], ['X', '', '']]


The code tries to change the first element of the first row at line 4, but setting `board[0][0]` to "X" effectively sets the first element of every row to "X".  This is because we create a list containing three references to the same row list at line 2, i.e., all the three elements of `board` refer to the same list in the memory.  As a result, a change in one row is reflected in all rows. 

In [1]:
funcs = []
results = []
for x in range(3):
    def some_func():
        return x
    funcs.append(some_func)
    results.append(some_func()) # note the function call here

funcs_results = [func() for func in funcs]
print(results)  # [0,1,2]
print(funcs_results)

[0, 1, 2]
[2, 2, 2]


The results of calling `some_func` are different during and after the loop. The unexpected output is due to how closures, a concept where a function remembers the environment in which it was created, capture variables in Python. When `some_func` is called inside the loop, it immediately returns the current value of `x`, resulting in `results` containing [0, 1, 2]. When the functions in `funcs` are called, they all use the value of `x` at that time, which is 2. This demonstrates the concept of late binding, where the functions capture the variable instead of its value at the time of definition, and all these functions share the same variable reference. 

In [2]:
f = open("./data.txt", "w+")
f.write("1,2,3,4,5")
f.close()

nums = []
with open("./data.txt", "w+") as f:
  lines = f.readlines()
  for line in lines:
    nums += [int(x) for x in line.split(",")]

print(sum(nums))

0


The code writes some numbers to a file and attempts to read these numbers back and calculate their sum. However, the printed result is 0, which suggests that the code might not be successfully reading the numbers. This is due to the misunderstanding of the file mode "w+" used when opening the file the second time. When a file is opened with the "w+" mode, it is essentially truncated and opened for writing and reading. Since all existing data is deleted, there are no lines to read, and the `nums` list remains empty. 