***
**Author:** Peter Lu \
CS108: Data Science Ethics (UCR - Winter 2024)
***

# Functional Programming and Data Practice

This notebook contains practice problems for lambda functions, functional programming, and `numpy`. You will see old as well as new use cases for the above topics. Good luck!

***
# Functional Programming

1) Given a list of strings, `map` each element of the list to a new list containing every other letter of the respective words.

In [None]:
words = ['yellow', 'appropriate', 'happy', 'court']

# Enter code below


2) The `ord` function returns the [ascii value](https://www.ascii-code.com/) of a character. Use this function to return the strings from `words` whose total ascii values do not exceed 600. \
*Hint*: Python has a built in `sum` method that aggregates the numerical values of a list. It may be useful. There are multiple ways to solve this problem, but kudos if you can use some combination of `filter` and/or `map`!

In [None]:
# Enter code below


***
# Numpy

In [None]:
# Import the numpy library


3) Using `sample`, return a slice containing the latter half of the middle 3 rows.


In [None]:
sample = np.random.uniform(0, 10, (5, 7)).astype('int64')
sample

In [None]:
# Enter code below


4) Given a data set sampled from a gamma distribution, find the mean of the data set. Then, create two slices: one slice of all values less than the mean and the other half of all values greater than or equal to the mean. \
*Hint:* Similar to `pandas`, `numpy` has its own mean function *and* method.

In [None]:
sample = np.random.gamma(5, 1, 30).round(3)
sample

In [None]:
# Enter code below


5) Create a function `zscore` that takes a random sample as input and returns the z-score of each value. The z-score is calculated as
$$z = \frac{x - \mu}{\sigma}$$
where $x$ is a data point, $\mu$ is the mean of the sample, and $\sigma$ is the standard deviation of the sample. Compare your results to `scipy.stats.zscore`. Then, find all z-scores that are within 2 standard deviations of the mean of the z-scores. In other words, find all values between $\mu_Z \pm 2\sigma_Z$.

In [None]:
import scipy

sample = np.random.normal(53, 5, 100)
sample

In [None]:
# Enter code below


6a) `np.dot` returns the dot product between two vectors, the product of two matrices, or the product of a vector and matrix. \
\
In the case of a dot product of 2 vectors $\boldsymbol{x}$ and $\boldsymbol{y}$, it performs the operation
$$\boldsymbol{x} \cdot \boldsymbol{y} = \sum_{i = 1}^n x_iy_i.$$
In other words, corresponding vector elements are multiplied then summed together. Given the various data values below, compute the dot products of the variables in the specified orders. Output your results. Do the dimensions of your outputs make sense?

In [None]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])

mat1 = np.random.uniform(0, 10, (3, 3)).astype('int64')
mat1

In [None]:
mat2 = mat1.T # matrix transpose
mat2

In [None]:
# x dot y
# Enter code below


In [None]:
# mat1 dot mat2
# Enter code below


In [None]:
# mat1 dot x
# Enter code below


6b) Suppose $\boldsymbol{\theta}$ is a 3-dimensional vector of coefficients for a linear regression model. In other words, given a data vector $\boldsymbol{x}$, its predicted value is computed as
$$\hat{f}(\boldsymbol{x}) = \hat{\boldsymbol{\theta}}^{T}\boldsymbol{x} + \hat{b} = \sum_{i = 1}^3 \hat{\theta}_i x_i + \hat{b}$$
where $\hat{b}$ is the bias term.
Write a function `predict` that takes three arguments: a vector of coefficients $\hat{\boldsymbol{\theta}}$, an $n \times 3$ data matrix $X$, and an optional bias term $\hat{b}$ that defaults to $0$. The function should return an $n \times 1$ vector containing the predicted value for each of the $n$ points in the data matrix. \
*Hint*: Pay close attention to your dimensions/shapes!

In [None]:
X = np.random.uniform(0, 10, (100, 3)).astype('int64')
theta_hat = np.array([2, 1, 6])

In [None]:
# Enter code below


7) A useful function for simulating intervals of real numbers is `np.linspace`. As we'll see later in disussion, it is also useful for plotting lines and functions in $\mathbb{R}^2$. It generates a specified number of values between a start and end value as an ndarray. Here's an example of generating the interval $[-5, 5]$:

In [None]:
# 1000 points generated evenly between -5 and 5 inclusively
interval = np.linspace(-5, 5, 1000)
interval

Given a coefficient $\hat{\theta}$ and bias $\hat{b}$ for a linear regression model, generate an ndarray `line` that represents the predictive function
$$\hat{y} = \hat{\theta} x + \hat{b}.$$
Does the resulting array make sense? What do $\hat{\theta}$ and $\hat{b}$ represent? What conclusions can be drawn about $\hat{\boldsymbol{\theta}}$ in problem 6b? \
*Hint*: This is a line in $\mathbb{R}^2$!

In [None]:
theta_hat = 5
b_hat = 2

# Enter code below:
