# DX 601 Week 4 Homework

## Introduction

In this homework, you will practice linear regression and working with random variables.
The data sets of this homework are small, so your code should run in an instant, but you will be able to apply the same techniques to larger data sets.

You may find it helpful to refer to these GitHub repositories of Jupyter notebooks for sample code.

* https://github.com/bu-cds-omds/dx500-examples
* https://github.com/bu-cds-omds/dx601-examples
* https://github.com/bu-cds-omds/dx602-examples

Any calculations demonstrated in code examples or videos may be found in these notebooks, and you are allowed to copy this example code in your homework answers.

## Instructions

You should replace every instance of "..." below.
These are where you are expected to write code to answer each problem.

After some of the problems, there are extra code cells that will test functions that you wrote so you can quickly see how they run on an example.
If your code works on these examples, it is more likely to be correct.
However, the autograder will test different examples, so working correctly on these examples does not guarantee full credit for the problem.
You may change the example inputs to further test your functions on your own.
You may also add your own example inputs for problems where we did not provide any.

Be sure to run each code block after you edit it to make sure it runs as expected.
When you are done, we strongly recommend you run all the code from scratch (Runtime menu -> Restart and Run all) to make sure your current code works for all problems.

If your code raises an exception when run from scratch, it will  interfere with the auto-grader process causing you to lose some or all points for this homework.
Please ask for help in YellowDig or schedule an appointment with a learning facilitator if you get stuck.


## Problems

## Shared Imports

Do not install or use any additional modules.
Installing additional modules may result in an autograder failure resulting in zero points for some or all problems.

In [None]:
import math
import random

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model

### Shared Data

Many of the problems will use this mango data set.
This data set is small and your code should run instantly with it, but you will be able to use the same code and techniques with larger data sets.

In [None]:
mango_data = pd.read_csv("mango-tiny.tsv", sep="\t")

In [None]:
mango_data

### Problem 1

Set `p1` to the number of parameters in a linear model trained on the mango data set that predicts rated flavor from the other columns.

In [None]:
# YOUR CHANGES HERE
#used .columns from last week's hw
#lists column names, counts list, -1 to exclude rated_flavor
p1 = len(mango_data.columns) - 1 

In [None]:
p1

### Problem 2

Set `p2` to $\lim_{x \rightarrow 2} \frac{(x^2-4)}{x+2}$.

Hint: Try to simplify that fraction assuming $x \neq 2$.

In [None]:
# YOUR CHANGES HERE
# x is approaching 2, but never is 2 itself
# the fraction simplifies to ((x - 2)(x + 2))/(x + 2)
# (x^2 - 4) = (x - 2)(x + 2) then divide each side by x
# ((x - 2)(x + 2))/(x + 2) = (x - 2)(x + 2) # cannot do x - 2, must keep the problem as is
# lim(x - 2) = 0 simplified it
# lim(2 - 2) = 0 
p2 = 0

Check the value of `p2`.

In [None]:
p2

### Problem 3

What is the derivative of $x^2 - 3x + 9$ at $x=2$?

In [None]:
# DO NOT CHANGE

xs = np.linspace(-5, 5)
plt.plot(xs, xs * xs - 3 * xs + 9)
plt.ylim(0);

In [None]:
# YOUR CHANGES HERE
#2 ^ 2 - 3(2) + 9
# 4 - 6 + 9
p3 = 7

Check the value of `p3`.

In [None]:
p3

### Problem 4

Set `p4` to the mean row of the mango data set.
That is, `p4` should be a single row of data with the same columns, where each value is the mean of the corresponding column in the mango data set.

In [None]:
# YOUR CHANGES HERE
# q4 = pd.DataFrame.mean(mango_data)
# green_rating           3.000
# yellow_rating          3.000
# softness               2.375
# wrinkles               0.750
# estimated_flavor       2.500
# estimated_sweetness    2.250
# rated_flavor           2.000

# used pandas.pydata.org for how to create a table
p4 = pd.DataFrame({
  'green_rating': [3.000],
  'yellow_rating': [3.000],
  'softness': [2.375],
  'wrinkles': [0.750],
  'estimated_flavor': [2.500],
  'estimated_sweetness': [2.250],
  'rated_flavor': [2.000]
})

In [None]:
p4

### Problem 5

Set `p5` to be the median of the estimated flavor column in the mango data set.

You may find NumPy's [numpy.median](https://numpy.org/doc/stable/reference/generated/numpy.median.html) function or pandas' [pandas.DataFrame.median](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.median.html) method helpful.

In [None]:
# YOUR CHANGES HERE

p5 = np.median(mango_data['estimated_flavor'])

Check the value of `p5`.

In [None]:
p5

### Problem 6

How many local extrema does the function $x^3 - 3x$ have?

In [None]:
# @title Plot of $x^3 - 3x$

xs = np.linspace(-5,5)
plt.plot(xs, xs**3 - 3 * xs);

In [None]:
# YOUR CHANGES HERE

p6 = 2

Check the value of `p6`.

In [None]:
p6

### Problem 7

Set `p7` to the number of following functions that are convex.

In [None]:
# @ title Functions to Check for Convexity

xs = np.linspace(-5, 5)

plt.figure(figsize=(10, 4))
plt.subplot(2, 5, 1)
plt.plot(xs, xs**2)
plt.title("$x^2$")

plt.subplot(2, 5, 2)
plt.plot(xs, -xs**2)
plt.title("$-x^2$")

plt.subplot(2, 5, 3)
plt.plot(xs, xs**3)
plt.title("$x^3$")

plt.subplot(2, 5, 4)
plt.plot(xs, xs**4)
plt.title("$x^4$")

plt.subplot(2, 5, 5)
plt.plot(xs, xs*xs - 9 * xs)
plt.title("$x^2 - 9x$")

plt.subplot(2, 5, 6)
plt.plot(xs, xs)
plt.title("$x$")

plt.subplot(2, 5, 7)
plt.plot(xs, -xs**4+1000 * xs**2)
plt.title("$1000x^2 - x^4$")

plt.subplot(2, 5, 8)
plt.plot([x for x in xs if x > 0], [1 / x for x in xs if x > 0], color="C0")
plt.plot([x for x in xs if x < 0], [1 / x for x in xs if x < 0], color="C0")
plt.title("$1/x$")

plt.subplot(2, 5, 9)
plt.plot(xs, -xs)
plt.title("$-x$")

plt.subplot(2, 5, 10)
plt.plot(xs, 0.001 * xs**2)
plt.title("$0.001 x^2$")

plt.subplots_adjust(hspace=0.4, wspace=0.4);

Hint: Convexity is a global property of functions.
You may want to change the range of x values to see the big picture.

In [None]:
# YOUR CHANGES HERE

p7 = 4

Check the value of `p7`.

In [None]:
p7

### Problem 8

Set `p8` to $\lim_{x \to 2^-} \frac{|x-2|}{x-2}$.

In [None]:
# YOUR CHANGES HERE
# x is approaching 2, but never is 2 itself
# |x - 2|/(x - 2) ... -(1)
p8 = -1

Check the value of `p8`.

In [None]:
p8

### Problem 9

Write a function `p9` returning the derivative of $x^5$.

In [None]:
# YOUR CHANGES HERE
#d/dx x**k = kx ** k-1
# derivative of x^5 is 5x^4
def p9(x):
    return 5 * (x ** (5 - 1))

Check the output of `p9`.

In [None]:
p9(0)

In [None]:
p9(1)

### Problem 10

Write a function `p10` returning the derivative of $4 x^2$.

In [None]:
# YOUR CHANGES HERE
#d/dx c*x**k = c* k * x **(k-1)
def p10(x):
    return 4 * 2 * (x ** (2 - 1))

Check the output of `p10`.

In [None]:
p10(0)

In [None]:
p10(1)

### Problem 11

Write a function `p11` returning the derivative of $3 e^x + 5 x^5$.

In [None]:
# YOUR CHANGES HERE
#d/dx c*x**k = c* k * x **(k-1)
# https://www.w3schools.com/python/ref_math_exp.asp for e raised to the power, special e^x

def p11(x):
    return 3 * math.exp(x) + (5 * 5 * (x ** 5 - 1))

Check the output of `p11`.

In [None]:
p11(0)

In [None]:
p11(1)

### Problem 12

Write a function `p12` returning the derivative of $\log_3(x)$.

You could try to derive this using the chain rule, but feel free to look up the rule for logarithms with different bases.

In [None]:
# YOUR CHANGES HERE
# for rule for log of different bases https://www.khanacademy.org/math/algebra2/
# d/dx logb(x) = 1/(x * ln(b)) The derivative of ln(x) comes from the fact that e^x and ln(x) are inverses of each other

def p12(x):
    return 1/(x * math.log(3))

Check the output of `p12`.

In [None]:
p12(1)

In [None]:
p12(3)

### Problem 13

Build a linear regression for the mango rated flavor column using all the other columns as inputs.
Set `p13` to be the output of this model for the mean row of the data set (similar to your answer to problem 4, but without the rated flavor column).

In [None]:
# YOUR CHANGES HERE
#table with column means except for rated_flavor
mean_row = pd.DataFrame({
  'green_rating': [3.000],
  'yellow_rating': [3.000],
  'softness': [2.375],
  'wrinkles': [0.750],
  'estimated_flavor': [2.500],
  'estimated_sweetness': [2.250],
})
# to have the above and mango data table match, the rated_flavor column is dropped from mango data
new_mango_data = mango_data.drop(columns=['rated_flavor'])

#build linear regression from mango_data with column dropped
model = sklearn.linear_model.LinearRegression()
model.fit(new_mango_data, mango_data['rated_flavor'])

#put the row into the model for prediction
p13 = model.predict(mean_row)[0]


Check the value of `p13`.

In [None]:
p13

### Problem 14

The derivative of the polynomial $5 x^{99} - 2 x^78 + 3 x^{25} + 4 x^4 -357$ is another polynomial.
What is the degree of that polynomial?

In [None]:
# YOUR CHANGES HERE
# the constant is 0, the derivative will have 4 degrees

p14 = 'quartic'

Check the value of `p14`.

In [None]:
p14

### Problem 15

Write a function `p15` that takes in parameters `m` and `b` and computes the average $L_2$ loss for the training data in `x15` and `y15` based on the linear prediction $mx + b$.

In [None]:
# DO NOT CHANGE

x15 = np.asarray([0, 1, 2, 3, 4])
y15 = np.asarray([0, 0, 1, 1, 1])

In [None]:
# YOUR CHANGES HERE
# L2 = (f(x) - y) ^ 2 
# average request
#y15 = m(x15) + b ??
# residual = observed value - predicted value

def p15(m, b):
    # mx + b
    linear_prediction = m * x15 + b

    # equation for L2 loss. because of y = mx+b, y15 is y, f(x) is y
    loss = (y15 - linear_prediction) ** 2

    #requested average
    return np.average(loss)

Test `p15` with different inputs.

In [None]:
p15(0, 0)

In [None]:
p15(0, 1)

In [None]:
p15(1, 0)

In [None]:
p15(1, 1)

### Problem 16

Write a function `p16` that takes in four parameters, `a`, `b`, `c`, and `x_in`, and returns the derivative of $ax^2 + bx +c$ evaluated at value $x=x_{in}$.

That is, `p16` should compute $\frac{d (ax^2 + bx +c)}{dx}(x_{in})$.

In [None]:
# YOUR CHANGES HERE
# d/dx ax^2 + bx + c * x_in
def p16(a, b, c, x_in):
    return ((2 * a * x_in) + b) * x_in

### Problem 17

Set `p17` to $\lim_{x \to 0} x^2 \mathrm{cos} \left( \frac{1}{x} \right)$.

In [None]:
# @title Plot of $x^2 cos(1/x)

xs = np.linspace(-0.25, 0.25, 1000)
plt.plot(xs, xs**2 * np.cos(1 / xs));

Hint: This one looks tricky because $\mathrm{cos} \left( \frac{1}{x} \right)$ oscillates increasingly fast as $x$ approaches zero.
However, you should still be able to bound its behavior.

In [None]:
# YOUR CHANGES HERE
# log_2 x/x^2 cos(1/x) #no idea
# bound between -0.1 and 0.1
# -1 <= cos(1/x) <= 1, multiple by x^2, squeeze theory
# x needs to get as close as possible to 0, but not be 0
p17 = 0

In [None]:
p17

### Problem 18

Given a model with a single parameter $c$ whose loss function in terms of $c$ is convex, and given the following samples of the loss function output, what is the best (highest) lower bound that you can put on the optimal value of $c$?

| c | Loss(c) |
|---:|---:|
| 0 | 1.0 |
| 1 | 0.5 |
| 2 | 0.25 |
| 3 | 0.125 | 
| 4 | 0.25 |
| 5 | 0.6 |
| 6 | 0.9 |

In [None]:
# YOUR CHANGES HERE
#based on the table, 3 is the lowest
p18 = 3

Check the value of `p18`.

In [None]:
p18

### Problem 19

Write a function `p19` that returns the strings "cat", "dog", "cow", "tiger" or "horse" with equal probability.

In [None]:
# YOUR CHANGES HERE
# 0.20 for each
def p19():
    #random syntax, https://www.w3schools.com/python/numpy/numpy_random.asp
    return np.random.choice(["cat", "dog", "cow", "tiger", "horse"])

In [None]:
[p19() for x in range(10)]

### Problem 20

Set `p20` to the value of $x$ that minimizes $x^4 + 3x^3 + 2x + 5$.

In [None]:
def q20(x):
    return x**4 + 3 * x**3 + 2*x + 5

In [None]:
# @title Plot of $x^4 + 3x^3 + 2x + 5$

xs = np.linspace(-5, 5, 1000)
plt.plot(xs, xs**4 + 3 * xs**3 + 2*xs + 5);

Hint: whatever tactics you prefer to minimize this function numerically.

In [None]:
# YOUR CHANGES HERE
# put zero in the equation and it gets you the smallest which is 5
p20 = np.argmin(xs, 0)

Check the value of `p20`.

In [None]:
p20

In [None]:
q20(p20)

### Generative AI Usage

If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the [generative AI policy](https://www.bu.edu/cds-faculty/culture-community/gaia-policy/).
If you did not use any generative AI tools, simply write NONE below.

<h3>Problems 9 - 11</h3>
I used TerrierGPT, Anthropic Claude 4.5 Sonnet to describe derivatives to me in a simplier way and also derivative polynomials without code. I also asked it to give me practice examples to do by hand. I needed something more simple than the class notes to understand the basics first. I used this alongside the video from Required Resources for 4.7.<br> https://terriergpt.bu.edu/share/S944dqGeCEbn3xWCBfCHa


<h3>Problem 12</h3>
I used TerrierGPT, Anthropic Claude 4.5 Sonnet and asked how to get a derivative of a log and to not show me any code, just explain and it then provided me examples to do by hand. I didn't know at all how to get a derivative of a log. I couldn't find the example in class notes. I also used this alongside some Khan Academy Algebra 2 videos.<br> https://terriergpt.bu.edu/share/GNP5fJNm30OKIOzq874Wd 


<h3>Problem 17</h3>
I used TerrierGPT, Anthropic Claude 4.5 Sonnet and asked to explain in detail sin, cos, and tan. I don't remember anything from undergrad classes about them so I needed a refresher. I didn't understand the code examples for the graphs, interactive unit circle visualization, and why they're called "wave functions" so I skipped over them. Office hours helped as well to solve the problem. <br>
https://terriergpt.bu.edu/share/SmGuY5q_vjGiyNJV668Yo
