# MACHINE LEARNING AND QUANTUM COMPUTERS
# ASSIGNMENT 1 (26/11/25)

## PROBLEM 2

<div class="alert alert-block alert-success">
<b>P2</b>. When possible, compare your results to theoretical values.
</div>

### Preliminaries

Let's start by importing all the libraries that we will need:

In [1]:
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import random

Also, let's check that all of those packages were correctly installed:

In [2]:
print(f"Numpy's version: {np.__version__}")
print(f"Matplot's version: {mpl.__version__}")
print(f"Scipy's version: {sp.__version__}")
print(f"Pandas's version: {pd.__version__}")

Numpy's version: 2.3.4
Matplot's version: 3.10.7
Scipy's version: 1.16.3
Pandas's version: 2.3.3


### Fundamentals

The theoretical results that we are going to consider to compare our data is the mean and standard deviation of the underlying distribution. Doing so we'll see if the error bar that we got earlier holds them inside or not.

For the Gaussian distribution it's as simple as it gets, as we have that the theoretical mean is the one that we've chosen to generate the data, $\mu$.

On the other hand, the Uniform distribution's mean and standard deviation are given by

$$\mu=(a+b)/2$$
$$\sigma=(b-a)/\sqrt{12}$$

Finally, the Beta distribution has the following expressions that define its mean and standard deviation

$$\mu=\alpha/(\alpha+\beta)$$
$$\sigma=\sqrt{\frac{(\alpha\cdot \beta)}{((\alpha+\beta)^2\cdot (\alpha + \beta + 1))}}$$

Let's define the function to generate our random datasets:

In [3]:
def data_set(N,mu,sigma,a,b,alpha,beta):
    x_normal = np.random.normal(mu,sigma,N)
    x_uniform = np.random.uniform(a,b,N)
    x_beta = np.random.beta(alpha,beta,N)
    x = np.linspace(-10,10,N)

    mN = np.mean(x_normal)
    sN = np.std(x_normal)

    mU = np.mean(x_uniform)
    sU = np.std(x_uniform)

    mB = np.mean(x_beta)
    sB = np.std(x_beta)

    return(x_normal,x_uniform,x_beta,x,mN,sN,mU,sU,mB,sB)

We'll work with three datasets:
- One of 5000 elements
- One of 500 elements
- One of 50 elements

In [4]:
x_normal1, x_uniform1, x_beta1, x1, mN1, sN1, mU1, sU1, mB1, sB1 = data_set(N = 5000, mu = 1, sigma = 1.1, a = 0, b = 2, alpha = 4, beta = 4)
x_normal2, x_uniform2, x_beta2, x2, mN2, sN2, mU2, sU2, mB2, sB2 = data_set(N = 500, mu = 1, sigma = 1.1, a = 0, b = 2, alpha = 4, beta = 4)
x_normal3, x_uniform3, x_beta3, x3, mN3, sN3, mU3, sU3, mB3, sB3 = data_set(N = 50, mu = 1, sigma = 1.1, a = 0, b = 2, alpha = 4, beta = 4)

Let's see how does the theoretical mean compare to the ones from the sample that we have just computed:

In [11]:
# THEORETICAL
# Gauss
mu = 1
sigma = 1.1

# Uniform
a = 0
b = 2
meanU = (a+b)/2
sigmaU = (b-a)/np.sqrt(12)

# Beta
alpha = 4
beta = 4
meanB = alpha/(alpha+beta)
sigmaB = np.sqrt((alpha * beta)/((alpha+beta)**2 * (alpha + beta + 1)))

In [13]:
print("[ THEORETICAL MEANS ]")
print(f"     · Gaussian: {mu:.2f}")
print(f"     · Uniform: {meanU:.2f}")
print(f"     · Beta: {meanB:.2f}")
print()

print("[ SAMPLE MEANS (N=5000) ]")
print(f"     · Gaussian: {mN1:.2f}")
print(f"     · Uniform: {mU1:.2f}")
print(f"     · Beta: {mB1:.2f}")
print()

print("[ SAMPLE MEANS (N=500) ]")
print(f"     · Gaussian: {mN2:.2f}")
print(f"     · Uniform: {mU2:.2f}")
print(f"     · Beta: {mB2:.2f}")
print()

print("[ SAMPLE MEANS (N=50) ]")
print(f"     · Gaussian: {mN3:.2f}")
print(f"     · Uniform: {mU3:.2f}")
print(f"     · Beta: {mB3:.2f}")

[ THEORETICAL MEANS ]
     · Gaussian: 1.00
     · Uniform: 1.00
     · Beta: 0.50

[ SAMPLE MEANS (N=5000) ]
     · Gaussian: 1.00
     · Uniform: 1.00
     · Beta: 0.50

[ SAMPLE MEANS (N=500) ]
     · Gaussian: 1.03
     · Uniform: 1.00
     · Beta: 0.49

[ SAMPLE MEANS (N=50) ]
     · Gaussian: 1.10
     · Uniform: 0.97
     · Beta: 0.47


It's pretty clear that the means obtained are almost identical to the theoretical ones. Also, as we can see, the more we increase the number of samples, the more we approach the *ideal* values. In general (not for evey confidence interval, though) the confidence intervals given by Chebyshev's and Hoeffding's inequalities in the previous exercise are more than enough to include the sample mean inside them (taking the theoretical as the central one).