### TP SD TSIA 211

Welcome to Paname T ́el ́ecom ! You have been recently hired by this young company that would like to become a challenger in the competitive field of data centers. In order to gain quick knowledge on how to run a data center efficiently, our team of hackers has used a security breach to steal some data from a well known data center. On e-campus, you will be able to download the files data center data matrix.npy and data center helper.py that contain the data and basic operations that can be done on them.
The data is composed of measurements with a roughly two-hour sampling rate together with 4 key performance indicators (KPIs). However, we were not able to get the formula that gives the indicators as a function of the data. Your mission is thus to reverse-engineer the performance indicator. We conjecture that they can be written as a ratio of affine transforms of the raw data, subject to some noise.
This gives the following model for the KPI number i at time t :

$$

y_i(t)=\frac{w_{i,1}^T x(t) + w_{i,0} + \epsilon_i(t)}{w_{i,2}^T x(t) + 1}

$\textbf{Question 3.1 :}$

Si $(Aw)_t = b_t$, alors $\forall t$:

$$
\tilde{x}(t)^T w_1 + w_0 - y(t) \times \tilde{x}(t)^T w_2 = y(t)
\\
<=> \tilde{x}(t)^T w_1 + w_0 = y(t) (1 + \tilde{x}(t)^T w_2)
$$
Soit : 
$$
y(t) = \frac{\tilde{x}(t)^T w_1 + w_0 }{\tilde{x}(t)^T w_2 + 1}
$$
Comme y et w_0 sont des scalaires, ils sont invariants par transposés d'où le résultat.


$\textbf{Question 3.2 :}$

In [2]:
import numpy as np
import matplotlib.pyplot as plt

# Loading data

data_matrix_train, COP_train, data_matrix_test, COP_test, names = np.load('data_center_data_matrix.npy', allow_pickle=True)

# Constructing matrices for min_w ||A w - b||_2**2

matrix_mean = np.mean(data_matrix_train, axis=0)
M = data_matrix_train - matrix_mean
matrix_std = np.std(M, axis=0)
M = M / matrix_std

A = np.hstack([M, np.ones((M.shape[0],1)), -(M.T * COP_train[:,3]).T])
b = COP_train[:,3]

# Constructing matrices for the test set

M_test = (data_matrix_test - matrix_mean) / matrix_std
A_test = np.hstack([M_test, np.ones((M_test.shape[0],1)), -(M_test.T * COP_test[:,3]).T])
b_test = COP_test[:,3]


# Loading raw data
import pandas as pd
data = pd.read_csv('Raw_Dataset_May.csv')

def name_to_subcategory_and_details(col_name):
    if np.isreal(col_name):
        col_name = names[col_name]
    indices = np.nonzero((data['NAME'] == col_name).values)[0]
    if len(indices) > 0:
        subcategory = data['SUBCATEGORY'].iloc[[indices[0]]].values[0]
        details = data['DETAILS'].iloc[[indices[0]]].values[0]
        return subcategory, details
    else:
        print('unknown name')

In [3]:
w, residuals, rank, s = np.linalg.lstsq(A, b, rcond=None)

print("solution: ", w)


solution:  [-0.00927821  0.08309371 -0.03672704 ...  0.01980595 -0.03057174
 -0.01188614]


$\textbf{Question 3.3 :}$

In [4]:

MSE_test = np.mean((A_test @ w - b_test)**2)
print("MSE test: ", MSE_test)


MSE test:  780.8984793523563


L'erreur est grande.

$\textbf{Question 3.4 :}$

In [5]:
lambda_reg = 100

I = np.identity(A.shape[1])

w_ridge = np.linalg.inv(A.T @ A + lambda_reg * I) @ A.T @ b


b_pred_reg = A_test @ w_ridge


mse_reg = np.mean((b_test - b_pred_reg)**2)

print("Unregularized MSE:", MSE_test)
print("Regularized MSE:", mse_reg)

Unregularized MSE: 780.8984793523563
Regularized MSE: 301.0548280896588


$\textbf{Question 3.5 :}$

$$ \frac{1}{2} ||Aw-b||^2 = \frac{1}{2} (Aw-b)^T (Aw-b) 


\\ grad(f_1)(w) = A^T(Aw-b) + \lambda w
$$




$\textbf{Question 3.6 :}$



In [100]:

def descente(A,b, alpha=1e-2, eps=1, maxIter=10000):
    w = np.zeros(A.shape[1])
    grad = A.T @ (A @ w - b) + lambda_reg * w 
    
    i=0
    while np.linalg.norm(grad)>eps: 
        grad = A.T @ (A @ w - b) + lambda_reg * w 
        w = w-alpha*grad 
        i += 1
        
        if i > maxIter:
            return None 
    return w,i
     
w_optimal = descente(A_test, b_test)


  grad = A.T @ (A @ w - b) + lambda_reg * w


In [101]:
w_optimal = descente(A_test, b_test, w_ridge)
print("Optimal w:", w_optimal)

Optimal w: (array([inf, inf, nan, ..., nan, nan, inf]), 50)


  grad = A.T @ (A @ w - b) + lambda_reg * w
  grad = A.T @ (A @ w - b) + lambda_reg * w


$\textbf{Question 4.1 :}$

$ f_2(w) = \frac{1}{2} ||Aw-b||^2 $

$ g_2(w) = \lambda ||w||_1 $

Et on a : $ grad(f_2)(w) = A^T(Aw-b) $

Ainsi que : $\text{prox}_{\lambda g_2}(w_i) = \text{sign}(w_i) \cdot \max(|w_i| - \lambda, 0) \quad \text{pour tout } i $



In [105]:
def prox_g2(w,lamb):
    for i in range(w.shape[0]):
        w[i] = np.sign(w[i]) * max(np.abs(w[i]) - lamb, 0)
    return w


def proximal_gradient_descent(A, b, lambda_val, alpha, max_iterations=1000):
    m, n = A.shape
    w = np.zeros(n)  
    for iteration in range(max_iterations):
        gradient = A.T @ (A @ w - b)
        w_new = prox_g2(w - alpha * gradient, alpha * lambda_val)
        
        if np.linalg.norm(w_new - w) < 1e-3:
            break
        w = w_new

    return w

lambda_val = 200
alpha = 0.001

w_optimal = proximal_gradient_descent(A, b, lambda_val, alpha)
print(w_optimal)
    

  gradient = A.T @ (A @ w - b)


[nan nan nan ... nan nan nan]
