# Pregunta 5

## Sin el uso de librerías programe las penalizaciones l1 y l2, aplicando normalización.

### R.-

# L1 and L2 Regularization Methods
A regression model that uses the L1 regularization technique is called **lasso regression** and a model that uses the L2 is called **ridge regression**. The key difference between these two is the penalty term.

>Un modelo de regresión que utiliza la técnica de regularización L1 se denomina **regresión de lazo** y un modelo que utiliza la técnica de regularización L2 se denomina **regresión de cresta**. La diferencia clave entre estos dos es el plazo de sanción.

# L1 Regularization: Lasso regression
Lasso is an acronym for least absolute shrinkage and selection operator, and lasso regression adds the “absolute value of magnitude” of the coefficient as a penalty term to the loss function.

> Lasso es un acrónimo de operador de selección y contracción mínima absoluta, y la regresión de lazo agrega el "valor absoluto de magnitud" del coeficiente como término de penalización a la función de pérdida.

$$ \sum_{i=1}^{n}{(Y_{1} - \sum_{j=1}^{p}{X_{ij}\beta})^{2}}+\lambda\sum_{j=1}^{p}{| \beta_{j}|}$$

Again, if lambda is zero, then we'll get back OLS (ordinary least squares) whereas a very large value will make coefficients zero, which means it will become underfit.

>Nuevamente, si lambda es cero, entonces obtendremos MCO (mínimos cuadrados ordinarios), mientras que un valor muy grande hará que los coeficientes sean cero, lo que significa que quedará insuficiente.

# L2 Regularization: Ridge Regression

Ridge regression adds the “squared magnitude” of the coefficient as the penalty term to the loss function. The highlighted part below represents the L2 regularization element.

> La regresión de cresta agrega la “magnitud al cuadrado” del coeficiente como término de penalización a la función de pérdida. La parte resaltada a continuación representa el elemento de regularización L2.

$$ \sum_{i=1}^{n}{(Y_{1} - \sum_{j=1}^{p}{X_{ij}\beta})^{2}}+\lambda\sum_{j=1}^{p}{\beta_{j}^{2}}$$

Here, if lambda is zero then you can imagine we get back OLS. However, if lambda is very large then it will add too much weight and lead to underfitting. Having said that, how we choose lambda is important. This technique works very well to avoid overfitting issues.

> Aquí, si lambda es cero, entonces puedes imaginar que recuperamos OLS. Sin embargo, si lambda es muy grande, agregará demasiado peso y provocará un ajuste insuficiente. Dicho esto, es importante cómo elegimos lambda. Esta técnica funciona muy bien para evitar problemas de sobreajuste.

In [1]:
import numpy as np
import pandas as pd

# Función para manejar la normalización
def normalizar(df, norm='l2'):
    # Convertir el DataFrame a un array de numpy para operaciones numéricas
    X = df.values
    
    if norm == 'l1':
        # Normalización L1: sumar los valores absolutos de cada fila y dividir
        norms = np.sum(np.abs(X), axis=1)
    elif norm == 'l2':
        # Normalización L2: calcular la raíz cuadrada de la suma de los cuadrados de cada fila
        norms = np.sqrt(np.sum(X**2, axis=1))
    else:
        raise ValueError("Norm type not supported. Use 'l1' or 'l2'.")
    
    # Manejar casos donde la norma es cero para evitar divisiones por cero
    norms[norms == 0] = 1
    
    # Normalizar dividiendo cada fila por su norma respectiva
    X_normalized = X / norms[:, None]
    
    # Convertir de nuevo a DataFrame, manteniendo los mismos índices y columnas
    return pd.DataFrame(X_normalized, index=df.index, columns=df.columns)

Abriendo dataframes

In [2]:
import os

current_dir = os.getcwd()

input_folder = os.path.join(os.path.dirname(current_dir), "Pregunta 2/")

In [3]:
import pandas as pd

df_r = pd.read_csv(input_folder+'/dataset_r_02.csv', header=None)
df_g = pd.read_csv(input_folder+'/dataset_g_02.csv', header=None)
df_b = pd.read_csv(input_folder+'/dataset_b_02.csv', header=None)
df_k = pd.read_csv(input_folder+'/dataset_k_02.csv', header=None)


Aplicando dos normalizaciones a dos datasets de diferentes colores e imprimiendo

Normalización l1

In [8]:
df_r_Notm_l1 = normalizar(df_r, norm='l1')
df_g_Notm_l1 = normalizar(df_g, norm='l1')

In [5]:
print(df_r_Notm_l1)

          0         1         2         3         4         5         6     \
0     0.000264  0.000269  0.000272  0.000276  0.000259  0.000264  0.000107   
1     0.000233  0.000258  0.000241  0.000270  0.000246  0.000243  0.000240   
2     0.000563  0.000600  0.000603  0.000563  0.000615  0.000630  0.000360   
3     0.000242  0.000237  0.000227  0.000242  0.000287  0.000342  0.000350   
4     0.000319  0.000316  0.000334  0.000316  0.000317  0.000314  0.000322   
...        ...       ...       ...       ...       ...       ...       ...   
4995  0.000039  0.000003  0.000008  0.000029  0.000000  0.000002  0.000007   
4996  0.000254  0.000277  0.000256  0.000288  0.000284  0.000303  0.000299   
4997  0.000351  0.000355  0.000351  0.000355  0.000362  0.000371  0.000362   
4998  0.000238  0.000258  0.000266  0.000277  0.000238  0.000257  0.000254   
4999  0.000201  0.000221  0.000218  0.000239  0.000248  0.000272  0.000303   

          7         8         9     ...      4086      4087    

In [9]:
df_r_Notm_l1.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,4086,4087,4088,4089,4090,4091,4092,4093,4094,4095
count,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,...,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0
mean,0.000281,0.000281,0.000284,0.000285,0.000285,0.000285,0.000288,0.000288,0.00029,0.000289,...,0.000227,0.000226,0.000225,0.000223,0.000223,0.000221,0.000221,0.00022,0.000219,0.000221
std,0.000103,0.000102,0.000103,0.000102,0.000102,0.000102,0.000102,0.000102,0.000102,0.000103,...,9.9e-05,9.9e-05,9.8e-05,9.9e-05,9.9e-05,9.9e-05,9.8e-05,9.9e-05,9.9e-05,0.000101
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.000221,0.000222,0.000225,0.000226,0.000227,0.000228,0.000228,0.000231,0.000232,0.000232,...,0.000159,0.000157,0.000154,0.000154,0.000152,0.000151,0.000153,0.000149,0.00015,0.00015
50%,0.000277,0.000278,0.000282,0.000283,0.000284,0.000283,0.000287,0.000287,0.000287,0.000288,...,0.000225,0.000225,0.000224,0.000222,0.000221,0.000217,0.000218,0.000217,0.000217,0.000218
75%,0.000337,0.000339,0.000339,0.00034,0.000341,0.000341,0.000343,0.000343,0.000344,0.000345,...,0.000288,0.000287,0.000287,0.000287,0.000285,0.000284,0.000284,0.000279,0.000281,0.000284
max,0.001086,0.000948,0.001048,0.001004,0.000964,0.001055,0.000924,0.001074,0.001112,0.001079,...,0.000782,0.000789,0.000789,0.000785,0.000798,0.000795,0.000776,0.000753,0.000785,0.000776


Normalización l2

In [11]:
df_b_Notm_l2 = normalizar(df_b, norm='l2')
df_k_Notm_l2 = normalizar(df_k, norm='l2')

In [7]:
print(df_b_Notm_l2)

          0         1         2         3         4         5         6     \
0     0.014982  0.015195  0.014557  0.015195  0.014664  0.016576  0.005525   
1     0.015273  0.016633  0.015169  0.015901  0.012763  0.012135  0.012763   
2     0.034104  0.036284  0.036129  0.033481  0.036596  0.037219  0.019777   
3     0.037413  0.036533  0.034112  0.032351  0.031251  0.031911  0.029270   
4     0.023296  0.023296  0.024842  0.024069  0.024621  0.024621  0.025173   
...        ...       ...       ...       ...       ...       ...       ...   
4995  0.005588  0.003922  0.005196  0.007157  0.006177  0.006667  0.006765   
4996  0.022836  0.024333  0.023085  0.025581  0.026205  0.028077  0.028077   
4997  0.026497  0.026748  0.026999  0.026999  0.027250  0.027376  0.026120   
4998  0.015768  0.017114  0.017883  0.018652  0.016633  0.017883  0.017979   
4999  0.015400  0.017242  0.015568  0.016572  0.015400  0.016740  0.019251   

          7         8         9     ...      4086      4087    

In [12]:
df_b_Notm_l2.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,4086,4087,4088,4089,4090,4091,4092,4093,4094,4095
count,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,...,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0
mean,0.018977,0.019031,0.019366,0.019453,0.019537,0.01955,0.019713,0.019736,0.019866,0.019765,...,0.010846,0.010818,0.010749,0.010666,0.010681,0.010521,0.010567,0.010423,0.010416,0.010578
std,0.007504,0.007393,0.007397,0.007286,0.007333,0.007367,0.007307,0.007218,0.007162,0.007198,...,0.006287,0.006369,0.006313,0.006369,0.006335,0.006303,0.006288,0.006358,0.006345,0.006498
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.014867,0.015216,0.015636,0.015734,0.015916,0.015898,0.016035,0.016259,0.016263,0.016253,...,0.006191,0.006088,0.006053,0.00598,0.005999,0.005795,0.005887,0.005762,0.005676,0.005849
50%,0.018991,0.019056,0.019438,0.019572,0.019666,0.019762,0.019928,0.019992,0.02004,0.019975,...,0.01026,0.010184,0.010045,0.009961,0.009996,0.009824,0.009772,0.009603,0.009661,0.009652
75%,0.023103,0.0231,0.023416,0.023544,0.023583,0.023663,0.02382,0.02377,0.023897,0.023882,...,0.014842,0.014802,0.014813,0.014661,0.014683,0.01454,0.014502,0.014362,0.014421,0.01447
max,0.067116,0.056598,0.062403,0.06429,0.061192,0.070478,0.060444,0.065089,0.066489,0.062809,...,0.044816,0.044275,0.043733,0.042856,0.044558,0.04319,0.043552,0.042506,0.049105,0.053222
