# SCIKIT LEARN - STANDARD SCALER

$$z = \frac{(x - \bar{x})}{\sigma}$$
$$\sigma = \sqrt{\frac{\sum_{i=1}^{n} (x_{i} - \bar{x})^{2}}{n}}$$

In [11]:
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

# Sample data x = "square meters" and y  = "price"
x = np.array([45, 50, 60, 60, 70, 80, 90, 90, 100, 110])  # Independent variable (features)
y = np.array([150, 160, 200, 180, 230, 250, 300, 280, 300, 400])  # Dependent variable (target)
# Reshape x for sklearn
x_reshaped = x.reshape(-1, 1)

In [7]:
# Number of observations
n = len(x) 
print(f"Number of observations: {n}")

Number of observations: 10


In [20]:
# Estandarizar los datos
scaler = StandardScaler()
x_scaler = scaler.fit_transform(x_reshaped)
print(f"Standardized Data:\n {x_scaler}")

# Estandarizar los datos manualmente
mean = np.mean(x_reshaped)
std_dev = np.std(x_reshaped)
x_scaler = (x_reshaped - mean) / std_dev
print(f"Mean: {mean}")
print(f"Standard Deviation: {std_dev}")
print(f"Standardized Data (Manual):\n {x_scaler}")

# Crear el modelo de regresión lineal
model = LinearRegression()

# Entrenar el modelo
model.fit(x_scaler, y)

# Forecast the future sales
x_test = np.array([150]).reshape(-1, 1)
x_test_scaler = scaler.transform(x_test)
y_pred_scaler = model.predict(x_test_scaler)
print(f"y_pred_scaler:\n{y_pred_scaler}")

Standardized Data:
 [[-1.46700751]
 [-1.22651448]
 [-0.74552841]
 [-0.74552841]
 [-0.26454234]
 [ 0.21644373]
 [ 0.6974298 ]
 [ 0.6974298 ]
 [ 1.17841587]
 [ 1.65940194]]
Mean: 75.5
Standard Deviation: 20.79062288629179
Standardized Data (Manual):
 [[-1.46700751]
 [-1.22651448]
 [-0.74552841]
 [-0.74552841]
 [-0.26454234]
 [ 0.21644373]
 [ 0.6974298 ]
 [ 0.6974298 ]
 [ 1.17841587]
 [ 1.65940194]]
y_pred_scaler:
[501.37651822]


El método **fit_transform** calcula la media y la desviación estándar de los datos y luego los transforma para que tengan media cero y varianza unitaria (desviación estándar de 1).