# Laborator 6

Ce îi poate face pe oameni fericiți? Se consideră problema predicției gradului de fericire a populației globului folosind informații despre diferite caracteristici a bunăstării respectivei populații precum Produsul intern brut al țării în care locuiesc (gross domestic product – GBP), gradul de fericire, etc.

Folsind datele aferente anului 2017 (fisierul v1_world-happiness-report-2017.csv), să se realizeze o predicție a gradului de fericire în funcție:

doar de Produsul intern brut (exemplu detaliat live - demo)
doar de caracteristica "Family" (tema)
de Produsul intern brut si de gradul de libertate (temă).

### Cu librarii exterene

In [12]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

def run_tool_regression(file_path, feature_cols):
    # Încarcă și curăță datele
    df = pd.read_csv(file_path)

    # Eliminăm rândurile cu valori lipsă (NaN) în coloanele relevante
    df = df[feature_cols + ["Happiness.Score"]].dropna()

    # Separăm inputurile și outputul
    X = df[feature_cols]
    y = df["Happiness.Score"]

    # Antrenare
    model = LinearRegression()
    model.fit(X, y)

    # Predicție + evaluare
    y_pred = model.predict(X)
    mse = mean_squared_error(y, y_pred)

    return model.intercept_, model.coef_.tolist(), mse

In [13]:
file_paths = {
    "V1": "data/v1_world-happiness-report-2017.csv",
    "V2": "data/v2_world-happiness-report-2017.csv",
    "V3": "data/v3_world-happiness-report-2017.csv"
}

scenarios = [
    ("PIB", ["Economy..GDP.per.Capita."]),
    ("Family", ["Family"]),
    ("PIB + Libertate", ["Economy..GDP.per.Capita.", "Freedom"])
]

for label, path in file_paths.items():
    print(f"\n{label}")
    for name, features in scenarios:
        intercept, coef, mse = run_tool_regression(path, features)
        print(f"→ {name} | Intercept: {intercept:.4f} | Coef: {coef} | MSE: {mse:.4f}")


V1
→ PIB | Intercept: 3.2032 | Coef: [2.1841849464150878] | MSE: 0.4322
→ Family | Intercept: 7.2485 | Coef: [-1.2545734699510898] | MSE: 1.2714
→ PIB + Libertate | Intercept: 2.5461 | Coef: [1.8735928859219293, 2.3557110577423854] | MSE: 0.3251

V2
→ PIB | Intercept: 3.2032 | Coef: [2.1841849464150878] | MSE: 0.4322
→ Family | Intercept: 1.8298 | Coef: [2.96424751567571] | MSE: 0.5510
→ PIB + Libertate | Intercept: 3.2031 | Coef: [-3068866.3347596293, 6137737.037906636] | MSE: 0.4321

V3
→ PIB | Intercept: 3.2184 | Coef: [2.1702564070923027] | MSE: 0.4375
→ Family | Intercept: 1.8298 | Coef: [2.96424751567571] | MSE: 0.5510
→ PIB + Libertate | Intercept: 2.5494 | Coef: [1.8673876865038552, 2.4115244508801914] | MSE: 0.3148


### Fara Librarii externe

In [14]:
import csv

def load_data(file_path, x_columns):
    X = []
    Y = []
    with open(file_path, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for row in reader:
            try:
                x_values = [float(row[col]) for col in x_columns]
                y_value = float(row["Happiness.Score"])
                X.append(x_values)
                Y.append(y_value)
            except ValueError:
                continue
    return X, Y

In [22]:
file_path = "data/v3_world-happiness-report-2017.csv"

X_gdp, Y_gdp = load_data(file_path, ["Economy..GDP.per.Capita."])

X_family, Y_family = load_data(file_path, ["Family"])

X_combo, Y_combo = load_data(file_path, ["Economy..GDP.per.Capita.", "Freedom"])

In [23]:
def transpose(matrix):
    return [list(row) for row in zip(*matrix)]

def dot_product(vec1, vec2):
    return sum(a * b for a, b in zip(vec1, vec2))

def matrix_multiply(A, B):
    B_T = transpose(B)
    return [[dot_product(row_a, col_b) for col_b in B_T] for row_a in A]

In [24]:
def identity_matrix(n):
    return [[1 if i == j else 0 for j in range(n)] for i in range(n)]

def gauss_jordan_inverse(matrix):
    n = len(matrix)
    A = [row[:] for row in matrix]
    I = identity_matrix(n)

    for i in range(n):
        if A[i][i] == 0:
            for j in range(i + 1, n):
                if A[j][i] != 0:
                    A[i], A[j] = A[j], A[i]
                    I[i], I[j] = I[j], I[i]
                    break
            else:
                raise ValueError("Matricea nu este inversabilă (pivot = 0).")

        pivot = A[i][i]
        A[i] = [x / pivot for x in A[i]]
        I[i] = [x / pivot for x in I[i]]

        for j in range(n):
            if i != j:
                factor = A[j][i]
                A[j] = [a - factor * b for a, b in zip(A[j], A[i])]
                I[j] = [a - factor * b for a, b in zip(I[j], I[i])]

    return I


In [25]:
def ridge_inverse(matrix, lambda_reg=1e-8):
    """ Returnează (X^T X + λI)^(-1) """
    n = len(matrix)
    reg_matrix = [[matrix[i][j] + (lambda_reg if i == j else 0) for j in range(n)] for i in range(n)]
    return gauss_jordan_inverse(reg_matrix)

def least_squares_fit_ridge(X, Y, lambda_reg=1e-8):
    X_bias = [[1] + row for row in X]
    XT = transpose(X_bias)
    XTX = matrix_multiply(XT, X_bias)
    XTY = [[dot_product(row, Y)] for row in XT]
    XTX_inv = ridge_inverse(XTX, lambda_reg)
    W = matrix_multiply(XTX_inv, XTY)
    return [w[0] for w in W]

In [19]:
# Antrenăm modelul pe baza PIB-ului
weights_gdp = least_squares_fit_ridge(X_gdp, Y_gdp)
print("Coeficienți regresie (PIB):", weights_gdp)

# Antrenăm modelul pe baza Family
weights_family = least_squares_fit_ridge(X_family, Y_family)
print("Coeficienți regresie (Family):", weights_family)

# Antrenăm modelul pe baza PIB + Libertate
weights_combo = least_squares_fit_ridge(X_combo, Y_combo)
print("Coeficienți regresie (PIB + Libertate):", weights_combo)

Coeficienți regresie (PIB): [3.218426796833498, 2.1702564074613093]
Coeficienți regresie (Family): [1.8298321531871977, 2.964247515055021]
Coeficienți regresie (PIB + Libertate): [2.5493967334566783, 1.867387687392247, 2.411524445971992]


### Predictie si evaluare

In [20]:
def predict(X, weights):
    return [weights[0] + sum(w * x for w, x in zip(weights[1:], row)) for row in X]

def mean_squared_error(y_true, y_pred):
    n = len(y_true)
    return sum((yt - yp) ** 2 for yt, yp in zip(y_true, y_pred)) / n

In [26]:
# Predicții
Y_pred_gdp = predict(X_gdp, weights_gdp)
Y_pred_family = predict(X_family, weights_family)
Y_pred_combo = predict(X_combo, weights_combo)

# Evaluare
mse_gdp = mean_squared_error(Y_gdp, Y_pred_gdp)
mse_family = mean_squared_error(Y_family, Y_pred_family)
mse_combo = mean_squared_error(Y_combo, Y_pred_combo)

print(f"MSE PIB: {mse_gdp:.4f}")
print(f"MSE Family: {mse_family:.4f}")
print(f"MSE PIB + Libertate: {mse_combo:.4f}")

MSE PIB: 0.4375
MSE Family: 0.5510
MSE PIB + Libertate: 0.3148
