Ce notebook a pour objectif principal d'effectuer des régressions panels sur nos données.

L'utilisation du panel a pour objectif d'effacer les effets fixes liées à la commune. Ainsi, le "alpha_i" dans l'équation économétrique d'une transaction immobilière représente tous les effets fixes influençant le prix de transaction qui sont directement liées à la commune et non à la caractéristique de la maison.

L'idée qu'il y ait des effets fixes liées à la commune nous vient du fait que lorsque l'on regarde les statistiques descriptives du prix de transaction par commune, on remarque une réelle disparité entre départements.

## Partie 1 : Nettoyage des données

On introduit des variables indicatrices nécessaires à nos régressions et on supprime le peu d'observations n'ayant pas toutes les variables requises (en particulier le code postal)

In [61]:
#Mise en place des données

import pandas as pd
import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
from linearmodels.panel import PanelOLS, FirstDifferenceOLS, RandomEffects
import pandas as pd

df = pd.read_csv("/home/onyxia/work/projet_statapp_inondations-11/data/DVF_corrige.csv")
df = df.set_index(['nom_commune']).sort_index()

In [62]:
df = df.dropna(subset=['code_postal_x'])
df = df[df["nombre_pieces_principales"] >= 1]


#On met les risques de submersion sous forme booléenne
df['risque_submersion_faible'] = df['risque_submersion_faible'].map({"vrai": True, "faux": False})
df['risque_submersion_moyen'] = df['risque_submersion_moyen'].map({"vrai": True, "faux": False})
df['risque_submersion_fort'] = df['risque_submersion_fort'].map({"vrai": True, "faux": False})

#Ici on introduit plusieurs indicatrices pour pouvoir les comparer, on le utilise pas toutes dans une seule et même régression
df["indic_distance_littoral<10km"] = (df["distance_littoral"] <= 10).astype(int)
df["distance_littoral<10km"] = (df["distance_littoral"] <= 10).astype(int)*df["distance_littoral"]

df["indic_distance_littoral<5km"] = (df["distance_littoral"] <= 5).astype(int)
df["distance_littoral<5km"] = (df["distance_littoral"] <= 5).astype(int)*df["distance_littoral"]

df["indic_distance_littoral<2.5km"] = (df["distance_littoral"] <= 2.5).astype(int)
df["distance_littoral<2.5km"] = (df["distance_littoral"] <= 2.5).astype(int)*df["distance_littoral"]

df["indic_distance_littoral<100m"] = (df["distance_littoral"] <= 0.1).astype(int)
df["distance_littoral<100m"] = (df["distance_littoral"] <= 0.1).astype(int)*df["distance_littoral"]

df["indic_distance_littoral<200m"] = (df["distance_littoral"] <= 0.2).astype(int)
df["distance_littoral<200m"] = (df["distance_littoral"] <= 0.2).astype(int)*df["distance_littoral"]



df["indic_distance_fleuve<200m"] = (df["distance_fleuve"] <= 0.2).astype(int)
df["distance_fleuve<200m"] = (df["distance_fleuve"] <= 0.2).astype(int)*df["distance_fleuve"]

df["indic_distance_fleuve<100m"] = (df["distance_fleuve"] <= 0.1).astype(int)
df["distance_fleuve<100m"] = (df["distance_fleuve"] <= 0.1).astype(int)*df["distance_fleuve"]

df["indic_distance_fleuve<1km"] = (df["distance_fleuve"] <= 1).astype(int)
df["distance_fleuve<1km"] = (df["distance_fleuve"] <= 1).astype(int)*df["distance_fleuve"]

#On exclue 0 car c'est la valeur la plus commune
df["dep_1"] = (df["nombre_dependances"] == 1).astype(int)
df["dep_2"] = (df["nombre_dependances"] == 2).astype(int)
df["dep_3plus"] = (df["nombre_dependances"] >= 3).astype(int)

#On exclue 0 car c'est une valeur impossible et on exclue 4 car c'est la valeur la plus commune
df["piece_1"] = (df["nombre_pieces_principales"] == 1).astype(int)
df["piece_2"] = (df["nombre_pieces_principales"] == 2).astype(int)
df["piece_3"] = (df["nombre_pieces_principales"] == 3).astype(int)
df["piece_5"] = (df["nombre_pieces_principales"] == 5).astype(int)
df["piece_6plus"] = (df["nombre_pieces_principales"] >= 6).astype(int)


df['log_prix_par_metre_carre'] = np.log(df['prix_par_metre_carre'])

In [63]:
X = df[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.141
Model:                                  OLS   Adj. R-squared:                  0.141
Method:                       Least Squares   F-statistic:                     78.70
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):           9.88e-86
Time:                              12:59:50   Log-Likelihood:                -76735.
No. Observations:                    101538   AIC:                         1.535e+05
Df Residuals:                        101515   BIC:                         1.537e+05
Df Model:                                22                                         
Covariance Type:                    cluster                                         
                                   coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------

## Partie 2 : Régression Panel avec effets fixes pour la commune

In [72]:
X = df[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df["log_prix_par_metre_carre"]
group = df["code_postal_x"]

X_within = X - X.groupby(group).transform("mean")
Y_within = Y - Y.groupby(group).transform("mean")

model = sm.OLS(Y_within, X_within).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())


                                    OLS Regression Results                                   
Dep. Variable:     log_prix_par_metre_carre   R-squared (uncentered):                   0.088
Model:                                  OLS   Adj. R-squared (uncentered):              0.088
Method:                       Least Squares   F-statistic:                              79.56
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):                    3.81e-86
Time:                              13:25:35   Log-Likelihood:                         -58291.
No. Observations:                    101538   AIC:                                  1.166e+05
Df Residuals:                        101516   BIC:                                  1.168e+05
Df Model:                                22                                                  
Covariance Type:                    cluster                                                  
                                   coef    std err          

In [88]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)


                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0868
Estimator:                         PanelOLS   R-squared (Between):             -0.0109
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):             -0.0080
Time:                              13:44:51   Log-likelihood                -5.835e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      437.58
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101316)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [89]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<5km",
    "distance_littoral<5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)

                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0853
Estimator:                         PanelOLS   R-squared (Between):             -0.0214
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):             -0.0193
Time:                              13:45:26   Log-likelihood                -5.843e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      429.63
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101316)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [90]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<2.5km",
    "distance_littoral<2.5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)

                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0850
Estimator:                         PanelOLS   R-squared (Between):             -0.0235
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):             -0.0218
Time:                              13:45:40   Log-likelihood                -5.845e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      428.06
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101316)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [91]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<200m",
    "distance_littoral<200m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)

                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0868
Estimator:                         PanelOLS   R-squared (Between):             -0.0109
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):             -0.0079
Time:                              13:45:52   Log-likelihood                -5.835e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      437.67
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101316)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [92]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)

                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0879
Estimator:                         PanelOLS   R-squared (Between):             -0.0132
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):             -0.0099
Time:                              13:46:04   Log-likelihood                -5.829e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      443.58
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101316)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [93]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<1km",
    "distance_fleuve<1km",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)

                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0885
Estimator:                         PanelOLS   R-squared (Between):             -0.0245
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):             -0.0211
Time:                              13:46:19   Log-likelihood                -5.826e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      446.95
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101316)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

## Partie 3 : Régression panel avec effets aléatoires sur la commune

On considère maintenant des effets aléatoires au niveau de la commune. On considère donc que les alpha_i sont tirés aléatoirement selon une distribution propre à chaque commune

In [None]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3239
Estimator:                    RandomEffects   R-squared (Between):              0.1871
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1215
Time:                              13:09:32   Log-likelihood                -5.854e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2210.8
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [None]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<5km",
    "distance_littoral<5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3173
Estimator:                    RandomEffects   R-squared (Between):              0.1612
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1087
Time:                              13:41:58   Log-likelihood                -5.862e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2144.5
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [None]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<2.5km",
    "distance_littoral<2.5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3165
Estimator:                    RandomEffects   R-squared (Between):              0.1449
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1006
Time:                              13:42:12   Log-likelihood                -5.864e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2136.6
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [87]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<200m",
    "distance_littoral<200m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3210
Estimator:                    RandomEffects   R-squared (Between):              0.1697
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1157
Time:                              13:44:31   Log-likelihood                 -5.86e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2181.2
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [86]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3239
Estimator:                    RandomEffects   R-squared (Between):              0.1871
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1215
Time:                              13:44:28   Log-likelihood                -5.854e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2210.8
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [85]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<1km",
    "distance_fleuve<1km",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3226
Estimator:                    RandomEffects   R-squared (Between):              0.1966
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1278
Time:                              13:44:25   Log-likelihood                 -5.85e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2197.7
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

## Partie 4 : Régression avec la moyenne par commune

Une troisième méthode est d'introduire la moyenne des prix par commune dans la régression OLS

In [76]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.388
Model:                                  OLS   Adj. R-squared:                  0.388
Method:                       Least Squares   F-statistic:                     203.2
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          5.81e-125
Time:                              13:35:53   Log-Likelihood:                -59500.
No. Observations:                    101538   AIC:                         1.190e+05
Df Residuals:                        101514   BIC:                         1.193e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------

In [82]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<5km",
    "distance_littoral<5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.389
Model:                                  OLS   Adj. R-squared:                  0.388
Method:                       Least Squares   F-statistic:                     203.3
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          5.59e-125
Time:                              13:41:22   Log-Likelihood:                -59482.
No. Observations:                    101538   AIC:                         1.190e+05
Df Residuals:                        101514   BIC:                         1.192e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------

In [77]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<2.5km",
    "distance_littoral<2.5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.389
Model:                                  OLS   Adj. R-squared:                  0.388
Method:                       Least Squares   F-statistic:                     203.6
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          4.84e-125
Time:                              13:39:13   Log-Likelihood:                -59481.
No. Observations:                    101538   AIC:                         1.190e+05
Df Residuals:                        101514   BIC:                         1.192e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------

In [78]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<200m",
    "distance_littoral<200m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.388
Model:                                  OLS   Adj. R-squared:                  0.388
Method:                       Least Squares   F-statistic:                     205.6
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          1.97e-125
Time:                              13:39:16   Log-Likelihood:                -59499.
No. Observations:                    101538   AIC:                         1.190e+05
Df Residuals:                        101514   BIC:                         1.193e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------

In [79]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.390
Model:                                  OLS   Adj. R-squared:                  0.389
Method:                       Least Squares   F-statistic:                     214.5
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          3.41e-127
Time:                              13:39:37   Log-Likelihood:                -59395.
No. Observations:                    101538   AIC:                         1.188e+05
Df Residuals:                        101514   BIC:                         1.191e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------

In [81]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<1km",
    "distance_fleuve<1km",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.389
Model:                                  OLS   Adj. R-squared:                  0.389
Method:                       Least Squares   F-statistic:                     206.9
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          1.08e-125
Time:                              13:40:09   Log-Likelihood:                -59402.
No. Observations:                    101538   AIC:                         1.189e+05
Df Residuals:                        101514   BIC:                         1.191e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------