Ce notebook a pour objectif principal d'effectuer des régressions panels sur nos données.

L'utilisation du panel a pour objectif d'effacer les effets fixes liées à la commune. Ainsi, le "alpha_i" dans l'équation économétrique d'une transaction immobilière représente tous les effets fixes influençant le prix de transaction qui sont directement liées à la commune et non à la caractéristique de la maison.

L'idée qu'il y ait des effets fixes liées à la commune nous vient du fait que lorsque l'on regarde les statistiques descriptives du prix de transaction par commune, on remarque une réelle disparité entre départements.

## Partie 1 : Nettoyage des données

On introduit des variables indicatrices nécessaires à nos régressions et on supprime le peu d'observations n'ayant pas toutes les variables requises (en particulier le code postal)

In [2]:
#Mise en place des données

import pandas as pd
import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
from linearmodels.panel import PanelOLS, FirstDifferenceOLS, RandomEffects
import pandas as pd

df = pd.read_csv("/home/onyxia/work/projet_statapp_inondations-12/data/DVF_corrige.csv")


In [26]:
df.isna().sum()

fid                                0
Unnamed: 0.1                       0
Unnamed: 0                         0
id_mutation                        0
date_mutation                      0
numero_disposition                 0
valeur_fonciere                    0
adresse_numero                   690
adresse_suffixe                94268
adresse_nom_voie                   1
adresse_code_voie                  0
code_postal_x                      1
nom_commune                        0
code_departement                   0
surface_reelle_bati                0
nombre_pieces_principales          0
surface_terrain                    0
longitude                          0
latitude                           0
prix_maison                        0
nombre_dependances                 0
prix_terrain                       0
risque_debordement_fort            0
risque_debordement_moyen           0
risque_debordement_faible          0
risque_submersion_fort             0
risque_submersion_moyen            0
r

In [27]:
df.mode().iloc[0]

fid                                      1
Unnamed: 0.1                             0
Unnamed: 0                               0
id_mutation                    2021-132350
date_mutation                   28/07/2021
numero_disposition                     1.0
valeur_fonciere                   200000.0
adresse_numero                         2.0
adresse_suffixe                          B
adresse_nom_voie                LE VILLAGE
adresse_code_voie                     0020
code_postal_x                      34500.0
nom_commune                          Nîmes
code_departement                      34.0
surface_reelle_bati                   90.0
nombre_pieces_principales              4.0
surface_terrain                      500.0
longitude                         3.049139
latitude                         43.013674
prix_maison                       200000.0
nombre_dependances                     0.0
prix_terrain                      200000.0
risque_debordement_fort                0.0
risque_debo

In [4]:
df = df[df["nombre_pieces_principales"] >= 1]




#Ici on introduit plusieurs indicatrices pour pouvoir les comparer, on le utilise pas toutes dans une seule et même régression
df["indic_distance_littoral<10km"] = (df["distance_littoral"] <= 10).astype(int)
df["distance_littoral<10km"] = (df["distance_littoral"] <= 10).astype(int)*df["distance_littoral"]

df["indic_distance_littoral<5km"] = (df["distance_littoral"] <= 5).astype(int)
df["distance_littoral<5km"] = (df["distance_littoral"] <= 5).astype(int)*df["distance_littoral"]

df["indic_distance_littoral<2.5km"] = (df["distance_littoral"] <= 2.5).astype(int)
df["distance_littoral<2.5km"] = (df["distance_littoral"] <= 2.5).astype(int)*df["distance_littoral"]

df["indic_distance_littoral<100m"] = (df["distance_littoral"] <= 0.1).astype(int)
df["distance_littoral<100m"] = (df["distance_littoral"] <= 0.1).astype(int)*df["distance_littoral"]

df["indic_distance_littoral<200m"] = (df["distance_littoral"] <= 0.2).astype(int)
df["distance_littoral<200m"] = (df["distance_littoral"] <= 0.2).astype(int)*df["distance_littoral"]



df["indic_distance_fleuve<200m"] = (df["distance_fleuve"] <= 0.2).astype(int)
df["distance_fleuve<200m"] = (df["distance_fleuve"] <= 0.2).astype(int)*df["distance_fleuve"]

df["indic_distance_fleuve<100m"] = (df["distance_fleuve"] <= 0.1).astype(int)
df["distance_fleuve<100m"] = (df["distance_fleuve"] <= 0.1).astype(int)*df["distance_fleuve"]

df["indic_distance_fleuve<1km"] = (df["distance_fleuve"] <= 1).astype(int)
df["distance_fleuve<1km"] = (df["distance_fleuve"] <= 1).astype(int)*df["distance_fleuve"]

#On exclue 0 car c'est la valeur la plus commune
df["dep_1"] = (df["nombre_dependances"] == 1).astype(int)
df["dep_2"] = (df["nombre_dependances"] == 2).astype(int)
df["dep_3plus"] = (df["nombre_dependances"] >= 3).astype(int)

#On exclue 0 car c'est une valeur impossible et on exclue 4 car c'est la valeur la plus commune
df["piece_1"] = (df["nombre_pieces_principales"] == 1).astype(int)
df["piece_2"] = (df["nombre_pieces_principales"] == 2).astype(int)
df["piece_3"] = (df["nombre_pieces_principales"] == 3).astype(int)
df["piece_5"] = (df["nombre_pieces_principales"] == 5).astype(int)
df["piece_6plus"] = (df["nombre_pieces_principales"] >= 6).astype(int)


df['log_prix_par_metre_carre'] = np.log(df['prix_par_metre_carre'])
df['log_prix_maison'] = np.log(df['prix_maison'])

In [29]:
df.isna().sum()

fid                         0
Unnamed: 0.1                0
Unnamed: 0                  0
id_mutation                 0
date_mutation               0
                           ..
piece_3                     0
piece_5                     0
piece_6plus                 0
log_prix_par_metre_carre    0
log_prix_maison             0
Length: 65, dtype: int64

In [5]:
X = df[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.141
Model:                                  OLS   Adj. R-squared:                  0.141
Method:                       Least Squares   F-statistic:                     78.70
Date:                      Fri, 02 May 2025   Prob (F-statistic):           5.18e-86
Time:                              06:23:38   Log-Likelihood:                -76736.
No. Observations:                    101539   AIC:                         1.535e+05
Df Residuals:                        101516   BIC:                         1.537e+05
Df Model:                                22                                         
Covariance Type:                    cluster                                         
                                   coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------

## Partie 2 : Régression Panel avec effets fixes pour la commune

In [30]:
X = df[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df["log_prix_par_metre_carre"]
group = df["codeInsee"]

X_within = X - X.groupby(group).transform("mean")
Y_within = Y - Y.groupby(group).transform("mean")

model = sm.OLS(Y_within, X_within).fit(cov_type='cluster', cov_kwds={'groups': df["codeInsee"]})

print(model.summary())


                                    OLS Regression Results                                   
Dep. Variable:     log_prix_par_metre_carre   R-squared (uncentered):                   0.087
Model:                                  OLS   Adj. R-squared (uncentered):              0.087
Method:                       Least Squares   F-statistic:                              140.7
Date:                      Thu, 01 May 2025   Prob (F-statistic):                   3.25e-287
Time:                              18:43:33   Log-Likelihood:                         -56238.
No. Observations:                    101539   AIC:                                  1.125e+05
Df Residuals:                        101517   BIC:                                  1.127e+05
Df Model:                                22                                                  
Covariance Type:                    cluster                                                  
                                   coef    std err          

In [31]:
df["temps"] = 0
df_panel = df.set_index(["codeInsee", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)


                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0862
Estimator:                         PanelOLS   R-squared (Between):             -0.0176
No. Observations:                    101539   R-squared (Within):               0.0000
Date:                      Thu, May 01 2025   R-squared (Overall):             -0.0061
Time:                              18:46:07   Log-likelihood                 -5.63e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      431.23
Entities:                               998   P-value                           0.0000
Avg Obs:                             101.74   Distribution:               F(22,100519)
Min Obs:                             1.0000                                           
Max Obs:                             3064.0

In [32]:
df["temps"] = 0
df_panel = df.set_index(["codeInsee", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<5km",
    "distance_littoral<5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)

                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0861
Estimator:                         PanelOLS   R-squared (Between):             -0.0214
No. Observations:                    101539   R-squared (Within):               0.0000
Date:                      Thu, May 01 2025   R-squared (Overall):             -0.0170
Time:                              18:47:10   Log-likelihood                 -5.63e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      430.45
Entities:                               998   P-value                           0.0000
Avg Obs:                             101.74   Distribution:               F(22,100519)
Min Obs:                             1.0000                                           
Max Obs:                             3064.0

In [33]:
df["temps"] = 0
df_panel = df.set_index(["codeInsee", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<2.5km",
    "distance_littoral<2.5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)

                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0860
Estimator:                         PanelOLS   R-squared (Between):             -0.0211
No. Observations:                    101539   R-squared (Within):               0.0000
Date:                      Thu, May 01 2025   R-squared (Overall):             -0.0161
Time:                              18:49:16   Log-likelihood                -5.631e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      429.99
Entities:                               998   P-value                           0.0000
Avg Obs:                             101.74   Distribution:               F(22,100519)
Min Obs:                             1.0000                                           
Max Obs:                             3064.0

In [34]:
df["temps"] = 0
df_panel = df.set_index(["codeInsee", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<200m",
    "distance_littoral<200m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)

                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0862
Estimator:                         PanelOLS   R-squared (Between):             -0.0176
No. Observations:                    101539   R-squared (Within):               0.0000
Date:                      Thu, May 01 2025   R-squared (Overall):             -0.0060
Time:                              18:50:32   Log-likelihood                 -5.63e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      431.26
Entities:                               998   P-value                           0.0000
Avg Obs:                             101.74   Distribution:               F(22,100519)
Min Obs:                             1.0000                                           
Max Obs:                             3064.0

In [7]:
df["temps"] = 0
df_panel = df.set_index(["codeInsee", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)



                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0873
Estimator:                         PanelOLS   R-squared (Between):             -0.0208
No. Observations:                    101539   R-squared (Within):               0.0000
Date:                      Fri, May 02 2025   R-squared (Overall):             -0.0082
Time:                              06:26:34   Log-likelihood                -5.624e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      436.98
Entities:                               998   P-value                           0.0000
Avg Obs:                             101.74   Distribution:               F(22,100519)
Min Obs:                             1.0000                                           
Max Obs:                             3064.0

In [9]:
df["temps"] = 0
df_panel = df.set_index(["codeInsee", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_maison"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)

                          PanelOLS Estimation Summary                           
Dep. Variable:        log_prix_maison   R-squared:                        0.4753
Estimator:                   PanelOLS   R-squared (Between):              0.1038
No. Observations:              101539   R-squared (Within):               0.0000
Date:                Fri, May 02 2025   R-squared (Overall):              0.1088
Time:                        06:27:18   Log-likelihood                -6.004e+04
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      4138.8
Entities:                         998   P-value                           0.0000
Avg Obs:                       101.74   Distribution:               F(22,100519)
Min Obs:                       1.0000                                           
Max Obs:                       3064.0   F-statistic (robust):             4138.8
                            

In [36]:
df["temps"] = 0
df_panel = df.set_index(["codeInsee", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<1km",
    "distance_fleuve<1km",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

Y = df_panel["log_prix_par_metre_carre"]


model = PanelOLS(Y, X, entity_effects=True).fit()

print(model.summary)

                             PanelOLS Estimation Summary                              
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.0880
Estimator:                         PanelOLS   R-squared (Between):             -0.0353
No. Observations:                    101539   R-squared (Within):               0.0000
Date:                      Thu, May 01 2025   R-squared (Overall):             -0.0214
Time:                              18:52:21   Log-likelihood                 -5.62e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      440.93
Entities:                               998   P-value                           0.0000
Avg Obs:                             101.74   Distribution:               F(22,100519)
Min Obs:                             1.0000                                           
Max Obs:                             3064.0

## Partie 3 : Régression panel avec effets aléatoires sur la commune

On considère maintenant des effets aléatoires au niveau de la commune. On considère donc que les alpha_i sont tirés aléatoirement selon une distribution propre à chaque commune

In [None]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3239
Estimator:                    RandomEffects   R-squared (Between):              0.1871
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1215
Time:                              13:09:32   Log-likelihood                -5.854e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2210.8
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [None]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<5km",
    "distance_littoral<5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3173
Estimator:                    RandomEffects   R-squared (Between):              0.1612
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1087
Time:                              13:41:58   Log-likelihood                -5.862e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2144.5
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [None]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<2.5km",
    "distance_littoral<2.5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3165
Estimator:                    RandomEffects   R-squared (Between):              0.1449
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1006
Time:                              13:42:12   Log-likelihood                -5.864e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2136.6
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [87]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<200m",
    "distance_littoral<200m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3210
Estimator:                    RandomEffects   R-squared (Between):              0.1697
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1157
Time:                              13:44:31   Log-likelihood                 -5.86e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2181.2
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

In [11]:
df["temps"] = 0
df_panel = df.set_index(["codeInsee", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)



                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.6763
Estimator:                    RandomEffects   R-squared (Between):              0.1320
No. Observations:                    101539   R-squared (Within):               0.0000
Date:                      Fri, May 02 2025   R-squared (Overall):              0.0606
Time:                              06:28:40   Log-likelihood                -5.698e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      9640.6
Entities:                               998   P-value                           0.0000
Avg Obs:                             101.74   Distribution:               F(22,101516)
Min Obs:                             1.0000                                           
Max Obs:                             3064.0

In [39]:
df["temps"] = 0
df_panel = df.set_index(["codeInsee", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_maison"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                        RandomEffects Estimation Summary                        
Dep. Variable:        log_prix_maison   R-squared:                        0.8554
Estimator:              RandomEffects   R-squared (Between):              0.4112
No. Observations:              101539   R-squared (Within):               0.0000
Date:                Thu, May 01 2025   R-squared (Overall):              0.4010
Time:                        20:02:40   Log-likelihood                -6.079e+04
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                    2.73e+04
Entities:                         998   P-value                           0.0000
Avg Obs:                       101.74   Distribution:               F(22,101516)
Min Obs:                       1.0000                                           
Max Obs:                       3064.0   F-statistic (robust):             4162.7
                            

In [85]:
df["temps"] = 0
df_panel = df.set_index(["code_postal_x", "temps"])


X = df_panel[[
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<1km",
    "distance_fleuve<1km",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X = sm.add_constant(X)
Y = df_panel["log_prix_par_metre_carre"]

model_random_effets = RandomEffects(Y,X)
random_effets_resultats = model_random_effets.fit()



print(random_effets_resultats.summary)

                           RandomEffects Estimation Summary                           
Dep. Variable:     log_prix_par_metre_carre   R-squared:                        0.3226
Estimator:                    RandomEffects   R-squared (Between):              0.1966
No. Observations:                    101538   R-squared (Within):               0.0000
Date:                      Mon, Apr 28 2025   R-squared (Overall):              0.1278
Time:                              13:44:25   Log-likelihood                 -5.85e+04
Cov. Estimator:                  Unadjusted                                           
                                              F-statistic:                      2197.7
Entities:                               200   P-value                           0.0000
Avg Obs:                             507.69   Distribution:               F(22,101515)
Min Obs:                             16.000                                           
Max Obs:                             2609.0

## Partie 4 : Régression avec la moyenne par commune

Une troisième méthode est d'introduire la moyenne des prix par commune dans la régression OLS

In [16]:

X = df[[
    "prix_par_metre_carre",
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()



Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["codeInsee"]})

print(model.summary())


                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.895
Model:                                  OLS   Adj. R-squared:                  0.895
Method:                       Least Squares   F-statistic:                     419.3
Date:                      Fri, 02 May 2025   Prob (F-statistic):               0.00
Time:                              06:31:36   Log-Likelihood:                 30040.
No. Observations:                    101539   AIC:                        -6.003e+04
Df Residuals:                        101515   BIC:                        -5.980e+04
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                   coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------

In [42]:

X = df[[
    "prix_par_metre_carre",
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()



Y = df["log_prix_maison"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["codeInsee"]})

print(model.summary())

                            OLS Regression Results                            
Dep. Variable:        log_prix_maison   R-squared:                       0.904
Model:                            OLS   Adj. R-squared:                  0.904
Method:                 Least Squares   F-statistic:                     2451.
Date:                Thu, 01 May 2025   Prob (F-statistic):               0.00
Time:                        20:09:40   Log-Likelihood:                 8190.7
No. Observations:              101539   AIC:                        -1.633e+04
Df Residuals:                  101515   BIC:                        -1.610e+04
Df Model:                          23                                         
Covariance Type:              cluster                                         
                                   coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------------------------
const           

In [82]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<5km",
    "distance_littoral<5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.389
Model:                                  OLS   Adj. R-squared:                  0.388
Method:                       Least Squares   F-statistic:                     203.3
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          5.59e-125
Time:                              13:41:22   Log-Likelihood:                -59482.
No. Observations:                    101538   AIC:                         1.190e+05
Df Residuals:                        101514   BIC:                         1.192e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------

In [77]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<2.5km",
    "distance_littoral<2.5km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.389
Model:                                  OLS   Adj. R-squared:                  0.388
Method:                       Least Squares   F-statistic:                     203.6
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          4.84e-125
Time:                              13:39:13   Log-Likelihood:                -59481.
No. Observations:                    101538   AIC:                         1.190e+05
Df Residuals:                        101514   BIC:                         1.192e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------

In [78]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<200m",
    "distance_littoral<200m",
    "indic_distance_fleuve<100m",
    "distance_fleuve<100m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.388
Model:                                  OLS   Adj. R-squared:                  0.388
Method:                       Least Squares   F-statistic:                     205.6
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          1.97e-125
Time:                              13:39:16   Log-Likelihood:                -59499.
No. Observations:                    101538   AIC:                         1.190e+05
Df Residuals:                        101514   BIC:                         1.193e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------

In [79]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<200m",
    "distance_fleuve<200m",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.390
Model:                                  OLS   Adj. R-squared:                  0.389
Method:                       Least Squares   F-statistic:                     214.5
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          3.41e-127
Time:                              13:39:37   Log-Likelihood:                -59395.
No. Observations:                    101538   AIC:                         1.188e+05
Df Residuals:                        101514   BIC:                         1.191e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------

In [81]:
prix_par_metre_carre_moyen_commune = df.groupby("code_postal_x")["prix_par_metre_carre"].mean()

df["prix_par_metre_carre_moyen_commune"] = df["code_postal_x"].map(prix_par_metre_carre_moyen_commune)

X = df[[
    "prix_par_metre_carre_moyen_commune",
    "distance_mairie_km",
    "indic_distance_littoral<10km",
    "distance_littoral<10km",
    "indic_distance_littoral<100m",
    "distance_littoral<100m",
    "indic_distance_fleuve<1km",
    "distance_fleuve<1km",
    "surface_reelle_bati",
    "piece_1",
    "piece_2",    
    "piece_3",  
    "piece_5",  
    "piece_6plus",
    "dep_1",
    "dep_2",
    "dep_3plus",
    "risque_debordement_fort",
    "risque_debordement_moyen",
    "risque_debordement_faible",
    "risque_submersion_fort",
    "risque_submersion_moyen",
    "risque_submersion_faible"
]].copy()

X["risque_debordement_fort"] = X["risque_debordement_fort"].astype(int)
X["risque_debordement_moyen"] = X["risque_debordement_moyen"].astype(int)
X["risque_debordement_faible"] = X["risque_debordement_faible"].astype(int)
X["risque_submersion_fort"] = X["risque_submersion_fort"].astype(int)
X["risque_submersion_moyen"] = X["risque_submersion_moyen"].astype(int)
X["risque_submersion_faible"] = X["risque_submersion_faible"].astype(int)

Y = df["log_prix_par_metre_carre"]
X = sm.add_constant(X)
model = sm.OLS(Y, X).fit(cov_type='cluster', cov_kwds={'groups': df["code_postal_x"]})

print(model.summary())

                               OLS Regression Results                               
Dep. Variable:     log_prix_par_metre_carre   R-squared:                       0.389
Model:                                  OLS   Adj. R-squared:                  0.389
Method:                       Least Squares   F-statistic:                     206.9
Date:                      Mon, 28 Apr 2025   Prob (F-statistic):          1.08e-125
Time:                              13:40:09   Log-Likelihood:                -59402.
No. Observations:                    101538   AIC:                         1.189e+05
Df Residuals:                        101514   BIC:                         1.191e+05
Df Model:                                23                                         
Covariance Type:                    cluster                                         
                                         coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------