Skip to content

Commit 58c7128

Browse files
change na subset (#362)
* change na subset * deprecated * deprecated * change method * Automated changes * Automated changes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent ebca985 commit 58c7128

File tree

1 file changed

+15
-12
lines changed
  • content/course/modelisation/3_regression

1 file changed

+15
-12
lines changed

content/course/modelisation/3_regression/index.qmd

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -223,10 +223,10 @@ un problème de spécification ?
223223
# 1. Régression linéaire de per_gop sur différentes variables explicatives
224224
xvars = ['Unemployment_rate_2019', 'Median_Household_Income_2019', 'Percent of adults with less than a high school diploma, 2015-19', "Percent of adults with a bachelor's degree or higher, 2015-19"]
225225
226-
df2 = votes[["per_gop"] + xvars]
226+
df2 = votes[["per_gop"] + xvars].copy()
227227
df2['log_income'] = np.log(df2["Median_Household_Income_2019"])
228-
indices_to_keep = ~df2.isin([np.nan, np.inf, -np.inf]).any(1)
229-
df2 = df2[indices_to_keep].astype(np.float64)
228+
df2 = df2.dropna().astype(np.float64)
229+
230230
231231
X_train, X_test, y_train, y_test = train_test_split(
232232
df2.drop(["Median_Household_Income_2019","per_gop"], axis = 1),
@@ -335,10 +335,11 @@ en `log` sinon son échelle risque d'écraser tout effet.
335335
# 1. Régression linéaire de per_gop sur différentes variables explicatives
336336
xvars = ['Unemployment_rate_2019', 'Median_Household_Income_2019', 'Percent of adults with less than a high school diploma, 2015-19', "Percent of adults with a bachelor's degree or higher, 2015-19"]
337337
338-
df2 = votes[["per_gop"] + xvars]
338+
xvars = ['Unemployment_rate_2019', 'Median_Household_Income_2019', 'Percent of adults with less than a high school diploma, 2015-19', "Percent of adults with a bachelor's degree or higher, 2015-19"]
339+
340+
df2 = votes[["per_gop"] + xvars].copy()
339341
df2['log_income'] = np.log(df2["Median_Household_Income_2019"])
340-
indices_to_keep = ~df2.isin([np.nan, np.inf, -np.inf]).any(1)
341-
df2 = df2[indices_to_keep].astype(np.float64)
342+
df2 = df2.dropna().astype(np.float64)
342343
343344
X = sm.add_constant(df2.drop(["Median_Household_Income_2019","per_gop"], axis = 1))
344345
results = sm.OLS(df2[['per_gop']], X).fit()
@@ -487,10 +488,10 @@ une mesure de qualité du modèle.
487488
#1. Modèle logit avec les mêmes variables que précédemment
488489
xvars = ['Unemployment_rate_2019', 'Median_Household_Income_2019', 'Percent of adults with less than a high school diploma, 2015-19', "Percent of adults with a bachelor's degree or higher, 2015-19"]
489490
490-
df2 = votes[["per_gop"] + xvars]
491+
df2 = votes[["per_gop"] + xvars].copy()
491492
df2['log_income'] = np.log(df2["Median_Household_Income_2019"])
492-
indices_to_keep = ~df2.isin([np.nan, np.inf, -np.inf]).any(1)
493-
df2 = df2[indices_to_keep].astype(np.float64)
493+
df2 = df2.dropna().astype(np.float64)
494+
494495
495496
df2['y'] = (df2['per_gop']>0.5).astype(int)
496497
@@ -511,8 +512,10 @@ print(clf.intercept_, clf.coef_)
511512
#| include: false
512513
#| echo: false
513514
515+
from sklearn.metrics import ConfusionMatrixDisplay
516+
514517
# 2. Matrice de confusion
515-
sklearn.metrics.plot_confusion_matrix(clf, X_test, y_test)
518+
ConfusionMatrixDisplay.from_predictions(y_test, y_pred)
516519
517520
sc_accuracy = sklearn.metrics.accuracy_score(y_pred, y_test)
518521
sc_f1 = sklearn.metrics.f1_score(y_pred, y_test)
@@ -564,6 +567,7 @@ En utilisant échantillons d'apprentissage et d'estimation :
564567
de gagner.
565568
2. Faire un test de ratio de vraisemblance concernant l'inclusion de la variable de (log)-revenu.
566569

570+
567571
```{python}
568572
#| include: false
569573
#| echo: false
@@ -576,8 +580,7 @@ xvars = [
576580
577581
df2 = votes[["per_gop"] + xvars]
578582
df2['log_income'] = np.log(df2["Median_Household_Income_2019"])
579-
indices_to_keep = ~df2.isin([np.nan, np.inf, -np.inf]).any(1)
580-
df2 = df2[indices_to_keep].astype(np.float64)
583+
df2 = df2.dropna().astype(np.float64)
581584
582585
df2['y'] = (df2['per_gop']>0.5).astype(int)
583586

0 commit comments

Comments
 (0)