Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non linear feature engineering for logistic regression #707

Closed
ogrisel opened this issue Aug 31, 2023 · 1 comment · Fixed by #731
Closed

Non linear feature engineering for logistic regression #707

ogrisel opened this issue Aug 31, 2023 · 1 comment · Fixed by #731
Assignees
Milestone

Comments

@ogrisel
Copy link
Collaborator

ogrisel commented Aug 31, 2023

As a follow up for #701, I suggest that:

  • We replace the notebook currently named "Beyond linear separation in classification" by a new notebook named "Non-linear feature engineering for Logistic Regression"

  • In this notebook we reuse the same 2D synthetic moons and Gaussian quantiles datasets

  • We start with a logistic regression and shows that it underfits

  • Then we build more and more complex pipelines with different preprocessors:

  • KBinsDiscretizer

  • SplineTransformer

  • We observe that those transformers do axis-aligned non linear transformations that lead to axis aligned classification decision boundaries,

  • We explore modeling multiplicative interactions between the derived features with

  • KBinsDirectizer with sparse output followed by PolynomialFeatures(degree=2, interaction_only=True)

  • SplineTransformer followed by Nystroem (either with kernel="rbf" and a good value of gamma or kernel="poly" and degree=2)

Then we add a new exercise with:

  • The half moons dataset only
  • SVC(kernel="linear") (this should give similar underfitting results as logistic regression from the previous notebook
  • then ask the user to try:
    • make_pipeline(Nystroem(kernel="rbf", gamma=some_gamma, n_components=300), SVC(kernel="linear")
    • SVC(kernel="rbf", gamma=some_gamma)
  • the results should be similar

Then we can optionally suggest to try MLPClassifier on this dataset to get somewhat similar results.

@glemaitre
Copy link
Collaborator

working on this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants