Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boston dataset no longer avaliable in sklearn for classifier-decision-boundaries.ipynb and classifier-boundary-animations.ipynb #225

Closed
mepland opened this issue Dec 30, 2022 · 3 comments
Labels
Milestone

Comments

@mepland
Copy link
Collaborator

mepland commented Dec 30, 2022

ImportError: 
`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
investigated in [1], the authors of this dataset engineered a
non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the
research that led to the creation of this dataset was to study the
impact of air quality but it did not give adequate demonstration of the
validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of
this dataset unless the purpose of the code is to study and educate
about ethical issues in data science and machine learning.
@parrt
Copy link
Owner

parrt commented Dec 31, 2022

I'd say we download and save as csv or find similar set like CA housing.

@parrt parrt added the clean up label Dec 31, 2022
@mepland
Copy link
Collaborator Author

mepland commented Jan 1, 2023

I think it should be switched to a new toy dataset, like CA housing.

@parrt parrt added this to the 2.1 milestone Jan 1, 2023
@parrt
Copy link
Owner

parrt commented Jan 1, 2023

Fixed by f86c5e1 (will appear after we merge back into master).

@parrt parrt closed this as completed in f86c5e1 Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants