Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework PredMap #42

Merged
merged 4 commits into from
Apr 6, 2022
Merged

Rework PredMap #42

merged 4 commits into from
Apr 6, 2022

Conversation

raplima
Copy link
Collaborator

@raplima raplima commented Apr 6, 2022

This started as rework for the fit method in PredMap, but ended up being bigger. Here is a summary:

  1. Re-work PCA logic. With these modifications, multi-bands raster pass through a PCA dimensionality reduction right after the program reads the files. This makes it much easier to deal with columns ahead. However, this approach does not wait for data split (between training and test). I assume this to be reasonable as PCA is an unsupervised learning technique. Moreover, GIS users could perfom a similar operation using another tool and enter a single-band raster instead.
  2. Remove correlation figure. Idk, no particular reason. However I am not sure how much people used that before. We can add it back if necessary.
  3. Clearly separate train, test, and "full data" operations. This helps preventing previously identified bugs that used sklearn's fit_transform on test data.
  4. Improve classification performance report. Scikit-learn's classification report provides a more detailed output than the one the program used to write.

I think the old fit method had a lot of unnecessary actions. This version runs the test data much faster (~1min) than the previous one (~3min). I also changed some parameters for the RandomizedSearchCV that partially (totally?) addresses #31 with more messages. Significant changes are included though, and we should check if everything is working as expected - drops in performance are expected due to fit_transform bug.

@raplima raplima requested a review from marcosbr April 6, 2022 18:53
@raplima raplima merged commit 4def580 into main Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Let user know when program is still running
2 participants