-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Outreachy applications] Traversal of the space of train/test splits #3
Comments
Hello, I would like to work on this issue. |
I am working on the tabulated form and including graphs too. Is anything else required?? |
I have submitted a PR regaring this issue, kindly review. |
I will work on this issue |
@Addi-11 I saw your PR, we can discuss further there. Yes, the requirement is a function that returns the tabular form. |
i will work on this issue |
Hi! Yesterday after my pull request I realised that my solution for issue #2 is actually also addressing this one. I have no experience with git, so I have no clue on how to relate the two issues or how should I proceed so that the pull request is also connected to this issue here. |
* visual for eeg * code restructured * #3 data-split space mapped * tabulated relation btw k and evaluation metrics * gain-lift charts of models * interprtation added
* visual for eeg * code restructured * #3 data-split space mapped * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * calibration plot added * docstrings
these committed changes fixes issue #3 of traversal space of train-test splits using KNN model.in #2 i have used decision tree and further recommended outlier detection algorithm for classification. so in this PR i have used KNN and compared results with previous classfication.this PR uses already defined modules in #2.
these committed changes fix issue#4 space traversal of k-fold. in this the obtained hyper parameter tuned model from PR for #3 is used in KNN model and k-fold as well as its variant stratified k-fold is used for accuracy evaluation of the classification by KNN model by varying the no. of folds. the mean_score is used as evaluation metric.
* visual for eeg * code restructured * #3 data-split space mapped * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * indexed * removed plot-recall-curve * learning-curve * added models * env refresh * final estimate added * black formats * conclusion added
* visual for eeg * code restructured * #3 data-split space mapped * tabulated relation btw k and evaluation metrics * gain-lift charts of models * auc-roc implemented * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * interprtation added * docstring, interpretation added * indexed * removed plot-recall-curve * shorten PR * conflict resolve Co-authored-by: mlopatka <mlopatka@users.noreply.github.com>
* Update .gitignore * Preliminary Analysis * Helper modules (Bar and Hist graph) * Rough KNN algorithm implemented * Delete libraries.py * KNN classifier refactored and polished Returns only variable of intests for use the metrics calculations. * refactored for performance just the required functions imported * draft mlp classifier implemented to be reviewed * ... * Threshold conversion logic implemented Since knn.predict calculates a probability, we implement a logic for binary classification * Prelimary cleaning and knn model classification implemented! * Adjusted plor error with title placement * ... * Files reformated with 'Black' * Logistic Regression classifier * Refactores modules to improve modularity * Implemented Log Reg * Deleted mpl module to focus on knn and log reg * Refactors gotignore to my personal folder * refactored for readability * Implementation to add counts and relative percentages on bars graph * Refactored name #2, Completed Prelimary Analysis and Interpreted Results * Update Issue #2 - Train and test a classification model (PRESC).ipynb * Files reformated with 'Black' * Display Error corrected * Interpreted choice of hyper-parameters * Refactored and Added Modules used for Issue 3 * Prelimanry Analysis - Traversal of the space of train_test splits * Issue#3 complete * Removed Issues #2 and #3 ipynb * Issue #4 - completed Issue #4 - Traversal of the space of cross-validation folds * Delete defaults_data.csv Removing duplication of the existing data set which can be loaded from the repos root directory. Co-authored-by: mlopatka <mlopatka@users.noreply.github.com>
* fixes #8 * fixes #4, attempt 1 * updated missclassification graph and brokedown functions * first attempt to fix # 3 * implemeneted all change requests * formatted code for all helper files * minor fix * fixed code formatting issues and removed extra file * fixed code formatting, added docstring to func * fixed relative path * fixed all changes requested * fixed relative path in notebook * fixing conflict with some file changes * fixing attempt last for conflicts
* visual for eeg * code restructured * #3 data-split space mapped * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * indexed * removed plot-recall-curve * env refresh * final estimate added
* #7 Visualization for misclassification * Comparing test sample classifications between models I compared the random forest and k nearest neighbors classifier models and used a barchart to visualize the classification of the test set * added probability to misclasification visualization * new misclassification visualization method used * moved into misclassification_visualization folder * moved to misclassification visualization folder * Traversal of the space of train-test splits * fixed file path and did better visualization * Update #7 visualization for misclassifications.ipynb * Update misclassification_function.py * made changes to #7 * Delete Traversal of the space of train-test splits #3.ipynb * Delete traversal_function.py * Traversal of the space of train-test splits #3
* visual for eeg * code restructured * #3 data-split space mapped * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * indexed * removed plot-recall-curve * env refresh * final estimate added * method1 * method1-complete * formats
Given a classification model, we want to investigate how much the performance score computed on the test set depends on the choice of train/test split proportion. Eg. how would our performance estimate change if we used a 60/40 split rather than 80/20?
Write a function that takes a scikit-learn estimator and a dataset, and computes an evaluation metric over a grid of train/test split proportions from 0 to 100%. To assess variability, for each split proportion it should resplit and recompute the metric multiple times. It should output a table of splits with multiple metric values per split.
The text was updated successfully, but these errors were encountered: