[Outreachy applications] Traversal of the space of train/test splits #3

dzeber · 2020-03-04T20:50:42Z

Given a classification model, we want to investigate how much the performance score computed on the test set depends on the choice of train/test split proportion. Eg. how would our performance estimate change if we used a 60/40 split rather than 80/20?

Write a function that takes a scikit-learn estimator and a dataset, and computes an evaluation metric over a grid of train/test split proportions from 0 to 100%. To assess variability, for each split proportion it should resplit and recompute the metric multiple times. It should output a table of splits with multiple metric values per split.

Addi-11 · 2020-03-07T13:11:24Z

Hello, I would like to work on this issue.

Addi-11 · 2020-03-07T20:08:18Z

Is this the requirement ??

Addi-11 · 2020-03-07T20:09:18Z

I am working on the tabulated form and including graphs too. Is anything else required??

Addi-11 · 2020-03-07T23:44:09Z

I have submitted a PR regaring this issue, kindly review.

shashigharti · 2020-03-10T10:16:48Z

I will work on this issue

dzeber · 2020-03-11T00:14:12Z

@Addi-11 I saw your PR, we can discuss further there. Yes, the requirement is a function that returns the tabular form.

asthad16 · 2020-03-17T06:51:25Z

i will work on this issue

alberginia · 2020-03-17T15:13:17Z

Hi! Yesterday after my pull request I realised that my solution for issue #2 is actually also addressing this one. I have no experience with git, so I have no clue on how to relate the two issues or how should I proceed so that the pull request is also connected to this issue here.

* visual for eeg * code restructured * #3 data-split space mapped * tabulated relation btw k and evaluation metrics * gain-lift charts of models * interprtation added

* visual for eeg * code restructured * #3 data-split space mapped * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * calibration plot added * docstrings

these committed changes fixes issue #3 of traversal space of train-test splits using KNN model.in #2 i have used decision tree and further recommended outlier detection algorithm for classification. so in this PR i have used KNN and compared results with previous classfication.this PR uses already defined modules in #2.

#3 traversal of train_test_split

asthad16 · 2020-03-25T09:13:44Z

i have worked on the issue #3. i request u to please review my PR #122

these committed changes fix issue#4 space traversal of k-fold. in this the obtained hyper parameter tuned model from PR for #3 is used in KNN model and k-fold as well as its variant stratified k-fold is used for accuracy evaluation of the classification by KNN model by varying the no. of folds. the mean_score is used as evaluation metric.

* visual for eeg * code restructured * #3 data-split space mapped * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * indexed * removed plot-recall-curve * learning-curve * added models * env refresh * final estimate added * black formats * conclusion added

* visual for eeg * code restructured * #3 data-split space mapped * tabulated relation btw k and evaluation metrics * gain-lift charts of models * auc-roc implemented * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * interprtation added * docstring, interpretation added * indexed * removed plot-recall-curve * shorten PR * conflict resolve Co-authored-by: mlopatka <mlopatka@users.noreply.github.com>

* Update .gitignore * Preliminary Analysis * Helper modules (Bar and Hist graph) * Rough KNN algorithm implemented * Delete libraries.py * KNN classifier refactored and polished Returns only variable of intests for use the metrics calculations. * refactored for performance just the required functions imported * draft mlp classifier implemented to be reviewed * ... * Threshold conversion logic implemented Since knn.predict calculates a probability, we implement a logic for binary classification * Prelimary cleaning and knn model classification implemented! * Adjusted plor error with title placement * ... * Files reformated with 'Black' * Logistic Regression classifier * Refactores modules to improve modularity * Implemented Log Reg * Deleted mpl module to focus on knn and log reg * Refactors gotignore to my personal folder * refactored for readability * Implementation to add counts and relative percentages on bars graph * Refactored name #2, Completed Prelimary Analysis and Interpreted Results * Update Issue #2 - Train and test a classification model (PRESC).ipynb * Files reformated with 'Black' * Display Error corrected * Interpreted choice of hyper-parameters * Refactored and Added Modules used for Issue 3 * Prelimanry Analysis - Traversal of the space of train_test splits * Issue#3 complete * Removed Issues #2 and #3 ipynb * Issue #4 - completed Issue #4 - Traversal of the space of cross-validation folds * Delete defaults_data.csv Removing duplication of the existing data set which can be loaded from the repos root directory. Co-authored-by: mlopatka <mlopatka@users.noreply.github.com>

* fixes #8 * fixes #4, attempt 1 * updated missclassification graph and brokedown functions * first attempt to fix # 3 * implemeneted all change requests * formatted code for all helper files * minor fix * fixed code formatting issues and removed extra file * fixed code formatting, added docstring to func * fixed relative path * fixed all changes requested * fixed relative path in notebook * fixing conflict with some file changes * fixing attempt last for conflicts

* visual for eeg * code restructured * #3 data-split space mapped * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * indexed * removed plot-recall-curve * env refresh * final estimate added

Add files via upload

….ipynb

* #7 Visualization for misclassification * Comparing test sample classifications between models I compared the random forest and k nearest neighbors classifier models and used a barchart to visualize the classification of the test set * added probability to misclasification visualization * new misclassification visualization method used * moved into misclassification_visualization folder * moved to misclassification visualization folder * Traversal of the space of train-test splits * fixed file path and did better visualization * Update #7 visualization for misclassifications.ipynb * Update misclassification_function.py * made changes to #7 * Delete Traversal of the space of train-test splits #3.ipynb * Delete traversal_function.py * Traversal of the space of train-test splits #3

* visual for eeg * code restructured * #3 data-split space mapped * fixes issue3 * studied data splits for all classifiers * added graph in the loop * docstrings added * validation sets added * formatting * evaluated all classifiers * compared models * result added * indexed * removed plot-recall-curve * env refresh * final estimate added * method1 * method1-complete * formats

….ipynb

dzeber mentioned this issue Mar 4, 2020

[Outreachy applications] Traversal of the space of cross-validation folds #4

Closed

Addi-11 added a commit to Addi-11/PRESC that referenced this issue Mar 7, 2020

mozilla#3 data-split space mapped

b65565c

tab1tha mentioned this issue Mar 10, 2020

Train test ratio #43

Merged

Sidrah-Madiha mentioned this issue Mar 11, 2020

Traversal of the space of train test splits, fixes #3 #46

Merged

This was referenced Mar 18, 2020

[Outreachy applications] Covariate Shift #78

Closed

[ Fixes: #78 ] Covariate Data #80

Closed

mlopatka pushed a commit that referenced this issue Mar 20, 2020

Lift-Gain Charts for classification models (#39)

3ba024c

* visual for eeg * code restructured * #3 data-split space mapped * tabulated relation btw k and evaluation metrics * gain-lift charts of models * interprtation added

mlopatka closed this as completed in 1218953 Mar 20, 2020

mlopatka reopened this Mar 20, 2020

namrathagopalabhatla mentioned this issue Mar 21, 2020

WIP: LR and MLP models for vehicles.csv, please review evaluation methods, unsure about required plots #47

Merged

Bolaji61 added a commit to Bolaji61/PRESC that referenced this issue Mar 22, 2020

Submitting my first solution to issue mozilla#3

e06f2c4

Bolaji61 mentioned this issue Mar 22, 2020

Fixed #3, Traversal of the space of train/test splits. #101

Merged

Bolaji61 added a commit to Bolaji61/PRESC that referenced this issue Mar 25, 2020

Fixed issue mozilla#3 completely

18e3341

asthad16 mentioned this issue Mar 25, 2020

#3 traversal of train_test_split asthad16/PRESC#2

Merged

asthad16 referenced this issue in asthad16/PRESC Mar 25, 2020

Merge pull request #2 from asthad16/asthad16-issue3-train_test_split

f9f64f4

#3 traversal of train_test_split

asthad16 mentioned this issue Mar 25, 2020

Asthad16 issue3 train test split #122

Merged

elie-wanko mentioned this issue Mar 27, 2020

Issue#3 - Traversal of the space of traintest splits #109

Closed

mlopatka closed this as completed in #46 Mar 27, 2020

mlopatka reopened this Mar 27, 2020

iamarchisha mentioned this issue Mar 29, 2020

[ Fixes: #78 ] Covariate Data #136

Merged

mlopatka closed this as completed in 73876fb Mar 30, 2020

dzeber mentioned this issue Mar 30, 2020

Comparing different train test ratios #99

Closed

mlopatka reopened this Mar 30, 2020

msmelo mentioned this issue Apr 1, 2020

Traversal of train test splits and cross validation and Visualization for misclassifications #142

Merged

dzeber pushed a commit that referenced this issue Apr 2, 2020

Merge pull request #3 from asthad16/asthad16-#4_kfold

13c8af8

Add files via upload

mhmohona mentioned this issue Apr 3, 2020

Traversal of the space of train/test splits #147

Merged

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Apr 3, 2020

mozilla#3 Traversal of the space of train test splits, eeg.csv

3fad0ac

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Apr 3, 2020

mozilla#3 Traversal of the space of train/test splits, eeg.csv

f08b52b

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Apr 3, 2020

Delete mozilla#3 Traversal of the space of train test splits, eeg.csv…

5bb0118

….ipynb

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Apr 3, 2020

mozilla#3 Traversal of the space of train test splits, eeg.csv

42569cb

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Apr 3, 2020

mozilla#3 Traversal of the space of train/test splits, eeg.csv

3adbe69

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Apr 3, 2020

mozilla#3 Traversal of the space of train/test splits, eeg.csv

4dbe3af

asthad16 mentioned this issue Apr 5, 2020

general train-test split and k-fold evaluation #158

Merged

dzeber changed the title ~~Traversal of the space of train/test splits~~ [Outreachy applications] Traversal of the space of train/test splits Jul 13, 2020

dzeber closed this as completed Jul 14, 2020

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Aug 27, 2020

Update mozilla#3 Traversal of the space of train test splits, eeg.csv…

3a7d7de

….ipynb

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Aug 27, 2020

Update mozilla#3 Traversal of the space of train test splits, eeg.csv…

ae1949c

….ipynb

arizzogithub added a commit to arizzogithub/PRESC that referenced this issue Jul 3, 2022

Update mozilla#3 Traversal of the space of train test splits, eeg.csv…

3e27100

….ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Outreachy applications] Traversal of the space of train/test splits #3

[Outreachy applications] Traversal of the space of train/test splits #3

dzeber commented Mar 4, 2020

Addi-11 commented Mar 7, 2020

Addi-11 commented Mar 7, 2020

Addi-11 commented Mar 7, 2020

Addi-11 commented Mar 7, 2020

shashigharti commented Mar 10, 2020

dzeber commented Mar 11, 2020

asthad16 commented Mar 17, 2020

alberginia commented Mar 17, 2020 •

edited

Loading

asthad16 commented Mar 25, 2020

[Outreachy applications] Traversal of the space of train/test splits #3

[Outreachy applications] Traversal of the space of train/test splits #3

Comments

dzeber commented Mar 4, 2020

Addi-11 commented Mar 7, 2020

Addi-11 commented Mar 7, 2020

Addi-11 commented Mar 7, 2020

Addi-11 commented Mar 7, 2020

shashigharti commented Mar 10, 2020

dzeber commented Mar 11, 2020

asthad16 commented Mar 17, 2020

alberginia commented Mar 17, 2020 • edited Loading

asthad16 commented Mar 25, 2020

alberginia commented Mar 17, 2020 •

edited

Loading