MPCA pipeline #91

shuo-zhou · 2021-04-10T05:56:25Z

PR for MPCA pipeline Card.

Description

Pipeline MPCA->Feature selection by Fisher score->SVM/Logistic Regression .

Status

Ready

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
In-line docstrings updated and documentation docs updated.

codecov-io · 2021-04-10T06:33:40Z

Codecov Report

Merging #91 (9c6385d) into master (8d30ff8) will increase coverage by 1.93%.
The diff coverage is 91.04%.

@@            Coverage Diff            @@
##           master     #91      +/-   ##
=========================================
+ Coverage    4.47%   6.41%   +1.93%     
=========================================
  Files          36      37       +1     
  Lines        2927    2994      +67     
=========================================
+ Hits          131     192      +61     
- Misses       2796    2802       +6

Impacted Files	Coverage Δ
kale/pipeline/mpca_trainer.py	`91.04% <91.04%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d30ff8...9c6385d. Read the comment docs.

haipinglu · 2021-04-10T11:16:50Z

kale/pipeline/mpca_trainer.py

+from sklearn.feature_selection import f_classif
+from sklearn.linear_model import LogisticRegression
+from sklearn.model_selection import GridSearchCV
+from sklearn.svm import SVC


Have you considered LinearSVC (liblinear)? Or, can we make it an option (with one as the default)? SVC is using libsvm but LinearSVC is using liblinear.

I noticed this when checking https://scikit-learn.org/stable/modules/feature_selection.html and found that the examples using LinearSVC rather than SVC.

Did you check the matlab SVM I sent to you to see which version was used?

SVC object has predict_proba function, which can give the probbabilitiy of each class, while LinearSVC does not have it.

If LinearSVC can give a better accuracy and the user does not care the probability, then it is still a decent choice. How often did we report the probability of each class in our papers so far?

I understand probability is something good, but I do not consider it as essential or must-have.

Probability is a feature requested by Cameron. How about making it optional (LinearSVC and SVC(kernel="linear"))?

A feature requested by a user does not mean to enforce it for all users. Otherwise, sklearn won't have LinearSVC. Good to have options.

haipinglu · 2021-04-10T11:40:38Z

kale/pipeline/mpca_trainer.py

+
+from ..embed.mpca import MPCA
+
+classifiers = {"svc": [SVC, {"kernel": ["linear"], "C": np.logspace(-3, 2, 6)}],


Have you checked the Matlab code options?
Does this cover the options used in our CMR Matlab code? It seems that we used the default there, which seems to be 1/n (an adaptive value) according to https://uk.mathworks.com/help/stats/fitclinear.html#d123e312449
At least this value of C should be covered.

There is a close value in np.logspace(-3, 2, 6), which is a list [0.001, 0.01, 0.1, 0, 1, 10]. The optimal value of C will be determined by grid search if classifier_params="auto". I will add 1/n to this list if necessary.

@sz144 1/n is an interesting and smart choice because when n --> infinity, 1/n --> 0 and no regularisation is needed.

haipinglu · 2021-04-10T11:41:47Z

kale/pipeline/mpca_trainer.py

+from ..embed.mpca import MPCA
+
+classifiers = {"svc": [SVC, {"kernel": ["linear"], "C": np.logspace(-3, 2, 6)}],
+               "lr": [LogisticRegression, {"C": np.logspace(-3, 2, 6)}]}


Again, consider to define repeated values np.logspace(-3, 2, 6) as a (global) variable

haipinglu · 2021-04-10T11:46:18Z

kale/pipeline/mpca_trainer.py

+classifiers = {"svc": [SVC, {"kernel": ["linear"], "C": np.logspace(-3, 2, 6)}],
+               "lr": [LogisticRegression, {"C": np.logspace(-3, 2, 6)}]}
+
+default_search_params = {'cv': 5}


We did not do cv in matlab, see https://uk.mathworks.com/help/stats/fitclinear.html#d123e314573
Is it an option here or a must? It may not be necessary, considering how Matlab deals with it (and we get better results). CV on small sample may overfit as well.

CV is used to determine the value of C only if classifier_param is set to be "auto".

OK. Some light documentation/comments in the tests will be helpful for review / reading / future changes.

haipinglu · 2021-04-10T11:47:51Z

kale/pipeline/mpca_trainer.py

+            classifier (str, optional): Classifier for training. Options: support vector machine (svc) or
+                logistic regression (lr). Defaults to 'svc'.
+            classifier_params (dict, optional): Parameters of classifier. Defaults to 'auto'.
+            mpca_params (dict, optional): Parameters of Multi-linear PCA. Defaults to None.


Why Multi-linear here? Be consistent.

Will change

haipinglu · 2021-04-10T11:49:51Z

kale/pipeline/mpca_trainer.py

+            self.auto_classifier_param = True
+            clf_param_gird = classifiers[classifier][1]
+            self.grid_search = GridSearchCV(classifiers[classifier][0](),
+                                            param_grid=clf_param_gird,


haipinglu · 2021-04-10T11:52:24Z

kale/pipeline/mpca_trainer.py

+        else:
+            f_score, p_val = f_classif(x_proj, y)
+            self.feature_order = (-1 * f_score).argsort()
+        x_proj = x_proj[:, self.feature_order][:, :self.n_features]


Will a new name be better for selected features from x_proj?

Is x_train better?

Who can tell the difference? Isn't x also x_train?

haipinglu · 2021-04-10T11:55:26Z

kale/pipeline/mpca_trainer.py

+        check_is_fitted(self.clf)
+
+        x_proj = self.mpca.transform(x)
+        x_new = x_proj[:, self.feature_order][:, :self.n_features]


Use consistent naming convention, here x_new but you reused x_proj above.

haipinglu · 2021-04-10T11:57:50Z

tests/pipeline/test_mpca_trainer.py

+    trainer.fit(x, y)
+    y_pred = trainer.predict(x)
+    testing.assert_equal(np.unique(y), np.unique(y_pred))
+    assert accuracy_score(y, y_pred) >= 0.8


Expected training error (to be >0.8)?

Training accuracy > 0.8, i.e. training error < 0.2.

shuo-zhou

Thanks for the comments.

shuo-zhou · 2021-04-10T13:08:14Z

kale/pipeline/mpca_trainer.py

+
+from ..embed.mpca import MPCA
+
+classifiers = {"svc": [SVC, {"kernel": ["linear"], "C": np.logspace(-3, 2, 6)}],


There is a close value in np.logspace(-3, 2, 6), which is a list [0.001, 0.01, 0.1, 0, 1, 10]. The optimal value of C will be determined by grid search if classifier_params="auto". I will add 1/n to this list if necessary.

shuo-zhou · 2021-04-10T13:11:12Z

kale/pipeline/mpca_trainer.py

+            classifier (str, optional): Classifier for training. Options: support vector machine (svc) or
+                logistic regression (lr). Defaults to 'svc'.
+            classifier_params (dict, optional): Parameters of classifier. Defaults to 'auto'.
+            mpca_params (dict, optional): Parameters of Multi-linear PCA. Defaults to None.


Will change

shuo-zhou · 2021-04-10T13:12:01Z

tests/pipeline/test_mpca_trainer.py

+    trainer.fit(x, y)
+    y_pred = trainer.predict(x)
+    testing.assert_equal(np.unique(y), np.unique(y_pred))
+    assert accuracy_score(y, y_pred) >= 0.8


Training accuracy > 0.8, i.e. training error < 0.2.

haipinglu · 2021-04-10T20:56:17Z

tests/pipeline/test_mpca_trainer.py

+    assert accuracy_score(y, y_pred) >= 0.8
+
+    if classifier == "linear_svc":
+        with pytest.raises(Exception):


Very glad to learn this from you. Thanks.
This may be another important documentation for developers to refer to (besides fixture): https://docs.pytest.org/en/stable/assert.html#assertions-about-expected-exceptions

haipinglu · 2021-04-10T21:08:21Z

@sz144 In-line docstrings seem fine but please have documentation in docs updated before merging. Otherwise, this new API won't appear in docs. Thanks.
This is actually the last checkbox in the description that you leave unchecked/ignored. You should complete and tick the applicable checkboxes when done. The checkbox is a checklist for you to review and check.

shuo-zhou · 2021-04-11T02:35:53Z

@sz144 In-line docstrings seem fine but please have documentation in docs updated before merging. Otherwise, this new API won't appear in docs. Thanks.
This is actually the last checkbox in the description that you leave unchecked/ignored. You should complete and tick the applicable checkboxes when done. The checkbox is a checklist for you to review and check.

Thanking you for pointing this out. The last checkbox is checked. Is there anything I need to do before merging?

haipinglu · 2021-04-11T06:33:55Z

@sz144 In-line docstrings seem fine but please have documentation in docs updated before merging. Otherwise, this new API won't appear in docs. Thanks.
This is actually the last checkbox in the description that you leave unchecked/ignored. You should complete and tick the applicable checkboxes when done. The checkbox is a checklist for you to review and check.

Thanking you for pointing this out. The last checkbox is checked. Is there anything I need to do before merging?

@sz144 Come on. Have you done the docs update? I have make the text have documentation in docs updated bold. How can you tick something that you have not done yet? Could you check the docs yourself before asking me to check?

haipinglu · 2021-04-11T07:11:45Z

kale/pipeline/mpca_trainer.py

+
+        Args:
+            classifier (str, optional): Classifier for training. Options: support vector machine (svc) or
+                logistic regression (lr). Defaults to 'svc'.


Outdated docstrings. Now there are three options and you need to explain a bit so users know the difference from the docs rather than have to read the code.

haipinglu · 2021-04-11T07:21:20Z

kale/pipeline/mpca_trainer.py

+#         Haiping Lu, h.lu@sheffield.ac.uk or hplu@ieee.org
+# =============================================================================
+
+"""Implementation of MPCA->Feature Selection->Linear SVM/LogisticRegression Pipeline


Add references of papers using this pipeline:
For Cardiac MRI: https://doi.org/10.1093/ehjci/jeaa001
For brain fMRI: https://doi.org/10.1007/978-3-319-24553-9_75
For gait videos (though KNN rather than SVM/LR was used): https://doi.org/10.1109/TNN.2007.901277

… MPCA-Pipeline

haipinglu · 2021-04-11T09:24:32Z

kale/pipeline/mpca_trainer.py

+    classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention
+    (pp. 613-620). Springer, Cham.
+    [3] Lu, H., Plataniotis, K. N., & Venetsanopoulos, A. N. (2008). MPCA: Multilinear principal component analysis of
+    tensor objects. IEEE transactions on Neural Networks, 19(1), 18-39.


"transactions" needs to be capitalized.

shuo-zhou added 4 commits April 9, 2021 23:01

Create mpca_trainer.py

4c80a52

update mpca_trainer isort

ca5014b

init mpca_trainer test

1ef0a6e

Update test_mpca_trainer.py

afe55e4

shuo-zhou added enhancement Improvement of existing code tests Tests and coverage labels Apr 10, 2021

shuo-zhou requested a review from haipinglu April 10, 2021 05:56

shuo-zhou added this to In progress in v0.1.0 via automation Apr 10, 2021

shuo-zhou added 3 commits April 10, 2021 14:00

Update test_mpca_trainer.py

d5f0a21

Update test_mpca_trainer.py

6d7172e

Update test_mpca_trainer.py

9c6385d

haipinglu changed the title ~~Mpca pipeline~~ MPCA pipeline Apr 10, 2021

haipinglu reviewed Apr 10, 2021

View reviewed changes

shuo-zhou commented Apr 10, 2021

View reviewed changes

shuo-zhou added 2 commits April 10, 2021 22:52

add LinearSVC

47233dc

Update mpca_trainer.py

1ac2618

haipinglu reviewed Apr 10, 2021

View reviewed changes

haipinglu approved these changes Apr 10, 2021

View reviewed changes

Update kale.pipeline.rst

546d0e2

haipinglu reviewed Apr 11, 2021

View reviewed changes

haipinglu and others added 3 commits April 11, 2021 08:52

clarify docs update in the PR template

050762e

revise mpca_trainer.py docstrings

37ea4a9

Merge branch 'MPCA-Pipeline' of https://github.com/pykale/pykale into…

e86b559

… MPCA-Pipeline

haipinglu reviewed Apr 11, 2021

View reviewed changes

revise mpca_trainer.py docstrings

2ade516

haipinglu enabled auto-merge April 11, 2021 09:49

haipinglu merged commit 16cb102 into master Apr 11, 2021

v0.1.0 automation moved this from In progress to Done Apr 11, 2021

haipinglu deleted the MPCA-Pipeline branch April 11, 2021 09:56

This was referenced Apr 18, 2021

Release v0.0.1b1 test #100

Closed

Release 0.1.0b3 #102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPCA pipeline #91

MPCA pipeline #91

shuo-zhou commented Apr 10, 2021 •

edited

codecov-io commented Apr 10, 2021 •

edited

haipinglu Apr 10, 2021

shuo-zhou Apr 10, 2021

haipinglu Apr 10, 2021

shuo-zhou Apr 10, 2021

haipinglu Apr 10, 2021

haipinglu Apr 10, 2021

shuo-zhou Apr 10, 2021

haipinglu Apr 10, 2021

haipinglu Apr 10, 2021

haipinglu Apr 10, 2021

shuo-zhou Apr 10, 2021

haipinglu Apr 10, 2021

haipinglu Apr 10, 2021

shuo-zhou Apr 10, 2021

haipinglu Apr 10, 2021

haipinglu Apr 10, 2021

shuo-zhou Apr 10, 2021

haipinglu Apr 10, 2021

haipinglu Apr 10, 2021

haipinglu Apr 10, 2021

shuo-zhou Apr 10, 2021

shuo-zhou left a comment

shuo-zhou Apr 10, 2021

shuo-zhou Apr 10, 2021

shuo-zhou Apr 10, 2021

haipinglu Apr 10, 2021

haipinglu commented Apr 10, 2021 •

edited

shuo-zhou commented Apr 11, 2021

haipinglu commented Apr 11, 2021 •

edited

haipinglu Apr 11, 2021

haipinglu Apr 11, 2021

haipinglu Apr 11, 2021


		from ..embed.mpca import MPCA

		classifiers = {"svc": [SVC, {"kernel": ["linear"], "C": np.logspace(-3, 2, 6)}],

MPCA pipeline #91

MPCA pipeline #91

Conversation

shuo-zhou commented Apr 10, 2021 • edited

Description

Status

Types of changes

codecov-io commented Apr 10, 2021 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shuo-zhou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haipinglu commented Apr 10, 2021 • edited

shuo-zhou commented Apr 11, 2021

haipinglu commented Apr 11, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shuo-zhou commented Apr 10, 2021 •

edited

codecov-io commented Apr 10, 2021 •

edited

haipinglu commented Apr 10, 2021 •

edited

haipinglu commented Apr 11, 2021 •

edited