Add multiclass support #29

MichalBrzozowski91 · 2024-02-26T15:06:35Z

No description provided.

… into add-multiclass-support

mwachnicki · 2024-03-06T12:33:56Z

CHANGELOG.md

@@ -1,5 +1,11 @@
 # Version History

+## 1.1.0 Feb 26, 2024


This date should be consistent with the release date, so I mark it with a comment as a reminder :)

notebooks/requirements.txt

tests/test_bert_truncated.py

mwachnicki · 2024-03-08T15:19:37Z

Summary of issues we have discussed today on the call:

Divide classes into Classifier and Regressor
Current BertClassifier class as a base for both
Interface:
- Classification: predict method returning classes and predict_scores method returning probabilities
- Regression: predict method
_predict_logits as a protected method that is used by predict/predict_scores to get meaningful results
Refactor names of classes
Make sure that the cross entropy loss uses softmax and is the best choice in terms of numerical stability, etc.
- Actually, the best option would be to have the loss function as a parameter with the default value
Similarly, there can be a possibility to choose the optimizer and its parameters

… into add-multiclass-support

MichalBrzozowski91 · 2024-04-13T18:48:33Z

Thanks for the review :)
I added the main changes concerning division classes into Classifier and Regressor
I did not implement the custom loss and optimizer:

For me, it does violate the YAGNI principle - I am not sure this is needed now.
If necessary, I would rather add it in another PR. This issue is separate from adding the multiclass support.

mwachnicki

Thank you, I really like how it looks now! 🙂

I agree that the loss and optimizer feature is not part of the multiclass support. I have just created the related issue - #43.

tests/test_bert_truncated.py

belt_nlp/bert.py

mwachnicki · 2024-04-25T14:54:44Z

belt_nlp/bert_with_pooling.py

@@ -75,7 +78,7 @@ def __init__(
        self._params.update(additional_params)

        self.device = device
-        self.collate_fn = BertClassifierWithPooling.collate_fn_pooled_tokens
+        self.collate_fn = BertBaseWithPooling.collate_fn_pooled_tokens


self.collate_fn_pooled_tokens would also work and use the method overwritten in a derived class (if done so).

But this is is a static method, it seems natural to me to use with the class and not an instance.

You may be right in general, but this is a specific case, where the collate_fn_pooled_tokens method is called in an instance method of the same class.

If you create a new class that derives from the BertBaseWithPooling class and overwrite the collate_fn_pooled_tokens method, you would probably expect the overwritten method to be used in the __init__ method and that's why I think it would be safer to call it as self.collate_fn_pooled_tokens.

However, in this very specific case, this scenario may be unlikely, so I just want to present you my point of view, but I leave the decision to you.

belt_nlp/bert_regressor_truncated.py

notebooks/requirements.txt

notebooks/binary_classification/base.ipynb

notebooks/multiclass/base.ipynb

mwachnicki · 2024-04-26T09:51:13Z

I would also love to see the same basic tests for the Regressor classes.

mwachnicki · 2024-06-14T10:56:46Z

tests/test_bert_regressor_with_pooling.py

+from pathlib import Path
+from shutil import rmtree
+
+import numpy as np


numpy is not used

mwachnicki · 2024-06-14T10:57:37Z

tests/test_bert_regressor_truncated.py

+from belt_nlp.bert_regressor_truncated import BertRegressorTruncated
+MODEL_PARAMS = {"batch_size": 1, "learning_rate": 5e-5, "epochs": 1, "device": "cpu"}


There should be a blank line in between.

mwachnicki · 2024-06-14T11:09:49Z

tests/test_bert_regressor_truncated.py

+
+    x_test = ["nice"] * 99 + ["bad"] * 1
+
+    model.fit(x_train, y_train)


Right now, y_train expects list[bool], so the type hint should be updated.

mwachnicki · 2024-06-14T11:15:11Z

tests/test_bert_regressor_truncated.py

+    assert scores.shape == torch.Size([2, 1])
+
+
+def test_regression_order():


I think that for both classifiers and regressors prediction_order is the appropriate name.

predictions is used in the docstring anyway ;)

MichalBrzozowski91 and others added 7 commits February 25, 2024 14:56

add dot env dir to gitignore

1fed356

add support for multiclass and regression

1d344d8

add edge case for empty text

29cd36b

add notebooks for multiclass and regression

de95713

delete data from repo use huggingface datasets instead

ace24ef

Merge branch 'main' of github.com:mim-solutions/bert_for_longer_texts…

c5cc660

… into add-multiclass-support

update docs

084a152

MichalBrzozowski91 requested a review from mwachnicki February 26, 2024 15:06

upgrade to 1.1.0 in pyproject

122a207

mwachnicki self-assigned this Mar 6, 2024

mwachnicki reviewed Mar 6, 2024

View reviewed changes

mwachnicki assigned MichalBrzozowski91 and unassigned mwachnicki Mar 6, 2024

Michał Brzozowski added 5 commits April 8, 2024 18:36

Merge branch 'main' of github.com:mim-solutions/bert_for_longer_texts…

fbbd044

… into add-multiclass-support

divide classes to Classifier and Regressor

b46a326

do not require specific versions

192f62e

fix tests

7fd3361

update notebooks

ff01f9a

mwachnicki assigned mwachnicki and unassigned MichalBrzozowski91 Apr 25, 2024

mwachnicki mentioned this pull request Apr 26, 2024

Loss function and optimizer as parameters #43

Open

mwachnicki requested changes Apr 26, 2024

View reviewed changes

mwachnicki assigned MichalBrzozowski91 and unassigned mwachnicki Apr 26, 2024

Michał Brzozowski added 3 commits May 18, 2024 08:32

run tests on cpu

b89f50d

update requirements

bdcf08b

review corrections

4172771

MichalBrzozowski91 assigned mwachnicki and unassigned MichalBrzozowski91 May 18, 2024

mwachnicki requested changes Jun 14, 2024

View reviewed changes

mwachnicki assigned MichalBrzozowski91 and unassigned mwachnicki Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multiclass support #29

Add multiclass support #29

MichalBrzozowski91 commented Feb 26, 2024

mwachnicki Mar 6, 2024

mwachnicki commented Mar 8, 2024 •

edited

MichalBrzozowski91 commented Apr 13, 2024

mwachnicki left a comment

mwachnicki Apr 25, 2024

MichalBrzozowski91 May 18, 2024

mwachnicki Jun 14, 2024

mwachnicki commented Apr 26, 2024

mwachnicki Jun 14, 2024

mwachnicki Jun 14, 2024

mwachnicki Jun 14, 2024

mwachnicki Jun 14, 2024

		from belt_nlp.bert_regressor_truncated import BertRegressorTruncated
		MODEL_PARAMS = {"batch_size": 1, "learning_rate": 5e-5, "epochs": 1, "device": "cpu"}


		x_test = ["nice"] * 99 + ["bad"] * 1

		model.fit(x_train, y_train)

		assert scores.shape == torch.Size([2, 1])


		def test_regression_order():

Add multiclass support #29

Are you sure you want to change the base?

Add multiclass support #29

Conversation

MichalBrzozowski91 commented Feb 26, 2024

Choose a reason for hiding this comment

mwachnicki commented Mar 8, 2024 • edited

MichalBrzozowski91 commented Apr 13, 2024

mwachnicki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwachnicki commented Apr 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwachnicki commented Mar 8, 2024 •

edited