load model from path/str to generate Card #205

p-mishra1 · 2022-10-28T15:41:48Z

#96 Added required changes and followed the pattern @BenjaminBossan advised.
Creating PR from another branch as the last one had too many commits and reverts.
@BenjaminBossan have a look.

BenjaminBossan

@p-mishra1 Thanks a lot for listening to the feedback, I like the refactor you did. This PR is now close to being ready. I have a few minor comments, please take a look. Also, please add an entry to the changes.rst.

BenjaminBossan · 2022-10-31T10:09:31Z

skops/card/_model_card.py

+    """Loads the mddel if provided a file path, if already a model instance return it unmodified.
+
+    Parameters
+    ----------
+    model : pathlib.path, str, or sklearn estimator
+        Path/str or the actual model instance. if a Path or str, loads the model on first call.


Could you please add line length limits to the docstring as well?

BenjaminBossan · 2022-10-31T10:10:44Z

skops/card/_model_card.py

+
+    model_path = Path(model)
+    if not model_path.exists():
+        raise ValueError("Model file does not exist")


Could you please add a test case for this potential error? Also, FileNotFoundError is a more precise error.

skops/card/_model_card.py

BenjaminBossan · 2022-10-31T10:11:32Z

skops/card/_model_card.py

@@ -172,7 +210,7 @@ class Card:

    Parameters
    ----------
-    model: estimator object
+    model: pathlib.path, str, or sklearn estimator object
        Model that will be documented.


Could you please extend the docstring, now that we have more options here?

i changed to this "pathlib.path, str, or sklearn estimator object"
what should we add more ?

How about something along the lines that if a str or Path is passed, the model will be loaded. This could be surprising behavior for some users and should be mentioned.

BenjaminBossan · 2022-10-31T10:19:05Z

skops/card/tests/test_card.py

+    assert loaded_model_str.n_jobs is model0.n_jobs
+    assert loaded_model_path.n_jobs is model0.n_jobs
+    assert loaded_model_path.n_jobs is loaded_model_str.n_jobs


Suggested change

assert loaded_model_str.n_jobs is model0.n_jobs

assert loaded_model_path.n_jobs is model0.n_jobs

assert loaded_model_path.n_jobs is loaded_model_str.n_jobs

assert loaded_model_str.n_jobs == model0.n_jobs

assert loaded_model_path.n_jobs == model0.n_jobs

assert loaded_model_path.n_jobs == loaded_model_str.n_jobs

Although is works in this case, let's use == to compare numbers and then we don't have to worry about the size of the number or the Python interpreter :)

BenjaminBossan · 2022-10-31T10:20:34Z

skops/card/tests/test_card.py

+        assert card_from_model0.model.n_jobs == card_from_str.model.n_jobs
+
+    @pytest.mark.parametrize("suffix", [".pkl", ".pickle", ".skops"])
+    def test_model_card_path(self, suffix):


I think this test could be unified with the previous one, as there is only one line difference, by parametrizing it over whether the save_file is converted to Path or not.

BenjaminBossan · 2022-10-31T10:23:43Z

skops/card/tests/test_card.py

+    def test_model_is_str_pickle(self, destination_path):
+        model0 = LinearRegression(n_jobs=123)
+        f_name0 = destination_path / "lin_reg.pickle"
+        with open(f_name0, "wb") as f:


Can we use save_model_to_file in this test?

skops/card/tests/test_card.py

p-mishra1 · 2022-10-31T16:30:36Z

@p-mishra1 Thanks a lot for listening to the feedback, I like the refactor you did. This PR is now close to being ready. I have a few minor comments, please take a look. Also, please add an entry to the changes.rst.

Thanks to you @BenjaminBossan patiently guiding me and giving me feedback every time..

BenjaminBossan

Nice, we're almost done here.

From my point of view, there are only 3 minor issues left:

Adjusting the docstring as discussed
Remove the debug print (see comment)
Add an entry to changes.rst

BenjaminBossan · 2022-11-01T12:31:02Z

skops/card/tests/test_card.py

+        model0 = LinearRegression(n_jobs=123)
+        save_file = save_model_to_file(model0, suffix)
+        save_file = file_type(save_file)
+        print(type(save_file))


Please remove :)

oh, sorry left it by mistake,
one more thing @BenjaminBossan I'm unable to understand where should i add the changes in changes.rst it will go under v0.3 right.

it will go under v0.3 right.

Exactly, just add an entry to the end of the list. Also, since you're not on the contributor list yet, add yourself as a contributor at the very end of the document.

skops/card/_model_card.py

p-mishra1 · 2022-11-01T15:16:19Z

@BenjaminBossan the test were failing on windows so I changed the function

def save_model_to_file(model_instance, suffix):
    save_file_handle, save_file = tempfile.mkstemp(suffix=suffix, prefix="skops-test")
    if suffix in (".pkl", ".pickle"):
        with open(save_file, "wb") as f:
            pickle.dump(model_instance, f)
    elif suffix == ".skops":
        dump(model_instance, save_file)
    return save_file_handle, save_file

like this so that i can close the file before removing it somehow it is not a problem in linux/mac.
I'm thinking that is the issue here.

p-mishra1 · 2022-11-01T16:00:13Z

@BenjaminBossan do you have an idea why it is failing on windows, i could not figure out exactly why,

BenjaminBossan · 2022-11-01T16:20:55Z

@BenjaminBossan do you have an idea why it is failing on windows, i could not figure out exactly why,

I'm not sure, Windows is always a big riddle. Maybe the string that is being matched contains a character that doesn't work on Windows for some reason?

BenjaminBossan · 2022-11-01T17:10:57Z

If you can't figure it out, I would be okay with skipping the test on Windows (with a comment explaining why). The error really comes from the test itself and is not something a Windows user would encounter in practice when using the model card.

p-mishra1 · 2022-11-01T17:41:48Z

I will look into it and then we can decide

p-mishra1 · 2022-11-02T16:27:59Z

@BenjaminBossan i tried somethings and also worked in the windows environment to mitigate the issue have changed the test a little bit hopefully it will work this time.
give it a look.

adrinjalali

thanks @p-mishra1

adrinjalali · 2022-11-03T12:22:21Z

skops/card/_model_card.py

+    if model_path.suffix in (".pkl", ".pickle"):
+        model = joblib.load(model_path)
+    elif model_path.suffix == ".skops":
+        model = load(model_path)


we shouldn't rely on the extension here. We can check if the file is a zip file (zipfile.is_zipfile(filename)¶ and load with skops if yes, otherwise try to load with joblib, and catch any exceptions which might be raised during each of them.

can't we just keep a key in config file to store which format the model was serialized in?

i agree depending on file extension is not a good way.
let's say the file loaded indeed was a zip then we will be conforming that it is sklearn estimator object. using this
base.BaseEstimator ?
really sorry if it felt too naive. :)

can't we just keep a key in config file to store which format the model was serialized in?

We could but I don't think this function is called late enough for that to work.

let's say the file loaded indeed was a zip then we will be conforming that it is sklearn estimator object. using this
base.BaseEstimator ?

No, if it's a zip file, then it's a skops file, which means we can load using skops, otherwise we load with joblib, and we don't validate the output. We just catch any exception which might have been raised.

adrinjalali · 2022-11-03T12:23:24Z

skops/card/_model_card.py

+    @model.deleter
+    def model(self):
+        del self._model


do we need this?

For consistency, the method should exist (assuming we leave the rest as is), no?

I almost never see a deleter for properties.

adrinjalali · 2022-11-03T12:24:16Z

skops/card/_model_card.py

+    @property
+    def model(self):
+        model = _load_model(self._model)
+        if model is not self._model:
+            self._model = model
+        return model


I don't think we should change model. We should always keep model attribute as is given by the user, and only load when we need it in other methods.

agreed 👍🏻

I'm not sure I fully understand. You want the model argument passed by the user to remain unchanged, and when we access it in other methods, we check if it's a string/path and load it? So basically swap around the attribute names model and _model? Or load the model each time it's accessed (which would be wasteful, since it's used, for instance, in __repr__ so could be accessed quite frequently).

so instead of using _model attribute we should use _load_model(self.model) wherever there is a call to model,
am i thinking in the right direction @merveenoyan @adrinjalali .
also tagging @BenjaminBossan to have a look on what we discussed earlier.

yes, exactly @p-mishra1 .

Yes @BenjaminBossan sounds fine will work on this more, and
just curious do we have slack/forum/discord where we can talk about code design before commiting it, or is it the preferred way.

Thanks for continuing the work.

I think ideally these discussions should come up in the PR or the corresponding issue. I thought there was consensus on the suggested approach here, I guess there was some confusion, sorry for that.

def __repr__(self) -> str: model = getattr(self, "model", None) if model: model = _load_model(model) # this line model_str = self._strip_blank(repr(model)) model_repr = aRepr.repr(f" model={model_str},").strip('"').strip("'") else: model_repr = None

def _extract_estimator_config(self) -> str: """Extracts estimator hyperparameters and renders them into a vertical table. Returns ------- str: Markdown table of hyperparameters. """ model = _load_model(self.model) # This line hyperparameter_dict = model.get_params(deep=True)

def _generate_card(self) -> ModelCard: . . . if self.model_diagram is True: model = _load_model(self.model) # this line model_plot_div = re.sub(r"\n\s+", "", str(estimator_html_repr(model)))

so these 3 functions will be changed like this(commented with "this line") while doing lazy loading in card class.

also the test for card repr and str either have to be changed to take card instance and path,
or the test class have to be duplicated like passing model file as param instead of estimator.
lmk @adrinjalali @BenjaminBossan which method should we prefer.

I wonder if this is the right approach or if we should keep the property and just do the _load_model inside the property. The disadvantage of the suggested approach is that it's very error prone, as soon as new code is added that access the model, it would need to do the same thing. That's why I would prefer using property. @adrinjalali wdyt?

how about self.get_model() which would return the model if self.model is a str? acceptable middle ground?

merveenoyan

thanks a lot for working on this!

merveenoyan · 2022-11-03T12:40:43Z

skops/card/_model_card.py

+    if model_path.suffix in (".pkl", ".pickle"):
+        model = joblib.load(model_path)
+    elif model_path.suffix == ".skops":
+        model = load(model_path)


can't we just keep a key in config file to store which format the model was serialized in?

merveenoyan · 2022-11-03T12:41:22Z

skops/card/_model_card.py

+    @property
+    def model(self):
+        model = _load_model(self._model)
+        if model is not self._model:
+            self._model = model
+        return model


agreed 👍🏻

p-mishra1 · 2022-11-20T15:28:09Z

@BenjaminBossan @adrinjalali have a look at the changes, sorry for the long gap, was busy with some things.

adrinjalali

This LGTM now. but since the code has changed a bit since last review from @BenjaminBossan , I'll let him check and merge.

BenjaminBossan

I only have a few minor comments left, after that we should be good 👍

BenjaminBossan · 2022-11-21T13:04:14Z

skops/card/_model_card.py

+
+    Parameters
+    ----------
+    model : pathlib.path, str, or sklearn estimator


Suggested change

model : pathlib.path, str, or sklearn estimator

model : pathlib.Path, str, or sklearn estimator

BenjaminBossan · 2022-11-21T13:06:18Z

skops/card/_model_card.py

@@ -263,6 +304,18 @@ def __init__(
        self._extra_sections: list[tuple[str, Any]] = []
        self.metadata = metadata or ModelCardData()

+    def get_model(self) -> Any:
+        """Returns sklearn estimator object if ``Path``/``str``
+            is provided.


Suggested change

is provided.

is provided.

BenjaminBossan · 2022-11-21T13:06:42Z

skops/card/_model_card.py

@@ -162,6 +165,41 @@ def metadata_from_config(config_path: Union[str, Path]) -> ModelCardData:
    return card_data


+def _load_model(model: Any) -> Any:
+    """Loads the mddel if provided a file path, if already a model instance return it
+       unmodified.


Suggested change

unmodified.

unmodified.

BenjaminBossan · 2022-11-21T13:07:42Z

skops/card/_model_card.py

+        documented. If a ``Path`` or ``str`` is provided, model will be loaded
+        on first use.


Suggested change

documented. If a ``Path`` or ``str`` is provided, model will be loaded

on first use.

documented. If a ``Path`` or ``str`` is provided, model will be loaded.

After the change, it will be loaded each time, so "on first use" is not correct anymore.

BenjaminBossan · 2022-11-21T13:07:58Z

skops/card/_model_card.py

+        ``Path``/``str`` of the model or the actual model instance that will be
+        documented. If a ``Path`` or ``str`` is provided, model will be loaded
+        on first use.
+


Suggested change

skops/card/tests/test_card.py

BenjaminBossan · 2022-11-21T13:10:26Z

skops/card/tests/test_card.py

+    assert loaded_model_str.n_jobs == model0.n_jobs
+    assert loaded_model_path.n_jobs == model0.n_jobs


Suggested change

assert loaded_model_str.n_jobs == model0.n_jobs

assert loaded_model_path.n_jobs == model0.n_jobs

assert loaded_model_str.n_jobs == 123

assert loaded_model_path.n_jobs == 123

This test is a tiny bit more robust this way (in case that model0 is being mutated by _load_model for some reason).

BenjaminBossan · 2022-11-21T13:10:56Z

skops/card/tests/test_card.py

+
+    assert loaded_model_str.n_jobs == model0.n_jobs
+    assert loaded_model_path.n_jobs == model0.n_jobs
+    assert loaded_model_path.n_jobs == loaded_model_str.n_jobs


This test is not needed, right? If the two asserts above succeed, this cannot possibly fail.

BenjaminBossan · 2022-11-21T13:12:37Z

skops/card/tests/test_card.py

+
+        card_from_path = self.path_to_card(file_name)
+        result_from_path = meth(card_from_path)
+        print(result_from_path)


BenjaminBossan · 2022-11-21T13:15:11Z

skops/card/tests/test_card.py

+
+    @pytest.mark.parametrize("meth", [repr, str])
+    @pytest.mark.parametrize("suffix", [".pkl", ".skops"])
+    def test_model_card_repr(self, card: Card, meth, suffix):


I'm not sure if this test is strictly necessary. Maybe it is sufficient to show that calling repr or str on a card that has a path/str for model works? The fact that it's the same output is implied by the other tests.

I changed it to only repr from card_from _path.

BenjaminBossan

This looks good. Thanks a lot @p-mishra1 for your work and patience.

p-mishra1 · 2022-11-23T15:05:01Z

This looks good. Thanks a lot @p-mishra1 for your work and patience.

Thanks a lot to you guys @BenjaminBossan and @adrinjalali for guiding me and listening to my silly comments and doubts.
Also thanks to @merveenoyan who introduced me to the repo by posting about this issue on Twitter.

one request I have to @BenjaminBossan as most of the guidance and reviews were provided by you, can you provide me any feedback you have for me so that I can improve myself?

BenjaminBossan · 2022-11-24T13:01:15Z

Hey @p-mishra1,

First of all, it's really hard to give feedback after one PR and without knowing you personally, so there is only so much I can say. But this is my personal impression:

I liked the initiative you showed. Your responses were quick and you were receptive to our feedback - even when it was conflicting sometimes ;-). As you know, this all took some time but still, you persisted, which is very good. You were willing to do some of the hard work and came up with solutions to new challenges that occurred on the way.

If there was one thing that would make the life of us maintainers easier, it is if right from the start, you would have delivered a more "complete" package, i.e. updating the docs everywhere, providing unit tests, making sure that code style and formatting are consistent with the rest of the code base. When we see that as maintainers right from the start, we're always excited.

Hopefully, this helps you a little bit. If you have feedback for us, also feel free to share.

p-mishra1 · 2022-11-25T12:14:33Z

Hey @p-mishra1,

First of all, it's really hard to give feedback after one PR and without knowing you personally, so there is only so much I can say. But this is my personal impression:

I liked the initiative you showed. Your responses were quick and you were receptive to our feedback - even when it was conflicting sometimes ;-). As you know, this all took some time but still, you persisted, which is very good. You were willing to do some of the hard work and came up with solutions to new challenges that occurred on the way.

If there was one thing that would make the life of us maintainers easier, it is if right from the start, you would have delivered a more "complete" package, i.e. updating the docs everywhere, providing unit tests, making sure that code style and formatting are consistent with the rest of the code base. When we see that as maintainers right from the start, we're always excited.

Hopefully, this helps you a little bit. If you have feedback for us, also feel free to share.

Thank you @BenjaminBossan i appreciate you taking some time to give me feedback. I know it is quite tough for you to give feedback on the coding part .
I don't have any feedbacks from my side i really liked the way you guys handled each PR despite the noob mistakes that i did.
I would love contributing more and i will be on look out for the opportunities but if you guys feel there is some issues, feature i can contribute to i will be happy to take that up.

p-mishra1 added 2 commits October 28, 2022 20:57

load model from path func and test

842a1e1

load model from path func and test

e644e5c

p-mishra1 mentioned this pull request Oct 28, 2022

load model in Card class #174

Closed

BenjaminBossan requested changes Oct 31, 2022

View reviewed changes

model card add. test and error msg update

0f447e3

p-mishra1 requested a review from BenjaminBossan October 31, 2022 16:09

BenjaminBossan reviewed Nov 1, 2022

View reviewed changes

p-mishra1 added 2 commits November 1, 2022 19:45

change log update, docstring update

88c81c1

Merge branch 'main' into praj-load-model

e87f00e

p-mishra1 requested a review from BenjaminBossan November 1, 2022 14:25

BenjaminBossan reviewed Nov 1, 2022

View reviewed changes

skops/card/_model_card.py Outdated Show resolved Hide resolved

docstring, test changes to avoid fail on win

626efcc

p-mishra1 requested a review from BenjaminBossan November 1, 2022 15:22

p-mishra1 closed this Nov 1, 2022

p-mishra1 reopened this Nov 1, 2022

p-mishra1 and others added 2 commits November 2, 2022 20:53

Merge branch 'main' into praj-load-model

169c317

solve test issue for windows

4181bcc

BenjaminBossan mentioned this pull request Nov 3, 2022

Alternative Card implementation #203

Merged

adrinjalali reviewed Nov 3, 2022

View reviewed changes

merveenoyan reviewed Nov 3, 2022

View reviewed changes

p-mishra1 added 2 commits November 20, 2022 20:35

load model func. and test change

8e8ff22

merge main into praj-load-model

ca506e8

adrinjalali approved these changes Nov 21, 2022

View reviewed changes

BenjaminBossan requested changes Nov 21, 2022

View reviewed changes

test update and required refactoring

eaf1989

p-mishra1 requested a review from BenjaminBossan November 22, 2022 14:53

test update and required refactoring

5cb59a3

BenjaminBossan approved these changes Nov 23, 2022

View reviewed changes

BenjaminBossan merged commit 757b940 into skops-dev:main Nov 23, 2022

BenjaminBossan mentioned this pull request Nov 23, 2022

Take model as argument to generate the card #96

Closed

BenjaminBossan mentioned this pull request Dec 6, 2022

Cache model loading in model card #243

Open

	model : pathlib.path, str, or sklearn estimator
	model : pathlib.Path, str, or sklearn estimator

		documented. If a ``Path`` or ``str`` is provided, model will be loaded
		on first use.

		assert loaded_model_str.n_jobs == model0.n_jobs
		assert loaded_model_path.n_jobs == model0.n_jobs

load model from path/str to generate Card #205

load model from path/str to generate Card #205

Conversation

p-mishra1 commented Oct 28, 2022

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-mishra1 commented Oct 31, 2022

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-mishra1 commented Nov 1, 2022 • edited

p-mishra1 commented Nov 1, 2022

BenjaminBossan commented Nov 1, 2022

BenjaminBossan commented Nov 1, 2022

p-mishra1 commented Nov 1, 2022 • edited

p-mishra1 commented Nov 2, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-mishra1 Nov 3, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-mishra1 Nov 6, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merveenoyan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-mishra1 commented Nov 20, 2022

adrinjalali left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

p-mishra1 commented Nov 23, 2022

BenjaminBossan commented Nov 24, 2022

p-mishra1 commented Nov 25, 2022

p-mishra1 commented Nov 1, 2022 •

edited

p-mishra1 commented Nov 1, 2022 •

edited

p-mishra1 Nov 3, 2022 •

edited

p-mishra1 Nov 6, 2022 •

edited