integrations: sklearn #5

pared · 2020-11-19T10:47:36Z

Seems we should be supporting at least few popular frameworks.

Considering their popularity, we should probably start with:

keras - we have initial implementation
~~sklearn~~
xgboost

Worth considering:

- ~~FastAi~~ - integrations: fastai #136
- pytorch lightning

TF and PyTorch - it seems to me that using their pure form is done when users need highly custom models, and probably in that cases they will be able to handle dvclive by hand.
@dmpetrov did I miss some popular framework?

EDIT:
crossing out FastAi as it has its own issue now

The text was updated successfully, but these errors were encountered:

dberenbaum · 2021-01-27T19:59:04Z

I think it's easy enough for users to add integrations as needed (or for the dvc team to add them in response to demand), so it's probably not worthwhile to spend time adding more now.

How do we plan to handle dependencies for multiple frameworks? Each supported framework is pretty heavy, and I think it's unreasonable already to expect an XGBoost user to install Tensorflow to use dvclive. Similar concerns would apply for dvcx.

Thoughts @pared @dmpetrov ?

dberenbaum · 2021-01-27T22:34:40Z

See #25 for more discussion of dependency management.

pared · 2021-01-28T12:27:17Z

@dberenbaum
I think leaving particular implementations for our users is a good idea, those are easy tasks. Writing tests might be harder, but I guess we can help users write them, instead of doing all the legwork, not even knowing whether particular integrations will be desired by userbase.

As to installation, you are right, we do it already in dvc (for different backends) and we will have to go this way here too.

dberenbaum · 2021-01-29T14:30:17Z

On second thought here, is it worthwhile to add sklearn integration? Since this is such a large framework, integration may be more complex, and if you have an opinion about how to implement it, probably better to add the integration now than wait for contributions. Even if it means implementing one particular model or class of models, it may be a worthwhile template. Thoughts?

pared · 2021-01-29T15:26:19Z

Makes sense, I will get to that once I am done with supporting dvclive outputs caching

dberenbaum · 2021-03-10T21:37:11Z

sklearn is largely not focused on deep learning, which has been the primary use case for dvclive. Should other algorithms be supported? If the primary purpose is to track model training progress, it seems only useful where models are trained iteratively. I only know of a couple of classes of algorithms where this is true:

Gradient descent (including neural networks/deep learning)
Ensemble methods (such as gradient boosting)

pared · 2021-03-11T10:38:31Z

@dberenbaum Yes, after digging through documentation, it seems to me that in general, learning algorithms divide to those which utilize fit method and both fit and partial_fit. It does not seem to me that we can provide integration for "only fit" models, and in case of partial_fit models, the workflow will probably look more like torch one, which in my opinion does not require any integration, as its created manually.

The only place I could probably see some integration is methods accepting scoring param which can be Callable but it seems to me it would be really hard to define how such integration could work.

daavoo · 2021-04-01T12:25:06Z

I am considering to work on the integration with pytorch-lightning but I'm not sure about where to contribute the new logger (i.e. this repository or pytorch-lightning itself). See #70 (comment)

daavoo · 2021-06-02T18:21:07Z

I added an integration with mmcv:

open-mmlab/mmcv#1075

pared · 2021-06-03T09:19:27Z

@daavoo Thats a great news! Can we do something to help with that pull request?

daavoo · 2021-06-04T08:33:45Z

@daavoo Thats a great news! Can we do something to help with that pull request?

It has been already approved so I think it will be merged soon, thanks!

daavoo · 2021-06-10T10:52:10Z

I think it might be a good idea to have separated issues for each integration in order to better track the progress and have specific discussions for each one (i.e. this issue got "populated" by specific sklearn discussions).

I.e: #83

pared · 2021-06-10T11:51:48Z

@daavoo That is right, in the beggining we intended it to be an umbrella issue, since singular implementations seemed like easy tasks. As sklerarn example shows, we should probably track each integration separately.

For future reference:
Changing the name of the issue for sklearn. Other integrations issues should be created as separate issues.

daavoo · 2021-10-27T20:43:12Z

Reviving this as I think that skearn should be the entry point for discussing what can dvclive provide in "stepless" scenarios (no deep learning no gradient boosting) beyond #182

Taking a quick look at our example repositories using sklearn (https://github.com/iterative/example-get-started), it looks that it would be a low-hanging fruit to add some utility to go from (y_true, y_pred) to PRC / ROC plots.

Given that example repo, we would be removing quite a few lines for users:

# Given labels, predictions

precision, recall, prc_thresholds = metrics.precision_recall_curve(labels, predictions)
fpr, tpr, roc_thresholds = metrics.roc_curve(labels, predictions)

# ROC has a drop_intermediate arg that reduces the number of points.
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve.
# PRC lacks this arg, so we manually reduce to 1000 points as a rough estimate.
nth_point = math.ceil(len(prc_thresholds) / 1000)
prc_points = list(zip(precision, recall, prc_thresholds))[::nth_point]
with open(prc_file, "w") as fd:
    json.dump(
        {
            "prc": [
                {"precision": p, "recall": r, "threshold": t}
                for p, r, t in prc_points
            ]
        },
        fd,
        indent=4,
    )

with open(roc_file, "w") as fd:
    json.dump(
        {
            "roc": [
                {"fpr": fp, "tpr": tp, "threshold": t}
                for fp, tp, t in zip(fpr, tpr, roc_thresholds)
            ]
        },
        fd,
        indent=4,
    )

To:

from dvclive.sklearn import log_precision_recall_curve, log_roc_curve

log_precision_recall_curve(labels, predictions)
log_roc_curve(labels, predictions)

pared changed the title ~~integrations: integrate with FastAi~~ integrations Nov 19, 2020

pared added this to To do in DVC 12 - 26 Jan 2021 via automation Jan 12, 2021

pared self-assigned this Jan 12, 2021

pared moved this from To do to In progress in DVC 12 - 26 Jan 2021 Jan 26, 2021

efiop added this to To do in DVC 26 Jan - 9 Feb 2021 via automation Jan 26, 2021

efiop assigned dberenbaum Jan 26, 2021

efiop moved this from In progress to Done in DVC 12 - 26 Jan 2021 Jan 26, 2021

efiop moved this from To do to In progress in DVC 26 Jan - 9 Feb 2021 Jan 26, 2021

pared added this to To do in DVC 9 - 23 Feb 2021 via automation Feb 9, 2021

pared moved this from In progress to Done in DVC 26 Jan - 9 Feb 2021 Feb 9, 2021

efiop removed this from To do in DVC 9 - 23 Feb 2021 Feb 23, 2021

efiop added this to To do in DVC 23 Feb - 9 March 2021 via automation Feb 23, 2021

pared moved this from To do to In progress in DVC 23 Feb - 9 March 2021 Mar 5, 2021

efiop added this to To do in DVC 9 - 23 March 2021 via automation Mar 9, 2021

efiop moved this from In progress to Done in DVC 23 Feb - 9 March 2021 Mar 9, 2021

efiop moved this from To do to Done in DVC 9 - 23 March 2021 Mar 16, 2021

daavoo mentioned this issue Mar 31, 2021

Questions about integrations and scope of dvclive #70

Closed

dberenbaum mentioned this issue May 27, 2021

guide: checkpoints for Tensorflow iterative/dvc.org#2509

Closed

shcheklein mentioned this issue Jun 3, 2021

get-started-checkpoints: Create a new DVCCallback that saves the model and metrics before signal-file or make_checkpoint iterative/example-repos-dev#47

Closed

pared changed the title ~~integrations~~ sklearn integration Jun 10, 2021

pared changed the title ~~sklearn integration~~ integrations: sklearn Jun 10, 2021

daavoo added A: frameworks Area: ML Framework integration feature request labels Jul 14, 2021

pared removed their assignment Nov 3, 2021

daavoo mentioned this issue Nov 8, 2021

live: Add log_image and log_plot. #189

Merged

2 tasks

daavoo closed this as completed Sep 26, 2022

JenniferHem mentioned this issue Feb 3, 2023

Scikit-learn integration #452

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrations: sklearn #5

integrations: sklearn #5

pared commented Nov 19, 2020 •

edited by daavoo

Loading

dberenbaum commented Jan 27, 2021

dberenbaum commented Jan 27, 2021

pared commented Jan 28, 2021

dberenbaum commented Jan 29, 2021

pared commented Jan 29, 2021

dberenbaum commented Mar 10, 2021

pared commented Mar 11, 2021

daavoo commented Apr 1, 2021

daavoo commented Jun 2, 2021

pared commented Jun 3, 2021

daavoo commented Jun 4, 2021

daavoo commented Jun 10, 2021 •

edited

Loading

pared commented Jun 10, 2021

daavoo commented Oct 27, 2021 •

edited

Loading

integrations: sklearn #5

integrations: sklearn #5

Comments

pared commented Nov 19, 2020 • edited by daavoo Loading

dberenbaum commented Jan 27, 2021

dberenbaum commented Jan 27, 2021

pared commented Jan 28, 2021

dberenbaum commented Jan 29, 2021

pared commented Jan 29, 2021

dberenbaum commented Mar 10, 2021

pared commented Mar 11, 2021

daavoo commented Apr 1, 2021

daavoo commented Jun 2, 2021

pared commented Jun 3, 2021

daavoo commented Jun 4, 2021

daavoo commented Jun 10, 2021 • edited Loading

pared commented Jun 10, 2021

daavoo commented Oct 27, 2021 • edited Loading

pared commented Nov 19, 2020 •

edited by daavoo

Loading

daavoo commented Jun 10, 2021 •

edited

Loading

daavoo commented Oct 27, 2021 •

edited

Loading