Update to yolo #232

daavoo · 2023-08-03T10:55:13Z

Deployed:

https://github.com/daavoo/example-get-started-experiments
https://studio.iterative.ai/team/Iterative/projects/example-get-started-experiments-bhm5eg8le6

Make the decision because:

It was the framework I was already using on a similar project 😅
It feels like it is more popular these days compared to fast.ai.
Not only in general, but in our particular case more than 1 customer/prospect has explicitly reported using YOLO but (afaik) we did not hear that from fast.ai.
The framework logic for training felt simpler
See train.py vs previous with fast.ai
No need to separate evaluate.py stage because YOLO loads the best model and evaluates it at the end of training.

TODO:

daavoo · 2023-08-03T10:59:13Z

example-get-started-experiments/generate.sh


+git rm TrainSegModel.ipynb


We have a git tag for the notebook state.
Feels more natural to drop it once converted to DVC pipeline

daavoo · 2023-08-03T13:48:43Z

example-get-started-experiments/code/README.md

@@ -107,47 +103,3 @@ This tag also contains a GitHub Actions workflow that reruns the pipeline if any
 changes are introduced to the pipeline-related files. 
 [CML](https://cml.dev/) is used in this workflow to provision a cloud-based GPU 
 machine as well as report model performance results in Pull Requests.
-
-## Deploying the model


Removing this section for now.
Will properly close #230 on a subsequent P.R. adding the sagemaker code

Try to improve look of images painted by YOLO

dberenbaum · 2023-08-03T21:16:54Z

Thoughts on yolo:

I like that it's a more popular, high-level framework.
Am I reading it wrong, or are the predictions pretty bad?
Not sure if the simplicity is a plus here. It is easy to read, but it's maybe too trivial? Do I really need to modularize my training code if it's 2 lines?
How is dvclive even invoked here? Any idea why https://docs.ultralytics.com/reference/yolo/utils/callbacks/dvc/ is giving a 404?
In general, I worry there's so much magic happening in both dvc and yolo here that it's hard for the user to learn anything from it or generalize to their use case.

A couple small Studio UI issues:

Did you manually hide all older commits in Studio? I only see the latest commit and its experiments.
Similarly, it looks like lots of metrics are hidden. If we want to use a more realistic example in yolo, I think it would be better to show more metrics.

dberenbaum · 2023-08-03T21:18:14Z

example-get-started-experiments/code/requirements.txt

Minor, but I vote to drop fire and use argparse. The scripts will still be plenty short and I think they will be more transparent.

dberenbaum · 2023-08-03T21:18:39Z

example-get-started-experiments/code/src/train.py

-            desc="This is a Computer Vision (CV) model that's segmenting out swimming pools from satellite images.",
-            labels=["cv", "segmentation", "satellite-images", params.train.arch],
-        )
+    yolo.train(data="datasets/yolo_dataset.yaml", epochs=epochs, imgsz=imgsz, **kwargs)


Why not track datasets/yolo_dataset.yaml in dvc.yaml?

No reason. Can totally do it.
It is just a description of where paths are because YOLO wants it to be in a file so felt like not really relevant

example-get-started-experiments/generate.sh

dberenbaum · 2023-08-03T21:22:44Z

example-get-started-experiments/generate.sh

+yolo settings datasets_dir="/workspaces/example-repos-dev/example-get-started-experiments/build/example-get-started-experiments/datasets/"
+yolo settings runs_dir="/workspaces/example-repos-dev/example-get-started-experiments/build/example-get-started-experiments/runs/"
+yolo settings weights_dir="/workspaces/example-repos-dev/example-get-started-experiments/build/example-get-started-experiments/weights/"
+jupyter nbconvert --execute 'TrainSegModel.ipynb' --inplace
 git add .
 tick
 git commit -m "Run notebook and apply best experiment"


Is it still applying the best experiment?

No, do you think we should keep that logic? I am not even sure if people realize what was going one without checking this script

No, I just meant we should change the commit message

dberenbaum · 2023-08-03T21:31:45Z

example-get-started-experiments/generate.sh

+  -d src/train.py -d datasets/yolo_dataset/train -d datasets/yolo_dataset/val \
+  "python src/train.py \${train}"


Feels odd to not have the model as an output of the stage here to me.

Also, why do have train and val datasets as separate deps/outs instead of tracking the whole dataset as one?

Feels odd to not have the model as an output of the stage here to me.

I can make the changes for that. I know odd but couldn't find a reason why it matters (maybe I should add additional stage)

Also, why do have train and val datasets as separate deps/outs instead of tracking the whole dataset as one?

So in Studio you can use Data columns to show the number of files separately.

I can make the changes for that. I know odd but couldn't find a reason why it matters (maybe I should add additional stage)

Does iterative/dvclive#631 break anything here since it skips caching if _inside_dvc_exp=True?

Does iterative/dvclive#631 break anything here since it skips caching if _inside_dvc_exp=True?

Because the .dvc file is created at notebook stage, I believe there is no difference because DVC will (dvc) commit the changes at the end

dberenbaum · 2023-08-03T21:42:32Z

Overall, I think it's great and shows off how powerful and simple the tools can be, but feels more like a showcase of functionality than a way to teach about dvc experiments. I would like to at least take some steps on iterative/dvclive#603 if we make this our default example project. Although it's hard to maintain them all as auto-generated repos, I think we need many examples of different frameworks, so I would urge that we should try to keep both projects in some form instead of replacing one with the other.

daavoo · 2023-08-04T14:10:37Z

Am I reading it wrong, or are the predictions pretty bad?

Not sure how/what you are reading. I could talk for a while about metrics for these tasks 😅

The metric currently selected is an instance segmentation one (pmAP 0.5-0.95) and it doesn't hold the same meaning as accuracy or similar (like, I expect 0.9 to consider it good).
It can't be even compared across datasets, but here is SOTA in COCO (still no one passed 0.6):

Not sure if the simplicity is a plus here. It is easy to read, but it's maybe too trivial? Do I really need to modularize my training code if it's 2 lines?

The point is not (only?) to modularize but to use exp run and its associated features, right?

How is dvclive even invoked here? Any idea why https://docs.ultralytics.com/reference/yolo/utils/callbacks/dvc/ is giving a 404?

It is automatically added as a callback if it is installed:

https://github.com/ultralytics/ultralytics/blob/main/ultralytics/utils/callbacks/dvc.py

This is the way YOLO currently handles the integrations.

In general, I worry there's so much magic happening in both dvc and yolo here that it's hard for the user to learn anything from it or generalize to their use case.

So, I am assuming you did not have that worry with the previous fast.ai code because the callback was passed explicitly?
I remember the feedback was that it was too complicated because it used the context manager, etc.

A couple small Studio UI issues:

Did you manually hide all older commits in Studio? I only see the latest commit and its experiments.

Yes, feel free to add/show whatever.

Similarly, it looks like lots of metrics are hidden. If we want to use a more realistic example in yolo, I think it would be better to show more metrics.

Same as above, just the view I added, doesn't have to be what we show. It is what I prefer but I don't know if that is the best way to show as an example.
I do worry about the fallacy of making things look "complex" for the sake of appearance.

daavoo · 2023-08-04T14:12:29Z

but feels more like a showcase of functionality than a way to teach about dvc experiments

Sorry, maybe it was already implicit in the previous comments but could you summarize the point that makes it the case for this example vs the current one?

What I got as main differences:

no evaluate stage
train code to simple

Is there something else?

dberenbaum · 2023-08-04T15:23:17Z

The metric currently selected is an instance segmentation one (pmAP 0.5-0.95) and it doesn't hold the same meaning as accuracy or similar (like, I expect 0.9 to consider it good).

Sorry, I was reading it wrong 😄 . I have used mAP but here I was looking at the image masks. I think it is just harder to see whether the red bounding boxes overlay an actual pool. Anyway, disregard that comment.

Sorry, maybe it was already implicit in the previous comments but could you summarize the point that makes it the case for this example vs the current one?

I don't have that strong an opinion on which one should be the "default." I think we need to keep both (and in that case, I'm fine with keeping the deployment in this one) because they showcase different features and frameworks and we need more examples.

Reasons why I wouldn't push to make this one the "default":

Docs need to be reworked and I don't see that it would be worth the time.
If I'm not using yolo, it doesn't show me anything about how to use dvclive. By default, I would rather have an example that invokes Live() or an explicit DVCLiveCallback().

daavoo · 2023-08-04T15:27:25Z

Sorry, I was reading it wrong 😄 . I have used mAP but here I was looking at the image masks. I think it is just harder to see whether the red bounding boxes overlay an actual pool. Anyway, disregard that comment.

Yes, the default built-in visualization is ... not great.

I don't have that strong an opinion on which one should be the "default." I think we need to keep both (and in that case, I'm fine with keeping the deployment in this one) because they showcase different features and frameworks and we need more examples.

Reasons why I wouldn't push to make this one the "default":

Docs need to be reworked and I don't see that it would be worth the time.

If I'm not using yolo, it doesn't show me anything about how to use dvclive. By default, I would rather have an example that invokes Live() or an explicit DVCLiveCallback().

Makes sense.
For the record about 2, at least HuggingFace appears to be also defaulting to automatically include the callback if the library is installed, so assuming we start moving the callbacks to their repos, more examples might start to look like this

daavoo · 2023-08-04T15:29:10Z

Will make it another repo, converting to draft for now

daavoo added 2 commits August 3, 2023 12:52

Update to YOLO.

f1616c8

update metrics

6ce0d3d

daavoo added the A: example-get-started-experiments DVC Experiment, DVCLive examples label Aug 3, 2023

daavoo self-assigned this Aug 3, 2023

daavoo requested review from dberenbaum and shcheklein August 3, 2023 10:55

daavoo marked this pull request as draft August 3, 2023 10:55

daavoo commented Aug 3, 2023

View reviewed changes

daavoo added 2 commits August 3, 2023 12:57

Ignore dvclive report

0076222

Update gto register

6e838c4

daavoo marked this pull request as ready for review August 3, 2023 13:41

daavoo mentioned this pull request Aug 3, 2023

Update to YOLO. Add Sagemaker deployment #229

Closed

2 tasks

daavoo commented Aug 3, 2023

View reviewed changes

daavoo added 2 commits August 3, 2023 13:53

rename label to pool.

4a079c4

Try to improve look of images painted by YOLO

Increase notebook defaults

b588b72

dberenbaum reviewed Aug 3, 2023

View reviewed changes

example-get-started-experiments/generate.sh Show resolved Hide resolved

dberenbaum reviewed Aug 3, 2023

View reviewed changes

daavoo marked this pull request as draft August 4, 2023 15:28

daavoo closed this Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to yolo #232

Update to yolo #232

daavoo commented Aug 3, 2023 •

edited

Loading

daavoo Aug 3, 2023 •

edited

Loading

daavoo Aug 3, 2023

dberenbaum commented Aug 3, 2023 •

edited

Loading

dberenbaum Aug 3, 2023

daavoo Aug 4, 2023

dberenbaum Aug 3, 2023

daavoo Aug 4, 2023

dberenbaum Aug 3, 2023

daavoo Aug 4, 2023 •

edited

Loading

dberenbaum Aug 4, 2023

dberenbaum Aug 3, 2023

daavoo Aug 4, 2023

dberenbaum Aug 4, 2023

daavoo Aug 4, 2023

dberenbaum commented Aug 3, 2023

daavoo commented Aug 4, 2023 •

edited

Loading

daavoo commented Aug 4, 2023

dberenbaum commented Aug 4, 2023

daavoo commented Aug 4, 2023 •

edited

Loading

daavoo commented Aug 4, 2023

		-d src/train.py -d datasets/yolo_dataset/train -d datasets/yolo_dataset/val \
		"python src/train.py \${train}"

Update to yolo #232

Update to yolo #232

Conversation

daavoo commented Aug 3, 2023 • edited Loading

daavoo Aug 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dberenbaum commented Aug 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daavoo Aug 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dberenbaum commented Aug 3, 2023

daavoo commented Aug 4, 2023 • edited Loading

daavoo commented Aug 4, 2023

dberenbaum commented Aug 4, 2023

daavoo commented Aug 4, 2023 • edited Loading

daavoo commented Aug 4, 2023

daavoo commented Aug 3, 2023 •

edited

Loading

daavoo Aug 3, 2023 •

edited

Loading

dberenbaum commented Aug 3, 2023 •

edited

Loading

daavoo Aug 4, 2023 •

edited

Loading

daavoo commented Aug 4, 2023 •

edited

Loading

daavoo commented Aug 4, 2023 •

edited

Loading