Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best commit in replicate ls/ps using primary metric #44

Merged
merged 1 commit into from Aug 3, 2020

Conversation

andreasjansson
Copy link
Member

Looks something like

experiment  started             status   host      user     param-1  latest   step  label-1  best     step  label-1
1eeeeee     10 seconds ago      running  10.1.1.1  andreas  100      3cccccc  20    0.02     2cccccc  20    0.01
2eeeeee     about a second ago  stopped  10.1.1.2  andreas  200      4cccccc  5              N/A

In order to support --storage-url and config metrics we need to fetch config from storage. This means that when you update metrics in your local replicate.yaml it won't affect replicate ls. I added a TODO to re-visit this.

@andreasjansson
Copy link
Member Author

@bfirsh I assume from your comment in Slack that this is okay to merge? Happy to help with merge conflict resolution when your branch is ready to land.

@andreasjansson andreasjansson merged commit e3ecea0 into master Aug 3, 2020
@andreasjansson andreasjansson deleted the andreas/best-commit-in-ls branch August 3, 2020 09:15
@bfirsh
Copy link
Member

bfirsh commented Aug 3, 2020

Haha, no, I commented in Slack specifically so it wasn't confused as a code review. I'll take a look now.

Copy link
Member

@bfirsh bfirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good generally! Comments inline.

missing_keys = metric_keys - label_keys
if missing_keys:
sys.stderr.write(
"Warning: Missing metric{} in commit: {}".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice


// pull out the saved config from the commits list
// TODO(andreas): this is a temporary hack, refactor once
// we've migrated to the new data format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind elaborating on this? How do you imagine this working with the new data model?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know :) it depends on the data model. But I imagine that there will be some sort of Experiment object you can retrieve by experiment ID. That Experiment has a Config inside it, since config is saved by replicate.init().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List now seems to read the config from the latest commit. What is the reasoning behind this?

Surely when you update the primary metric in replicate.yaml, you want replicate list to change immediately? In the current implementation, it seems it won't change until you run another experiment.

@@ -66,6 +68,7 @@ def get_metadata(self) -> Dict[str, Any]:
"params": self.params,
"user": self.get_user(),
"host": self.get_host(),
"config": self.config,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended as temporary, as hinted at in your TODO in the CLI, or do we want to keep this?

I think I like the idea of the point-in-time config being in the experiment (following our principle of just gathering as much information as possible) but it raises an interesting semantic issue: the storage URL is now in the experiment metadata. You can imagine moving storages around, and it's a bit weird that the old storage URL is then baked inside the experiment. A actual problem? Perhaps not. A bit messy? Yes.

It's a bit like a Git commit embedding the URL of the default remote. It just feels a bit conceptually weird.

Does the storage URL need to be in here? Maybe we can store everything except the storage URL. Or, maybe this is an early hint that the storage URL should be stored/configured elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting question I've been going back and forth on. You can imagine scenarios in either direction.

Let's say you're working on a search model, and you use precision as the primary metric. Then you realize that the business cares more about recall, so you start tracking recall, and make that the primary metric. You didn't track recall before, but you still want to save your old experiments. In this scenario it's good that the config is saved along the experiment metadata, so that "best" means best precision for the old experiments and best recall for the new experiments.

The other scenario is the one we've discussed, where you start training a model without defining metrics in replicate.yaml, and tack on metrics later. In that scenario you want to use the local replicate.yaml.

I can't think of a way we can support both these, so we probably have to choose one. I'm leaning towards the first option, to save config along the experiment, so that it's frozen at the point of replicate run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it makes sense to store the config alongside the experiment, as per our principle of gathering as much information as possible. You could argue this duplicated though: the replicate.yaml file is in the commit data. It being in the metadata could be thought of as an optimization.

I think the interesting problem is the storage URL being in there. There is a semantic mismatch of some kind going on.

@andreasjansson
Copy link
Member Author

Haha, no, I commented in Slack specifically so it wasn't confused as a code review. I'll take a look now.

Sorry about that, I misunderstood. I'll hold off on merging until I get an Approved.

@bfirsh
Copy link
Member

bfirsh commented Aug 3, 2020

No biggy, don't worry! At this stage it doesn't matter if we review before or after it gets into master, just want to make sure I get my eyes on it somehow. :)

@bfirsh
Copy link
Member

bfirsh commented Aug 3, 2020

More discussion, for future readers of this issue. https://replicatehq.slack.com/archives/CPRGK33J5/p1596491607005100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants