-
Notifications
You must be signed in to change notification settings - Fork 0
Review of bulk run creation functionality #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add detail to the README and raise a PR in the docs repo
Also can we add Parquet support?
@@ -87,3 +87,6 @@ lint = [ | |||
|
|||
[tool.mypy] | |||
ignore_missing_imports = true | |||
|
|||
[tool.uv.sources] | |||
simvue = { git = "https://github.com/simvue-io/python-api", branch = "dev" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to remember to change this back once new PyPI version of Python API released
from .config import get_url_and_headers | ||
from .push import PushDelimited |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are PushJSON and PushDelimited imported differently? Should be consistent
) | ||
|
||
|
||
def push_json_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely shouldn't have code repetition here with push_delim_metadata
being almost identical except for the class used. Can we use a factory, or just a input_type
parameter and if/elif statements to select the appropriate class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No as this would add a lot of overhead and reduce readability, use of a factory here would be superfluous
return _push_class.load_from_metadata(input_file, folder=folder) | ||
|
||
|
||
def push_json_runs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again very repetitiive from the above function, can some of this be pulled out into a common setup function which all of these functions use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes repetition is the more readable solution, this function is not that long and clearly shows a different reader is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it makes it less maintainable as in the future if we have a lot of file formats and want to add a new param to the CLI interface, we will need to update all of the functions
We already have the if/elif logic in the CLI, based on the suffix of the input file. I dont see why we couldnt have this be one function, where the relevant class is passed in? Would just simplify this here
@@ -2036,5 +2038,111 @@ def get_artifact_json(ctx, artifact_id: str) -> None: | |||
click.echo(error_msg, fg="red", bold=True) | |||
|
|||
|
|||
@simvue.group("push") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure simvue push
is an informative name for this feature
I also think it should be under the run
group which you already have, since it is creating a set of runs
simvue run create-batch
or something maybe? idk
_folder.commit() | ||
|
||
if not isinstance(_data, list): | ||
raise ValueError("Expected JSON content to be a list.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we give the option between a list of dicts, or a dict of dicts? Ie i can see some people may have:
{
"run_1": {
"a": 10,
"b": 20
},
"run_2": {
"a": 15,
"b": 25
},
...
}
We could support the key as the run name, and the values as the metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the JSON format which I would naively expect data to be in instead of a list of dicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we ourselves can decide this format, and that it is easier if there is just a list of "packets" to process, I would argue enforcing the one form is best here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tentatively agree, but the point of these loading functions is that it should be as flexible as possible to allow for someone with a file of results not to have to bother fitting it into our format before upload (which I know will never be completely possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, noone is going to happen to have an output that aligns with this anyway, they will have to restructure it regardless (hence the connectors helping)
class PushJSON(PushAPI): | ||
@pydantic.validate_call | ||
def load_from_metadata( | ||
self, input_file: pydantic.FilePath, *, folder: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All same comments as with CSV parser above
) | ||
): | ||
if _metrics := self._run_metrics.get(i): | ||
sv_obj.Metrics.new(run=_id, metrics=_metrics).commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will currently not support multi-D metrics right? Do we want to support that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at this stage
assert result.exit_code == 0, result.stdout | ||
|
||
|
||
def test_push_runs() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably set a TTL on these runs, otherwise repeatedly running the tests will very quickly fill up your simvue account!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ set a folder and delete the folder and runs once complete
], | ||
catch_exceptions=False | ||
) | ||
assert result.exit_code == 0, result.stdout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests should use the client to check that the appropriate number of runs have been created, and the correct info is present in at least one of them
No description provided.