Skip to content

Conversation

wk9874
Copy link

@wk9874 wk9874 commented Sep 17, 2025

No description provided.

Copy link
Author

@wk9874 wk9874 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add detail to the README and raise a PR in the docs repo

Also can we add Parquet support?

@@ -87,3 +87,6 @@ lint = [

[tool.mypy]
ignore_missing_imports = true

[tool.uv.sources]
simvue = { git = "https://github.com/simvue-io/python-api", branch = "dev" }
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remember to change this back once new PyPI version of Python API released

from .config import get_url_and_headers
from .push import PushDelimited
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are PushJSON and PushDelimited imported differently? Should be consistent

)


def push_json_metadata(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely shouldn't have code repetition here with push_delim_metadata being almost identical except for the class used. Can we use a factory, or just a input_type parameter and if/elif statements to select the appropriate class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No as this would add a lot of overhead and reduce readability, use of a factory here would be superfluous

return _push_class.load_from_metadata(input_file, folder=folder)


def push_json_runs(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again very repetitiive from the above function, can some of this be pulled out into a common setup function which all of these functions use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes repetition is the more readable solution, this function is not that long and clearly shows a different reader is used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it makes it less maintainable as in the future if we have a lot of file formats and want to add a new param to the CLI interface, we will need to update all of the functions

We already have the if/elif logic in the CLI, based on the suffix of the input file. I dont see why we couldnt have this be one function, where the relevant class is passed in? Would just simplify this here

@@ -2036,5 +2038,111 @@ def get_artifact_json(ctx, artifact_id: str) -> None:
click.echo(error_msg, fg="red", bold=True)


@simvue.group("push")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure simvue push is an informative name for this feature

I also think it should be under the run group which you already have, since it is creating a set of runs

simvue run create-batch or something maybe? idk

_folder.commit()

if not isinstance(_data, list):
raise ValueError("Expected JSON content to be a list.")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we give the option between a list of dicts, or a dict of dicts? Ie i can see some people may have:

{
    "run_1": {
        "a": 10,
        "b": 20
    },
    "run_2": {
        "a": 15,
        "b": 25
    },
...
}

We could support the key as the run name, and the values as the metadata

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the JSON format which I would naively expect data to be in instead of a list of dicts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we ourselves can decide this format, and that it is easier if there is just a list of "packets" to process, I would argue enforcing the one form is best here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentatively agree, but the point of these loading functions is that it should be as flexible as possible to allow for someone with a file of results not to have to bother fitting it into our format before upload (which I know will never be completely possible)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, noone is going to happen to have an output that aligns with this anyway, they will have to restructure it regardless (hence the connectors helping)

class PushJSON(PushAPI):
@pydantic.validate_call
def load_from_metadata(
self, input_file: pydantic.FilePath, *, folder: str
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All same comments as with CSV parser above

)
):
if _metrics := self._run_metrics.get(i):
sv_obj.Metrics.new(run=_id, metrics=_metrics).commit()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will currently not support multi-D metrics right? Do we want to support that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at this stage

assert result.exit_code == 0, result.stdout


def test_push_runs() -> None:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably set a TTL on these runs, otherwise repeatedly running the tests will very quickly fill up your simvue account!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ set a folder and delete the folder and runs once complete

],
catch_exceptions=False
)
assert result.exit_code == 0, result.stdout
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should use the client to check that the appropriate number of runs have been created, and the correct info is present in at least one of them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants