Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipestat interface suggestions #158

Closed
nsheff opened this issue Feb 21, 2024 · 5 comments
Closed

Pipestat interface suggestions #158

nsheff opened this issue Feb 21, 2024 · 5 comments
Milestone

Comments

@nsheff
Copy link
Contributor

nsheff commented Feb 21, 2024

clutter in return value

It's distracting that every time I retrieve a value, I get all this other stuff (created time, modified time, record identifier):

psm["sample2"] = {"value": "hijklmn"}
>>> psm["sample2"]
{'value': 'hijklmn', 'pipestat_created_time': '2024-02-20 21:44:13', 'pipestat_modified_time': '2024-02-20 21:44:13', 'record_identifier': 'sample2'}

I would expect it to just say:

psm["sample2"] = {"value": "hijklmn"}
>>> psm["sample2"]
{'value': 'hijklmn''}

why can't retrieve use the class record identifier?

I can do this:

psm = pipestat.PipestatManager(
    record_identifier="sample1",
    results_file_path="analysis/temp.yaml",
    schema_path="analysis/seqcol_pipestat_schema.yaml",
)
psm.report({"value": "abcdefg"})

It knows to put it on the record id I started with. But then why doesn't this work?

psm.retrieve_one("value")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nsheff/.local/lib/python3.11/site-packages/pipestat/pipestat.py", line 101, in inner
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nsheff/.local/lib/python3.11/site-packages/pipestat/pipestat.py", line 713, in retrieve_one
    raise RecordNotFoundError(f"Record '{record_identifier}' not found")
pipestat.exceptions.RecordNotFoundError: Record 'value' not found

That's confusing to me

docs are wrong

following the tutorial, docs say this:
image

But I don't get None -- I get an exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nsheff/.local/lib/python3.11/site-packages/pipestat/pipestat.py", line 101, in inner
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nsheff/.local/lib/python3.11/site-packages/pipestat/pipestat.py", line 713, in retrieve_one
    raise RecordNotFoundError(f"Record '{record_identifier}' not found")
pipestat.exceptions.RecordNotFoundError: Record 'sample1' not found

There are many other problems with the tutorial. Someone needs to actually go through it and run it. https://pep.databio.org/pipestat/code/python-tutorial/

Reporting result is returning a list

psm.report({"value": "arg"}, record_identifier="sample2")
["Reported records for 'sample2' in 'refget' :\n - value: arg", "Reported records for 'sample2' in 'refget' :\n - pipestat_created_time: 2024-02-20 21:53:22", "Reported records for 'sample2' in 'refget' :\n - pipestat_modified_time: 2024-02-20 21:53:22"]
@nsheff nsheff added this to the v0.9.0 milestone Mar 12, 2024
@donaldcampbelljr
Copy link
Contributor

Regarding clutter in return value:
The above example (>>> psm["sample2"]) is a simple wrapper around retrieve_one:

def __getitem__(self, key):
# This is a wrapper for the retrieve function:
result = self.retrieve_one(record_identifier=key)
return result

retrieve_one currently returns everything for that record unless you specify result identifiers.

@donaldcampbelljr
Copy link
Contributor

Regarding

why can't retrieve use the class record identifier?

With the above example, psm.retrieve_one("value") , pipestat thinks that value is the record_identifier.

image

However, if I attempt to specify the arguments, result = psm.retrieve_one(result_identifier="value"), I see that there is a problem and pipestat is not pulling the class record_identifier appropriately.

psm1 = PipestatManager(record_identifier="RECORD1", schema_path="sample_output_schema.yaml", results_file_path="results.yaml", pipeline_type="sample")
psm1.report(values={"number_of_things": 100})
result = psm1.retrieve_one(result_identifier="number_of_things")

Error

  File "/home/drc/PythonProjects/pipestat/opt_dependencies/venv6/lib/python3.10/site-packages/pipestat/pipestat.py", line 708, in retrieve_one
    raise RecordNotFoundError(
pipestat.exceptions.RecordNotFoundError: Results '['number_of_things']' for 'None' not found

@donaldcampbelljr
Copy link
Contributor

Regarding

Reporting result is returning a list

It appears that this was intentional: acf3a54

We wanted to return formatted results. For example, I know that PyPiper takes this returned formatted value and logs it. Is there an issue with it being a list? Should we just concatenate the formatted results into one big string?

@nsheff
Copy link
Contributor Author

nsheff commented Apr 4, 2024

We wanted to return formatted results.

I think the return value of the report function should not be a list of strings. It should be some recognition about what was reported that is machine-understandable. There could be a separate function to take that programmatic output and produce a formatted string if that's what someone wants.

generally you want a function to return something that can be used more universally, unless the whole point of the function is just to format a string.

@donaldcampbelljr
Copy link
Contributor

I've made associated issues for initial comment, so I will mark this specific issue as solved (close it) and work on the child issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants