Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipestat should offer the option to include a history of results #177

Closed
donaldcampbelljr opened this issue Apr 4, 2024 · 4 comments · Fixed by #179
Closed

pipestat should offer the option to include a history of results #177

donaldcampbelljr opened this issue Apr 4, 2024 · 4 comments · Fixed by #179

Comments

@donaldcampbelljr
Copy link
Contributor

          I flipped force_overwrite to default to True. I will do the same in PyPiper. However, we will still need to allow pipestat to offer the ability for history of results:

In the longer term, pipestat should offer the option to include a history of results, and these should be stored somehow in the file (and database). This may not actually be too hard to implement; just add a 'history' function, and when something is overwritten, just move the old values into the history in a way that is an array, rather than a single value. Then, pipestat could offer a clear history function to remove old stuff, if desired, but otherwise, repeated reports of the same result will simply add to the history.

Originally posted by @donaldcampbelljr in #161 (comment)

@donaldcampbelljr
Copy link
Contributor Author

Playing around with different options in PR #178. A POC is working for filebackend. But the results file grows quickly. I am contemplating moving this to a separate .history.yaml file that is parallel with results.yaml so that it is less messy.

test_pipe:
  project: {}
  sample:
    pypiperRecordIdentifier1:
      number_of_things: 300
      pipestat_created_time: '2024-04-04 14:16:54'
      pipestat_modified_time: '2024-04-04 14:16:54'
    RECORD1:
      number_of_things: 50000
      pipestat_created_time: '2024-04-04 17:23:56'
      pipestat_modified_time: '2024-04-04 18:28:59'
      name_of_something: Another_Name
      history:
        number_of_things:
          '2024-04-04 18:28:43':
            reported_result: 100
          '2024-04-04 18:28:58':
            reported_result: 50000
        pipestat_modified_time:
          '2024-04-04 18:28:43':
            reported_result: '2024-04-04 18:28:43'
          '2024-04-04 18:28:58':
            reported_result: '2024-04-04 18:28:58'
          '2024-04-04 18:28:59':
            reported_result: '2024-04-04 18:28:59'
        name_of_something:
          '2024-04-04 18:28:43':
            reported_result: Test_Name
          '2024-04-04 18:28:59':
            reported_result: Another_Name
    RECORD2:
      number_of_things: 300
      pipestat_created_time: '2024-04-04 17:23:56'
      pipestat_modified_time: '2024-04-04 18:28:56'
      name_of_something: Test_Name_Changed...Again
      history:
        number_of_things:
          '2024-04-04 18:28:45':
            reported_result: 100
          '2024-04-04 18:28:50':
            reported_result: 200
          '2024-04-04 18:28:54':
            reported_result: 300
        pipestat_modified_time:
          '2024-04-04 18:28:45':
            reported_result: '2024-04-04 18:28:45'
          '2024-04-04 18:28:48':
            reported_result: '2024-04-04 18:28:48'
          '2024-04-04 18:28:50':
            reported_result: '2024-04-04 18:28:50'
          '2024-04-04 18:28:52':
            reported_result: '2024-04-04 18:28:52'
          '2024-04-04 18:28:54':
            reported_result: '2024-04-04 18:28:54'
          '2024-04-04 18:28:56':
            reported_result: '2024-04-04 18:28:56'
        name_of_something:
          '2024-04-04 18:28:48':
            reported_result: Test_Name
          '2024-04-04 18:28:52':
            reported_result: Test_Name_Changed
          '2024-04-04 18:28:56':
            reported_result: Test_Name_Changed...Again

@donaldcampbelljr
Copy link
Contributor Author

For now, I'm just continuing with the above approach for the file backend and have added a retrieve_history function which uses retrieve_one

@donaldcampbelljr
Copy link
Contributor Author

Currently deletion will look something like this:

        name_of_something:
          '2024-04-04 18:28:43':
            reported_result: Test_Name
          '2024-04-04 18:28:59':
            reported_result: Another_Name
          '2024-04-04 18:59:29':
            reported_result: Another_Name
          '2024-04-04 20:02:40':
            reported_result: Another_Name
          '2024-04-04 20:03:28':
            reported_result: Another_Name
          '2024-04-04 20:05:54':
            deletion: ''

However, if the record is removed (this occurs if only the history, creation_time, and modified_time are all that is left), the history is also removed with the record.

@donaldcampbelljr
Copy link
Contributor Author

Currently working on the db_backend, it appears as though we will also need to delete the history of the record when the primary record is removed (similar to file backend) because of "foreign key contraint"

Could not remove the result from the database. Exception: (psycopg.errors.ForeignKeyViolation) update or delete on table "default_pipeline_name__sample" violates foreign key constraint "default_pipeline_name__sample_history_source_record_id_fkey" on table "default_pipeline_name__sample_history"

However, I'm operating under the assumption that this is desirable anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant