Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.4.0 Pipestat Release #61

Merged
merged 81 commits into from
Jun 29, 2023
Merged

v0.4.0 Pipestat Release #61

merged 81 commits into from
Jun 29, 2023

Conversation

donaldcampbelljr
Copy link
Contributor

@donaldcampbelljr donaldcampbelljr commented Jun 23, 2023

This PR is for the release of Pipestat v0.4.0.

Milestone Changes and Tracked Issues can be found here:
https://github.com/pepkit/pipestat/milestone/1

Highlights include:

  1. Migrating from SQL Alchemy to SQLModel as the interface to the DB backend.
  2. Splitting backends into separate classes (File vs DB).

Outstanding To-dos:

  • Documentation must still be updated.

nsheff and others added 30 commits December 22, 2021 22:05
Remove attmap dependency
longer line length for formatter CI; add options
Adding clarification to tutorial docs for obtaining output yaml files.
Close #28
* improve error message when namespace can't be found; apply formatter; close #3

* only call helper when it's needed

* tweak main object docstring for better CLI help message

* restructure parser creation logic

* better isolation of branches of conditional expression

* cleaner setup of PSM config

* be more explicit that a test expectation is simply to not crash

* better test names and description

* better validation sequence of PSM ctor args; #3

* add exception subtype and test to solve #3

* export PipestatError at package namespace level

* simplify / condense sytnax

* collect more errors, avoid repetition

* tweak message

* add log info for reading the schema

* collect all reserved word violations

* first-pass at handling looper-like structure for schema; #20

* simplify syntax by using default default arg

* update test case for more specific error type; #3

* use name directly

* tweak implementation and fix syntax

* introduce sqlmodel

* simplify implementation

* simplify implementation

* temp, amid refactor to use SQLModel

* update schemas in test/sample files

* start testing ParsedSchema

* wave of parsed schema tests

* fix import path

* add more parsed schema testing

* more parsed schema testing

* finish happy path ParsedSchema tests

* repairing old tests

* using pydantic model to avoid SQLModel collection / list type restriction

It appears that SQLModel doesn't yet support collection types. More generally, there appears to be deliberate, or at least known, lack of support for types for which there's no matching SQLAlchemy type.
See 3rd feature from 0.0.7 release: https://sqlmodel.tiangolo.com/release-notes/#features
See related PR: tiangolo/sqlmodel#18

* pausing on sample-level handling; splitting off files from the scalars

* collecting ORMs

* add record identifier

* WORKING EXAMPLE with totally broken logic, but functional machinery

* include record identifier, and extend table models

use __table_args__={'extend_existing': True} to extend table models, such that subsequent calls to create_model aren't problematic.

* update expected test failure reason syntax, and constructor signature

* better test names

* remove ABC tag from PipestatError since we instantiate it in fact

* fix missing schema test

* update results file / namespace test

* remove connection tests

* fix results file 1:1 logic with namespace

* fix test expectation

* more test updates

* more namespace removal

* remove debugging; temp

* better parametrization scheme

* update what to ignore based on what can be left by interrupted tests

* update schema format based on merge of status and results

* start updating access to status table

* apply formatter

* relax constraint for data type for status subschema

* add primary key for status table

* add record identifier to status table

* remove unnecessary test nesting

* remove now-outdated reference to 'self' in what were member functions

* better organization and syntax in status test suite

* restructuring model builds and tests

* require project or sample data in schema

* add formatter to test reqs

* nulls rather than empty mappings; update schema structures

* reorganizing and cleaning up constants

* test for requirement of pipeline identifier

* add test for extra schema keys

* condense validation of schema, add tests stubs

* finishing overhaul of constants and minimization, finishing parsed schema tests and stubs for integration with PSM

* ironing out more read_yaml_data bugs, testing

* amid updating signatures to account for project / samples disambiguation

* Remove namespace and status_schema from PipestatManager creation to align with Class changes.

* Import constants from argparser for use in cli.py.

* Change psm.schema -> psm.schema.results_data due to changes in parsing schemas.

* Apply formatting.

* Add const import for ENV_VARS

* Add ability to determine project_level flag from config.yaml.Allow project_level to default to False and modified logic during table name acquisition to get either table_name__sample or table_name__project based on project_level flag. Apply formatting.

* Simplified 2 tests within pytest. Apply formatting.

* Resolve pytest failure: TestPipestatManagerInstantiation.test_missing_cfg_data

* Add force_overwrite=True for reporting test.

* Add "array" type to CLASSES_BY_TYPE to prevent failure during complex model creation. #40

* Apply _recursively_replace_custom_types to sample data to parse more complex objects. #40

* Add pytest for complex objects. #40

* Clean up pytest for complex/nested objects. Convert nested objects (dicts) to strings before pushing to db columns.  #40

* Capture 'namespace' from config file and add it as a separate column in the sample table.  #41

* Add database context manager such that the db tables are cleared between tests. #38

* Add comment to context manager class. #38

* Fix removal test. Check if db mode is false AND if results_file.yaml is supplied. #39

* Formatting. #39

* Allow for storing dict as JSONB. Implemented filtering of data within JSONB column using sqlmodel. #40

* Fix bug with context manager in pytests for DB connection. #38

* Added ability to parse collection of images as array[dict] and store as JSONB. #40

* Import contextmanager class into test_status.py to prevent pytest failures when testing db backend. #38

* Fix failures in TestNoRecordID. #38

* Fix runtime warning due to __table_args__. #49

* Apply formatting. #49

* Fixed issue with setting self.project_level and added pytest to report both sample and project level data. #37

* Refactor pytest constants.

* Refactor test_report_samples_and_project.

* Create deepcopy of values before reporting. #50

* Only import deepcopy function. #50

* Re-implement record_count. #44

* Re-implement highlighted_results and associated pytest. #45

* Move helper function into helpers file

* lint

* move select_value to helpers

* lint

* move constants to const See #46

* refactor and simplify PSM constructor

* Implement project and sample level disambiguation and fix associated PyTests #51

* move select_value method to yacman

* Reconcile refactoring changes with project_level disambiguation. #51

* Refactor project_level check into function. #51

* Apply formatting.

* updates on pipeline type

* Fix results_file.yaml case issue.

* increase black line length to 99

* black formatting

* Re-implement test_str_representation. #38

* simplify constructor further

* Decouple tests and ensure results_file.yaml is not persistent between tests. #38

* Implement xFail tests from test_init.py. #38

* Implement test_custom_status_schema #38

* Split backend classes (#55)

* Incomplete work toward splitting backend into classes

* go back...

* revert more

* revert more

* redo changes

* backend work

* file reporting works

* file backend work

* Fix typing import for python 3.10 #52

* Partial implementation of remove for file backend. #52

* Add check_which_results_exist and check_results_exist to file backend. #52

* Move check_results_exist to abstract class. #52

* Implement check_record_exists in file backend. #52

* Implement get_status and set_status in file backend. #52

* Implement clear_status in file backend. #52

* partial DBBackend report implementation.

* continue DBBackend report implementation.

* Implement retrieve for DBBackend. #52

* Implement removal for DBBackend. #52

* refactor check_record_exists and add to interface

* refactor check_which_results_exist to list_existing_results

* finish implementing remove for DB backend #52

* Add option to return all results for list_existing_results DB Backend

* Add option to return all results for list_existing_results File Backend

* consolidate report and report_db in DB Backend

* create remove_record functions for file and DB backends

* Add basic implementation for Pytesting new split classes.

* remove unused decorator

* simplify list results func

* clarify docstring

* rename to list_results

* rename some variables

* fix priority vars for check-result_exists

* lint

* simplify list_results for DB Backend by using list comprehension

* extend basic testing for split_backend, fix remove_record to update local file

* clean up report in pipestat manager, fix related pytests

* clean up remove and retrieve

* add pytest condition for DB backend

* Implement select function #52

* Implement select_txt function #52

* ReImplement record_count function

* Implement set_status

* Implement get_status

* Remove redundant code from pipestat.py

* move select_distinct to DB backend.

* remove redundant functions, clean up docstrings

* add raising exception if no backend specified

* Correct doc strings.

* remove ".new" usage.

* add PROJECT_NAME as constant

* Clean up pipestat context managers.

* lint

* add some type hints.

* implement require backend decorator

* no need for intermediate variable

* Change self[DATA_KEY] to self.data and align both parent and child classes to use attribute instead of attribute or item.

* fix pytest assertion

* replace status" with constant

* Change backend data attribute to _data

* Fix pipeline_type/table_name priority

* Remove select functions from pipestatmanager parent class

* Move initializing and loading results file to FileBackend.

* Refactor namespace and project name to pipeline_name.

* Refactor pipeline_id to pipeline_name, remove namespace property.

* Organize functions

* unify some properties retrievals

* standardize retrieve function. Fix #58.

* simplify field definitions, refactor namespace to project_name

* use const in field definitions

* Remove commented, unused code from backend

* Add and edit docstrings

* add some tests

* move orm and engine creation to DB backend.

* Split backend files for readability.

* Clean up redundant importing, re-organize functions and properties

---------

Co-authored-by: nsheff <nsheff@users.noreply.github.com>

---------

Co-authored-by: Vince Reuter <vince.reuter@gmail.com>
Co-authored-by: Donald C <125581724+donaldcampbelljr@users.noreply.github.com>
@donaldcampbelljr
Copy link
Contributor Author

Documentation has been updated.

@donaldcampbelljr donaldcampbelljr marked this pull request as ready for review June 27, 2023 19:30
@donaldcampbelljr
Copy link
Contributor Author

I set codecov to have a threshold of 80% and, for now, be informational only.

Copy link
Member

@khoroshevskyi khoroshevskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just small comments, but overall LGTM. I like new code structure, and it seems very readable:)

pipestat/__init__.py Outdated Show resolved Hide resolved
pipestat/argparser.py Show resolved Hide resolved
pipestat/backends/abstract.py Show resolved Hide resolved
pipestat/backends/dbbackend.py Show resolved Hide resolved
pipestat/backends/dbbackend.py Show resolved Hide resolved
pipestat/const.py Show resolved Hide resolved
pipestat/const.py Show resolved Hide resolved
pipestat/helpers.py Show resolved Hide resolved
pipestat/helpers.py Outdated Show resolved Hide resolved
pipestat/helpers.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment