Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retrieving just one result /attribute #72

Closed
nsheff opened this issue Sep 6, 2023 · 10 comments
Closed

retrieving just one result /attribute #72

nsheff opened this issue Sep 6, 2023 · 10 comments
Assignees
Labels
Milestone

Comments

@nsheff
Copy link
Contributor

nsheff commented Sep 6, 2023

In my hands, when I use retrieve with a database backend, giving it a result_identifier has no effect.

It always gives me back the entire sample, not just the particular attribute I am requesting.

@nsheff nsheff changed the title retrieving just one retrieving just one result /attribute Sep 6, 2023
@nsheff nsheff added bug Something isn't working priority-high labels Sep 8, 2023
@donaldcampbelljr
Copy link
Contributor

donaldcampbelljr commented Sep 8, 2023

I am unable to replicate this issue on my machine using the pipestat_table branch. I tried sample and project level reporting and retrieving for a database backend.

output:

Results when using result_identifier = "project_integer"

100

Results when not using result_identifier

{'project_string': 'misc string to report', 'project_integer': 100, 'project_name': 'random_project_id_2', 'id': 2}

Commands:

    results8 = psm2.retrieve(project_name="random_project_id_2", result_identifier="project_integer",
                             pipeline_type='project')
    print(results8)

    results9 = psm2.retrieve(project_name="random_project_id_2",
                             pipeline_type='project')
    print(results9)

@nsheff
Copy link
Contributor Author

nsheff commented Sep 10, 2023

I'm using that branch, for a sample pipeline,and psm.retrieve("mysample", "result_id") returns the entire thing; the same as psm.retrieve("mysample").

So I'm doing: psm.retrieve("mysample")["result_id"] and that works,but I should be able to do psm.retrieve("mysample", "result_id").

I guess I will either create a MWE or just fix it

@nsheff
Copy link
Contributor Author

nsheff commented Sep 10, 2023

I'm using database backend

@nsheff
Copy link
Contributor Author

nsheff commented Sep 10, 2023

fascinating... psm.retrieve("mysample", result_identifier="result_id") works.... so it only works if I name that second arg. but it should be working without it being named....

@nsheff
Copy link
Contributor Author

nsheff commented Sep 10, 2023

Here's a MWE:

bbc = BedBaseConf("bedbase.org/config/bedbase2.yaml")

bbc.bed.retrieve("78c0e4753d04b238fc07e4ebe5a02984")  # gives the entire sample

bbc.bed.retrieve("78c0e4753d04b238fc07e4ebe5a02984", "bedfile")  # doesn't work, gives the entire sample

bbc.bed.retrieve("78c0e4753d04b238fc07e4ebe5a02984", result_identifier="bedfile")  # works

@nsheff
Copy link
Contributor Author

nsheff commented Sep 10, 2023

Ok I figured it. it was very confusing, but the retrieve function on the main PipestatManager class has a different signature from the one on the backend. I didn't realize that, and I was expecting it to work like that.

We will need to change that signature. I suggest:

  • we PipestatManager.retrieve expect record_identifer instead of sample_name and project_name, so it can match the backend signature.
  • we eliminate the _get_record_identifier concept
  • we could handle doing both types using a separate function, if necessary (like, retrieve_project to access project-level attributes, instead of overloading retrieve)

@donaldcampbelljr
Copy link
Contributor

The input arguments order doesn't match across the pipestat.py retrieve function and the dbbackend version. I assume that's what's causing issues?

@donaldcampbelljr
Copy link
Contributor

I'll check the other functions signatures on Monday to ensure consistency.

@nsheff
Copy link
Contributor Author

nsheff commented Sep 10, 2023

The input arguments order doesn't match across the pipestat.py retrieve function and the dbbackend version. I assume that's what's causing issues?

sort of -- it's not so much the order, it's that the signature is not the same (the arguments differ).

this will require some rethinking about how to manage project vs sample records, though.

donaldcampbelljr added a commit that referenced this issue Sep 20, 2023
@donaldcampbelljr
Copy link
Contributor

I did another pass on the signatures along with some other polishing. This should now be working as expected.

@donaldcampbelljr donaldcampbelljr added this to the v0.6.0 milestone Oct 5, 2023
donaldcampbelljr added a commit that referenced this issue Dec 22, 2023
* Add get_records to pipestat_manager and add related test #75

* clean up signatures #72

* clean up docstrings

* remove superfluous get_table_name function

* Reduce get_orm and get_model into one function. #71

* Fix CLI to use record-identifier instead of sample-name

* Fix CLI to use pipeline_type #37

* modify get_records to return new structure #75

* fix html report generation using get_records

* add simple CLI test for reporting and retrieving #81

* Pipestat reader (#80)

* update version in prep for pipestat table

* add table function

* add initial stats function from looper, remove counter

* fixing _create_stats_summary and get_file_for_project

* fix sample_level stats reporting

* add object reporting

* remove redundancies

* lint

* add assertion for table generation to pre-existing test

* use pipeline_type when retrieving samples

* Fix stats table generation output to look better

* fix return types

* fix list vs List

* fix typo for stats table

* fix doc strings

* update func names for disambiguation

* fix key error issue in html_reports

* clean up

* adjust docstrings

* update docstrings

Co-authored-by: Nathan Sheffield <nsheff@users.noreply.github.com>

* adjust docstrings, rename html_reports_pipestat.py to reports.py

* adjust LOGGER.info string

* simplify objs and stats generation

* remove old todos

* remove sample_name, add pipeline_name to project field definitions

* work towards using project_name instead of sample_name from project-level pipelines

* fix check record bug, add better error message for pipeline type.

* add checks for pipeline types, lint

* fix get_samples with pipeline type

* fix fetch_pipeline_results

* add basic tests for project_level, fix associated bugs

* allow table creation to use project_name if applicable

* add passing project name to file_backend

* refactor to use r_id as input

* update doc strings

* more refactoring of records vs samples to be more general

* move table generation to reports.py

* allow status file dir path to be associated with config path OR the file path to ensure looper compatibility.

* add output_dir as a parameter for placing reports and stat files. Otherwise defaults to results_file dir or config_file dir

* add obtaining schema_path from pipestat config for Looper compatibility

* added select function to main pipestat class

* added select_txt and select_distinct function to main pipestat class

* added get_one_record function to main pipestat class #70

* Removing CLI req that one must supply config and schema because the user can supply schema within the config. For Looper compatibility.

* Revert "added get_one_record function to main pipestat class #70"

This reverts commit b477060.

* Revert "added select_txt and select_distinct function to main pipestat class"

This reverts commit 4588f4f.

* Revert "added select function to main pipestat class"

This reverts commit ee98145.

* Add basic glossary page in the form of a table. Addresses: pepkit/looper#290

* remove redundant tag

* add RecordNotFoundError across both backends #74

* Add PipelineTypeNotSuppliedError to hide dbbackend information #74

* Initial work towards a fastapi implementation of pipestat reader
#22

* Initial work to separate classes #78

* working implementation of child classes for reporting using DBBACKEND. Manual testing confirmed. pytest Tests broken. #78

* implement report for filebackend

* fix inheritance issue with pipeline_type

* typo

* major refactoring, many tests still broken #78

* Fix signatures for report and explicitly state input arguments.

* remove pipeline type check

* add projectpipestamanager and refactor project_name to record_identifier

* fix get_status

* fix tests

* fix all tests #78

* add todo

* code cleanup and remove getting_table_name

* clean up

* partial removal of pipeline_type logic from DB backend

* remove pipeline_type input arguments from File Backend

* clean up

* add simple class to wrap SamplePipestatManager and ProjectPipestatManager

* change pipestatmanager to mutable mapping #78

* lowercase attributes #78

* polish PipestatBoss #78

* env_vars for pipeline_type

* update docs

* working proof of concept to retrieve results #22

* update readme and add getting output schema

* add else catch for outptu schema and fix typo

* add ability to run with uvicorn if calling reader.py directly.

* add __init__.py and main func

* add more endpoints #22

* update endpoints, add Query

* allow for more complex filtering using post and pydantic models #22

* Add image and file endpoints #22

* lint

* Add basic arg parser to pass absolute path to config file

* Add endpoint for get_records()

* clean up

* modify get_records to return new structure #75

* fix html report generation using get_records

* attempt global creation of psm, does not work

* fix creation of global psm

* change fetching by filetype

* add cli option for pipestat serve

* simplify and fix pipestat serve #22

* add host and port arguments #22

* add config error

* update readme

---------

Co-authored-by: nsheff <nsheff@users.noreply.github.com>

* list to List

* fixed typing in python3.8

* update changelog

* fix table_name_bug

* "0.5.4a1" for alpha release

* "0.6.0a1" for alpha release

* Dev json schema (#87)

* first pass to allow for output_schema as a true JSON schema #85

* remove unused import

* simplify logic, extend _safe_pop_one_mapping to handles multiple keys

* disambiguation key vs keys in parsed schema

* make samples and project objects instead of arrays, add more test assertions, clean up docstrings.

* add status data to string representation of ParsedSchema

* add retrieve_distinct function for #70 and databio/bedhost#75

* add `pipestat link` functionality to create symlinks (#90)

* begin work on file inking via pipestat

* continue implementing link for an output_dir OR the results_file.yaml
#89

* implemented symlinks for filebackend given output_dir
#89

* implemented symlinks for filebackend using results.yaml, polish tests
#89

* fix typo, clean up

* more lcean up

* begin rework to be backend agnostic

* refactor to place in folders specific to a result_identifier

* add more complex types, add temp directory for better testing

* fix recursive finding of paths

* clean up, move to abstract class, confirm works for both backends

* remove unused test files

* add jinja2 to requirements-all.txt #91

* complex example with collision

* add warning

* add cli option

* remove unused functions

* update changelog

* clean up docstrings, remove unused variables, refactor link_dir

* clean up more docstrings

* remove unnecessary print statement to reduce verbosity

* Begin work on adding created and modified datetime, DateTime objects are not parsing correctly, tests broken #93

* fix datetime objects #93

* fix pipeline_name field beginning with underscore #94

* timestamp continuation, list_recent_results does NOT work

* fix doc string for schema

* add filtering by created and modified time #93

* update version for alpha release v0.6.0a2

* fix list reversing #93

* polish tests #93

* add filebackend list_recent_results, remove records broken #93

* fix remove records bug #93

* update doc strings #93

* update change log for list_recent_results

* begin work on retrieve_multiple #96

* add db backend, test broken #96

* fix db backend logic #96

* implement file backend #96

* fix docstrings

* some minor polishing

* update readme

* move pipestat reader per #22

* fix pipestat reader setup and manifest per #22

* Add optional dependencies via setup #22

* pipestat can now run without DB specific dependencies #22

* Some refactoring #64

* add other requirements to test

* refactor to have ParsedSchemaDB inherit from ParsedSchema for code reduction #64

* Clean up

* return total record count #75

* make initialization of file and DB backend into separate functions

* add decorator for checking dependencies

* polishing decorator for checking dependencies

* consolidate decorator and ancillary function into one

* first pass #99, some tests broken

* fix all tests #99

* refactor store to cfg #99

* fix bug with table path

* setitem is now a wrapper for report

* fix getitem implementation, add test #99

* Begin work on select_records, retrieve_one, retrieve_many #103

* Implement select_records, and cursor pagination #103

* clarify init message

* Update dbbackend.py

* add loading JSON and column.contains for filtering #103

* testing differences

* add or_ and and_ sqlmodel operators #103

* some comments

* steps toward smarter pytesting

* #103 -- added multi key filter

* typo

* polish changes to select_records and add tests. #103

* fix typo

* remove Union for python 3.8 compat

* move get_nested_column and define_sqlalchemy_type outside of select_records

* update doc strings

* Modify to be true JSONs chema with using $ref and $def for custom objects (image, file) #85

* Update pipestat/backends/abstract.py

Co-authored-by: Oleksandr <sasha99250@gmail.com>

* Update pipestat/backends/db_backend/dbbackend.py

Co-authored-by: Oleksandr <sasha99250@gmail.com>

* Update pipestat/backends/db_backend/dbbackend.py

Co-authored-by: Oleksandr <sasha99250@gmail.com>

* Update pipestat/pipestat.py

Co-authored-by: Oleksandr <sasha99250@gmail.com>

* Update pipestat/pipestat.py

Co-authored-by: Oleksandr <sasha99250@gmail.com>

* Update tests/data/sample_output_schema.yaml

Co-authored-by: Oleksandr <sasha99250@gmail.com>

* Polish PR, fix docstrings, add wrapper for remove

* remove unnecessary mutablemapping inheritance

* lint

* cleaning

* fixed failing test

* remove old, unused validate_schema function

* change ALL output_schemas to reflect JSON schema

* lint

* per request, remove select function as well as dynamic_filter

* add to do

* revert PipestatManager to a MutableMapping

* remove select_txt and select_distinct

* revert select_distinct for now

* remove unused backend funcs, tests broken

* implement list_recent_results wrapper (DBBackend only)

* implement list_recent_results wrapper (DBBackend only)

* some polish, begin implementing select_records

* more work towards select_records

* change operator func, add list comprehension for sets

* implement AND OR Logic and fix tests, add sorting

* fix time stamp filtering if NOT nested

* lint and fix db test

* attempt to pare down results based on columns, does not work.

* add column filtering for filebackend, add tests, add no filter condition retrieving all records on filebackend

* fix complex object filtering

* implement select_distinct for filebackend

* select_dsitinct now works on record identifiers for filebackend, polish tests

* fix retrieve_one and retrieve_many for filebackend

* fix timestamp test for filebackend

* use isinstance for type checking

* refactored for efficiency

* fix recursive schema after merge

* remove get_records, fix return type for select_records if given columns

* begin remove of get_one_record

* deleted get_one_record function

* fix get_status on dbbackend using select_records

* remove retrieve from dbbackend and reports

* replace retrieve with retrieve_one and begin replacing retrieve with select_records

* lint

* fix select_records filebackend bug, remove retrieve from tests

* finish removing retrieve function

* attempt re-implementing record not found error for retrieve_one and retrieve_many

* fix bugs with select_records and RecordNotFound, implement ColumnNotFound exception

* lint

* update changelog and version for 0.6.0a3 release

* refactoring based on ruff

* update cli and usage and api docs, lint

* update documentation focused on schemas

* add result_identifier as parameter to retrieve_one

* add result_identifier as parameter to retrieve_many

* __version__ = "0.6.0a4"

* Finish adding pytest.mark.skipif

* lint

* move general tests to test_pipestat, remove some redundancy

* implement clear_status pytest for filebackend #110

* polish test set_item, report_overwrite

* add pytest fixture val_dict

* add pytest fixture values_sample, values_project

* add pytest fixture values_complex_linking, move fixtures to conftest.py

* deleted unnecessary print

* add test_select_no_filter_limit

* black formatting

* add pytest fixture for ranges

* lint

* fix record_identifier being returned twice if it is an input column

* fix typo

* change retrieve_one output to no longer we wrapped by select_records output

* fix schema propert to return psm.cfg["_schema"].to_dict()

* fix schema propert to return psm.cfg["_schema"]

* add psm.schema.original_schema to parsed schema

* add r_id = record_identifier or self.record_identifier to retrieve_one

* fix majority of tests due to retrieve_one

* lint

* fix table creation bug with retrieve_one

* remove RecordNotFoundError because it breaks pypiper

* fix parsed_schema json schema parsing to be correct hierarchy

* update jupyter api docs

* update cli docs

* fix property access #113

* add retrieving multiple results using retrieve_one

* list to List

* add quickstart guide

* add psm.schema.resolved_schema

* remove unused function

* version bump to v0.6.0a5

* fix config_path property

* re-implement RecordNotFoundError

* be more selective about which exceptions to pass

* potential solution for #117

* version bump 0.6.0a6

* remove redundant pydantic

* fix pipestat reader dependency checking

* fix cli bug regarding env variables

* retrieve_one record_identifier defaults to None to allow for use with env variables, clean README.md

* further documentation polishing

* fix obtaining record_identifier from cfg file

* correct config.md

* correct pipestat_specification.md

* 0.6.0a7 prerelease

* Clarify return on select_records

* ensure requirements list yacman>= 0.9.2

* potential fix for #119

* change warning to debug for creating status table

* fix pipestat creating subdirectories before creating results file.

* First proof of concept for #120

* simply do replacement at beginning

* Fixed #106

* now functional, create aggregate_results.yaml which replaces self._data before reporting #120

* replace spaces in names with %20 during html link generation

* fix reported_objs bug

* expandpath for link_dir during pipestat link

* add tests for new aggregate_results functionality

* refactor db-backend to dbbackend

* fix stats and objects buttons for summary page #130

* fix stats table on summary page #130

* fix broken status if multi_results_files #130

* fix log and profile files #130

* fix tsv columns #130

* Add check for pipeline type during report generation #130

* refactor Sample to Record #130

* fix refactor for sample page

* fix stats.tsv and objects.yaml generation to incorporate pipeline types

* add project_objects page #130

* minor polishing

* remove temp test

* lint

* add passing looper's samples to pipestat summarize to populate table

* remove unused code

* version bump for pre-release v0.6.0a9

* use "object_type" for image and file types or fall back on using "type", add try except

* ensure project level objects are accurate on summary/index page

* remove project name from stats and object file names

* update changelog

* 0.6.0a10 bump for pre-release

* allow project-level figures to display as columns instead of rows

* 0.6.0a11 pre-release

* v0.6.0 release prep

* lint

---------

Co-authored-by: nsheff <nsheff@users.noreply.github.com>
Co-authored-by: Khoroshevskyi <sasha99250@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants