Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: peppy.Project.config.to_yaml() doesn't accurately reproduce original config file #399

Closed
nleroy917 opened this issue Aug 18, 2022 · 3 comments · Fixed by #489

Comments

@nleroy917
Copy link
Member

Overview

Over at PEPhub, I am implementing a feature to download a PEP as a zipped folder. To do this, I am recapitulating the PEP using the internal state of the peppy.Project object returned from pepagent and the database. However, the peppy.Project.config object saves file path configurations as relative to where the peppy.Project object was instantiated, rather than the actual value in the file. This becomes problematic when downloading PEPs since the file path in the project_config.yaml file and the actual file location are now out of sync. Here is an example:

Example

The config file:

cat examples/demo/basic/project_config.yaml
pep_version: "2.0.0"
sample_table: sample_table.csv

At the root of pephub (/):

>>> import peppy
>>> p = peppy.Project("examples/demo/basic/project_config.yaml")
>>> p.config
pep_version: 2.0.0
sample_table: examples/demo/basic/sample_table.csv

Inside the PEP folder (/examples/demo/basic/project_config.yaml):

>>> import peppy
>>> p = peppy.Project("project_config.yaml")
>>> p.config
pep_version: 2.0.0
sample_table: sample_table.csv

Note how the sample_table attribute is changing depending on where the PEP was instantiated

I suspect peppy uses this state internally to initialize things and it was never intended that the original config file would need to be reproduced. This is less of a bug, I guess, and more of a feature request. I could manually update the internal state of peppy.Project and just strip the relative paths using os.path.basename, but it would be nice to have a method that can recreate the PEP assuming all files are next to one another.

Supplemental Material :D

Here is the zipping function where I build back up the PEP:

def zip_pep(project: peppy.Project) -> Response:
    """Zip a project up to download"""
    content_to_zip = {}

    if project.config:
        cfg_filename = basename(project.config_file)
        content_to_zip[cfg_filename] = project.config.to_yaml()
    if project.sample_table is not None:
        sample_table_filename = basename(project.to_dict().get('sample_table', "sample_table.csv"))
        content_to_zip[sample_table_filename] = project.sample_table.to_csv()
    if project.subsample_table is not None:
        # sometimes the subsample table is a list. So change behavior
        # based on this
        if not isinstance(project.subsample_table, list):
            subsample_table_filename = basename(project.to_dict().get('subsample_table', "subsample_table.csv"))
            content_to_zip[subsample_table_filename] = project.subsample_table.to_csv()
        else:
            subsample_table_filenames = project.to_dict().get('subsample_table', "subsample_table.csv")
            for sstable, sstable_filename in zip(project.subsample_table, subsample_table_filenames):
                subsample_table_filename = basename(sstable_filename)
                content_to_zip[subsample_table_filename] = **sstable.to_csv()**
@nsheff
Copy link
Contributor

nsheff commented Aug 26, 2022

Hmm.

I'm a little surprised the peppy adjusts the content in the config file, to be honest.

I think instead, peppy should use an internal value where it keeps track of where it actually finds the sample tables, rather than updating the config. @stolarczyk can you comment on this?

@khoroshevskyi with your experience with all the to_dict stuff, can you comment on this?

@khoroshevskyi
Copy link
Member

khoroshevskyi commented Aug 26, 2022

Hmm.

I'm a little surprised the peppy adjusts the content in the config file, to be honest.

I think instead, peppy should use an internal value where it keeps track of where it actually finds the sample tables, rather than updating the config. @stolarczyk can you comment on this?

@khoroshevskyi with your experience with all the to_dict stuff, can you comment on this?

to_dict function will take all the content of _config and _sample_df variables, it won't change anything. But I agree with @nleroy917 , that this variables are changing during peppy processing.
It won't influence on methods that are initializing peppy from dict, but it will initial paths, that can cause errors in the future (e.g. in this Nathans issue).

@stolarczyk
Copy link
Member

I think instead, peppy should use an internal value where it keeps track of where it actually finds the sample tables, rather than updating the config. @stolarczyk can you comment on this?

Yeah, this seems like a sensible approach. I'm not sure what was my motivation for implementing it the other way. I would have done what you suggest if this issue had manifested earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants