Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.3.0 roadmap #48

Closed
35 tasks done
golobor opened this issue Aug 3, 2020 · 6 comments
Closed
35 tasks done

0.3.0 roadmap #48

golobor opened this issue Aug 3, 2020 · 6 comments

Comments

@golobor
Copy link
Member

golobor commented Aug 3, 2020

TODO:

API

  • decide if we want to expose more from bioframe.core.*

Cosmetic:

  • master branch name → main
  • old region.py split into core.stringops, core.construction
  • delete old util.py
  • delete dask.py
  • rename genomeops.py to utils.py
  • docstrings should all have 1 sentence then space line, then longer description
  • use arg for arguments in docstrings, to agree with https://numpydoc.readthedocs.io/en/latest/format.html#parameters. internal links can be added with func:myfunc()

Changes in the existing code code:

  • update trim() to rename limits → regions
  • update complement() to rename chromsizes → view, and update behavior to accept a dataframe input instead of just dict of chromsizes.
  • update suffixes= to default ("", "_") across ops

New code:

  • add function bioframe.to_ucsc_region_string(). (maybe in io?) can we have to_ucsc of some sort somewhere ? #50
  • create a new module that defines standards on bioframes, verifies existing dataframes and converts different inputs into bioframes: core.py? bioframe.py? standards.py? utils.py?
  • add functions to perform various checks on bioframes: is_sorted, is_overlapping, etc... add functions to perform various checks on bioframes: is_sorted, is_overlapping, etc... #19 Which module should they go to? definitions.py? The constraint to keep in mind is that some of these checks may require ops (i.e. tests for overlapping intervals in the set).
  • add a universal constructor to make regions dataframe from: dict {str:int} or dict {str:(int,int)} or pd.Series(ints, chroms), etc. This can then be used for limits. (see https://gist.github.com/gfudenberg/9898023bf9c9f3fc0791d086e6875179#file-test_verifiers-ipynb)
  • synchronize handling of pd.NA in crucial columns (chrom, start, end, on). This is currently handled on a function-by-function basis to avoid casting to float.
  • solution: write a function that nans a part of a table AND casts numpy numeric types into pandas types bioframe.core.construction.sanitize_bioframe
  • delete split(), and update make_chromarms to use subtract.

Test:

  • tests for split
  • tests for make_chromarms
  • ensure that arrayops works with NaNs. (double check arrayops behavior for floats)

Docs:

  • standards and definitions (https://docs.google.com/document/d/10rnnz3TGcaR591Y33k5vPurJimq7vY_P_CHxXl5s0dE/edit)
  • an ipynb with performance evaluation and comparisons
  • fix links in docs to point to correct branch for ipynbs
  • need to use a different word for ‘name’ of an interval, vs. the ‘parent_region_name’ of an interval. E.g. a set of CTCF peaks could have names CTCFpeak1,…, CTCFpeakN, but they could all be on chr1p.
    →→ defaults are 'view_region' and 'name'
  • API docs for the rest of the library:
    • io
    • genomeops
    • core (specs, construction, stringops, checks)
  • add _verify_columns and _verify_column_dtypes to specs documentation (https://stackoverflow.com/a/7740295)
  • add discussion of construction/bedframes/concepts to the guide/interval_tutorial.ipynb.
  • add text to the guide/interval_tutorial.ipynb
  • update links in readme
  • add remaining ops to the guide
  • "how do I" aka cookbook/recipes (ideally, an ipynb)
@nvictus nvictus changed the title 1.0.0 roadmap 0.1.0 roadmap Sep 23, 2020
@golobor
Copy link
Member Author

golobor commented Nov 24, 2020

tests for split are now available: 31e11f5 (@gfudenberg, does it look good?)

@gfudenberg
Copy link
Member

gfudenberg commented Nov 24, 2020 via email

@nvictus nvictus changed the title 0.1.0 roadmap 0.3.0 roadmap Apr 13, 2021
@gfudenberg
Copy link
Member

gfudenberg commented May 4, 2021

as of now, complement(df, chromsizes) raises an error if any intervals exceed the chromsizes.

Will this be the desired behavior after updating to view?

(i.e. should this function require the user first trims intervals, then complements? or should overhangs be allowed, but only the view-internal complement is returned?)

--> update: complement now allows for intervals from df to cover multiple regions in view_df.

@gfudenberg
Copy link
Member

gfudenberg commented Jun 1, 2021

I'm no longer sure what the todo items:
"refactor get_default_columnames → get_default_columnames(df,cols), put a _verify_columns(cols) in that function"

see the decorator code from Nezar below and the post-release bullet (consider replacing the repeated get_default_colnames and _verify_columns() in each ops with either a decorator or a function that does both. )

" sync split with Anton's subtract."
were supposed to indicate. do you @golobor ?

see todo item of deleting split, and change make_chromarms to use subtract

@nvictus
Copy link
Member

nvictus commented Jun 15, 2021

from functools import wraps

def checks_colnames(func):
    @wraps(func)
    def wrapped(*args, **kwargs):
        df = args[0]
        cols = kwargs.get('cols')
        ck, sk, ek = _get_default_colnames() if cols is None else cols
        _verify_columns(df, [ck, sk, ek])
        func(*args, **kwargs)
    return wrapped

@gfudenberg
Copy link
Member

released!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants