0.3.0 roadmap #48

golobor · 2020-08-03T19:52:24Z

TODO:

API

decide if we want to expose more from bioframe.core.*

Cosmetic:

master branch name → main
old region.py split into core.stringops, core.construction
delete old util.py
delete dask.py
rename genomeops.py to utils.py
docstrings should all have 1 sentence then space line, then longer description
use arg for arguments in docstrings, to agree with https://numpydoc.readthedocs.io/en/latest/format.html#parameters. internal links can be added with func:myfunc()

Changes in the existing code code:

update trim() to rename limits → regions
update complement() to rename chromsizes → view, and update behavior to accept a dataframe input instead of just dict of chromsizes.
update suffixes= to default ("", "_") across ops

New code:

add function bioframe.to_ucsc_region_string(). (maybe in io?) can we have to_ucsc of some sort somewhere ? #50
create a new module that defines standards on bioframes, verifies existing dataframes and converts different inputs into bioframes: core.py? bioframe.py? standards.py? utils.py?
add functions to perform various checks on bioframes: is_sorted, is_overlapping, etc... add functions to perform various checks on bioframes: is_sorted, is_overlapping, etc... #19 Which module should they go to? definitions.py? The constraint to keep in mind is that some of these checks may require ops (i.e. tests for overlapping intervals in the set).
add a universal constructor to make regions dataframe from: dict {str:int} or dict {str:(int,int)} or pd.Series(ints, chroms), etc. This can then be used for limits. (see https://gist.github.com/gfudenberg/9898023bf9c9f3fc0791d086e6875179#file-test_verifiers-ipynb)
synchronize handling of pd.NA in crucial columns (chrom, start, end, on). This is currently handled on a function-by-function basis to avoid casting to float.
~~solution: write a function that nans a part of a table AND casts numpy numeric types into pandas types~~ bioframe.core.construction.sanitize_bioframe
delete split(), and update make_chromarms to use subtract.

Test:

tests for split
tests for make_chromarms
ensure that arrayops works with NaNs. (double check arrayops behavior for floats)

Docs:

The text was updated successfully, but these errors were encountered:

golobor · 2020-11-24T22:00:28Z

tests for split are now available: 31e11f5 (@gfudenberg, does it look good?)

gfudenberg · 2020-11-24T22:08:24Z

looks good!

…

-- it would help to add a few more of the explanations (# Test the case when a chromosome is missing from points is perfect). -- looks like the case where points is a dict could use a test (if we intend to keep that option for split() ). -- looks like add_names could use a test ops.split() also appears to need a docstring update for cols_points & add_names.

On Tue, Nov 24, 2020 at 2:00 PM Anton Goloborodko ***@***.***> wrote: tests for split are now available: 31e11f5 <31e11f5> ***@***.*** <https://github.com/gfudenberg>, does it look good?) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#48 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEV7GZMVPLD7DTJ3P62CWPLSRQUIVANCNFSM4PTVIJ4A> .

gfudenberg · 2021-05-04T17:02:08Z

~~as of now, complement(df, chromsizes) raises an error if any intervals exceed the chromsizes.~~

~~Will this be the desired behavior after updating to view?~~

~~(i.e. should this function require the user first trims intervals, then complements? or should overhangs be allowed, but only the view-internal complement is returned?)~~

--> update: complement now allows for intervals from df to cover multiple regions in view_df.

gfudenberg · 2021-06-01T07:13:37Z

~~I'm no longer sure what the todo items:~~
~~"refactor get_default_columnames → get_default_columnames(df,cols), put a _verify_columns(cols) in that function"~~

see the decorator code from Nezar below and the post-release bullet (consider replacing the repeated get_default_colnames and _verify_columns() in each ops with either a decorator or a function that does both. )

" sync split with Anton's subtract."
were supposed to indicate. do you @golobor ?

see todo item of deleting split, and change make_chromarms to use subtract

nvictus · 2021-06-15T20:08:03Z

from functools import wraps

def checks_colnames(func):
    @wraps(func)
    def wrapped(*args, **kwargs):
        df = args[0]
        cols = kwargs.get('cols')
        ck, sk, ek = _get_default_colnames() if cols is None else cols
        _verify_columns(df, [ck, sk, ek])
        func(*args, **kwargs)
    return wrapped

gfudenberg · 2021-08-31T22:04:02Z

released!

nvictus changed the title ~~1.0.0 roadmap~~ 0.1.0 roadmap Sep 23, 2020

nvictus changed the title ~~0.1.0 roadmap~~ 0.3.0 roadmap Apr 13, 2021

gfudenberg mentioned this issue Apr 15, 2021

add-verify-genomic-intervals-to-core #71

Merged

sergpolly mentioned this issue May 14, 2021

standardize functions to read various "bioframes" from file #73

Closed

gfudenberg closed this as completed Aug 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.3.0 roadmap #48

0.3.0 roadmap #48

golobor commented Aug 3, 2020 •

edited by gfudenberg

Loading

golobor commented Nov 24, 2020

gfudenberg commented Nov 24, 2020 via email

gfudenberg commented May 4, 2021 •

edited

Loading

gfudenberg commented Jun 1, 2021 •

edited

Loading

nvictus commented Jun 15, 2021

gfudenberg commented Aug 31, 2021

0.3.0 roadmap #48

0.3.0 roadmap #48

Comments

golobor commented Aug 3, 2020 • edited by gfudenberg Loading

golobor commented Nov 24, 2020

gfudenberg commented Nov 24, 2020 via email

gfudenberg commented May 4, 2021 • edited Loading

gfudenberg commented Jun 1, 2021 • edited Loading

nvictus commented Jun 15, 2021

gfudenberg commented Aug 31, 2021

golobor commented Aug 3, 2020 •

edited by gfudenberg

Loading

gfudenberg commented May 4, 2021 •

edited

Loading

gfudenberg commented Jun 1, 2021 •

edited

Loading