Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_genomedata: responds poorly to invalid syntax [only a track path, instead of (name, path) tuple] #33

Open
EricR86 opened this issue Apr 6, 2017 · 6 comments
Labels
enhancement New feature or request minor

Comments

@EricR86
Copy link
Member

EricR86 commented Apr 6, 2017

Original report (archived issue) by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


load_genomedata does not fail-fast nor return a clear error message when a track path is directly provided (tracks=['./5xC-sorted.bedGraph.gz']), as opposed to correctly providing a track name and (file or directory) path as a tuple (tracks=('5xC', './5xC-sorted.bedGraph.gz')).

An invocation of load_genomedata that results in this issue is provided below.

#!python

load_genomedata.load_genomedata('./testArchive', tracks=['./5xC-sorted.bedGraph.gz'],
seqfilenames=['/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chrY.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr21.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr5.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr3.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr2.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr6.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr16.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr20.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr15.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr12.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chrM.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr1.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr4.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr9.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr18.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr10.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr22.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr14.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chrX.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr11.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr13.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr19.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr8.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr17.fa',
 '/mnt/work1/data/genomes/human/hg19/iGenomes/Sequence/Chromosomes/chr7.fa'])
@EricR86
Copy link
Member Author

EricR86 commented Apr 6, 2017

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Notably this example manages to hang indefinitely

@EricR86
Copy link
Member Author

EricR86 commented Apr 6, 2017

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Upon some investigation there could be a hidden underyling problem. The fact that tracks option is not a tuple does not clearly explain the entire problem.
Notably there is this from the load_seq code which happens before the tracks are parsed. From genomedata/_load_seq.py:243:

    warnings.simplefilter("ignore")
    with Genome(gdpath, mode="w", filters=FILTERS_GZIP) as genome:
        if seqfile_type == "sizes":
            for name, size in sizes.items():
                chromosome = create_chromosome(genome, name, mode)
                size_chromosome(chromosome, size)
        else:
            assert seqfile_type in frozenset(["agp", "fasta"])
            for filename in filenames:
                if verbose:
                    print(filename, file=sys.stderr)

                with maybe_gzip_open(filename) as infile:
                    if seqfile_type == "agp":
                        name = path(filename).name.rpartition(".agp")[0]
                        chromosome = create_chromosome(genome, name, mode)
                        read_assembly(chromosome, infile)
                    else:
                        for defline, seq in LightIterator(infile):
                            chromosome = create_chromosome(genome, defline, mode)
                            read_seq(chromosome, seq)
    # XXX: this should be enforced even when there is an exception
    # is there a context manager available?
    warnings.resetwarnings()

@EricR86
Copy link
Member Author

EricR86 commented Apr 6, 2017

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


@cviner is the 5xC-sorted.bedgraph.gz available or is there a smaller bedGraph that produces similar results?

@EricR86
Copy link
Member Author

EricR86 commented Apr 6, 2017

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


It is not public data and is ~ 4 MiB. It is difficult for me to see how it could depend on that particular bedGraph (does any simple bedGraph not reproduce this?). I can give you a copy of it for local testing, if necessary though.

I don't know if this occurs for others, as I only made this mistake the one time, in an interactive session.

@EricR86
Copy link
Member Author

EricR86 commented Apr 7, 2017

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


  • Edited issue description
    Reformatted a portion of the description (as inline code).

@EricR86
Copy link
Member Author

EricR86 commented Jul 6, 2019

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


  • changed state from "new" to "open"

@EricR86 EricR86 added minor enhancement New feature or request labels Apr 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor
Projects
None yet
Development

No branches or pull requests

1 participant