Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on large import of a genome w/o any annotation, divide up only chunk … #43

Merged
merged 4 commits into from
Aug 12, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion example/split_gff.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,10 @@
if len(tokens) < 1:
continue
seqid = tokens[0][1:]
assert seqid in gffs
seqid = seqid.strip()
if seqid not in gffs:
print("%s not in GFF file" % seqid)
continue

if cur_seqid is None or seqid != cur_seqid:
if cur_seqid:
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 2.6.0
current_version = 2.8.0
commit = True
tag = True

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def run(self):

setup(
name='edge-genome',
version='2.6.0',
version='2.8.0',

author='Ginkgo Bioworks',
author_email='devs@ginkgobioworks.com',
Expand Down
2 changes: 1 addition & 1 deletion src/edge/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from django.db.backends.signals import connection_created

__version__ = '2.6.0'
__version__ = '2.8.0'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for bumping the version here in the PR? I think the flow is to run https://github.com/ginkgobioworks/edge#versioning on the master branch which will automatically bump the version and push a tag. Were you trying to release this branch on pypi?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just didn't know what I was doing.



def import_gff(name, fn):
Expand Down
20 changes: 11 additions & 9 deletions src/edge/importer.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,23 +94,26 @@ def build_fragment(self):
[f[0] for f in self.__features]
+ [f[1] + 1 for f in self.__features]))
break_points = sorted(break_points)

cur_len = 0
chunk_sizes = []
seqlen = len(self.__sequence)
benjiec marked this conversation as resolved.
Show resolved Hide resolved
for i, bp in enumerate(break_points):
if i == 0:
if bp > 1:
chunk_sizes.append(break_points[i] - 1)
cur_len += chunk_sizes[-1]
else:
chunk_sizes.append(break_points[i] - break_points[i - 1])
print('%d chunks' % (len(chunk_sizes),))
cur_len += chunk_sizes[-1]

if cur_len < seqlen:
chunk_sizes.append(seqlen-cur_len)

new_fragment = Fragment(name=self.__rec.id, circular=False, parent=None, start_chunk=None)
new_fragment.save()
new_fragment = new_fragment.indexed_fragment()

prev = None
flen = 0
seqlen = len(self.__sequence)

# divide chunks bigger than a certain threshold to smaller chunks, to
# allow insertion of sequence into database. e.g. MySQL has a packet
# size that prevents chunks that are too large from being inserted.
Expand All @@ -126,17 +129,16 @@ def build_fragment(self):
original_chunk_size -= chunk_size_limit
new_chunk_sizes.extend(divided_chunks)
chunk_sizes = new_chunk_sizes
print('%d chunks' % (len(chunk_sizes),))

prev = None
flen = 0
benjiec marked this conversation as resolved.
Show resolved Hide resolved
for chunk_size in chunk_sizes:
t0 = time.time()
prev = new_fragment._append_to_fragment(prev, flen, self.__sequence[flen:flen + chunk_size])
flen += chunk_size
print('add chunk to fragment: %.4f\r' % (time.time() - t0,), end="")

print("\nfinished adding chunks")
if flen < seqlen:
new_fragment._append_to_fragment(prev, flen, self.__sequence[flen:seqlen])

return new_fragment

def annotate(self, fragment):
Expand Down