-
Notifications
You must be signed in to change notification settings - Fork 110
Define GET /reference/{id}/bases in favor of GAReference.sequence. #120
Conversation
+1 |
/** | ||
The start position (0-based) of this query. Defaults to 0. | ||
*/ | ||
union { null, long } start = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be { long , null }
else it will throw a runtime exception IIRC (default value doesn't match default type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's an unfortunate quirk. On further thought, it doesn't make sense to allow this field to be null so I've removed the union type. I've the 'end' coordinate nullable, let me know if you feel this is too asymmetric.
Note: SearchReads may suffer from the original issue you described.
Hi Cassie, That's an interesting approach, but I think we want to think much bigger :) We're trying at a minimum to work on the petabyte-scale with sub-second terabyte access time. These sequences and reads will be many and thus storage and I/O will be huge - not to mention all the online processing such as QC, statistical analysis, pathway analysis, disease categorization, and many, many more. Think of it as Google Earth for sequence analysis :) For that a different set of parallel data-structures and algorithms will be required. Paul |
+1, after nit about default value is changed. We take a similar fragmentation strategy in the ADAM reference implementation. |
@pgrosu I think we need to solve the small case first, but I do believe that our "scalable" solution looks like this as well, at least for a linear reference. At the least, this is the naïve data parallel data structure. |
@fnothaft, I understand - no worries, I'm a team player :) |
/** | ||
The start position (0-based) of this query. Defaults to 0. | ||
*/ | ||
long start = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, both that and keeping the end nullable seem reasonable to me.
This is now mergeable. |
+1 looks good to me. If there are no further comments, propose to merge later today. |
Define GET /reference/{id}/bases in favor of GAReference.sequence.
While in that file, I also made the urlRoot instance var private for consistency with the others.
This approach is intended to allow for more efficient access to the bases needed by the client. Returning an upwards of 300MB (uncompressed) string as a single JSON value would likely have performance issues; this is solved here via pagination if the endpoint chooses to implement the method with multiple pages.