Skip to content
This repository has been archived by the owner on Oct 28, 2022. It is now read-only.

Define GET /reference/{id}/bases in favor of GAReference.sequence. #120

Merged
merged 1 commit into from
Aug 27, 2014

Conversation

calbach
Copy link
Contributor

@calbach calbach commented Aug 15, 2014

This approach is intended to allow for more efficient access to the bases needed by the client. Returning an upwards of 300MB (uncompressed) string as a single JSON value would likely have performance issues; this is solved here via pagination if the endpoint chooses to implement the method with multiple pages.

@cassiedoll
Copy link
Member

+1

@cassiedoll
Copy link
Member

This issue needs a little love - @fnothaft @delagoya @lh3 @AnyBody - any thoughts? :)

/**
The start position (0-based) of this query. Defaults to 0.
*/
union { null, long } start = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be { long , null } else it will throw a runtime exception IIRC (default value doesn't match default type).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's an unfortunate quirk. On further thought, it doesn't make sense to allow this field to be null so I've removed the union type. I've the 'end' coordinate nullable, let me know if you feel this is too asymmetric.

Note: SearchReads may suffer from the original issue you described.

@pgrosu
Copy link
Contributor

pgrosu commented Aug 21, 2014

Hi Cassie,

That's an interesting approach, but I think we want to think much bigger :) We're trying at a minimum to work on the petabyte-scale with sub-second terabyte access time. These sequences and reads will be many and thus storage and I/O will be huge - not to mention all the online processing such as QC, statistical analysis, pathway analysis, disease categorization, and many, many more. Think of it as Google Earth for sequence analysis :) For that a different set of parallel data-structures and algorithms will be required.

Paul

@fnothaft
Copy link
Contributor

+1, after nit about default value is changed. We take a similar fragmentation strategy in the ADAM reference implementation.

@fnothaft
Copy link
Contributor

@pgrosu I think we need to solve the small case first, but I do believe that our "scalable" solution looks like this as well, at least for a linear reference. At the least, this is the naïve data parallel data structure.

@pgrosu
Copy link
Contributor

pgrosu commented Aug 21, 2014

@fnothaft, I understand - no worries, I'm a team player :)

/**
The start position (0-based) of this query. Defaults to 0.
*/
long start = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, both that and keeping the end nullable seem reasonable to me.

@cassiedoll
Copy link
Member

This is now mergeable.

@delagoya
Copy link
Contributor

+1 looks good to me. If there are no further comments, propose to merge later today.

skeenan added a commit that referenced this pull request Aug 27, 2014
Define GET /reference/{id}/bases in favor of GAReference.sequence.
@skeenan skeenan merged commit 08e4a0d into ga4gh:master Aug 27, 2014
dcolligan pushed a commit to dcolligan/ga4gh-schemas that referenced this pull request Jul 20, 2016
While in that file, I also made the urlRoot instance var private for consistency with the others.
dcolligan pushed a commit to dcolligan/ga4gh-schemas that referenced this pull request Jul 20, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants