From 5c256691d1a0178ad29449ca99c19282a1ef22b3 Mon Sep 17 00:00:00 2001 From: John Marshall Date: Fri, 4 Nov 2016 11:47:48 +0000 Subject: [PATCH 1/5] Add GA4GH Retrieval API specification Plain text as exported from the previous Google Docs document: https://docs.google.com/document/d/1OSPfxdJ3uPoCfUVzMaekCOPF5sNEwqkJEUj-SjlECy0 as of 8 December 2016 ("Last edit was made on 8 November"), but sans BOM and with native line terminators. --- ga4gh-retrieval.md | 264 +++++++++++++++++++++++++++++++++++++++++++ pub/ga4gh-ticket.png | Bin 0 -> 13695 bytes 2 files changed, 264 insertions(+) create mode 100644 ga4gh-retrieval.md create mode 100644 pub/ga4gh-ticket.png diff --git a/ga4gh-retrieval.md b/ga4gh-retrieval.md new file mode 100644 index 000000000..10a0467ea --- /dev/null +++ b/ga4gh-retrieval.md @@ -0,0 +1,264 @@ +Retrieval API spec v0.1 + + +Design principles +Protocol essentials +Authentication +Errors +CORS +Method: get reads by ID +URL parameters +Query parameters +Field filtering +Response JSON fields +Response data blocks +Diagram of core mechanic +HTTPS data block URLs +Inline data block URIs +Reliability & performance considerations +Security considerations +Method-specific error interpretations +Possible future enhancements + + +Document history: +18-mar-2016: copied from https://github.com/dnanexus-rnd/htsnexus/wiki +15-apr-2016: copied from working doc +15-aug-2016: final version for interop testing + + +Design principles +This data retrieval API bridges from existing genomics bulk data transfers to a client/server model with the following features: +* Incumbent data formats (BAM, CRAM) are preferred initially, with a future path to others. +* Multiple server implementations are supported, including those that do format transcoding on the fly, and those that return essentially unaltered filesystem data. +* Multiple use cases are supported, including access to small subsets of genomic data (e.g. for browsing a given region) and to full genomes (e.g. for calling variants). +* Clients can provide hints of the information to be retrieved; servers can respond with more information than requested but not less. +* We use the following pan-GA4GH standards: + * 0 start, half open coordinates + * The structuring of POST inputs, redirects and other non-reads data will be protobuf3 compatible JSON + + +Explicitly this API does NOT: +* Provide a way to discover the identifiers for valid ReadGroupSets -- clients obtain these via some out of band mechanism +Protocol essentials +All API invocations are made to a configurable HTTP(S) endpoint, receive URL-encoded query string parameters, and return JSON output. Successful requests result with HTTP status code 200 and have UTF8-encoded JSON in the response body, with the content-type application/json. The server may provide responses with chunked transfer encoding. The client and server may mutually negotiate HTTP/2 upgrade using the standard mechanism. + + +Any timestamps that appear in the response from an API method are given as ISO 8601 date/time format. + + +HTTP responses may be compressed using RFC2616 transfer-coding, not content-coding. +Authentication +Requests to the retrieval API endpoint may be authenticated by means of an OAuth2 bearer token included in the request headers, as detailed in RFC 6750. Briefly, the client supplies the header Authorization: Bearer xxxx with each HTTPS request, where xxxx is a private token. The mechanisms by which clients originally obtain their authentication tokens, and by which servers verify them, are currently beyond the scope of this specification. Servers may honor non-authenticated requests at their discretion. +Errors +Non-successful invocations of the API return an HTTP error code, and the response body contains a JSON object (content-type application/json) with the following structure: +{ + "error": { + "type": "NotFound", + "message": "No such accession" + } +} +The following error types are defined: +type + HTTP status code + Description + InvalidAuthentication + 401 + Authorization provided is invalid + PermissionDenied + 403 + Authorization is required to access the resource + NotFound + 404 + The resource requested was not found + Unable + 406 + The server is unable to fulfill the request + UnsupportedFormat + 409 + The requested file format is not supported by the server + InvalidInput + 422 + The request parameters do not adhere to the specification + InternalError + 500 + Server error, clients should try later + ServiceUnavailable + 503 + Service is temporarily unavailable + + +CORS +All API resources should have the following support for cross-origin resource sharing (CORS) to support browser-based clients: + + +If a request to the URL of an API method includes the Origin header, its contents will be propagated into the Access-Control-Allow-Origin header of the response. Preflight requests (OPTIONS requests to the URL of an API method, with appropriate extra headers as defined in the CORS specification) will be accepted if the value of the Access-Control-Request-Method header is GET. The values of Origin and Access-Control-Request-Headers (if any) of the request will be propagated to Access-Control-Allow-Origin and Access-Control-Allow-Headers respectively in the preflight response. The Access-Control-Max-Age of the preflight response is set to the equivalent of 30 days. +Method: get reads by ID +GET /reads/ + + +The core mechanic for accessing specified reads data. The JSON response is a "ticket" allowing the caller to obtain the desired data in the specified format, which may involve follow-on requests to other endpoints, as detailed below. + + +The client can request only reads overlapping a given genomic range. The response may however contain a superset of the desired results, including all records overlapping the range, and potentially other records not overlapping the range; the client should filter out such extraneous records if necessary. Successful requests with empty result sets still produce a valid response in the requested format (e.g. including header and EOF marker). +URL parameters +field + description + id +required + A string specifying which reads to return. + + +The format of the string is left to the discretion of the API provider, including allowing embedded “/” characters. Strings could be ReadGroupSetIds as defined by the GA4GH API, or any other format the API provider chooses (e.g. “/data/platinum/NA12878”, “/byRun/ERR148333”). + Query parameters +field + description + format +optional string + Request read data in this format. Default: BAM. Allowed values: BAM,CRAM. +Server replies with HTTP status 409 if the requested format is not supported. +[a] +referenceName +optional + The reference sequence name, for example “chr1”, “1”, or “chrX”. If unspecified, all reads (mapped and unmapped) are returned.[b] + referenceMD5 +optional + The MD5 checksum uniquely representing the reference sequence as a lower-case hexadecimal string, calculated as the MD5 of the upper-case sequence excluding all whitespace characters (this is equivalent to SQ:M5 in SAM). + + +Server replies with HTTP status 422 if referenceName and referenceMD5 are both specified and are incompatible. + start +optional 32-bit unsigned integer + The start position of the range on the reference, 0-based, inclusive. If specified, referenceName or referenceMD5 must also be specified.[c] + end +optional 32-bit unsigned integer + The end position of the range on the reference, 0-based exclusive. If specified, referenceName or referenceMD5 must also be specified. + fields +optional + A list of fields to include, see below +Default: all + tags +optional + A comma separated list of tags to include, default: all. If the empty string is specified (tags=) no tags are included. It is illegal for the values of tags and notags to intersect; the server may return HTTP status 400 in this case. + notags +optional + A comma separated list of tags to exclude, default: none. It is illegal for the values of tags and notags to intersect; the server may return HTTP status 400 in this case. + Field filtering +The list of fields is based on BAM fields: + + +Field + Description + QNAME + Read names + FLAG + Read bit flags + RNAME + Reference sequence name + POS + Alignment position + MAPQ + Mapping quality score + CIGAR + CIGAR string + RNEXT + Reference sequence name of the next fragment template + PNEXT + Alignment position of the next fragment in the template + TLEN + Inferred template size + SEQ + Read bases + QUAL + Base quality scores + + +Example: fields=QNAME,FLAG,POS +Response JSON fields +field + description + format +string + Read data format. Default: BAM. Allowed values: BAM,CRAM. + urls +array of objects + an array providing URLs from which raw data can be retrieved. The client must retrieve binary data blocks from each of these URLs and concatenate them to obtain the complete response in the requested format. + + +Each element of the array is a JSON object with the following fields: + + +url +string + one URL. + + +May be either a https: URL or an inline data: URI. HTTPS URLs require the client to make a follow-up request (possibly to a different endpoint) to retrieve a data block. Data URIs provide a data block inline, without necessitating a separate request. + + +Further details below. + headers +optional object + for HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply as headers with any request to the URL. For example, if headers is {"Range": "bytes=0-1023", "Authorization": "Bearer xxxx"}, then the client must supply the headers Range: bytes=0-1023 and Authorization: Bearer xxxx with the HTTPS request to the URL. + + + md5 +optional hex string + MD5 digest of the blob resulting from concatenating all of the ’payload’ data’ -- the url data blocks. + Response data blocks +Diagram of core mechanic + + +1. Client sends a request with id, genomic range, and filter. +2. Server replies with a ticket describing data block locations (URLs and headers). +3. Client fetches the data blocks using the URLs and headers. +4. Client concatenates data blocks to produce local blob. + + +While the blocks must be finally concatenated in the given order, the client may fetch them in parallel. +HTTPS data block URLs +1. must have percent-encoded path and query (e.g. javascript encodeURIComponent; python urllib.urlencode) +2. must accept GET requests +3. should provide CORS +4. should allow multiple request retries, within reason +5. should use HTTPS rather than plain HTTP except for testing or internal-only purposes (for security + in-flight corruption detection) +6. Server must send the response with either the Content-Length header, or chunked transfer encoding, or both. Clients must detect premature response truncation. +7. Client and URL endpoint may mutually negotiate HTTP/2 upgrade using the standard mechanism. +8. Client must follow 3xx redirects from the URL, subject to typical fail-safe mechanisms (e.g. maximum number of redirects), always supplying the headers, if any. +If a byte range HTTP header accompanies the URL, then the client MAY decompose this byte range into several sub-ranges and open multiple parallel, retryable requests to fetch them. (The URL and headers must be sufficient to authorize such behavior by the client, within reason.) +Inline data block URIs +e.g. data:application/vnd.ga4gh.bam;base64,SGVsbG8sIFdvcmxkIQ== [RFC 2397, WP] +The client obtains the data block by decoding the embedded base64 payload. + + +1. must use base64 payload encoding (simplifies client decoding logic) +2. client should ignore the media type (if any), treating the payload as a partial blob. + + +Note: the base64 text should not be additionally percent encoded. +Reliability & performance considerations +To provide robustness to sporadic transfer failures, servers should divide large payloads into multiple data blocks in the urls array. Then if the transfer of any one block fails, the client can retry that block and carry on, instead of starting all over. Clients may also fetch blocks in parallel, which can improve throughput. + + +Initial guidelines, which we expect to revise in light of future experience: +* Data blocks should not exceed ~1GB +* Inline data URIs should not exceed a few megabytes +Security considerations +The URL and headers might contain embedded authentication tokens; therefore, production clients and servers should not unnecessarily print them to console, write them to logs, embed them in error messages, etc. +Method-specific error interpretations +* 406 Unable: may be returned if a genomic range is requested, but the server is unable to provide genomic range slicing for the particular dataset (e.g. if no index is available). +Possible future enhancements +1. add a mechanism to request reads from more than one ID at a time (e.g. for a trio) +2. allow clients to provide a suggested data block size to the server +3. consider adding other data types (e.g. variants) +4. add POST support (if and when request sizes get large) +5. [mlin] add a way to request all unmapped reads (e.g. by passing * for referenceName) +6. [dglazer] add a way to request reads in GA4GH binary format[d] (e.g. fmt=proto) + + + + +[a]This should probably be specified as a (comma separated?) list in preference order. If the client can accept both BAM and CRAM it is useful for it to indicate this and let the server pick whichever format it is most comfortable with. +[b]Define error code (404?, 416?) for queries to a reference that is not present in the header. (Note this is not the same as present but having no data aligned to it - that should just be an empty reply.) +[c]Define error response codes - suggest 416 (range not satisfiable). Perhaps this is appropriate for all chr, start and end failures. +[d]How will compression work in this case - can we benefit from columnar compression as does Parquet? diff --git a/pub/ga4gh-ticket.png b/pub/ga4gh-ticket.png new file mode 100644 index 0000000000000000000000000000000000000000..db8fbe4ed971b22c7f1ec5dfe17228845e459c04 GIT binary patch literal 13695 zcma*Obx<5p_br+P4^9ZK!6A6C!QCA~NMItkhv0*|1b2rJJOmFu5ZnS227(O?E`tp+ zxV_=~-mUlUAGhwWs%xrys?Rxluf5LNXP@ql*3(fXd`k25(W6I%>S{^`j~+b+qtBOd zvC+R7>Me7R9*L-^E6E#q&+q#OYIVHJS{eph0K~ej4F=(JJWlLM+O?zA>k+cHd=-X* z@)-klt0SQ34Qs(~5kNbo65a}n7~a6bPod^jto?k8_2x9{8ugB$>A&{Uxw(hyJ?-lx zJ?8k@%BxmKcFuQD{+)h0OEyGE$v*?QgQ`4WXFBsZuwi6xm&p>GJ{khOt}BI zBg%I$HbZpkeiWEZ)Y>$o>3gBMpwy#zdycAd<=Y#RMN^46U}e z!TB!JoTioHQ!4;8r&jD4aHmpkbFw*=C3JTHc}SEtq(cmMx(SAPjB&XTqe_0L?^)Hw z{(L!Ap78CBGFzTxF=gmcJ|SWymgnPvYPa%=s%uRlhDL3WA9UxU#Xd9jB=E^donGn8 zOt`O7%htr~@q*xj*te07-~zhieS;0Jg2j!FcVt!tCJ{5d-*b_DN9wQEr={JWZca!x zwHWothMwW?m?ji%NtetwU(nLv(&Beo4GC@4A&w`tMqY8G@BMsEo7T z5anRdqC=bAIKo2vsU;#;bRXZl;PJMc%x_65J*175SoO~D>6zr@B;eHO6H9}xK*k-3 zxO*J^1oc!`Z2ELAshZy5-c;t=;2M`uKs?OJp3(@aOE>wCp}$iVt+Eoj+^D+e68}7T?Q!Q#d^ccM9L+NDgk%D zfMc^J)F6Y$N@|%7t_mAs-Er@(c(<74wxkJzsh2!6*FgRp+n4j%Ix^WsNR0>~XCG&v z>RYVaruC~N5~qPda?TPe|Mr!w^r904I+}u)rh7@6ugM9&5U}JbSZxC{57PQw_azPj z$)V3`r91J>B!jl36k7T9*kQOCEzXQq^&4KkVQV|NxINU>OPJgWDi)Em!k==^J z3JD3u5|R2|d@FoK@O@iDlPJP%RsZKl&bO5*KA8S~m|sRYr599y8O-!pKVUHLPWB}8 z^}1lG@sE3u&x;Z$KZ6a|u%%E!27Q1cxTUx`)=eH>k#_4e2&7Z=tdiy_T-@Vaj(|?; z4}Ppa^PH*sHn<~ER-R#TBU^|K zKou*YlEHP40!r%v^utRd!)FF_aX@k>NB89CVk*m;-!LP_#5LOR`h{%wFsL#a(!8!H zGQgzU5lp!iZOG4RzuCZUX3tb>4mD2mO&LmF?k*w+ZQs8ZX@K_&Bn|{sCCr2Sn9(ZzKmS-A2Glx zrLX8&QVk(g?w~eMpW=Cls0%1|JA53`tY+qmhO=Vh@DBci7g--!^9+9ygzB+XRLcfz%Yyj}*O zN?l)4%cZht03q^5q6yOOuN$S`MG%esCYryTN;T}7hM#fe(Z3D@1#h7wPG1{Y&INnI zJyC)YYdJcR0ga@^8HU*Gu_&^Z=5mlsipF2QsaA6%KMCk)MZUcJVnJs=mG#sKekF(K zCY8oH{7bP7l1JRCE@Ddq3}dI+?^M@)2J{MF+o*P3G6e5Ef&chC+K##CMxGd`g122k zX4dS?pY&d&xS3MO%-X1(T)tG$xQXX#dGtHD!^$X*>&=!U*b@jgiRB0%EGRTBl^a{fx?F)O4Pdbz)fa7KFH7%SH2j99!=$n)fsep?lUQqk`$KBjW_@%+YknJAPG zP>ht=Ui43fcN}Mv#orY3piosN(~KW5%%GFK_JUFlEl;7lo>J+>FTjm2oMuMqdBk@W+SJPWygJ?}i6BFSn7NR!eVh7bxPirAd43JCPvv(eOI` z3)rfZ{wd=b6qS)?b}DM`?KUY$vJoLRqd&)C@7#M1z)Zc(!M#7 zzs_UOlc%0xu!;IchaluyX>fb@RRf!0#Z=XP+|tOWg@>rS?GjQB{G^q%z=C_BFl3f} zK(deh2SHfwY3@O;NE1K%WXJSLbfKWF)Qrspqs$PTBOeCMb5{Flr~%=PWY z*ujl>IXWJUdL+SR{5OJ+z1H}LqHRYD!XgV51v!KQcc&ryujWr5fj0INC%+LkJP^wR32^No`p~J59K9s&N_Ksg6ly$AN$K3Y`k=!D=or zIH&j85ujpWv?y%Me$P50O7Yic-pI{T-b%Z0>?J~5{Hb#!m6twJaD!Bbv*)d$(<`eR z7uvk6Fzae1ahM1!{hcRp%(DXqr%^!DT-ovr9iG_Zoy6vpVxmNnf`Vzjz*NZ_t*)}WmFR)BS ziMH3Cvz*%XL-uYj!di*U(>KUiUphUF7G_TsObJ^mLT2CV&irn?QEYfh#256C!5r+^x>);Ht1@>tj)_xEkX2nAW06OTau>ZfsZ8@c-vi{$& z%J%uVD0bA-my4yQYc*$2HRgo2uocEfD}NTQv{&KXp%gR5#x!4tAaq5n2|3P?_|OQ? ze$Ve)e<;Vs{$;)ehOBG79-RMIqdwVz?+^lCR9*AWzcE9kHuetz%ZLcL<~USQ0l&g*)F(`r(W}|S5wq}IPIv}rE zViW(}#82(00?%$u6;Tk;k*jD;5!-d$_cC@T>*#yKY?Pd#Z99g$S@A8uBH*(ZvotNS zcmO_ve)SGdc~xIQ9#Kqq=FIX*yQn%r0Qw2MVA!ZRej)i8bHg`QYC1JzjXDyhvB zr*;0&!epSJI?;we^y}e#P`>EmX*;nd2zov1BqAnWl=(``;&{=+xj({m-VjlxQ*6_0 zRUhMF7t$$P+;yJG~`_!d#_>YQObo zaZ7^*hkh{R;j(~JkJc{sb80tcCLU|Huz&L?V zjXLV;BWs}-?3q0D-H%oOg^z>IhHclXf=q$nVM{7RNu2C?mdF_WQ8Bfe%iuz6S3Be1 zh_qz*w^oV0mQOcW`qjB?v9f(8@u*K4HcHd4S!`=6*}xsbtHRGe%`6hF$;=OLI((N# zq)q1>j)(5u=r`axi$zdAEp`8z+Mb+mG$j#%uXNa?1|sZ#3Hm!&+Vs0ep9zdP#EH{N zClIiFC; zdbZb@Wlu_9DFO9x?0;Y6P{7m<0dYCdG2o&E@v8H-aoZwo;g)AN^c&Y=`0vFXjnP2^ zYGS{qP9_3kULLq94QzC}SVQ5wgG}oj3Gw<&{4Y>*Z~MKk&P!S~=?%DR~kl<8A#2y<|c$wxzgP{@Y(4(;U{XD@ajy z9^RGW@$KiRwKjckQbF$*YXb>YBw{6`czw1B%w67ct!(!?>vmnF^*-^`_w9M&<7KYn zgthZOM)ZfT#qbe!Q)_FeDm=JVcQQQj2!r010I=^k(wOGN&%^OdN5YYoh*ox0>4^^^ z&3(=!JZKj;#P-wBV1y49*xc=$Sa{Hw7fi`yL#0{@w&BYczP4qDEli4Y!Od7AxH0j}x`SM8tik4^cysd+;H8E)=U_YE3`c zb34A+E(yCAuWCf3fA$+=Hn&S4Y_$1val}mgdB6OK<+*8*NH=kc;#Jp0SYR{vQEs{4&%BQjcyD1M$xxC8}bQ4}inXjqO)8jwo zHqX97*4IY+H82-9uTQm~w#Cyy4hXt9wC!%>XiEZn>?VDTn+$DOKM)J0yRHupc*b(o zBr!#?tAagEKU=(*rviBO60M3MJvnQAD>!zt{xvgv^Wi zD;v0w^C(XvV9O|nB6R@)YW|@XI?s)#JD$ejD-|bXmc2bbOopk?-1c$GA#d^>hH1&i z@)&^^o;X&LD2XxxO)3OoZ>C3&+E312Q(nYe)SByhStN6>nr;L;&Xh32q5^0G-bH*l zcU~Y?(hL6eiD96H!40I(27U#Q7M=aMxH9kAw5PXG%i>jLdm_HGkgmA?k-GjL-X$Gw zB8Mcg1Nh#}&CNEKDN0m1@`z=DpA)I}ffAF>B6EoB2d%!pVqOD2dAvM3{SJ+90fTU!Fkvj%EG%D}LbvD%i{FMa8=l@$@fJM7K9YR8VQNz{)5$E%0b*G?3g&Q-Bye z8UnBA9f`aRR2%*qzti$YVop_(ztYzc%=oRLgk{jBT$GwPsf--=c&Ig=33s)wPL1cu zUmoQA#;qjm;ubH2Y5u8a%Akm0TZK*g2NiIaxq>bUrhKD-r8uWWk%ZsiImiE|IWI8l zHK2UDknH6sS{krbHQrlx{!7@Q!D-Ms+etiG>4?f&Eu?;lmVLGgQ(o#7{B#wkQxdh= zZ2mgsZkiVE4&0X}PJELp7AGV7#lcgg~_mY*v}=AIqB zJJy}4=%_#3e6%rL@NyRU&k}f$xuZhTr?^(&zn9YgUAC=?`xYu0xB>eIKhZ-VZ*LK@ zcmSD`?r=XNm=dl+=NL@zjt}z+H55rU9b7BE7sHUjdMzUW#?Rr#&U z#Q!ecUCzhprP_IR5jQf6D{f*obS+P(HHDLdvKEa7^s6``dLrHSIhmd1T6~1>Bls+y zloWlq#gjqGWm8_vL^yf1X`e_oYCRN5a9k%ylEU3ZcjwF#@+;}8`ACvd56;5+4xXzB z^7O)@9kDk~yYIVDY&a&0^uFh+2^;>$u_;TU-bIUPYTicOSXFfpB7z?HJ4q z<~g*{K_v}24iRGYzHA!P&>J$-Vlh5^K0@iB_eMaoTutltWF;3VDNX+;CyfyASyGzpw z6O$*wRIeEqcT;mIi!YwBxRE|~+=;NUu0i&R(45_?8`3DvB?HOUxwiI*Hk3E!2!Hre z?8t#Ub*0XwKe0-OFsXsYltCB+OKbao7ZyCrD4@@VZf44l7sj;_hoy2jAq@JMO->q= z-^k^PJ-41nE{Oj;$O_>||LyivBr$(c&nVzMO(65Qz)ozj>VqZfCN9%7bf!c0`bmkm zybKeM)mHm1a>C-pXXsXMu&WbtawKEj0FEltf%qT3wZc{(_z6&B|M8#lyE379SfS@h$PfoqR_+fJ`7lx$#l0p zGdgFgLLH1}7l{@UrqPo`4fNpNfBt>A|9@~bf66RZ++KX6tqZKrIQ)t8!_knag@ru2rtEeSJ4>^J_r_8*#{$7qO2n@pWauu z!aXZ-Lwx63RfNTbUPB-*3cnShI#K4&q+l=0b-T%7(l`LZAsN~IaCq?f#M+fF+F8z zAQ*KG1Tg=1KqDqMU*A`1ym52BOj3kCJ zfJEus$4we?E(yFsiVs{`7L}+YH3Pl%Z}u{}bf8~3*W1C01lpuORlSKX%}f1YVlLHN z3HKOxci1ByO2;fT!3DRpviG}?n_sAS9ofmOyZr*h&F=l2w6o!HO=%~YXLT~`W!Gw& zE~9(L4S(RiRUlWuUEed^n|>Zch+k*AloRMGYaAkdw4NgCmz(c>JTiJcczizmX79MB zV>Ru4gZyMarRA5lfIsJyi^)8QU3bVoFtU|Yz5@>P$L zqc9%c5{br^);vxtB&?4y?R=M3>)+rJ5TQj%Zk*ie#-~(8db=I%RmYwEc0vr?XjMe* zGZ-LL7b7&Hw}mry+95QYP;SF<7db#&$MN`|8zJ3TX1}l_0qlM%NM|kZ6k4!G{azrp z%Gl(4s!h14$cj_8Ek8vP@aX9flCk^x88inSd5^xWWmkV2Zu4U$n9r6 z*AW%EW^X4}NVmhbN4kQ9BfHLaYpk&7EMt*9j0qADaB!%4AA)7!Vy?L`OU@pCp@LbW2`^59Bc9OOp3G>0Y_l zt`tYG7+Skl6;sO8QNkXWpu>417#mH0m&y!j=009i;CVp+W(Jx|4=E10n+{uo0T~Jd zhPA&UYnvD9cOOrLs~p%ED>a$Go`#hPV)nA(o8Pgk&>f5CV;v1A8bplga>jNm(p0;3 zdM3rt`dN zPe8t#19U!y z8L-OkJf06|7;oaTaON|pz_R-n4I3!On!sG%Y?zR1)5g`cpu@I!ENzk1AThWkK3u~) zLCo)18t(&dQY(2&5WKF#+)l^q(DM<~S?2mMA5vil=~u>c?|3Sp#^vqTB=d>2tEn3k zAbWb=ynbt;!OY@rG+Q zNK{o-BN!$BC-0b+Vuu!?)QdG8M2k9XSo#7 z_0@p=YkU0V=T{?xZSmn-Aup+(cp0pe9LO{yi=L2*T{(l$j9-v`lr!CK?|o$DGx_7M ze0@BU*rtnVAMDMGUwnB{U^`5*nVifPf`)+bg%r1`Q>Gp^gwJm%WL-3l5gfWs$xHBU z(f;zK&BB6Hk_HaoP@zTjkl?cQF)6OI%sa-|q{E#UR)VGIBg!Xp=JbxlGT-8L72?$c zh(qaycB(xJ1n<1E|E*78)EHQPS`?TT&=2q4+-&d~e6^E6A9=+R+P0`C9mp&xi6d@t z^Rcnzs2T9$a*?>tdZ>)OeY@P~tMeKTdyRWiOyfBxu0fRNSv-V&o^HXfNwjCpzIH&~ z@(YorXo;l}fI8<8`_wBj`(}EU+KvjOcaFd5{8bA@VU)JOpU2Q%?*&VB{;NUaCG-;A z81N_g@Zh6R3BOs8U`bwD+%fIgHhX?<)TYKpq`g=AG0w_*TKDVUOLxSg?(h=+Qv#VILXnkx^Zky?D$Z*e0;cDUtt2?Xa}wY}x2+2psY1XPqycU6ds$q^(h!GHpTD9t z9#g-(;}%9!Y@ar+Z8oGIrqAAJ6H=$c!e^kg%VSkNzjZO)9p{s7O7$4C-c|E7tFgt$ zMkPw2*LbIj%%tLvn$+HZ_I+ zf5bD%qCVL)LSTbd(z!&r5$mnf$((Zsu|<3nW1ricT%--t`prD=@wZ_Jp+CjH_+9p) zm(PocK}i}t7qpEGJv^3lT1ESRxpN?IduTtVG>JsSc3F7+g*$?oq7|QY)=wlCY%B8> ztyr|4w(|>>V-G5+>ZJuEa9b%x-{ie zwAW|uqRM|SAl7Bv_|+(2QnTQm2;HBRWtC4h5t4nY{dQD6)*oSNg(p1 zHHCb!6dDRq)KO}>H~)S4eo&%y^DFS`SE64)ujc!H57icAr=?E*>7d$GD^&kBKBE)g zpb6PJw|uOn?-DT!366s!~zdQARRE=&WwB9LHE;{Vhe%kQvY>%i%2P z%4OpD5onxNEx>|lrs~DQ{dNg@#RvhTYR2|y+YE4UlYv-3BO|ebUtu$2p5GZ3NLP8s zg0!r)E~L&lrK$)G13Vfu1C9V``ke)%iqp@S+8KQjFd(SFXEt19>T%HqCPz#TBwjh_ zu>A0+VL{&EMf|9@yQ^~I*JhG9Hrb>JJLQ0XJBi+IWJFq<8v)&B5YsNp>PYLrtfsGp z^9~^(Zp=J_t=-(S;s$eMJ{ijxk)2mBhrCFmiK{wPCsSy z|87XBC#)aC<@Q}8O&hb}T`8G$o(jd5@2JCic5|DfX}e0FZzN~=WgiJbJVg{NDWoIO zi?XcUv`nEeE*_DkKcXz_qEz4yq)VM zFOvP<9?m^74H(yaA8B$fvgVRWYmq~0bQUR$NnCfiM@FC*gAP`UjTTklb>0_z!#$}6 zcEg!t&6u*0-sQtj~UE|E;!DcOlr>vuJ$jDt#3-ZBmk6LuJh{ITi zwAMS#`U;Yzck=zu=I%+q_A`Lbe$|Z0o-eqm%5$Y@_@8X>2VR~Rc&|oh+us6xjSJ z+vXh$x763(12S%h!v#$)mRef{+0vf zl|$Qwu8b1^a(bT`)-W~!wg*;^>O4Zp2j`qI)f0+U_2zQu$?_4Z0|}ggr5)0m-Ykw{ z|Eadi_kM;H2+zE$xH0na1@o%>|2ooOdJIC>?Pb8LQZB}C`THslRV;dFwnKHp080BN z+J_pN)W|brf8dJsq2TA1?Dn+qZ{kG!tg0fHy&Tr1U~3zVltF!4ZHJ#$(>GDeTh++c zZ-TtGt+hFVC| z!D7*~BoY;-7jv--0l0O3X(Hun91l+7K-eWz$p1_WvP6X0y@x{uP zabS8A@arTRD{UnM3lxif+8x!wBv4IBR(rX()E-iiHn^m~6`8y% zS9AX3Ov`qCVuq&>+4~zx*^?~D&Gya?cj94FQxgybgTc@K4G#~$x6C0`I!Vm`GqYd$ zrLC9)o|hLTD|EN;j;)v@KM#0oM_G>wB@sa<2kZ?~*U1fm`B{Mtz3WalQeHGlHi zr&Q)~?Lmk3IFw3afwyz7_q2G_3DtSoiZ;b(zqh2wq7yANMT0)%kdo)d2s&tBNZ`-8 zSDs*uU$U*?_%e8QG*tnbx4kVLcA8&C%suXorw;TsdhN{@x|}W_;A&=rH+8q9jfRi} zoh?EXYZu=XV4|bEOyVdu&CxB&!YTg1XZ6}K+^0kjvxe;9wq8$``BoW9#dz5Bu??n` zQ|t$ROM`|K_aISFku+eP{|& ztBk%zI|WEUui9F=#4L@_ieGUFd7$`rb&%o!F*SwZ;06qE*`yoc*`}c8@<_etN38fc zQH8_$9oKrsCv?PKJjKgs#xPBj!%rbXLtNf|I4b z8R>YuLD7E@@dQ$&>RSnFOGkyR$IE}RGX%U!6$~g&6QK%HU+3^)n?t8}+|{0ol_L<- zk8D(5ucY3LR|w}?;z=%``5SK1gLy1+Aa%~FgSe;BgsV5X0GUd)sz(XDH62YbHC@_J z;{^zMWurvs{BpiX5hO8GgGY7wCiK8o`rXF|bG8$K0}Ht=%_&Dk>xw(x8Ew-C$)786 zPeh@#Pmf+b49_W#qoP}ASg!6u!;l*u8)H^&yK-Jrr@r3YwPdKK1;5eXq*ewM-Oyi1$6nVgPk4k% zi6DmE03R{O|MX&|mG(5qjLJtjV|@YJ-&}&(ls|Zq_W~^2k9F--BK%~w2J9~6XR z*0~B2IRGR?zW9+#$H^s3!Zequsxos3{bI^N z>8o7dgmt!p-Pzy!XABqynk)B$Yk^BPYYXuzzhU2rNXY5XyzV>YQet6D8dcr$+zQpn zO&Q`oHb;cC63I4I4IPpXdVO9LKW+;$N$-7seY;cGWKqqj2rL1{0R#!W$#_+nN_;^lO8EV}m()5zn97d4^Q= zNv13lnV!-ycg{V6P72z1tt?VM%ai7=z`(?cMCV*re}*0wnBoYntQ1i2H%lXC@p*!o zVeuBWEkCEwRIy0#wHb62YhCjpSwBo>Q%c-!1XU*eK!F%O3%G;J{AX*08uHKZ5|RGM zdwSj1uN|w=!2cI9IEQ7FE2SleVj3*QyGPCCahRo;VW&VzZ_f9T97#RQrV!s`rtuYS zb$*K*EP<)VAtO2u5@M{VrnGsellVHdtSpk-D!}=@Pv{SE{`{gpkT^~#>`}!&O<+I` z#>}v=CF!V^T<2sdDhI5*&XYNK0-F0@x$=8Z;~G6W)c?^NHP8e1|I_(cd}j2(r)N#n zq!P2ypG!A2gvrfm^w7a4!R#J??^|pd!;3!spB+lO3QdC4zg?ndJH1Cw!?{l~IFG1b z3!@v8Vr@&ewUIxL`1VHe4%4+D^QT-s)}ryIBx$lsbE~gX^5b<)5+#fz(V1(1^^xpd za7(>7N4IO)9}ipRxgR{Fa-bC8j)@O%HZ+dc<4zt9XRUsJW{_a<$a1OPc(7#%_{l=N ztib8>efbqQwxePE_PaQ|chkjZFAHWNA1}{P+5J|nlUnM8kE%;i0ym_b3~z1l&#ouA z?wxZGE*ti0wIT?;ZT{*`2U5l~EaL`lN9Cpj0_29hlDckW6oH|eAY2dJu9KJxKJ>nF zHHKuungqozQxD&|2)4n3SV_Ak@_TNc<6XK^&GF$Y#PC233zY{|aMj!x@H{QHgjrSC zs-sWDOgOkJngqJ5U!ScvxVG&#Min?|^3zvv$c1}(?$*0YkOM-#4uZD2U_vq#a0K?< zK{#o=nR0`*qv)qRgDK?vMb8T(8Go)u3wnojy~a}|NAL#Mei9eOdZNLj48R|HuXrHQ z_sg}U{2h~4&J5kqj7+;Bb-WwrC!7F|SjOU6IRC(kS`wTjr^8CH9PplYn>9{RCxN;7 z8B?qA&?v<^3w|jmZ!dGP7xuBK!eCV|Gq{TTE0{gHes)WDHCw*KSJ0h3<4L|` zY`xBh#x*@sFmFQfGi7H=AV4l(LTcs{7BZ{Pmp^DhXNogs@z%fBNR~j0L|YML_pI|n z2`-zsvGiE30a4ya5|AJ)J<|2kn*505xeF(jPQ&m!3f}?7 zRZ%9uu9ddLdJE<9M?&$uqJEM&53gslPe|>cqdfd0aABNPz~_-xkui=o2B!jZ;5z2di{SPX~LXu?UnWIZ)7Eo7v~)q2jd`+Bu1KJcj;EFL)^V5I^NM+qn-> z_2W^!v5FwdQg`Q==7ambxDo*z*4APKgd(uKs(cKoTkLymj<62IVP7rjU$ZdBUnBfdPE7>$8(sKtWK@0D65xkw3wV zJ1~x+QRX^~(Asnzi)vdM=h}4DCh+-_e#GvS|Dy#tg!ntQJSzx@O#d4q^DWa=N)q;1 wS`I#1nbQ6-#3YIAf2ex@*V~*D?r|dB1vq9DvYpT$Ry|Tz)={cf_z?2{0H?0vL;wH) literal 0 HcmV?d00001 From 86eac644ff1fe27f5a568130ee46016522f85046 Mon Sep 17 00:00:00 2001 From: John Marshall Date: Tue, 8 Nov 2016 13:24:09 +0000 Subject: [PATCH 2/5] Trivial whitespace and heading formatting Tidy up whitepace and add ## markup to headings. Add basic Jekyll front matter; linebreak mode setting for Vim users. --- ga4gh-retrieval.md | 202 +++++++++++++++++++++++++-------------------- 1 file changed, 113 insertions(+), 89 deletions(-) diff --git a/ga4gh-retrieval.md b/ga4gh-retrieval.md index 10a0467ea..f3f6715f4 100644 --- a/ga4gh-retrieval.md +++ b/ga4gh-retrieval.md @@ -1,63 +1,76 @@ -Retrieval API spec v0.1 +--- +layout: default +title: Retrieval API spec v0.1 +--- +# Retrieval API spec v0.1 Design principles Protocol essentials -Authentication -Errors -CORS + Authentication + Errors + CORS Method: get reads by ID -URL parameters -Query parameters -Field filtering -Response JSON fields -Response data blocks -Diagram of core mechanic -HTTPS data block URLs -Inline data block URIs -Reliability & performance considerations -Security considerations -Method-specific error interpretations + URL parameters + Query parameters + Field filtering + Response JSON fields + Response data blocks + Diagram of core mechanic + HTTPS data block URLs + Inline data block URIs + Reliability & performance considerations + Security considerations + Method-specific error interpretations Possible future enhancements -Document history: +## Document history: + 18-mar-2016: copied from https://github.com/dnanexus-rnd/htsnexus/wiki 15-apr-2016: copied from working doc 15-aug-2016: final version for interop testing -Design principles +# Design principles + This data retrieval API bridges from existing genomics bulk data transfers to a client/server model with the following features: + * Incumbent data formats (BAM, CRAM) are preferred initially, with a future path to others. * Multiple server implementations are supported, including those that do format transcoding on the fly, and those that return essentially unaltered filesystem data. * Multiple use cases are supported, including access to small subsets of genomic data (e.g. for browsing a given region) and to full genomes (e.g. for calling variants). -* Clients can provide hints of the information to be retrieved; servers can respond with more information than requested but not less. +* Clients can provide hints of the information to be retrieved; servers can respond with more information than requested but not less. * We use the following pan-GA4GH standards: * 0 start, half open coordinates * The structuring of POST inputs, redirects and other non-reads data will be protobuf3 compatible JSON - Explicitly this API does NOT: * Provide a way to discover the identifiers for valid ReadGroupSets -- clients obtain these via some out of band mechanism -Protocol essentials -All API invocations are made to a configurable HTTP(S) endpoint, receive URL-encoded query string parameters, and return JSON output. Successful requests result with HTTP status code 200 and have UTF8-encoded JSON in the response body, with the content-type application/json. The server may provide responses with chunked transfer encoding. The client and server may mutually negotiate HTTP/2 upgrade using the standard mechanism. -Any timestamps that appear in the response from an API method are given as ISO 8601 date/time format. +# Protocol essentials + +All API invocations are made to a configurable HTTP(S) endpoint, receive URL-encoded query string parameters, and return JSON output. Successful requests result with HTTP status code 200 and have UTF8-encoded JSON in the response body, with the content-type application/json. The server may provide responses with chunked transfer encoding. The client and server may mutually negotiate HTTP/2 upgrade using the standard mechanism. +Any timestamps that appear in the response from an API method are given as ISO 8601 date/time format. HTTP responses may be compressed using RFC2616 transfer-coding, not content-coding. -Authentication + +## Authentication + Requests to the retrieval API endpoint may be authenticated by means of an OAuth2 bearer token included in the request headers, as detailed in RFC 6750. Briefly, the client supplies the header Authorization: Bearer xxxx with each HTTPS request, where xxxx is a private token. The mechanisms by which clients originally obtain their authentication tokens, and by which servers verify them, are currently beyond the scope of this specification. Servers may honor non-authenticated requests at their discretion. -Errors + +## Errors + Non-successful invocations of the API return an HTTP error code, and the response body contains a JSON object (content-type application/json) with the following structure: + { "error": { "type": "NotFound", "message": "No such accession" } } + The following error types are defined: type HTTP status code @@ -86,66 +99,71 @@ type ServiceUnavailable 503 Service is temporarily unavailable - -CORS +## CORS + All API resources should have the following support for cross-origin resource sharing (CORS) to support browser-based clients: +If a request to the URL of an API method includes the Origin header, its contents will be propagated into the Access-Control-Allow-Origin header of the response. Preflight requests (OPTIONS requests to the URL of an API method, with appropriate extra headers as defined in the CORS specification) will be accepted if the value of the Access-Control-Request-Method header is GET. +The values of Origin and Access-Control-Request-Headers (if any) of the request will be propagated to Access-Control-Allow-Origin and Access-Control-Allow-Headers respectively in the preflight response. +The Access-Control-Max-Age of the preflight response is set to the equivalent of 30 days. -If a request to the URL of an API method includes the Origin header, its contents will be propagated into the Access-Control-Allow-Origin header of the response. Preflight requests (OPTIONS requests to the URL of an API method, with appropriate extra headers as defined in the CORS specification) will be accepted if the value of the Access-Control-Request-Method header is GET. The values of Origin and Access-Control-Request-Headers (if any) of the request will be propagated to Access-Control-Allow-Origin and Access-Control-Allow-Headers respectively in the preflight response. The Access-Control-Max-Age of the preflight response is set to the equivalent of 30 days. -Method: get reads by ID -GET /reads/ +# Method: get reads by ID -The core mechanic for accessing specified reads data. The JSON response is a "ticket" allowing the caller to obtain the desired data in the specified format, which may involve follow-on requests to other endpoints, as detailed below. + GET /reads/ +The core mechanic for accessing specified reads data. The JSON response is a "ticket" allowing the caller to obtain the desired data in the specified format, which may involve follow-on requests to other endpoints, as detailed below. The client can request only reads overlapping a given genomic range. The response may however contain a superset of the desired results, including all records overlapping the range, and potentially other records not overlapping the range; the client should filter out such extraneous records if necessary. Successful requests with empty result sets still produce a valid response in the requested format (e.g. including header and EOF marker). -URL parameters + +## URL parameters + field description - id +id required - A string specifying which reads to return. +A string specifying which reads to return. +The format of the string is left to the discretion of the API provider, including allowing embedded “/” characters. Strings could be ReadGroupSetIds as defined by the GA4GH API, or any other format the API provider chooses (e.g. “/data/platinum/NA12878”, “/byRun/ERR148333”). + +## Query parameters -The format of the string is left to the discretion of the API provider, including allowing embedded “/” characters. Strings could be ReadGroupSetIds as defined by the GA4GH API, or any other format the API provider chooses (e.g. “/data/platinum/NA12878”, “/byRun/ERR148333”). - Query parameters field description - format +format optional string - Request read data in this format. Default: BAM. Allowed values: BAM,CRAM. +Request read data in this format. Default: BAM. Allowed values: BAM,CRAM. Server replies with HTTP status 409 if the requested format is not supported. -[a] +[a] referenceName optional - The reference sequence name, for example “chr1”, “1”, or “chrX”. If unspecified, all reads (mapped and unmapped) are returned.[b] - referenceMD5 +The reference sequence name, for example “chr1”, “1”, or “chrX”. If unspecified, all reads (mapped and unmapped) are returned.[b] +referenceMD5 optional - The MD5 checksum uniquely representing the reference sequence as a lower-case hexadecimal string, calculated as the MD5 of the upper-case sequence excluding all whitespace characters (this is equivalent to SQ:M5 in SAM). - +The MD5 checksum uniquely representing the reference sequence as a lower-case hexadecimal string, calculated as the MD5 of the upper-case sequence excluding all whitespace characters (this is equivalent to SQ:M5 in SAM). Server replies with HTTP status 422 if referenceName and referenceMD5 are both specified and are incompatible. - start +start optional 32-bit unsigned integer - The start position of the range on the reference, 0-based, inclusive. If specified, referenceName or referenceMD5 must also be specified.[c] - end +The start position of the range on the reference, 0-based, inclusive. If specified, referenceName or referenceMD5 must also be specified.[c] +end optional 32-bit unsigned integer - The end position of the range on the reference, 0-based exclusive. If specified, referenceName or referenceMD5 must also be specified. - fields +The end position of the range on the reference, 0-based exclusive. If specified, referenceName or referenceMD5 must also be specified. +fields optional - A list of fields to include, see below +A list of fields to include, see below Default: all - tags +tags +optional +A comma separated list of tags to include, default: all. If the empty string is specified (tags=) no tags are included. It is illegal for the values of tags and notags to intersect; the server may return HTTP status 400 in this case. +notags optional - A comma separated list of tags to include, default: all. If the empty string is specified (tags=) no tags are included. It is illegal for the values of tags and notags to intersect; the server may return HTTP status 400 in this case. - notags -optional - A comma separated list of tags to exclude, default: none. It is illegal for the values of tags and notags to intersect; the server may return HTTP status 400 in this case. - Field filtering -The list of fields is based on BAM fields: +A comma separated list of tags to exclude, default: none. It is illegal for the values of tags and notags to intersect; the server may return HTTP status 400 in this case. +### Field filtering + +The list of fields is based on BAM fields: Field Description @@ -171,83 +189,89 @@ Field Read bases QUAL Base quality scores - Example: fields=QNAME,FLAG,POS -Response JSON fields + +## Response JSON fields + field description - format +format string - Read data format. Default: BAM. Allowed values: BAM,CRAM. - urls +Read data format. Default: BAM. Allowed values: BAM,CRAM. +urls array of objects - an array providing URLs from which raw data can be retrieved. The client must retrieve binary data blocks from each of these URLs and concatenate them to obtain the complete response in the requested format. - +an array providing URLs from which raw data can be retrieved. The client must retrieve binary data blocks from each of these URLs and concatenate them to obtain the complete response in the requested format. Each element of the array is a JSON object with the following fields: - -url +url string - one URL. - +one URL. May be either a https: URL or an inline data: URI. HTTPS URLs require the client to make a follow-up request (possibly to a different endpoint) to retrieve a data block. Data URIs provide a data block inline, without necessitating a separate request. - Further details below. - headers +headers optional object - for HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply as headers with any request to the URL. For example, if headers is {"Range": "bytes=0-1023", "Authorization": "Bearer xxxx"}, then the client must supply the headers Range: bytes=0-1023 and Authorization: Bearer xxxx with the HTTPS request to the URL. - +for HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply as headers with any request to the URL. For example, if headers is {"Range": "bytes=0-1023", "Authorization": "Bearer xxxx"}, then the client must supply the headers Range: bytes=0-1023 and Authorization: Bearer xxxx with the HTTPS request to the URL. - md5 +md5 optional hex string - MD5 digest of the blob resulting from concatenating all of the ’payload’ data’ -- the url data blocks. - Response data blocks -Diagram of core mechanic - +MD5 digest of the blob resulting from concatenating all of the ’payload’ data’ -- the url data blocks. -1. Client sends a request with id, genomic range, and filter. +## Response data blocks + +### Diagram of core mechanic + +1. Client sends a request with id, genomic range, and filter. 2. Server replies with a ticket describing data block locations (URLs and headers). 3. Client fetches the data blocks using the URLs and headers. 4. Client concatenates data blocks to produce local blob. - While the blocks must be finally concatenated in the given order, the client may fetch them in parallel. -HTTPS data block URLs + +### HTTPS data block URLs + 1. must have percent-encoded path and query (e.g. javascript encodeURIComponent; python urllib.urlencode) 2. must accept GET requests 3. should provide CORS 4. should allow multiple request retries, within reason 5. should use HTTPS rather than plain HTTP except for testing or internal-only purposes (for security + in-flight corruption detection) 6. Server must send the response with either the Content-Length header, or chunked transfer encoding, or both. Clients must detect premature response truncation. -7. Client and URL endpoint may mutually negotiate HTTP/2 upgrade using the standard mechanism. +7. Client and URL endpoint may mutually negotiate HTTP/2 upgrade using the standard mechanism. 8. Client must follow 3xx redirects from the URL, subject to typical fail-safe mechanisms (e.g. maximum number of redirects), always supplying the headers, if any. If a byte range HTTP header accompanies the URL, then the client MAY decompose this byte range into several sub-ranges and open multiple parallel, retryable requests to fetch them. (The URL and headers must be sufficient to authorize such behavior by the client, within reason.) -Inline data block URIs + +### Inline data block URIs + e.g. data:application/vnd.ga4gh.bam;base64,SGVsbG8sIFdvcmxkIQ== [RFC 2397, WP] The client obtains the data block by decoding the embedded base64 payload. - 1. must use base64 payload encoding (simplifies client decoding logic) 2. client should ignore the media type (if any), treating the payload as a partial blob. - Note: the base64 text should not be additionally percent encoded. -Reliability & performance considerations -To provide robustness to sporadic transfer failures, servers should divide large payloads into multiple data blocks in the urls array. Then if the transfer of any one block fails, the client can retry that block and carry on, instead of starting all over. Clients may also fetch blocks in parallel, which can improve throughput. +### Reliability & performance considerations + +To provide robustness to sporadic transfer failures, servers should divide large payloads into multiple data blocks in the urls array. Then if the transfer of any one block fails, the client can retry that block and carry on, instead of starting all over. Clients may also fetch blocks in parallel, which can improve throughput. Initial guidelines, which we expect to revise in light of future experience: * Data blocks should not exceed ~1GB * Inline data URIs should not exceed a few megabytes -Security considerations + +### Security considerations + The URL and headers might contain embedded authentication tokens; therefore, production clients and servers should not unnecessarily print them to console, write them to logs, embed them in error messages, etc. -Method-specific error interpretations + +## Method-specific error interpretations + * 406 Unable: may be returned if a genomic range is requested, but the server is unable to provide genomic range slicing for the particular dataset (e.g. if no index is available). -Possible future enhancements + + +# Possible future enhancements + 1. add a mechanism to request reads from more than one ID at a time (e.g. for a trio) 2. allow clients to provide a suggested data block size to the server 3. consider adding other data types (e.g. variants) @@ -256,9 +280,9 @@ Possible future enhancements 6. [dglazer] add a way to request reads in GA4GH binary format[d] (e.g. fmt=proto) - - [a]This should probably be specified as a (comma separated?) list in preference order. If the client can accept both BAM and CRAM it is useful for it to indicate this and let the server pick whichever format it is most comfortable with. [b]Define error code (404?, 416?) for queries to a reference that is not present in the header. (Note this is not the same as present but having no data aligned to it - that should just be an empty reply.) [c]Define error response codes - suggest 416 (range not satisfiable). Perhaps this is appropriate for all chr, start and end failures. [d]How will compression work in this case - can we benefit from columnar compression as does Parquet? + + From 6c41a52ec8b3705e8c5fe2fbe3f29227562f13d5 Mon Sep 17 00:00:00 2001 From: John Marshall Date: Wed, 16 Nov 2016 10:31:19 +0000 Subject: [PATCH 3/5] Format tables, `code` words, and links Use kramdown formatting for simple tables (simple editing, nicely displayed on gh-pages, badly displayed in GitHub repository source) and native HTML for the complex nested table (more difficult editing, nicely displayed in both gh-pages and the repository source). This markdown document should now be equivalently formatted to the previous Google Docs document. --- ga4gh-retrieval.md | 261 +++++++++++++++++++++++++-------------------- pub/main.css | 10 ++ 2 files changed, 158 insertions(+), 113 deletions(-) diff --git a/ga4gh-retrieval.md b/ga4gh-retrieval.md index f3f6715f4..febcd9109 100644 --- a/ga4gh-retrieval.md +++ b/ga4gh-retrieval.md @@ -50,63 +50,51 @@ Explicitly this API does NOT: # Protocol essentials -All API invocations are made to a configurable HTTP(S) endpoint, receive URL-encoded query string parameters, and return JSON output. Successful requests result with HTTP status code 200 and have UTF8-encoded JSON in the response body, with the content-type application/json. The server may provide responses with chunked transfer encoding. The client and server may mutually negotiate HTTP/2 upgrade using the standard mechanism. +All API invocations are made to a configurable HTTP(S) endpoint, receive URL-encoded query string parameters, and return JSON output. Successful requests result with HTTP status code 200 and have UTF8-encoded JSON in the response body, with the content-type `application/json`. The server may provide responses with chunked transfer encoding. The client and server may mutually negotiate HTTP/2 upgrade using the standard mechanism. -Any timestamps that appear in the response from an API method are given as ISO 8601 date/time format. +Any timestamps that appear in the response from an API method are given as [ISO 8601] date/time format. -HTTP responses may be compressed using RFC2616 transfer-coding, not content-coding. +HTTP responses may be compressed using [RFC 2616] `transfer-coding`, not `content-coding`. ## Authentication -Requests to the retrieval API endpoint may be authenticated by means of an OAuth2 bearer token included in the request headers, as detailed in RFC 6750. Briefly, the client supplies the header Authorization: Bearer xxxx with each HTTPS request, where xxxx is a private token. The mechanisms by which clients originally obtain their authentication tokens, and by which servers verify them, are currently beyond the scope of this specification. Servers may honor non-authenticated requests at their discretion. +Requests to the retrieval API endpoint may be authenticated by means of an OAuth2 bearer token included in the request headers, as detailed in [RFC 6750]. Briefly, the client supplies the header `Authorization: Bearer xxxx` with each HTTPS request, where `xxxx` is a private token. The mechanisms by which clients originally obtain their authentication tokens, and by which servers verify them, are currently beyond the scope of this specification. Servers may honor non-authenticated requests at their discretion. ## Errors Non-successful invocations of the API return an HTTP error code, and the response body contains a JSON object (content-type application/json) with the following structure: +```json { "error": { "type": "NotFound", "message": "No such accession" } } +``` The following error types are defined: -type - HTTP status code - Description - InvalidAuthentication - 401 - Authorization provided is invalid - PermissionDenied - 403 - Authorization is required to access the resource - NotFound - 404 - The resource requested was not found - Unable - 406 - The server is unable to fulfill the request - UnsupportedFormat - 409 - The requested file format is not supported by the server - InvalidInput - 422 - The request parameters do not adhere to the specification - InternalError - 500 - Server error, clients should try later - ServiceUnavailable - 503 - Service is temporarily unavailable + +|- +type | HTTP status code | Description +|- +InvalidAuthentication | 401 | Authorization provided is invalid +PermissionDenied | 403 | Authorization is required to access the resource +NotFound | 404 | The resource requested was not found +Unable | 406 | The server is unable to fulfill the request +UnsupportedFormat | 409 | The requested file format is not supported by the server +InvalidInput | 422 | The request parameters do not adhere to the specification +InternalError | 500 | Server error, clients should try later +ServiceUnavailable | 503 | Service is temporarily unavailable +|- ## CORS -All API resources should have the following support for cross-origin resource sharing (CORS) to support browser-based clients: +All API resources should have the following support for cross-origin resource sharing ([CORS]) to support browser-based clients: -If a request to the URL of an API method includes the Origin header, its contents will be propagated into the Access-Control-Allow-Origin header of the response. Preflight requests (OPTIONS requests to the URL of an API method, with appropriate extra headers as defined in the CORS specification) will be accepted if the value of the Access-Control-Request-Method header is GET. -The values of Origin and Access-Control-Request-Headers (if any) of the request will be propagated to Access-Control-Allow-Origin and Access-Control-Allow-Headers respectively in the preflight response. -The Access-Control-Max-Age of the preflight response is set to the equivalent of 30 days. +If a request to the URL of an API method includes the `Origin` header, its contents will be propagated into the `Access-Control-Allow-Origin` header of the response. Preflight requests (`OPTIONS` requests to the URL of an API method, with appropriate extra headers as defined in the CORS specification) will be accepted if the value of the `Access-Control-Request-Method` header is `GET`. +The values of `Origin` and `Access-Control-Request-Headers` (if any) of the request will be propagated to `Access-Control-Allow-Origin` and `Access-Control-Allow-Headers` respectively in the preflight response. +The `Access-Control-Max-Age` of the preflight response is set to the equivalent of 30 days. # Method: get reads by ID @@ -119,111 +107,149 @@ The client can request only reads overlapping a given genomic range. The respons ## URL parameters -field - description -id -required + + +
+`id` +_required_ + A string specifying which reads to return. The format of the string is left to the discretion of the API provider, including allowing embedded “/” characters. Strings could be ReadGroupSetIds as defined by the GA4GH API, or any other format the API provider chooses (e.g. “/data/platinum/NA12878”, “/byRun/ERR148333”). +
## Query parameters -field - description -format -optional string + + + + + + + + + +
+`format` +_optional string_ + Request read data in this format. Default: BAM. Allowed values: BAM,CRAM. + Server replies with HTTP status 409 if the requested format is not supported. -[a] -referenceName -optional -The reference sequence name, for example “chr1”, “1”, or “chrX”. If unspecified, all reads (mapped and unmapped) are returned.[b] -referenceMD5 -optional +[^a] +
+`referenceName` +_optional_ + +The reference sequence name, for example “chr1”, “1”, or “chrX”. If unspecified, all reads (mapped and unmapped) are returned. [^b] +
+`referenceMD5` +_optional_ + The MD5 checksum uniquely representing the reference sequence as a lower-case hexadecimal string, calculated as the MD5 of the upper-case sequence excluding all whitespace characters (this is equivalent to SQ:M5 in SAM). -Server replies with HTTP status 422 if referenceName and referenceMD5 are both specified and are incompatible. -start -optional 32-bit unsigned integer -The start position of the range on the reference, 0-based, inclusive. If specified, referenceName or referenceMD5 must also be specified.[c] -end -optional 32-bit unsigned integer -The end position of the range on the reference, 0-based exclusive. If specified, referenceName or referenceMD5 must also be specified. -fields -optional +Server replies with HTTP status 422 if `referenceName` and `referenceMD5` are both specified and are incompatible. +
+`start` +_optional 32-bit unsigned integer_ + +The start position of the range on the reference, 0-based, inclusive. If specified, `referenceName` or `referenceMD5` must also be specified. [^c] +
+`end` +_optional 32-bit unsigned integer_ + +The end position of the range on the reference, 0-based exclusive. If specified, `referenceName` or `referenceMD5` must also be specified. +
+`fields` +_optional_ + A list of fields to include, see below Default: all -tags -optional -A comma separated list of tags to include, default: all. If the empty string is specified (tags=) no tags are included. It is illegal for the values of tags and notags to intersect; the server may return HTTP status 400 in this case. -notags -optional -A comma separated list of tags to exclude, default: none. It is illegal for the values of tags and notags to intersect; the server may return HTTP status 400 in this case. +
+`tags` +_optional_ + +A comma separated list of tags to include, default: all. If the empty string is specified (tags=) no tags are included. It is illegal for the values of `tags` and `notags` to intersect; the server may return HTTP status 400 in this case. +
+`notags` +_optional_ + +A comma separated list of tags to exclude, default: none. It is illegal for the values of `tags` and `notags` to intersect; the server may return HTTP status 400 in this case. +
### Field filtering The list of fields is based on BAM fields: -Field - Description - QNAME - Read names - FLAG - Read bit flags - RNAME - Reference sequence name - POS - Alignment position - MAPQ - Mapping quality score - CIGAR - CIGAR string - RNEXT - Reference sequence name of the next fragment template - PNEXT - Alignment position of the next fragment in the template - TLEN - Inferred template size - SEQ - Read bases - QUAL - Base quality scores - -Example: fields=QNAME,FLAG,POS +|- +Field | Description +|- +QNAME | Read names +FLAG | Read bit flags +RNAME | Reference sequence name +POS | Alignment position +MAPQ | Mapping quality score +CIGAR | CIGAR string +RNEXT | Reference sequence name of the next fragment template +PNEXT | Alignment position of the next fragment in the template +TLEN | Inferred template size +SEQ | Read bases +QUAL | Base quality scores +|- + +Example: `fields=QNAME,FLAG,POS`. ## Response JSON fields -field - description -format -string + + + + +
+`format` +_string_ + Read data format. Default: BAM. Allowed values: BAM,CRAM. -urls -array of objects +
+`urls` +_array of objects_ + an array providing URLs from which raw data can be retrieved. The client must retrieve binary data blocks from each of these URLs and concatenate them to obtain the complete response in the requested format. Each element of the array is a JSON object with the following fields: -url -string + + + +
+`url` +_string_ + one URL. -May be either a https: URL or an inline data: URI. HTTPS URLs require the client to make a follow-up request (possibly to a different endpoint) to retrieve a data block. Data URIs provide a data block inline, without necessitating a separate request. +May be either a `https:` URL or an inline `data:` URI. HTTPS URLs require the client to make a follow-up request (possibly to a different endpoint) to retrieve a data block. Data URIs provide a data block inline, without necessitating a separate request. Further details below. -headers -optional object -for HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply as headers with any request to the URL. For example, if headers is {"Range": "bytes=0-1023", "Authorization": "Bearer xxxx"}, then the client must supply the headers Range: bytes=0-1023 and Authorization: Bearer xxxx with the HTTPS request to the URL. - -md5 -optional hex string +
+`headers` +_optional object_ + +for HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply as headers with any request to the URL. For example, if headers is `{"Range": "bytes=0-1023", "Authorization": "Bearer xxxx"}`, then the client must supply the headers `Range: bytes=0-1023` and `Authorization: Bearer xxxx` with the HTTPS request to the URL. +
+ +
+`md5` +_optional hex string_ + MD5 digest of the blob resulting from concatenating all of the ’payload’ data’ -- the url data blocks. +
## Response data blocks ### Diagram of core mechanic +![Diagram showing ticket flow](pub/ga4gh-ticket.png) + 1. Client sends a request with id, genomic range, and filter. 2. Server replies with a ticket describing data block locations (URLs and headers). 3. Client fetches the data blocks using the URLs and headers. @@ -245,7 +271,7 @@ If a byte range HTTP header accompanies the URL, then the client MAY decompose t ### Inline data block URIs -e.g. data:application/vnd.ga4gh.bam;base64,SGVsbG8sIFdvcmxkIQ== [RFC 2397, WP] +e.g. `data:application/vnd.ga4gh.bam;base64,SGVsbG8sIFdvcmxkIQ==` ([RFC 2397], [Data URI]). The client obtains the data block by decoding the embedded base64 payload. 1. must use base64 payload encoding (simplifies client decoding logic) @@ -255,7 +281,7 @@ Note: the base64 text should not be additionally percent encoded. ### Reliability & performance considerations -To provide robustness to sporadic transfer failures, servers should divide large payloads into multiple data blocks in the urls array. Then if the transfer of any one block fails, the client can retry that block and carry on, instead of starting all over. Clients may also fetch blocks in parallel, which can improve throughput. +To provide robustness to sporadic transfer failures, servers should divide large payloads into multiple data blocks in the `urls` array. Then if the transfer of any one block fails, the client can retry that block and carry on, instead of starting all over. Clients may also fetch blocks in parallel, which can improve throughput. Initial guidelines, which we expect to revise in light of future experience: * Data blocks should not exceed ~1GB @@ -276,13 +302,22 @@ The URL and headers might contain embedded authentication tokens; therefore, pro 2. allow clients to provide a suggested data block size to the server 3. consider adding other data types (e.g. variants) 4. add POST support (if and when request sizes get large) -5. [mlin] add a way to request all unmapped reads (e.g. by passing * for referenceName) -6. [dglazer] add a way to request reads in GA4GH binary format[d] (e.g. fmt=proto) +5. [mlin] add a way to request all unmapped reads (e.g. by passing `*` for `referenceName`) +6. [dglazer] add a way to request reads in GA4GH binary format [^d] (e.g. fmt=proto) + +## Existing clarification suggestions + +[^a]: This should probably be specified as a (comma separated?) list in preference order. If the client can accept both BAM and CRAM it is useful for it to indicate this and let the server pick whichever format it is most comfortable with. +[^b]: Define error code (404?, 416?) for queries to a reference that is not present in the header. (Note this is not the same as present but having no data aligned to it - that should just be an empty reply.) +[^c]: Define error response codes - suggest 416 (range not satisfiable). Perhaps this is appropriate for all chr, start and end failures. +[^d]: How will compression work in this case - can we benefit from columnar compression as does Parquet? -[a]This should probably be specified as a (comma separated?) list in preference order. If the client can accept both BAM and CRAM it is useful for it to indicate this and let the server pick whichever format it is most comfortable with. -[b]Define error code (404?, 416?) for queries to a reference that is not present in the header. (Note this is not the same as present but having no data aligned to it - that should just be an empty reply.) -[c]Define error response codes - suggest 416 (range not satisfiable). Perhaps this is appropriate for all chr, start and end failures. -[d]How will compression work in this case - can we benefit from columnar compression as does Parquet? +[CORS]: http://www.w3.org/TR/cors/ +[Data URI]: https://en.wikipedia.org/wiki/Data_URI_scheme +[ISO 8601]: http://www.iso.org/iso/iso8601 +[RFC 2397]: https://www.ietf.org/rfc/rfc2397.txt +[RFC 2616]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html +[RFC 6750]: https://tools.ietf.org/html/rfc6750 diff --git a/pub/main.css b/pub/main.css index dc84b7f4a..2bd33fa0c 100644 --- a/pub/main.css +++ b/pub/main.css @@ -14,6 +14,16 @@ div.clear { clear: both; } div.sidebar li { margin: 0.5ex 0; } +table { + border-collapse: collapse; +} + +th, td { + border: 1px solid black; + padding: 1ex 1em; + vertical-align: top; +} + .site-footer { margin-top: 4ex; border-top: 1px solid #e8e8e8; From ff9df337dedf1eb4237ec704a4ab091749fc192d Mon Sep 17 00:00:00 2001 From: John Marshall Date: Mon, 12 Dec 2016 16:23:43 +0000 Subject: [PATCH 4/5] Add ga4gh-retrieval.md to README and index Remove ga4gh-retrieval's contents, history, and generic hts-specs footer. Minor edits to capitalisation. Use ASCII em dashes and quotation marks: kramdown will convert these to Unicode dashes and smart quotes. --- README.md | 6 ++++++ _layouts/default.html | 2 +- ga4gh-retrieval.md | 45 ++++++++++--------------------------------- index.md | 3 ++- 4 files changed, 19 insertions(+), 37 deletions(-) diff --git a/README.md b/README.md index 02b256953..72e8f8563 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,11 @@ These formats are discussed on the [vcftools-spec mailing list][vcfspec-ml]. **[BCFv2_qref.tex]** is a quick reference describing just the layout of data within BCF2 files. +Transfer protocols +------------------ + +The **[ga4gh-retrieval.md]** protocol enables parallel streaming access to data sharded across multiple URLs or files. + [SAMv1.tex]: http://samtools.github.io/hts-specs/SAMv1.pdf [SAMtags.tex]: http://samtools.github.io/hts-specs/SAMtags.pdf [CRAMv2.1.tex]: http://samtools.github.io/hts-specs/CRAMv2.1.pdf @@ -37,6 +42,7 @@ These formats are discussed on the [vcftools-spec mailing list][vcfspec-ml]. [VCFv4.3.tex]: http://samtools.github.io/hts-specs/VCFv4.3.pdf [BCFv1_qref.tex]: http://samtools.github.io/hts-specs/BCFv1_qref.pdf [BCFv2_qref.tex]: http://samtools.github.io/hts-specs/BCFv2_qref.pdf +[ga4gh-retrieval.md]: http://samtools.github.io/hts-specs/ga4gh-retrieval.html [ena-cram]: http://www.ebi.ac.uk/ena/about/cram_toolkit [htslib]: https://github.com/samtools/htslib diff --git a/_layouts/default.html b/_layouts/default.html index de3961e84..c58a0488c 100644 --- a/_layouts/default.html +++ b/_layouts/default.html @@ -10,7 +10,7 @@ - {% include footer.html %} + {%unless page.suppress_footer %}{% include footer.html %}{% endunless %} diff --git a/ga4gh-retrieval.md b/ga4gh-retrieval.md index febcd9109..5ee44c936 100644 --- a/ga4gh-retrieval.md +++ b/ga4gh-retrieval.md @@ -1,37 +1,11 @@ --- layout: default title: Retrieval API spec v0.1 +suppress_footer: true --- # Retrieval API spec v0.1 -Design principles -Protocol essentials - Authentication - Errors - CORS -Method: get reads by ID - URL parameters - Query parameters - Field filtering - Response JSON fields - Response data blocks - Diagram of core mechanic - HTTPS data block URLs - Inline data block URIs - Reliability & performance considerations - Security considerations - Method-specific error interpretations -Possible future enhancements - - -## Document history: - -18-mar-2016: copied from https://github.com/dnanexus-rnd/htsnexus/wiki -15-apr-2016: copied from working doc -15-aug-2016: final version for interop testing - - # Design principles This data retrieval API bridges from existing genomics bulk data transfers to a client/server model with the following features: @@ -45,7 +19,8 @@ This data retrieval API bridges from existing genomics bulk data transfers to a * The structuring of POST inputs, redirects and other non-reads data will be protobuf3 compatible JSON Explicitly this API does NOT: -* Provide a way to discover the identifiers for valid ReadGroupSets -- clients obtain these via some out of band mechanism + +* Provide a way to discover the identifiers for valid ReadGroupSets --- clients obtain these via some out of band mechanism # Protocol essentials @@ -76,7 +51,7 @@ Non-successful invocations of the API return an HTTP error code, and the respons The following error types are defined: |- -type | HTTP status code | Description +Type | HTTP status code | Description |- InvalidAuthentication | 401 | Authorization provided is invalid PermissionDenied | 403 | Authorization is required to access the resource @@ -114,7 +89,7 @@ _required_ A string specifying which reads to return. -The format of the string is left to the discretion of the API provider, including allowing embedded “/” characters. Strings could be ReadGroupSetIds as defined by the GA4GH API, or any other format the API provider chooses (e.g. “/data/platinum/NA12878”, “/byRun/ERR148333”). +The format of the string is left to the discretion of the API provider, including allowing embedded "/" characters. Strings could be ReadGroupSetIds as defined by the GA4GH API, or any other format the API provider chooses (e.g. "/data/platinum/NA12878", "/byRun/ERR148333"). @@ -134,7 +109,7 @@ Server replies with HTTP status 409 if the requested format is not supported. `referenceName` _optional_ -The reference sequence name, for example “chr1”, “1”, or “chrX”. If unspecified, all reads (mapped and unmapped) are returned. [^b] +The reference sequence name, for example "chr1", "1", or "chrX". If unspecified, all reads (mapped and unmapped) are returned. [^b] `referenceMD5` @@ -212,7 +187,7 @@ Read data format. Default: BAM. Allowed values: BAM,CRAM. `urls` _array of objects_ -an array providing URLs from which raw data can be retrieved. The client must retrieve binary data blocks from each of these URLs and concatenate them to obtain the complete response in the requested format. +An array providing URLs from which raw data can be retrieved. The client must retrieve binary data blocks from each of these URLs and concatenate them to obtain the complete response in the requested format. Each element of the array is a JSON object with the following fields: @@ -221,7 +196,7 @@ Each element of the array is a JSON object with the following fields: `url` _string_ -one URL. +One URL. May be either a `https:` URL or an inline `data:` URI. HTTPS URLs require the client to make a follow-up request (possibly to a different endpoint) to retrieve a data block. Data URIs provide a data block inline, without necessitating a separate request. @@ -231,7 +206,7 @@ Further details below. `headers` _optional object_ -for HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply as headers with any request to the URL. For example, if headers is `{"Range": "bytes=0-1023", "Authorization": "Bearer xxxx"}`, then the client must supply the headers `Range: bytes=0-1023` and `Authorization: Bearer xxxx` with the HTTPS request to the URL. +For HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply as headers with any request to the URL. For example, if headers is `{"Range": "bytes=0-1023", "Authorization": "Bearer xxxx"}`, then the client must supply the headers `Range: bytes=0-1023` and `Authorization: Bearer xxxx` with the HTTPS request to the URL. @@ -240,7 +215,7 @@ for HTTPS URLs, the server may supply a JSON object containing one or more strin `md5` _optional hex string_ -MD5 digest of the blob resulting from concatenating all of the ’payload’ data’ -- the url data blocks. +MD5 digest of the blob resulting from concatenating all of the "payload" data --- the url data blocks. diff --git a/index.md b/index.md index c4a7e9c67..25c3aaa3f 100644 --- a/index.md +++ b/index.md @@ -11,7 +11,7 @@ title: HTS format specifications {{line}}{% endfor %}
{% for line in readme_lines offset: 5 %} From 969268138c09cf1956cd99414f0bee6f38e6978b Mon Sep 17 00:00:00 2001 From: John Marshall Date: Tue, 31 Jan 2017 16:45:38 +0000 Subject: [PATCH 5/5] Rename to "htsget", the newly-chosen name for the protocol --- README.md | 4 ++-- ga4gh-retrieval.md => htsget.md | 6 +++--- index.md | 2 +- pub/{ga4gh-ticket.png => htsget-ticket.png} | Bin 4 files changed, 6 insertions(+), 6 deletions(-) rename ga4gh-retrieval.md => htsget.md (99%) rename pub/{ga4gh-ticket.png => htsget-ticket.png} (100%) diff --git a/README.md b/README.md index 72e8f8563..5659abda4 100644 --- a/README.md +++ b/README.md @@ -29,7 +29,7 @@ These formats are discussed on the [vcftools-spec mailing list][vcfspec-ml]. Transfer protocols ------------------ -The **[ga4gh-retrieval.md]** protocol enables parallel streaming access to data sharded across multiple URLs or files. +**[Htsget.md]** describes the _hts-get_ retrieval protocol, which enables parallel streaming access to data sharded across multiple URLs or files. [SAMv1.tex]: http://samtools.github.io/hts-specs/SAMv1.pdf [SAMtags.tex]: http://samtools.github.io/hts-specs/SAMtags.pdf @@ -42,7 +42,7 @@ The **[ga4gh-retrieval.md]** protocol enables parallel streaming access to data [VCFv4.3.tex]: http://samtools.github.io/hts-specs/VCFv4.3.pdf [BCFv1_qref.tex]: http://samtools.github.io/hts-specs/BCFv1_qref.pdf [BCFv2_qref.tex]: http://samtools.github.io/hts-specs/BCFv2_qref.pdf -[ga4gh-retrieval.md]: http://samtools.github.io/hts-specs/ga4gh-retrieval.html +[Htsget.md]: http://samtools.github.io/hts-specs/htsget.html [ena-cram]: http://www.ebi.ac.uk/ena/about/cram_toolkit [htslib]: https://github.com/samtools/htslib diff --git a/ga4gh-retrieval.md b/htsget.md similarity index 99% rename from ga4gh-retrieval.md rename to htsget.md index 5ee44c936..d6af0914f 100644 --- a/ga4gh-retrieval.md +++ b/htsget.md @@ -1,10 +1,10 @@ --- layout: default -title: Retrieval API spec v0.1 +title: htsget protocol suppress_footer: true --- -# Retrieval API spec v0.1 +# Htsget retrieval API spec v0.1 # Design principles @@ -223,7 +223,7 @@ MD5 digest of the blob resulting from concatenating all of the "payload" data -- ### Diagram of core mechanic -![Diagram showing ticket flow](pub/ga4gh-ticket.png) +![Diagram showing ticket flow](pub/htsget-ticket.png) 1. Client sends a request with id, genomic range, and filter. 2. Server replies with a ticket describing data block locations (URLs and headers). diff --git a/index.md b/index.md index 25c3aaa3f..d61ffe5d4 100644 --- a/index.md +++ b/index.md @@ -25,7 +25,7 @@ Specifications: - [VCF v4.1](VCFv4.1.pdf) - [VCF v4.2](VCFv4.2.pdf) - [VCF v4.3](VCFv4.3.pdf) -- [GA4GH retrieval](ga4gh-retrieval.html) +- [Htsget](htsget.html)
{% for line in readme_lines offset: 5 %} diff --git a/pub/ga4gh-ticket.png b/pub/htsget-ticket.png similarity index 100% rename from pub/ga4gh-ticket.png rename to pub/htsget-ticket.png