/
HTTP.protocol.txt
2462 lines (2251 loc) · 183 KB
/
HTTP.protocol.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
HTTP
STANDARDS ==> #RFCs mostly (see below)
GOAL ==> #Application-layer adding CRUD actions on [linked] generic whole resources:
# - whole: as opposed to parts of resources
# - generic: rich generic semantics with no assumption of type|format|purpose
# - linked:
# - good support for links between resources
# - hence good for documents
# - generic semantics reduce probability of broken links
#Oriented:
# - more towards consumption: client-server, proxies-friendly
# - than interaction: concurrency|realtime
#Properties:
# - good efficiency
# - poor security, rely on TLS instead
#On TCP/IP
#Wide support:
# - so is used for other purpose like tunneling
SUMMARY ==> #Protocol/format:
# protocols/versions application layer, TCP/IP, versioning, switching protocols,
# tunneling, extensions
# syntax HTTP/1.* (text-based), HTTP/2 (binary-based (frames) with
# streams), syntax error, exit, charset, case, headers syntax
# body formatting body length hint, unknown (streaming), encoding
#Errors:
# general error codes, types, response body
# specific limits, validation, financial, legal
#Content:
# semantics status codes, methods (allowed, safety),
# CRUD (GET, HEAD, POST, PUT, PATCH, DELETE)
# content processing transformations, preferences, expectations
# content negotation types (MIME, charset, language, compression, delta encoding,
# features, objectionable, datetime), proactive vs reactive
# links URI scheme, redirects (client-side, server-side, alternative
# services), wrong location, body location, semantic links, source maps
#State/context
# session 1 multiplexed TCP socket, PING, cookies
# tracking referer, negotiation (DNT [C])
# logging timestamp, response time
# software identification
#Communication:
# browser HTTP requests, rendering
# client-server, proxies proxy chain info, transformations, caching, authentication,
# errors
# concurrency conflicts, preconditions, method idempotency, realtime
#Security:
# security TLS, authentication (cookies, basic|digest, headers|body|query),
# integrity
#Efficiency:
# caching client vs server, history, unconditional caching,
# conditional caching, no caching, delta encoding
# prefetching server push, resource hints
# body size compression, big body (streaming, byte serving),
# partial|empty body
# tasks async, stream prioritization
# performance errors load, timeout
#Development:
# documentation OPTIONS
# debugging TRACE
PROTOCOL/FORMAT
/=+===============================+=\
/ : : \
)==: PROTOCOLS/VERSIONS :==(
\ :_______________________________: /
\=+===============================+=/
LAYERS ==> # - application layer
# - usually on top of TCP/IP
# - TCP ports: 80 (HTTP), 443 (HTTPS)
VERSIONING ==> #Current:
# - 2
# - previous: 1.1, 1.0
#To guess support, can use:
# - ALPN, TLS extension (see SSL/TLS doc)
# - Upgrade: h2[c] [C|S] (see switching protocols)
# - h2 is HTTS, h2c HTTPS
# - must include SETTINGS frame as HTTP2-Settings [C] (base64-encoded)
# - not supported by most browsers
# - can Upgrade to but not from HTTP/2
# - prior knowledge: client already knows server supports HTTP/2
#Wrong version:
# - 505 (HTTP Version Not Supported)
# - (HTTP/2) HTTP_1_1_ERROR (0xd) error code: ask to downgrade to HTTP/1.1
SWITCHING PROTOCOLS ==> #Starting a session in HTTP then continuing in another protocol:
#How:
# - client:
# - Upgrade: PROTOCOL[/VERSION],... [C]: requested ones
# - Connection: Upgrade [C]
# - server (success):
# - 101 (Switching Protocols)
# - Upgrade: PROTOCOL[/VERSION],... [S]: picked ones
# - Connection: Upgrade [S]
# - no response body
# - server (missing Upgrade [C])
# - 426 (Upgrade Required)
#Upgrade [C] allowed with HTTP/1.1 -> HTTP/2, but not with HTTP/2 -> *
TUNNELING ==> #Goal:
# - wrapping another protocol packets in HTTP
# - either:
# - proxy and endpoints wraps|unwraps request|response
# - initial setup is done with HTTP, then proxy forwards the other protocol
#Advantages:
# - compatibility with existing software, including firewalls
# - with HTTPS, can completely hide the other protocol packet
# (including which protocol it is)
# - can use HTTP features, e.g. caching
#Problems:
# - adds HTTP features that might not be needed -> complexity
# - slower
#Should register|use different:
# - URI scheme or HTTP methods: if very different from HTTP usual behavior
# - port: if separate from HTTP server (different goal|traffic|data)
# - status codes: never, should instead reuse existing ones and make sure it
# matches behavior:
# - because proxies will behave according to status code (e.g. for caching)
# - if nothing matches, use general ones like 200 or 500
#How:
# - CONNECT:
# - how:
# - client:
# - HTTP method CONNECT (unsafe non-idempotent)
# - URI: ORIGIN only
# - no request body
# - server:
# - 2**
# - (HTTP/2) can use CONNECT_ERROR (0xa) error code
# - then server will forward the TCP connection used by HTTP request
# - it will only forward without interpreting, i.e. can be any TCP-based protocol,
# including HTTP
# - intented for forward proxies
# - use cases:
# - using HTTPS end-to-end, i.e. proxy forwards SSL packets without needing to
# decrypt/re-encrypt them
# - any other HTTP tunneling case
# - allow client to do any TCP request, which can be security concern:
# - should restrict to specific use case by whitelisting ORIGIN and|or PORT
EXTENSIONS ==> # - new headers
# - HTTP/2:
# - new frame type
# - new SETTINGS_*
# - new *_ERROR
# - should prefix with X-*
# - if unavailable:
# - HTTP extensions: 510 (Not Extended)
# - feature: 501 (Not Implemented)
/=+===============================+=\
/ : : \
)==: SYNTAX :==(
\ :_______________________________: /
\=+===============================+=/
HTTP/1.* ==> #Newline is CRLF
#Request:
# METHOD URI HTTP/1.*
# HEADER: VAL (Host [C] mandatory)
# ...
#
# [BODY]
#Response:
# HTTP/1.* STATUS_NUM STATUS_STR
# [HEADER: VAL
# ...]
#
# [BODY]
HTTP/2 ==> #Stream: one request+response, or server push:
# - several ordered frames connected by stream ID
# - is what makes multiplexing possible
# - stream ID:
# - 0: hypothetical main stream
# - odd is started by client, even by server
# - incremental ID
#Frames:
# - headers: length [0-2], type [3], flags [4], reserved [5.0], stream ID [5.1-8],
# payload [*]
# - payload:
# - DATA (0x0): [padding length [0], ]data [*][, padding [*]]
# - HEADERS (0x1): [padding length [0], ][exclusive [1.0], stream parent ID [1.1-4],
# weight [5]], headers [*][, padding [*]]
# - PRIORITY (0x2): exclusive [0.0], stream parent ID [0.1-3], weight [4]
# - RST_STREAM (0x3): error code [0-3]
# - SETTINGS (0x4): several times: VAR [0-1], VAL [2-5]
# - PUSH_PROMISE (0x5): [padding length [0], ]reserved [1.0], promise stream ID [1.1-4],
# headers [*][, padding [*]]
# - PING (0x6): data [0-7]
# - GOAWAY (0x7): reserved [0.0], last stream ID [0.1-3], error code [4-7],
# debug data [*]
# - WINDOW_UPDATE (0x8): reserved [0.0], increment [0.1-3]
# - CONTINUATION (0x9): headers [*]
# - flags:
# - END_STREAM (0x1) (DATA, HEADERS)
# - ACK (0x1) (SETTINGS, PING)
# - END_HEADERS (0x4) (HEADERS, PUSH_PROMISE, CONTINUATION)
# - PADDED (0x8) (DATA, HEADERS, PUSH_PROMISE)
# - PRIORITY (0x20) (HEADERS)
#Init (for each endpoint):
# - start:
# PRI * HTTP/2.0
#
# SM
#
# - SETTINGS frame:
# - options, noted SETTINGS_*
# - receiver must respond with empty SETTINGS frame with ACK flag
# - can be sent again later to change settings
#Normal (request|response) stream:
# - request:
# - 1 HEADERS, 0-n DATA (body), 0-1 trailing HEADERS
# - HEADERS:
# - must contain pseudo-headers:
# :method METHOD [C]
# :authority URI [C]
# :path PATH [C] : can be *
# :scheme PROTOCOL [C]
# - if 1 frame not enough, send extra CONTINUATION frames with extra headers
# - last frame should have flag END_HEADERS
# - response:
# - same but HEADERS must contain instead pseudo-header :status UINT [S]
#End:
# - END_STREAM flag:
# - normal stream end
# - on last DATA frame (or last HEADERS if none) for each endpoint
# - might be followed by CONTINUATION frames
# - RST_STREAM frame:
# - cancels current stream
# - GOAWAY frame:
# - end of session , i.e. stops any new stream
# - keep processing existing streams
# - contains last stream ID that will be processed
# - can contain arbitrary debug info
# - should respond with another GOAWAY
SYNTAX ERROR ==> #Request:
# - HTTP method: 405 (see HTTP methods)
# - any: 400 (Bad Request)
#Response:
# - 500 (Internal Server Error)
#Request|response (HTTP/2 error codes):
# - PROTOCOL_ERROR (0x1): e.g. wrong frame order or content
# - COMPRESSION_ERROR (0x9): HPACK error
# - INTERNAL_ERROR (0x2): protocol-related generic error
EXIT ==> #Cancel stream (HTTP/2 error codes):
# - REFUSED_STREAM (0x7): stream refused
# - CANCEL (0x8): stream no longer needed
#Normal exit (HTTP/2 error codes):
# - STREAM_CLOSED (0x5)
# - NO_ERROR (0x0)
CHARSET ==> # - ASCII
# - headers:
# - unescaped:
# - [:alnum:] ! # $ % & ' * + - . ^ _ ` | ~
# - escaped with "..." or (...):
# - [:print:] TAB
# - in values only
# - \" or \( or \) to escape delimiters inside
# - escaped with URI encoding (UTF-8):
# - only for VAL in ;VAR=VAL
# - must be written VAR*=UTF-8'[LANG]'VAL
# - VAR must escape % ' *
# - UTF-8|LANG are case insensitive
# - can use both VAR*= and VAR= (as fallback)
# - for VAL that needs to human-readable, should be avoided otherwise
# - browser support is only for specific headers now:
# - Content-Disposition [S] filename
# - Link [S] title (partial support)
# - escaping of body depends on MIME type
CASE ==> #HTTP methods: uppercase
#Headers:
# - name: case insensitive. Capitalized often used
# - value: depends on header, but many define it as case insensitive when value is token
HEADERS ==> # - order is not significant
# - no duplicates, except Set-Cookie [S]. Should use commas for multiple header values
# - should use (...) for comments inside values
# - written [C|S] for server|client in my doc
# - binary compressed (HTTP/2)
# - algo is 'HPACK'
# - common HEADER or HEADER: VAL have predefined numbers
# - each endpoint caches HEADER: VAL in a dynamic table
# - :HEADER are called pseudo-headers:
# - same as headers but with extra restrictions on when can be used (HTTP/2)
# - <meta http-equiv="HEADER" content="VAL">:
# - add HTTP header client-side
# - only ones that are crossbrowser: refresh, Content-Security-Policy, set-cookie
# - <meta name="HEADER" content="VAL">:
# - add HTTP header client-side
# - only ones that are crossbrowser: referrer (Referrer-Policy [S])
/=+===============================+=\
/ : : \
)==: BODY FORMATTING :==(
\ :_______________________________: /
\=+===============================+=/
BODY LENGTH HINT ==> #Content-Length: NUM [C|S]:
# - request|response body length ('entity', not 'instance')
# - presence:
# - mandatory if body present
# - forbidden with chunked transfer encoding
# - can use 411 (Length Required)
# - only hint to help allocating resources or download progress:
# - not substitute for reading actual body length
BODY UNKNOWN LENGTH ==> #(HTTP/2)
# - not sending END_STREAM flag
# - use trailer HEADER frame for headers that are based on body, when body is not known
# in advance
# - TE: trailers [C] can be used
#(HTTP/1.1)
#Chunked transfer encoding:
# - send request|response body in several times
# - how:
# - TE: WORD[;q=NUM],... [C]:
# - optional
# - indicates preference for WORD:
# - trailers: Trailer [S]
# - other WORD: Transfer-Encoding: WORD [S]
# - q=NUM: like content negotiation
# - Transfer-Encoding: chunked [C|S]:
# - sent with each chunk:
# - first line is SIZE_HEX_NUM[;VAR=VAL;...]
# - rest is data
# - empty chunk indicates end
# - Trailer: HEADER [S]
# - optional
# - unless HEADER is not important (client can use body without it),
# only if TE: trailers [C]
# - HEADER is limited, e.g. cannot be related to:
# - message processing: Transfer-Encoding, Content-Length
# - content processing: Content-Encoding, Content-Type, Content-Range, Trailer
# - routing: Host, :authority
# - authentication
# - response control: Cache-Control, Expires, Date, Location, Retry-After, Vary,
# Warning
# - request modifiers: Range, Max-Forwards
# - applied after any transformation (compression, range, delta encoding)
# - should use compression
BODY ENCODING ==> #Often used:
text/plain #STR. For OBJ:
# - VAR=VAL, newline-separated
application/x-www-form-urlencoded #STR. For OBJ:
# - VAR=VAL:
# - &-separated
# - URI-encoded anything but [:alnum:] * , - _
# - space converted to + instead of %20
# - for nesting, can use VAR[VAR2]=VAL, but must be supported by server
# (e.g. Express BODY-PARSER)
#Legacy (used by <form>). Should prefer others like application/json
multipart/form-data; boundary=STR #Binary|big STR, e.g. file upload:
# --DELIM
# Content-Disposition: form-data; name="VAR"[; filename="FILE"]
# [Content-Type: TYPE]
# [Other headers]
#
# content as is
# --DELIM
# ... (another file)
# --DELIM--
#DELIM:
# - should be long enough and random
# - can include a graphical line with "------" for humans reading the request.
#Must user CRLF newlines
multipart/related; boundary=STR #Like multipart/form-data except each part is not individual, i.e. needs to be put together
; type="MIME" #with other parts to make sense.
[; start="CONTENT_ID"] #Differences:
[; start-info="STR|CONTENT_ID2"] # - each part:
# - must use Content-Id: "CONTENT_ID" (part identifier)
# - root part:
# - must be Content-Id: "CONTENT_ID" (def: first part)
# - must be Content-Type: "MIME" (from ;type)
# - can have metadata from ;start-info
# - Content-Disposition is optional
application/json #JSON
application/xml #XML
application/octet-stream #Binary
ERRORS
/=+===============================+=\
/ : : \
)==: GENERAL :==(
\ :_______________________________: /
\=+===============================+=/
ERROR CODES ==> #Contained in GOAWAY|RST_FRAME frame (HTTP/2)
#Types:
# - NO_ERROR (0x0)
# - PROTOCOL_ERROR (0x1): e.g. wrong frame order or content
# - INTERNAL_ERROR (0x2)
# - FLOW_CONTROL_ERROR (0x3): too much buffered DATA because of flow control
# - SETTINGS_TIMEOUT (0x4): SETTINGS timeout
# - STREAM_CLOSED (0x5)
# - FRAME_SIZE_ERROR (0x6): frame too small|big
# - REFUSED_STREAM (0x7):
# - sender notifies receiver that request has not been processed, allowing safe retry.
# - e.g. too many concurrent streams or refused server push
# - CANCEL (0x8): stream no longer needed or server push refused
# - COMPRESSION_ERROR (0x9)
# - CONNECT_ERROR (0xa): when using CONNECT method
# - ENHANCE_YOUR_CALM (0xb): throttling, e.g. too many server pushes
# - INADEQUATE_SECURITY (0xc): does not use TLS or wrong TLS setup
# - HTTP_1_1_ERROR (0xd): should downgrade to HTTP/1.1
TYPES ==> # syntax 405, 400, 500, See syntax errors
# PROTOCOL|COMPRESSION|INTERNAL_ERROR
# size 414, 431, 413 See limits
# not implemented 501, 510, 505 See extensions, versioning
# (server)
# validation 415, 406 See validation
# location 404, 410 See wrong location
# proxies 502, 504, Warning [S] See proxies
# concurrency 409 See concurrency
# authentication 401|407, 403, 419, 511 See authentication
# financial 402 See financial
# legal 451 See legal
ERROR RESPONSE BODY ==> #5** should include response body with error information
#Can be a "problem detail" (see REST doc)
/=+===============================+=\
/ : : \
)==: LIMITS :==(
\ :_______________________________: /
\=+===============================+=/
LIMITS ==> #Streams:
# (HTTP/2)
# - SETTINGS_MAX_CONCURRENT_STREAMS (0x3) (def: unlim)
#Frames:
# (HTTP/2)
# - SETTINGS_MAX_FRAME_SIZE (0x5) (def/min: 16KB, max: 16MB)
# - can use FRAME_SIZE_ERROR (0x6) error code
#URI:
# - 414 (Request URI Too Long)
#Headers:
# - 431 (Request Header Fields Too Large)
# - implementation has additional limits, e.g. Apache limits 8MB/header and 1000 headers,
# Node.js 80KB/header
# (HTTP/2)
# - SETTINGS_MAX_HEADER_LIST_SIZE (0x6) (def: unlim):
# - max HEADERS frame size, uncompressed, with extra 32 bytes per header.
# - not mandatory to respect it.
# - SETTINGS_HEADER_TABLE_SIZE (0x1) (def: 4KB): headers dynamic table size
#Body:
# - 413 (Request Entity Too Large)
# (HTTP/2)
# - max receiver buffer (flow control) for DATA frames:
# - counter SETTINGS_INITIAL_WINDOW_SIZE (0x4) bytes (def: 64KB, max: 2GB):
# - decreased when sender sends
# - increased when receiver sends WINDOW_UPDATE frame:
# - includes how much to increase
# - e.g. when receiver consumed (i.e. received and do not buffer anymore)
# - if exhausted, each endpoint should terminate stream with FLOW_CONTROL_ERROR (0x3)
# error code
# - should amount to how much receiver can buffer
# - can be for a stream, or whole connection (if stream ID is 0)
/=+===============================+=\
/ : : \
)==: VALIDATION :==(
\ :_______________________________: /
\=+===============================+=/
VALIDATION ==> #Missing headers:
# - 411 (Length Required): Content-Length [C]
# - 426 (Upgrade Required): Upgrade [C]
# - 428 (Precondition Required): If-* [C]
#Body:
# - 415 (Unsupported Media Type): wrong request body MIME_TYPE
# - 406 (Not Acceptable): wrong (requested) response body media type or encoding
# (See content negotiation, delta encoding, compression)
# - Prefer: handling=strict|lenient [C]: should validate [not] strictly request body
/=+===============================+=\
/ : : \
)==: CONTEXT :==(
\ :_______________________________: /
\=+===============================+=/
FINANCIAL ==> #402 (Payment Required)
LEGAL ==> #451 (Unavailable For Legal Reasons):
# - censorship, copyright, privacy.
# - can include Link [S] with rel="blocked-by"
CONTENT
/=+===============================+=\
/ : : \
)==: SEMANTICS :==(
\ :_______________________________: /
\=+===============================+=/
STATUS CODES ==> #Indicates response main semantics
#WebDAV-only: 102, 207, 208, 422, 423, 424, 425, 507, 508
#1**: informational, i.e. establish communication (no response body)
# 100 (Continue) client sent only headers and can proceed to send body See Expect: 100-continue [C]
# 101 (Switching Protocols) switch protocol success, e.g. from HTTP to HTTPS See Upgrade [C]
# 103 (Early Hints) final response's headers hints See prefetching
#2**: success
# 200 (OK) simple
# 201 (Created) new resource See POST
# 202 (Accepted) response is ongoing and will take some time to process See async
# 203 (Non-Authoritative Information) proxy transformed server's 200 See proxies
# 204 (No Content) OK, but nothing is to be returned and client should not refresh See empty body
# 205 (Reset Content) like 204 but client should reset See empty body
# 206 (Partial Content) response is only a subset of the full resource See Range [C]
# 226 (IM Used) returns diff not full resource See Delta encoding
#3**: client redirect
# 300 (Multiple Choices) redirect need client input See content negotiation
# 301 (Moved Permanently) permanent, GET (even if different method, but should ask first) See redirects
# 302 (Found) temp, GET (even if different method, but should ask first) See redirects
# 303 (See Other) different resource (not only different URI) than requested See redirects
# 304 (Not Modified) resource did not change See unconditional caching
# 305 (Use Proxy) redirect to a proxy (deprecated) See redirects
# 306 (Unused) Reserved
# 307 (Temporary Redirect) temp, same HTTP method See redirects
# 308 (Permanent Redirect) permanent, same HTTP method See redirects
#4**: client-side failure
# 400 (Bad Request) syntax error See syntax errors
# 401 (Unauthorized) authentication problem See authentication
# 402 (Payment Required) e.g. should pay to increase request rate See financial
# 403 (Forbidden) authorization problem (and location not secret) See authentication
# 404 (Not Found) wrong URI, but correct domain (or auth problem with secret location) See wrong location
# 405 (Method Not Allowed) HTTP method not implemented or not allowed See HTTP methods
# 406 (Not Acceptable) wrong (requested) response body media type or encoding See validation
# 407 (Proxy Authentication Required) like 401 but for application proxy See authentication
# 408 (Request Timeout) as opposed to response timeout See timeout
# 409 (Conflict) multi-client conflict See concurrency
# 410 (Gone) like 404, but indicates URI was present before See wrong location
# 411 (Length Required) missing request body length See body length hint
# 412 (Precondition Failed) resource changed even though client assumed it did not See conditional caching
# 413 (Payload Too Large) See limits
# 414 (URI Too Long) See limits
# 415 (Unsupported Media Type) unsupported request body media type (inverse of 406) See validation
# 416 (Range Not Satisfiable) wrong Range [C] See Range [C]
# 417 (Expectation Failed) Expect [C] failed See request processing
# 419 (Authentication Timeout) like 401 but for authentication timeout See authentication
# 421 (Misredirected Request) request picked wrong server See redirections
# 426 (Upgrade Required) switch protocol denied See switching protocols
# 428 (Precondition Required) must use If-* [C] See conditional caching
# 429 (Too Many Requests) throttle. May include Retry-After [S] See load
# 431 (Request Header Fields Too Large) See limits
# 451 (Unavailable For Legal Reasons) See legal
#5**: server-side failure
# 500 (Internal Server Error) bug in server, e.g. syntax error See syntax errors
# 501 (Not Implemented) feature not available yet See extensions
# 502 (Bad Gateway) sent by proxy when it received error from server See proxies
# 503 (Service Unavailable) server is down (cannot connect) See load
# 504 (Gateway Timeout) sent by proxy on server timeout See proxies
# 505 (HTTP Version Not Supported) See versioning
# 506 (Variant Also Negotiates) negotiation circular loop See content negotation
# 510 (Not Extended) HTTP extension not implemented See extensions
# 511 (Network Authentication Required) like 401 but for network proxy, containing link where to authenticate See authentication
INFORMATIONAL STATUS CODES ==> #A server can sent one or several 1** responses before the final response.
METHODS ==> #Request main semantics
#Whitelisting:
# - Allow: HTTP_METHOD,... [S]:
# - in response to OPTIONS request
# - errors:
# - not allowed: 405 (Method Not Allowed). Must also include Allow [S]
# - not understood: 501 (Not Implemented)
#X-HTTP-Method-Override: METHOD [C]:
# - request should be interpreted as if HTTP METHOD had been used
# - goal is to overcome proxies HTTP methods restrictions
# - should use POST:
# - does not break semantics (potentially non-safe and non-idempotent)
# - e.g. GET assumes safe, and breaking it arise security issue, because browsing users
# might trigger action unknowlingly
#Safety:
# - read vs write (on the resource)
# - list:
# - safe: GET, HEAD, OPTIONS, TRACE
# - unsafe: PUT, DELETE, [UN]LINK, POST, PATCH, CONNECT
# - Safe: yes|no [S]:
# - declares unsafe method being actually safe
# - goal is to imply idempotency, i.e. client can retry without prompting user, e.g.:
# - safe POST when need to submit x-www-urlencoded safe request without using XHR
# - idempotent PATCH
# - not implemented by clients
CRUD ==> #GET:
# - retrieve resource
# - safe method
# - body:
# - request: none
# - response: resource representation
#HEAD:
# - like GET but no response body
# - content-related headers may also be omitted, e.g. content encoding or transfer encoding
#POST:
# - generic modification|processing
# - including creating resource, without known resource URI (e.g. no specific ID)
# - use 201 (Created) with Location: URL [S]
# - unsafe method
# - usually non-idempotent
# - body:
# - request: modification, e.g. new resource representation
# - response: status of modification
# - e.g. new resource representation, in which case should add Content-Location [S]
#PUT:
# - replace|create (completely) resource, with known resource URI (e.g. specific ID)
# - if created, use 201 (Created) with Location: URL [S]
# - unsafe idempotent method
# - body:
# - request: new resource representation
# - response: either:
# - none, with 204|205
# - replace|creation status, with 200 (OK), e.g. new resource representation
#PATCH:
# - modify (partially) resource, with known resource URI (e.g. specific ID)
# - if not existing, can create, or not (preferred)
# - unsafe method
# - non-idempotent, unless it just appends or sets parts (as opposed to merge)
# - body:
# - request: resource diff|changes
# - response: diff status, e.g. new resource representation
# - can e.g. use JSON merge patch (see its doc) or JSON patch (see its doc)
# - Accept-Patch: MIME[;charset=CHARSET],... [S]
# - in response to OPTIONS request, meaning MIME types accepted by PATCH
#DELETE:
# - erase resource
# - unsafe idempotent method
# - body:
# - request: none
# - response: either:
# - none, with 204|205
# - deletion status, with 200 (OK)
/=+===============================+=\
/ : : \
)==: CONTENT PROCESSING :==(
\ :_______________________________: /
\=+===============================+=/
CONTENT TRANSFORMATION ==> #Possible transformations:
# - variants:
# - specific to the entity, changes final representation
# - e.g. language, charset, Content-Type, features
# - content-encoding:
# - specific to the entity, does not change final representation
# - e.g. compression, ranges, delta encoding
# - order is unspecified, except by A-IM [C] and IM [S]
# - transfer-encoding:
# - specific to the request|response
# - e.g. Transfer-Encoding [S]
# - forbidden in HTTP/2
#Content can be:
# - resource: highest-level
# - resource state: resource at a given time
# - variant: resource state variations
# - instance: a variant inside a request|response body
# - entity[-header|body]/representation: an instance after content-encoding, then
# transfer-encoding
#Resource can be dynamic (server modifies content) or static
REQUEST PROCESSING ==> #Prefer [C]:
# - client:
# - Prefer: VAR[=VAL][;ATTR=VAL...] ,... [C]:
# - ask server to handle request body in a specific way
# - should not be used for content negotiation
# - can be present several times per request (with different VARs)
# - def VAL: ""
# - VAR:
# - respond-async, time=NUM, return=minimal, return=representation,
# handling=strict|lenient, safe (see this doc)
# - x-VAR
# - server:
# - Preference-Applied: VAR[="VAL"] ,... [S]: which Prefer [C] were honored
#Expect: VAL,... [C]
# - generic expectation on how server should handle request
# - 417 (Expectation Failed) if expectation not met
CONTENT NEGOTIATION ==> #How:
# - client:
# - only GET|HEAD
# - Accept* [C]:
# - accepted variants:
# - Accept: MIME_TYPE[;q=NUM],... [C]
# - can use * for TYPE and|or SUBTYPE
# - Accept-Charset: CHARSET[;q=NUM],... [C]
# - can use *
# - Accept-Language: LANGUAGE[;q=NUM],... [C]
# - Accept-Encoding: COMPRESSION_ALGO[;q=NUM],... [C] (see compression)
# - A-IM: ALGO[;VAR=VAL][;q=NUM],... [C] (see delta encoding)
# - Accept-Datetime: DATE [C] (see memento)
# - has priority over other dimensions
# - works a bit differently than other dimensions
# - Accept-Features: FEATURE[;q=NUM],... [C]
# - FEATURE: FEAT, FEAT=|!=VAL, FEAT=[NUM-NUM2], !FEAT, *
# - anything not covered by MIME_TYPE, CHARSET or LANGUAGE: MEDIAQUERYLIST,
# web API support, speed vs graphics, etc.
# - should be x-FEATURE if not standard
# - NUM is preference (see below)
# - if omitted: should not use 'choice'
# - Negotiate: VAL [C]:
# - VAL:
# - trans: supports list|choice
# - VERSION,...: supports list|choice with those variant selection algorithm versions. Can be '*'
# - guest-small: supports and ask for list|choice providing list is small enough
# - vlist: supports and ask for list|choice
# - (default): ask for ad-hoc
# - TCN: list, re-choose [S]:
# - server forced TCN: list, although not asked by Negotiate [C]
# - server:
# - list|choice is called 'transparent content negotiation',
# ad-hoc 'server-side negotiation'
# - TCN: VAL [S]:
# - 2xx or 3xx, except 304
# - ad-hoc[, keep]:
# - picks one variant, provides no choice to client
# - should use:
# - Content-Location: URI [S]
# - Content-Type: MIME_TYPE[; charset=CHARSET] [S]
# - def: either application/octet-stream of content sniffing
# - Content-Language: LANGUAGE,... [S]
# - Content-Encoding: COMPRESSION_ALGO,... [S] (see compression)
# - IM: ALGO[;VAR=VAL],... [S] (see delta encoding)
# - Memento-Datetime: DATE [S] (see memento)
# - 406 (Not Acceptable): wrong Accept-* [C]
# - list[, re-choose]:
# - provides client only with possible variants:
# - should be kept short, e.g. 2 to 10
# - variant URI:
# - should not have variant itself.
# Otherwise should return 506 (Variant Also Negotiates)
# - are best as relative "FILENAME"
# - requires extra HTTP request
# - 300 status code
# - Alternates: CHOICE,... [S] (variants):
# - {"URI" NUM {ATTR VAL} ...}
# - ATTR:
# - type|charset|language VAL
# - features FEATURE ...:
# - can also use FEATURESET;[+NUM][-NUM2],
# where FEATURESET is FEATURE or [FEATURE ...]
# - NUM is max quality improvement (def: 1),
# NUM2 max degradation (def: 0)
# - length NUM
# - description "STR". Use %xXX for Unicode encoding.
# - any other x-ATTR
# - NUM (0-1):
# - quality value ("qvalue")
# - represents both the preference, and the min resource quality
# (in percentage)
# - def: 1
# - 0 is "not acceptable"
# - max 3 decimals
# - calculated by remote variant selection algorithm:
# - it also selects whether to use list or choice
# - e.g. RVSA 1.0:
# - NUM = product of all matching Accept*;q preferences
# - should use list if no NUM > 0, or only used wildcards
# (e.g. Accept-Features: * or Accept: */*)
# - <URI>: fallback
# - proxy-rvsa="VERSION,...": remote variant selection algorithm versions
# - choice:
# - ad-hoc + list
# - "URI" must be relative "FILENAME" for security reasons
# - proxy can choose to only respond with ad-hoc answer by:
# - Variant-Vary [S] -> Vary [S]
# - remove Content-Location [S], Alternates [S]
# - remove ';LIST_ETAG'
# - caching:
# - Vary: Negotiate, Accept*... [S]: list itself (list|choice)
# - Variant-Vary [S]: ad-hoc answer (choice)
# - must append ";LIST_ETAG" to each variant ETAG
# - client:
# - accepts chosen variant (ad-hoc, choice) or ask for a specific one (choice, list)
# - when using TCN: ad-hoc, keep, force accepting chosen variant
#Proactive (ad-hoc) vs reactive (list):
# - proactive:
# - puts work on the server
# - Accept* [C] allows fingerprinting user agent (privacy concern)
# - requires Vary [S] (see Vary [S] for problems associated with it)
# - reactive:
# - puts work on the client
# - might yield better guess
# - one more request
#Prefer: Safe [C]:
# - should not respond with "objectionable" content, e.g. for parental control
# - Firefox, IE10 use OS/browser UI to set it
/=+===============================+=\
/ : : \
)==: MEMENTO :==(
\ :_______________________________: /
\=+===============================+=/
MAIN RESOURCES ==> #URI-R: original resource
# - resource original URI
#URI-G: timegate
# - content negotiation server
#URI-M: memento
# - resource replicating original resource state
# - must keep same status code and headers
# - can be slightly different:
# - URI rewriting:
# - so links (including redirections) point to other mementos
# - optional, so client should figure out other mementos itself if no URI rewriting
# - adding content:
# - e.g. branding or archival status
# - should use Link: <http://mementoweb.org/terms/donotnegotiate>; rel="type" [S]
# - compression
#URI-T: timemap
# - list of mementos for an original resource
RETRIEVAL METHODS ==> # - content negotiation: best for a single resource
# - timemaps: best for several resources
# - mementos list: for single resource, if wants to manually control content negotiation
CONTENT NEGOTATION ==> #To URI-R:
# - client:
# - Accept-Datetime: DATE [C]
# - server:
# - Link: URI-G;rel=timegate [S] ...
# - for any 2**|3**|4** status code, including 404|410
#To URI-G:
# - client:
# - Accept-Datetime: DATE [C]
# - how to pick memento DATE according to requested DATE
# (nearest, lower round, upper round) is implementation-dependant
# - def: most recent
# - has priority over other content negotiation dimensions
# - server:
# - 302 (Found)
# - Location: URI-M [S]
# - Link: URI-R;rel=original [S]
# - Vary: Accept-Datetime [S]
#To URI-M:
# - client:
# - Accept-Datetime: DATE [C]
# - server:
# - Link: URI-R;rel=original [S]
# - Link: URI-G;rel=timegate [S] ...
# - Memento-Datetime: DATE [S]
# - memento's mtime (whereas Last-Modified [S] is actual response mtime)
# - Vary: Memento-Datetime [S]
#... means there can be several Link [S] for same REL
#Special cases:
# - URI-R === URI-G:
# - when origin server and timegate server are the same
# - URI-R response include URI-G response fields
# - URI-G === URI-M:
# - when timegate provides memento itself
# - URI-G response include URI-M response fields
# - 200 (OK) + memento in response body
# - Content-Location: URI-M [S]:
# - if different that current one
# - no Location [S]
# - chain of URI-G:
# - each URI-G but the last one redirect to another timegate:
# - same response except:
# - redirect to URI-G
# - no Vary [S]
# - no URI-G:
# - when set of mementos is not gonna change anymore:
# - snapshot of a discontinued origin server
# - resources that never change after creation
# - URI-R response include URI-M response fields
# - Link: URI-M;rel=memento [S] ...
# - no Link: URI-G;rel=timegate [S]
TIMEMAPS ==> #To URI-R|G|M|T:
# - Link: URI-T;rel=timemap;type="MIME"[;from|until="DATE"] [S] ...
# - from|until: first|last memento
#To URI-T:
# - list of possible URI-M (including datetime) for a given URI-R
# - in response body
# - can also include URI-T and URI-G
# - can specify several URI-T:
# - should be retrieved
# - goal is to divide response into several timemaps for:
# - pagination (should use from|until)
# - indexing|redirections
# - possible formats:
# - Content-Type: application/link-format [S]
# - newline-separated of Link: URI-T|G|R|M;... [S] values
# - "self" instead of "timemap" for current one
# - Content-Type: application/json [S]
# - might have less support than link-format
# - members:
# - original_uri "URI-R"
# - timegate_uri "URI-G"
# - timemap_uri.json|link_format "URI_T"
# - mementos:
# - list MEMENTO_ARR: datetime "DATE", uri "URI-M"
# - first|last|prev|next|closest MEMENTO
# - [pages:]
# - prev|next TIMEMAP:
# - from|until "DATE"
# - uri "URI-T"
# - [memento_compliant "yes|no"]
# - [archive_id STR]
# - [timemap_index: TIMEMAP_ARR]
MEMENTOS LIST ==> #To URI-R|G|M:
# - Link: URI-M;rel=memento;datetime="DATE"[;license="URI"] [S] ...
# - all mementos, including current one
# - optional
# - can optionally add REL first|last|prev|next, predecessor|successor|latest-version
# or working-copy[-of]
# - Link: <http://mementoweb.org/terms/donotnegotiate>; rel="type" [S]
# - specifies there are no mementos
# - client should not use Accept-Datetime [C]
SUPPORT ==> #Client:
# - no browsers natively
# - Chrome extensions: memento-time-travel, mink
# - Python: py-memento-client
#Server:
# - openwayback (Java)
# - pywb (Python)
# - mementoweb timegate (Python)
# - MediaWiki memento extension
# - mementoweb.org/tools/archive/ (Apache, use purl.org)
#Websites implementing it: many web archives, Wikipedia, some CMS, GitHub
#(using .git history)
#Validators: mementoweb.org/tools/validator/
MEMENTO AGGREGATOR ==> #Tools that retrieve timegates|mementos|timemaps by just providing URI
# - either use Memento
# - or if unavailable, check among popular archives
#Examples:
# - Memento timetravel REST API
/=+===============================+=\
/ : : \
)==: LINKS :==(
\ :_______________________________: /
\=+===============================+=/
URI SCHEME ==> #http[s]:
# - relative URI can only be absolute path reference (/PATH/...)
# - HOSTNAME[:PORT]:
# - required
# - if relative URI, uses:
# - (HTTP/2) :authority HOSTNAME[:PORT] [C]
# - (HTTP/1.1) Host: HOSTNAME[:PORT] [C]
# - IP address or DNS name
# - case sensitivity depends on server
# - PATH: def is "/"
#http: