HTTP server should support HEAD method against Python-generated content #256

Closed
danbri opened this Issue Jan 22, 2015 · 10 comments

Projects

None yet

2 participants

@danbri
Contributor
danbri commented Jan 22, 2015

Commandline curl -I flag fails ("HTTP/1.1 405 Method Not Allowed") on all our python-generated URLs.

http://curl.haxx.se/docs/manpage.html

-i, --include

(HTTP) Include the HTTP-header in the output. The HTTP-header includes things like server-name, date of the document, HTTP-version and more...

-I, --head

(HTTP/FTP/FILE) Fetch the HTTP-header only! HTTP-servers feature the command HEAD which this uses to get nothing but the header of a document. When used on an FTP or FILE file, curl displays the file size and last modification time only.

Appengine automatically responds to HTTP HEAD for static URLs e.g. most of docs/* but for python scripted content we need to implement a head function, see https://cloud.google.com/appengine/docs/python/tools/webapp/requesthandlerclass#RequestHandler_head for details.

See the corresponding 'get' function for details, https://github.com/schemaorg/schemaorg/blob/master/api.py#L865

It is not a huge task but needs some care to deal with the content negotiation (HTML vs JSON-LD) of the homepage.

@danbri danbri self-assigned this Jan 22, 2015
@danbri
Contributor
danbri commented Jan 22, 2015

There are two main cases here. The homepage (HTML vs JSON-LD) and the per-term documentation. To be able to get the content-length info needed for HEAD would require some reorg / tidying so that content generation was more nicely separated from sending it via GET. Other cases like favicon, docs/full.html too.

@danbri
Contributor
danbri commented May 22, 2015

From @mfhepp in a duplicate report:

Also note that the current implementation does not support HTTP HEAD requests, which it should, IMO.

Evidence:

$ curl -i http://www.schema.org/
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Date: Thu, 21 May 2015 06:18:00 GMT
Server: Google Frontend
Content-Length: 4630
Alternate-Protocol: 80:quic,p=0

$ curl -i http://www.schema.org/Thing
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Date: Thu, 21 May 2015 06:21:07 GMT
Server: Google Frontend
Content-Length: 20554
Alternate-Protocol: 80:quic,p=0

$ curl -I http://schema.org
HTTP/1.1 405 Method Not Allowed
Allow: GET
Content-Type: text/html; charset=UTF-8
Content-Length: 188
Date: Thu, 21 May 2015 06:23:06 GMT
Server: Google Frontend
Alternate-Protocol: 80:quic,p=0

@danbri
Contributor
danbri commented Aug 25, 2015

And @timbl nudges us on this again. /cc @RichardWallis

The tricky part is the Content-Length:, as it requires " def get(self, node):" to execute (e.g. on /Person) yet not emit any actual content over the connection. I've had a look around to see if there's a best practice for this, but didn't find much.

@danbri
Contributor
danbri commented Aug 25, 2015

Options:

@danbri
Contributor
danbri commented Aug 25, 2015

In sdoapp.py
def emitExactTermPage(self, node, layers="core"):
# [...]
self.response.write(self.AddCachedText(node, self.outputStrings, layers))

we could handle a lot of things here, e.g. add a muted=False or restructure to distinguish generating the content from emitting it.

@danbri danbri removed the type:enhancement label Mar 7, 2016
@RichardWallis RichardWallis added a commit that referenced this issue Apr 6, 2016
@RichardWallis RichardWallis Introduced handling of HTTP HEAD Requests
Also impemented 304 return codes for ETAG & modified since requests.
Addresses issues (#256) and (#1024)
8a473b3
@danbri
Contributor
danbri commented Apr 6, 2016

I've merged @RichardWallis 's implementation, which is now live on our testing / staging site: http://webschemas.org/ ... please take a look if you care about this issue!

@danbri
Contributor
danbri commented Apr 7, 2016

At this point the change is live on our staging / testing server webschemas.org but not yet at schema.org. For example:

$ curl --head http://schema.org/Person
HTTP/1.1 405 Method Not Allowed
Allow: GET
Content-Type: text/html; charset=UTF-8
Content-Length: 188
Date: Thu, 07 Apr 2016 13:22:08 GMT
Server: Google Frontend

$ curl --head http://webschemas.org/Person
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Access-Control-Allow-Origin: *
Cache-Control: public, max-age=43200
Vary: Accept, Accept-Encoding
Last-Modified: Wed, 06 Apr 2016 21:23:00 UTC
ETag: 24751160406212300a-667451055
Content-Length: 101593
Date: Thu, 07 Apr 2016 13:22:15 GMT
Server: Google Frontend

@RichardWallis
Contributor

As noted in the Pull Request (#1080), this implementation also introduces the functionality to respond with a HTTP '304 - Not Modified' when receiving either If-None-Match or If-Unmodified-Since request header values containing the appropriate values.

@danbri
Contributor
danbri commented Apr 7, 2016

A detail: should etags be suppressed on 404s? curl --head http://webschemas.org/QWERTY etc.

http://python.6.x6.nabble.com/etag-and-404-td102123.html debates that but seems to settle on avoiding 404 etags.

@danbri
Contributor
danbri commented Aug 10, 2016
@danbri danbri closed this Aug 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment