Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP server should support HEAD method against Python-generated content #256

Closed
danbri opened this issue Jan 22, 2015 · 10 comments
Closed
Assignees
Labels
site tools + python code Infrastructural issues around schema.org site. Most can ignore this!
Milestone

Comments

@danbri
Copy link
Contributor

danbri commented Jan 22, 2015

Commandline curl -I flag fails ("HTTP/1.1 405 Method Not Allowed") on all our python-generated URLs.

http://curl.haxx.se/docs/manpage.html

-i, --include

(HTTP) Include the HTTP-header in the output. The HTTP-header includes things like server-name, date of the document, HTTP-version and more...

-I, --head

(HTTP/FTP/FILE) Fetch the HTTP-header only! HTTP-servers feature the command HEAD which this uses to get nothing but the header of a document. When used on an FTP or FILE file, curl displays the file size and last modification time only.

Appengine automatically responds to HTTP HEAD for static URLs e.g. most of docs/* but for python scripted content we need to implement a head function, see https://cloud.google.com/appengine/docs/python/tools/webapp/requesthandlerclass#RequestHandler_head for details.

See the corresponding 'get' function for details, https://github.com/schemaorg/schemaorg/blob/master/api.py#L865

It is not a huge task but needs some care to deal with the content negotiation (HTML vs JSON-LD) of the homepage.

@danbri danbri added enhancement site tools + python code Infrastructural issues around schema.org site. Most can ignore this! labels Jan 22, 2015
@danbri danbri self-assigned this Jan 22, 2015
@danbri
Copy link
Contributor Author

danbri commented Jan 22, 2015

There are two main cases here. The homepage (HTML vs JSON-LD) and the per-term documentation. To be able to get the content-length info needed for HEAD would require some reorg / tidying so that content generation was more nicely separated from sending it via GET. Other cases like favicon, docs/full.html too.

@danbri
Copy link
Contributor Author

danbri commented May 22, 2015

From @mfhepp in a duplicate report:

Also note that the current implementation does not support HTTP HEAD requests, which it should, IMO.

Evidence:

$ curl -i http://www.schema.org/
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Date: Thu, 21 May 2015 06:18:00 GMT
Server: Google Frontend
Content-Length: 4630
Alternate-Protocol: 80:quic,p=0

$ curl -i http://www.schema.org/Thing
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Date: Thu, 21 May 2015 06:21:07 GMT
Server: Google Frontend
Content-Length: 20554
Alternate-Protocol: 80:quic,p=0

$ curl -I http://schema.org
HTTP/1.1 405 Method Not Allowed
Allow: GET
Content-Type: text/html; charset=UTF-8
Content-Length: 188
Date: Thu, 21 May 2015 06:23:06 GMT
Server: Google Frontend
Alternate-Protocol: 80:quic,p=0

@danbri
Copy link
Contributor Author

danbri commented Aug 25, 2015

And @timbl nudges us on this again. /cc @RichardWallis

The tricky part is the Content-Length:, as it requires " def get(self, node):" to execute (e.g. on /Person) yet not emit any actual content over the connection. I've had a look around to see if there's a best practice for this, but didn't find much.

@danbri
Copy link
Contributor Author

danbri commented Aug 25, 2015

Options:

@danbri
Copy link
Contributor Author

danbri commented Aug 25, 2015

In sdoapp.py
def emitExactTermPage(self, node, layers="core"):
# [...]
self.response.write(self.AddCachedText(node, self.outputStrings, layers))

we could handle a lot of things here, e.g. add a muted=False or restructure to distinguish generating the content from emitting it.

RichardWallis pushed a commit that referenced this issue Apr 6, 2016
Also impemented 304 return codes for ETAG & modified since requests.
Addresses issues (#256) and (#1024)
@danbri
Copy link
Contributor Author

danbri commented Apr 6, 2016

I've merged @RichardWallis 's implementation, which is now live on our testing / staging site: http://webschemas.org/ ... please take a look if you care about this issue!

@danbri
Copy link
Contributor Author

danbri commented Apr 7, 2016

At this point the change is live on our staging / testing server webschemas.org but not yet at schema.org. For example:

$ curl --head http://schema.org/Person
HTTP/1.1 405 Method Not Allowed
Allow: GET
Content-Type: text/html; charset=UTF-8
Content-Length: 188
Date: Thu, 07 Apr 2016 13:22:08 GMT
Server: Google Frontend

$ curl --head http://webschemas.org/Person
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Access-Control-Allow-Origin: *
Cache-Control: public, max-age=43200
Vary: Accept, Accept-Encoding
Last-Modified: Wed, 06 Apr 2016 21:23:00 UTC
ETag: 24751160406212300a-667451055
Content-Length: 101593
Date: Thu, 07 Apr 2016 13:22:15 GMT
Server: Google Frontend

@RichardWallis
Copy link
Contributor

As noted in the Pull Request (#1080), this implementation also introduces the functionality to respond with a HTTP '304 - Not Modified' when receiving either If-None-Match or If-Unmodified-Since request header values containing the appropriate values.

@danbri
Copy link
Contributor Author

danbri commented Apr 7, 2016

A detail: should etags be suppressed on 404s? curl --head http://webschemas.org/QWERTY etc.

http://python.6.x6.nabble.com/etag-and-404-td102123.html debates that but seems to settle on avoiding 404 etags.

@danbri
Copy link
Contributor Author

danbri commented Aug 10, 2016

Published - see http://schema.org/docs/releases.html

@danbri danbri closed this as completed Aug 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site tools + python code Infrastructural issues around schema.org site. Most can ignore this!
Projects
None yet
Development

No branches or pull requests

2 participants