New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP server should support HEAD method against Python-generated content #256

Closed
danbri opened this Issue Jan 22, 2015 · 10 comments

Comments

Projects
None yet
2 participants
@danbri
Contributor

danbri commented Jan 22, 2015

Commandline curl -I flag fails ("HTTP/1.1 405 Method Not Allowed") on all our python-generated URLs.

http://curl.haxx.se/docs/manpage.html

-i, --include

(HTTP) Include the HTTP-header in the output. The HTTP-header includes things like server-name, date of the document, HTTP-version and more...

-I, --head

(HTTP/FTP/FILE) Fetch the HTTP-header only! HTTP-servers feature the command HEAD which this uses to get nothing but the header of a document. When used on an FTP or FILE file, curl displays the file size and last modification time only.

Appengine automatically responds to HTTP HEAD for static URLs e.g. most of docs/* but for python scripted content we need to implement a head function, see https://cloud.google.com/appengine/docs/python/tools/webapp/requesthandlerclass#RequestHandler_head for details.

See the corresponding 'get' function for details, https://github.com/schemaorg/schemaorg/blob/master/api.py#L865

It is not a huge task but needs some care to deal with the content negotiation (HTML vs JSON-LD) of the homepage.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Jan 22, 2015

Contributor

There are two main cases here. The homepage (HTML vs JSON-LD) and the per-term documentation. To be able to get the content-length info needed for HEAD would require some reorg / tidying so that content generation was more nicely separated from sending it via GET. Other cases like favicon, docs/full.html too.

Contributor

danbri commented Jan 22, 2015

There are two main cases here. The homepage (HTML vs JSON-LD) and the per-term documentation. To be able to get the content-length info needed for HEAD would require some reorg / tidying so that content generation was more nicely separated from sending it via GET. Other cases like favicon, docs/full.html too.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri May 22, 2015

Contributor

From @mfhepp in a duplicate report:

Also note that the current implementation does not support HTTP HEAD requests, which it should, IMO.

Evidence:

$ curl -i http://www.schema.org/
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Date: Thu, 21 May 2015 06:18:00 GMT
Server: Google Frontend
Content-Length: 4630
Alternate-Protocol: 80:quic,p=0

$ curl -i http://www.schema.org/Thing
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Date: Thu, 21 May 2015 06:21:07 GMT
Server: Google Frontend
Content-Length: 20554
Alternate-Protocol: 80:quic,p=0

$ curl -I http://schema.org
HTTP/1.1 405 Method Not Allowed
Allow: GET
Content-Type: text/html; charset=UTF-8
Content-Length: 188
Date: Thu, 21 May 2015 06:23:06 GMT
Server: Google Frontend
Alternate-Protocol: 80:quic,p=0

Contributor

danbri commented May 22, 2015

From @mfhepp in a duplicate report:

Also note that the current implementation does not support HTTP HEAD requests, which it should, IMO.

Evidence:

$ curl -i http://www.schema.org/
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Date: Thu, 21 May 2015 06:18:00 GMT
Server: Google Frontend
Content-Length: 4630
Alternate-Protocol: 80:quic,p=0

$ curl -i http://www.schema.org/Thing
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Date: Thu, 21 May 2015 06:21:07 GMT
Server: Google Frontend
Content-Length: 20554
Alternate-Protocol: 80:quic,p=0

$ curl -I http://schema.org
HTTP/1.1 405 Method Not Allowed
Allow: GET
Content-Type: text/html; charset=UTF-8
Content-Length: 188
Date: Thu, 21 May 2015 06:23:06 GMT
Server: Google Frontend
Alternate-Protocol: 80:quic,p=0

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Aug 25, 2015

Contributor

And @timbl nudges us on this again. /cc @RichardWallis

The tricky part is the Content-Length:, as it requires " def get(self, node):" to execute (e.g. on /Person) yet not emit any actual content over the connection. I've had a look around to see if there's a best practice for this, but didn't find much.

Contributor

danbri commented Aug 25, 2015

And @timbl nudges us on this again. /cc @RichardWallis

The tricky part is the Content-Length:, as it requires " def get(self, node):" to execute (e.g. on /Person) yet not emit any actual content over the connection. I've had a look around to see if there's a best practice for this, but didn't find much.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Aug 25, 2015

Contributor

Options:

Contributor

danbri commented Aug 25, 2015

Options:

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Aug 25, 2015

Contributor

In sdoapp.py
def emitExactTermPage(self, node, layers="core"):
# [...]
self.response.write(self.AddCachedText(node, self.outputStrings, layers))

we could handle a lot of things here, e.g. add a muted=False or restructure to distinguish generating the content from emitting it.

Contributor

danbri commented Aug 25, 2015

In sdoapp.py
def emitExactTermPage(self, node, layers="core"):
# [...]
self.response.write(self.AddCachedText(node, self.outputStrings, layers))

we could handle a lot of things here, e.g. add a muted=False or restructure to distinguish generating the content from emitting it.

@danbri danbri removed the type:enhancement label Mar 7, 2016

RichardWallis added a commit that referenced this issue Apr 6, 2016

Introduced handling of HTTP HEAD Requests
Also impemented 304 return codes for ETAG & modified since requests.
Addresses issues (#256) and (#1024)
@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 6, 2016

Contributor

I've merged @RichardWallis 's implementation, which is now live on our testing / staging site: http://webschemas.org/ ... please take a look if you care about this issue!

Contributor

danbri commented Apr 6, 2016

I've merged @RichardWallis 's implementation, which is now live on our testing / staging site: http://webschemas.org/ ... please take a look if you care about this issue!

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 7, 2016

Contributor

At this point the change is live on our staging / testing server webschemas.org but not yet at schema.org. For example:

$ curl --head http://schema.org/Person
HTTP/1.1 405 Method Not Allowed
Allow: GET
Content-Type: text/html; charset=UTF-8
Content-Length: 188
Date: Thu, 07 Apr 2016 13:22:08 GMT
Server: Google Frontend

$ curl --head http://webschemas.org/Person
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Access-Control-Allow-Origin: *
Cache-Control: public, max-age=43200
Vary: Accept, Accept-Encoding
Last-Modified: Wed, 06 Apr 2016 21:23:00 UTC
ETag: 24751160406212300a-667451055
Content-Length: 101593
Date: Thu, 07 Apr 2016 13:22:15 GMT
Server: Google Frontend

Contributor

danbri commented Apr 7, 2016

At this point the change is live on our staging / testing server webschemas.org but not yet at schema.org. For example:

$ curl --head http://schema.org/Person
HTTP/1.1 405 Method Not Allowed
Allow: GET
Content-Type: text/html; charset=UTF-8
Content-Length: 188
Date: Thu, 07 Apr 2016 13:22:08 GMT
Server: Google Frontend

$ curl --head http://webschemas.org/Person
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Access-Control-Allow-Origin: *
Cache-Control: public, max-age=43200
Vary: Accept, Accept-Encoding
Last-Modified: Wed, 06 Apr 2016 21:23:00 UTC
ETag: 24751160406212300a-667451055
Content-Length: 101593
Date: Thu, 07 Apr 2016 13:22:15 GMT
Server: Google Frontend

@RichardWallis

This comment has been minimized.

Show comment
Hide comment
@RichardWallis

RichardWallis Apr 7, 2016

Contributor

As noted in the Pull Request (#1080), this implementation also introduces the functionality to respond with a HTTP '304 - Not Modified' when receiving either If-None-Match or If-Unmodified-Since request header values containing the appropriate values.

Contributor

RichardWallis commented Apr 7, 2016

As noted in the Pull Request (#1080), this implementation also introduces the functionality to respond with a HTTP '304 - Not Modified' when receiving either If-None-Match or If-Unmodified-Since request header values containing the appropriate values.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri

danbri Apr 7, 2016

Contributor

A detail: should etags be suppressed on 404s? curl --head http://webschemas.org/QWERTY etc.

http://python.6.x6.nabble.com/etag-and-404-td102123.html debates that but seems to settle on avoiding 404 etags.

Contributor

danbri commented Apr 7, 2016

A detail: should etags be suppressed on 404s? curl --head http://webschemas.org/QWERTY etc.

http://python.6.x6.nabble.com/etag-and-404-td102123.html debates that but seems to settle on avoiding 404 etags.

@danbri

This comment has been minimized.

Show comment
Hide comment
@danbri
Contributor

danbri commented Aug 10, 2016

@danbri danbri closed this Aug 10, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment