Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Last-Modified Header equal to the lastcommitdate of a file. #18354

Closed
tuxpowered opened this issue Jan 21, 2022 · 3 comments · Fixed by #18356
Closed

Add Last-Modified Header equal to the lastcommitdate of a file. #18354

tuxpowered opened this issue Jan 21, 2022 · 3 comments · Fixed by #18356
Labels
type/feature Completely new functionality. Can only be merged if feature freeze is not active.

Comments

@tuxpowered
Copy link

Feature Description

In use cases where you may need a simple script (bash, python, etc) to be the current version, it would be helpful for the Gitea server to provide the "Last-Modified" header set to the lastcommitdate.

This would allow a script to 'self check' the headers on any direct file link, and perform a local check to see if the gitea version is of a newer version than the local copy.

This is especially useful where you may not be able (or allowed) to install git to run on a system.

Screenshots

No response

@luwol03
Copy link

luwol03 commented Jan 22, 2022

This would also be useful if you're using something like docsify (a static documentation tool) which pulls its content through the raw files. The last modified header from github is already used to display a last modified date on every page you open from the documentation.

@zeripath
Copy link
Contributor

The last modified is a very weak marker of change for a tree entry and is in fact fairly poorly defined.

Trees in git do not store last modifieds for their entries or themselves. There is I think a date associated with the creation of the tree but it's very hard to get and it's not something you're meant to use. (Similarly for blobs.)

So a last modified header only makes sense in terms of a tree entry of a commit or branch (which is simply a pointer to a commit.)

Now commits actually have two dates: author and commit. Neither is actually enforced really however it's most common that the commit date is the one that really means the thing you think it means.

Then there's no rule that a commit date actually has to be after the commit date of the parents. (Although this is really rare.)

Now obviously we shouldn't just use the commit date for the tree entry as the entry likely wasn't changed at that commit.

We can use a git log command to find the last time we think a commit modified that path - this is not as cheap as you might think - but actually we cache this so we can use that.

However, a branch is not immutable and is simply a pointer to a commit. It's possible to change the branch to a completely different hierarchy which has different last modifieds but same blobs and vice versa.

In fact the only strong marker of cacheability and change is that of the blob sha which is provided as the etag.

zeripath added a commit to zeripath/gitea that referenced this issue Jan 22, 2022
Although the use of LastModified dates for caching of git objects should be
discouraged (as it is not native to git - and there are a LOT of ways this
could be incorrect) - LastModified dates can be a helpful somewhat more human
way of caching for simple cases.

This PR adds this header and handles the If-Modified-Since header to the /raw/
routes.

Fix go-gitea#18354

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor

So the take-home message is that the last modified date is simply not a native part of git and is in fact a really poor indicator of change of an object.

The correct way to detect if your object is out of date is to keep a copy of the hash (or regenerate it) and then compare that.

The git hash for an object can be generated (with some caveats - lfs and filters come to mind) using git hash-object (see https://git-scm.com/docs/git-hash-object). You can get the hash for a checked out object using git ls-files -c too. We should be able to get our API to return the blob SHA too on upload.

(If you do not have git installed - then generating a hash (again caveats) - can still be done using sha1sum (some very serious caveats) and prepending the blob input with blob ${size}\000 see https://git-scm.com/book/en/v2/Git-Internals-Git-Objects#_object_storage. However, please use git hash-object or just remember it.)

Almost every use for last-modified date especially for caching would be better handled using the sha/etag and there are certainly cases where using a last modified date over the sha will cause unexpected problems. The cases where a last modified date would be better than a sha would be better served using a different branch.

That being said I know humans like to see a modified date and because I do not know every use case, nor can I predict every use case, I can see how a last modified date could be useful so hence the above PR.

However I would strongly advise that you use the etag/sha instead.

@zeripath zeripath added the type/feature Completely new functionality. Can only be merged if feature freeze is not active. label Jan 30, 2022
6543 added a commit that referenced this issue May 9, 2022
Although the use of LastModified dates for caching of git objects should be
discouraged (as it is not native to git - and there are a LOT of ways this
could be incorrect) - LastModified dates can be a helpful somewhat more human
way of caching for simple cases.

This PR adds this header and handles the If-Modified-Since header to the /raw/
routes.

Fix #18354

Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: 6543 <6543@obermui.de>
AbdulrhmnGhanem pushed a commit to kitspace/gitea that referenced this issue Aug 24, 2022
Although the use of LastModified dates for caching of git objects should be
discouraged (as it is not native to git - and there are a LOT of ways this
could be incorrect) - LastModified dates can be a helpful somewhat more human
way of caching for simple cases.

This PR adds this header and handles the If-Modified-Since header to the /raw/
routes.

Fix go-gitea#18354

Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: 6543 <6543@obermui.de>
@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/feature Completely new functionality. Can only be merged if feature freeze is not active.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants