Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serve LFS/attachment with http.ServeContent to Support Range-Request #18448

Closed
wants to merge 13 commits into from
Closed

Serve LFS/attachment with http.ServeContent to Support Range-Request #18448

wants to merge 13 commits into from

Conversation

marxangels
Copy link

The ServeData function now can't handle Range Request.
Due to the complexity of this part of HTTP protocol, it is difficult to fully implement it and requires a lot of testing.
The official module http.ServeContent has this implementation, and we can replace with it directly.
It can handle If-Match, If-Unmodified-Since, If-None-Match, If-Modified-Since, If-Range and more, perfectly!

Thanks for @wxiaoguang 's tips.

@codecov-commenter

This comment has been minimized.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 29, 2022
@wxiaoguang
Copy link
Contributor

I haven't read through all related code carefully, but I have a feeling that if we call http.ServeContent directly, it would break the current behavior.

Currectly, Gitea's common.ServeData calls typesniffer.DetectContentType(buf) to do deep mime-type detection.

We had better keep the behavior. Since we could get the ReadSeekCloser, it's easy to read the a few bytes from beginning to do the DetectContentType. And it's better to work on common.ServeData directly to introduce http.ServeContent, indeed common.ServeData does do some specialized work for the response.

@marxangels
Copy link
Author

According to my experience, the existing mime detection is problematic, some mp3/mp4 files can not be recognized accurately.
And the http.ServeContent found no errors so far.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jan 30, 2022

Yes, there are related issues like:

Although I also think the typesniffer.DetectContentType might not be necessary and buggy for HTTP file serving, the feature was already there.

If this PR need to change the existing behavior, it need more people @go-gitea/maintainers to take a look at it.

My opinion: I prefer to use extension name to get mime types instead of typesniffer.DetectContentType (with some special case handling, eg: detecting text files without extension like README). The reason is: most files are using correct extension names, while the detection is buggy sometimes and no real benefit for files with extensions.

@lunny
Copy link
Member

lunny commented Jan 30, 2022

I think the security headers should be kept. And another problem is size is missed. But in fact we can get it.

@lunny
Copy link
Member

lunny commented Jan 30, 2022

size should be dropped to support range.

@lunny
Copy link
Member

lunny commented Jan 30, 2022

The function served attachment/git blob/git lfs file, so I think the name ServeLargeFile is not very suitable.

@lunny lunny added this to the 1.17.0 milestone Jan 30, 2022
@lunny lunny added the type/enhancement An improvement of existing functionality label Jan 30, 2022
@marxangels
Copy link
Author

size should be dropped to support range.

http.ServeContent uses a seek to the end of the content to determine its size.

@marxangels
Copy link
Author

marxangels commented Jan 30, 2022

The function served attachment/git blob/git lfs file, so I think the name ServeLargeFile is not very suitable.

git blob is served by the existing function common.ServeData now.
git blob doesn't implement a Seeker interface, so we should separate it into two functions.
Or we must implement a Seek function for the git blob Struct.

@zeripath
Copy link
Contributor

It appears that there's also changes to the mime type detection?

routers/common/repo.go Outdated Show resolved Hide resolved
routers/common/repo.go Outdated Show resolved Hide resolved
@KN4CK3R

This comment was marked as off-topic.

// ServeBlob download a git.Blob
func setCommonHeaders(ctx *context.Context, name string, data interface{}) error {
// Google Chrome dislike commas in filenames, so let's change it to a space
name = strings.ReplaceAll(name, ",", " ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto from the other PR:

I don't think this is needed actually, from what I remember about this weird bug, if you quote the filename(as being done in this function) chromium won't error about it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange. Is it dependent on different chromium versions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a historical problem, I believe modern browsers (and modern frameworks) can handle it correctly, and we do not need hacky methods.

https://stackoverflow.com/questions/93551/how-to-encode-the-filename-parameter-of-content-disposition-header-in-http

@@ -73,19 +74,22 @@ func TestDownloadRawTextFileWithoutMimeTypeMapping(t *testing.T) {
req := NewRequest(t, "GET", "/user2/repo2/raw/branch/master/test.xml")
resp := session.MakeRequest(t, req, http.StatusOK)

assert.Equal(t, "text/plain; charset=utf-8", resp.HeaderMap.Get("Content-Type"))
assert.Equal(t, "text/xml; charset=utf-8", resp.HeaderMap.Get("Content-Type"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> RawTextFileWithoutMimeTypeMapping this test is intended to not not find a specific mime type

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but the type-detection is now taken care by mime.TypeByExtension. And the default mime of XML file is text/xml.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type-detection of text files has known logic confusion now. we may fix it after this PR.
There are too many modifications in this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test a specific mime type not found, we can use any unknown extension, such as .foo .xyz

@marxangels marxangels changed the title handle LFS attachment/download with http.ServeContent Serve LFS/attachment with http.ServeContent to Support Range-Request Feb 5, 2022
routers/common/repo.go Outdated Show resolved Hide resolved
@stale
Copy link

stale bot commented Apr 16, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions.

@stale stale bot added the issue/stale label Apr 16, 2022
@lunny lunny added the issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented label Apr 16, 2022
@stale stale bot removed the issue/stale label Apr 16, 2022
var cs string
var err error
if reader, ok := data.(io.ReadSeeker); ok {
cs, err = charset.DetectEncodingFromReader(reader)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a seeker reader explicit for http.ServeContent - else change the detection to read in a buffer and use multi reader would also be a good option

@wxiaoguang wxiaoguang linked an issue May 1, 2022 that may be closed by this pull request
@@ -78,15 +85,35 @@ func DetectContentType(data []byte) SniffedType {
data = data[:sniffLen]
}

if (strings.Contains(ct, "text/plain") || strings.Contains(ct, "text/html")) && svgTagRegex.Match(data) ||
strings.Contains(ct, "text/xml") && svgTagInXMLRegex.Match(data) {
if (strings.HasPrefix(ct, "text/plain") || strings.HasPrefix(ct, "text/html")) && svgTagRegex.Match(data) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep Contains?

@lunny lunny modified the milestones: 1.17.0, 1.18.0 Jun 5, 2022
@LJFan
Copy link

LJFan commented Jul 18, 2022

I need this feature badly. Can anyone merge it into the main branch?

integrations/download_test.go Outdated Show resolved Hide resolved
integrations/download_test.go Outdated Show resolved Hide resolved
@@ -21,6 +26,10 @@ func newMimeTypeMap() {
m := make(map[string]string, len(keys))
for _, key := range keys {
m[strings.ToLower(key.Name())] = key.Value()
err := mime.AddExtensionType(key.Name(), key.Value())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please dont add unrelated changes in here

@6543
Copy link
Member

6543 commented Jul 25, 2022

I need this feature badly. Can anyone merge it into the main branch?

as soon as it's ready ...

@marxangels marxangels closed this Jul 25, 2022
@6543
Copy link
Member

6543 commented Jul 25, 2022

@marxangels whats going on?

@silverwind
Copy link
Member

Maybe he lost interest, I think we should re-raise the PR in a cleaned up form. It has some valuable stuff in it.

@6543
Copy link
Member

6543 commented Jul 25, 2022

-> #20480

I put it on my list of WIPs ...

@6543
Copy link
Member

6543 commented Jul 25, 2022

... will work on it if i got a timeslot for it ... but feel free to also push to it ...

@lunny lunny removed this from the 1.18.0 milestone Dec 20, 2022
@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LFS] video player cannot be controlled