Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Conversation

mrnugget
Copy link
Contributor

@mrnugget mrnugget commented Nov 10, 2022

(@jhchabran editing)

This PR adds a new endpoint, /.api/blame/:repo+rev/stream/:path which streams back the result from running git blame with the --incremental flag added.

If the "enable-streaming-git-blame" feature flag is enabled, the client will use that route to fetch back the hunks instead of going through GraphQL.

To reviewers: the backend code is reviewable, for the client part, @philipp-spiess will do another quick pass on monday morning. cc @mrnugget if you want to take a look, as I can't add you as a reviewer on your own PR.

Test plan

Tested on the scaletesting instance. Will be enabled on s2 as soon as it get merged.

See the results: https://github.com/sourcegraph/sourcegraph/pull/44199#issuecomment-1319035227

@cla-bot cla-bot bot added the cla-signed label Nov 10, 2022
@jhchabran jhchabran self-assigned this Nov 10, 2022
@sg-e2e-regression-test-bob
Copy link

sg-e2e-regression-test-bob commented Nov 10, 2022

Bundle size report 📦

Initial size Total size Async size Modules
0.00% (0.00 kb) 0.01% (+1.27 kb) 0.01% (+1.27 kb) 0.00% (0)

Look at the Statoscope report for a full comparison between the commits 55825b2 and c514c53 or learn more.

Open explanation
  • Initial size is the size of the initial bundle (the one that is loaded when you open the page)
  • Total size is the size of the initial bundle + all the async loaded chunks
  • Async size is the size of all the async loaded chunks
  • Modules is the number of modules in the initial bundle

@mrnugget mrnugget force-pushed the mrn+jh/streaming-git-blame branch from 71a5a3f to 3ede673 Compare November 11, 2022 11:00
@jhchabran
Copy link
Contributor

@mrnugget I've tested our new code against the renderer, by just reading things on the server instead of streaming to the client, it works as intented. At this point, I'm inclined to reach out to @philipp-spiess to see if we can get started on the frontend side.

@jhchabran
Copy link
Contributor

jhchabran commented Nov 14, 2022

@philipp-spiess in order to update the frontend to use that new feature:

  • I have put the route behind a feature flag enable-streaming-git-blame (bool), it will return a 404 otherwise.
    • Can we shield the POC on your side behind the same feature flag on the client side?
  • At this stage, our primary target is really to POC this more than anything else and test it on the scale-testing instance. Therefore, we're looking at a crude hack at this stage, as long as it tells us if blaming large and old files works better than before, we're good.
  • The API now has a new route: `:repo/glob/-/stream-blame/:path'.
    • It returns json lines and not JSON, to accomodate with streaming the responses.
    • It looks like this: [{hunk}, {hunk}]\n[{hunk, ...}]
      • The number of hunks in each batch is not fixed.
      • There are a few differences with the graphql structure which I pasted below for your convenience, we can work out if we want to clean stuff once we have validated the POC.
      • Yes, the (Start|End)Bytes are at 0, which is normal (that's a consequence of using --incremental with git blame but this should not be a problem as we don't need that data to display things in the UI.
[{"StartLine":110,"EndLine":111,"StartByte":0,"EndByte":0,"CommitID":"599b6b513a72fe233b15d2f3388c349545d7fa49","Author":{"Name":"Stephen Gutekanst","Email":"slimsag@users.noreply.github.com","Date":"2018-10-01T22:10:43Z"},"Message":"docs: add CONTRIBUTING + open-source FAQ (#178)","Filename":"README.md"}]
[{"StartLine":95,"EndLine":96,"StartByte":0,"EndByte":0,"CommitID":"9be8ad00d0795756fc330b70749dccea80505a75","Author":{"Name":"Beyang Liu","Email":"beyang@sourcegraph.com","Date":"2018-10-01T19:04:55Z"},"Message":"update README","Filename":"README.md"}]
[{"StartLine":100,"EndLine":102,"StartByte":0,"EndByte":0,"CommitID":"9be8ad00d0795756fc330b70749dccea80505a75","Author":{"Name":"Beyang Liu","Email":"beyang@sourcegraph.com","Date":"2018-10-01T19:04:55Z"},"Message":"update README","Filename":"README.md"}]
[{"StartLine":103,"EndLine":104,"StartByte":0,"EndByte":0,"CommitID":"02a03f54c15c3f86fcec2aed2f2b00adae5cd237","Author":{"Name":"Sourcegraph","Email":"hi@sourcegraph.com","Date":"2018-10-01T06:08:12Z"},"Message":"Publish Sourcegraph as open source 🚀","Filename":"README.md"}]

You can test this for yourself by running the following curL command:

curl 'https://sourcegraph.test:3443/github.com/sourcegraph/sourcegraph/-/stream-blame/README.md' -H 'authority: sourcegraph.test:3443' -H 'accept: application/json' -H 'cache-control: no-cache' -H 'content-type: application/json' -H 'cookie: PASTE_FROM_AN_EXISTING_SESSION' -H 'origin: https://sourcegraph.test:3443' -H 'referer: https://sourcegraph.test:3443/github.com/stretchr/testify/-/blob/_codegen/main.go?subtree=true' -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36' -v

As a reference, here is what we're sending back through the normal graphql call:

{
  "data": {
    "repository": {
      "commit": {
        "blob": {
          "blame": [
            {
              "startLine": 16,
              "endLine": 17,
              "author": {
                "person": {
                  "email": "frenchben@docker.com",
                  "displayName": "French Ben",
                  "user": null
                },
                "date": "2018-01-31T14:01:02Z"
              },
              "message": "Update error output to be clean",
              "rev": "bbca6551549492486ca1b0f8dee45553dd6aa6d7",
              "commit": {
                "url": "/github.com/hashicorp/go-multierror/-/commit/bbca6551549492486ca1b0f8dee45553dd6aa6d7"
              }
            },
           // ...
          ]
        }
      }
    }
  }
}

🤝 @philipp-spiess if you need anything from me regarding this, just reach me out, I'll make myself available so we can progress on this asap 🙏.

if _, err = w.Write(encoded); err != nil {
return err
}
flusher.Flush()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that flushing here is not desirable since flushing is a blocking operation that will likely be slower than just marshaling the next chunk. The OS flushing is almost definitely good enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from the streaming-search server-sent events write: https://github.com/sourcegraph/sourcegraph/blob/3033610b103e82904dfa7eabd33e1494c8520d3a/internal/search/streaming/http/writer.go#L100 They flush after every event/data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we wrap that writer in a buffer, which collects events until we get to 32KB payloads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting. But that's not the OS flushing then, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. We don't use OS flushing because we buffer JSON entries, then build the JSON array on flush, so it's not actually raw bytes we're buffering.

@philipp-spiess
Copy link
Contributor

@jhchabran @mrnugget

https://github.com/sourcegraph/sourcegraph/pull/44476 😎

Most of the stuff here is pretty straight forward. I decided to add an early-flush once we have received the first 50 hunks and emit those so we can have a fast initial render and then wait for the remainder of the hunks to go through before doing the final flush.

This is something we can tweak later on though!

@mrnugget
Copy link
Contributor Author

It looks like the two of you already talked about this, but just so it's recorded here, here's what I sent to JH yesterday:

@philipp-spiess philipp-spiess requested a review from a team November 22, 2022 12:08
Co-authored-by: Taras Yemets <yemets.taras@gmail.com>
Copy link
Contributor

@taras-yemets taras-yemets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontend changes look good to me.
Checked out the branch and tested on the CodeMirror blob view - it works 🚀
Great work 👍🏻

Comment on lines +709 to +711
if err := checkSpecArgSafety(string(opt.NewestCommit)); err != nil {
return nil, err
}
Copy link
Contributor

@evict evict Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually OK to check this arg for safety, but not strictly necessary because we filter potentially malicious arguments in IsAllowedGitCmd.

@jhchabran
Copy link
Contributor

@philipp-spiess could you deal with the conflicts on the tsx files so we can merge this 🙏 ?

base.Path("/lsif/upload").Methods("POST").Name(LSIFUpload)
base.Path("/search/stream").Methods("GET").Name(SearchStream)
base.Path("/compute/stream").Methods("GET", "POST").Name(ComputeStream)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why the extra spaces?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: forgot about it and brain didn't register when looking at my own code

w.WriteHeader(http.StatusInternalServerError)
return
}
if gitdomain.IsRepoNotExist(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That error implements NotFound so will be caught by errcode.IsNotFound(err) below.

Comment on lines 93 to 95
if strings.HasPrefix(requestedPath, "/") {
requestedPath = strings.TrimLeft(requestedPath, "/")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if strings.HasPrefix(requestedPath, "/") {
requestedPath = strings.TrimLeft(requestedPath, "/")
}
requestedPath = strings.TrimPrefix(requestedPath, "/")

"golang.org/x/sync/errgroup"
"golang.org/x/sync/semaphore"

"github.com/sourcegraph/sourcegraph/internal/actor"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should be grouped at the bottom with other /sourcegraph imports

func streamBlameFileCmd(ctx context.Context, checker authz.SubRepoPermissionChecker, repo api.RepoName, path string, opt *BlameOptions, command gitCommandFunc) (HunkReader, error) {
a := actor.FromContext(ctx)
if hasAccess, err := authz.FilterActorPath(ctx, checker, a, repo, path); err != nil || !hasAccess {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When hasAccess is false we may return nil, nil here which may break things higher up.

I'd suggest returning a specific Not Authorized error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thank you for catching this 🙏

//
// Because we do not control when p.Read is called, we have to account for
// the context being cancelled, to avoid leaking the goroutine running p.parse.
func (p hunkParser) parse(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is safe in this case, but still strange to see a non pointer receiver on a relatively complex type. There's a chance that this leads to bugs later if new fields are added that need to be mutated.

@philipp-spiess
Copy link
Contributor

@philipp-spiess could you deal with the conflicts on the tsx files so we can merge this 🙏 ?

@jhchabran done!

@jhchabran jhchabran requested a review from ryanslade November 24, 2022 10:52
Copy link
Contributor

@sashaostrikov sashaostrikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from gitserver perspective!
(but build fails due to error in ts file)

@coury-clark coury-clark merged commit 4a6378e into main Nov 24, 2022
@coury-clark coury-clark deleted the mrn+jh/streaming-git-blame branch November 24, 2022 15:10
philipp-spiess added a commit that referenced this pull request Nov 24, 2022
Co-authored-by: Jean-Hadrien Chabran <jh@chabran.fr>
Co-authored-by: Philipp Spiess <hello@philippspiess.com>
Co-authored-by: Taras Yemets <yemets.taras@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla-signed team/code-exploration Issues owned by the Code Exploration team
Projects
None yet
Development

Successfully merging this pull request may close these issues.