Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let enry decide the filetype unconditionally for code highlighting #56559

Merged
merged 5 commits into from Sep 19, 2023

Conversation

SuperAuguste
Copy link
Contributor

@SuperAuguste SuperAuguste commented Sep 12, 2023

Closes #56372

Test plan

Add tests.

@cla-bot cla-bot bot added the cla-signed label Sep 12, 2023
@SuperAuguste SuperAuguste marked this pull request as ready for review September 18, 2023 14:18
@sourcegraph-bot
Copy link
Contributor

sourcegraph-bot commented Sep 18, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff 82df680...e6b4430.

Notify File(s)
@efritz lib/codeintel/languages/BUILD.bazel
lib/codeintel/languages/languages.go

lang = enry.GetLanguage(path, []byte(contents))

c := contents
// classifier is faster on small files without losing much accuracy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate comment suggestion: Set a soft upper limit on the time spent to determine the language, as this code path is triggered for every syntax highlighting request.

(Optional) Additionally, the number 2048 seems arbitrary; could you roughly benchmark the time spent for some large C++ files? It would be nice to know much time this takes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, add a comment that this number shouldn't be reduced below 755 as the Apache License text has that many characters.

(Otherwise, for .h header files with Apache License headers, we may get confused between C and C++)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just copied this code from zoekt so I'm not sure what logic they used, but I can look into it if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it particularly matters what exact reasoning Zoekt used; this code snippet is small enough that we can independently determine what a good limit should be.

@varungandhi-src
Copy link
Contributor

This looks directionally correct, I've left some minor comments to make things more idiomatic and clearer in terms of why certain decisions were made and what some potential footguns are.

Also, you forgot to add a Test Plan in the PR description, it can be as simple as "Added new unit tests"

@SuperAuguste
Copy link
Contributor Author

Thanks @varungandhi-src, this helps a lot! :)

@SuperAuguste SuperAuguste force-pushed the auguste/enry-highlighting-lang-detection branch 2 times, most recently from 967ddca to 489939e Compare September 18, 2023 17:30
@SuperAuguste SuperAuguste force-pushed the auguste/enry-highlighting-lang-detection branch from 489939e to 7c8d961 Compare September 18, 2023 21:08
@SuperAuguste SuperAuguste force-pushed the auguste/enry-highlighting-lang-detection branch from 7c8d961 to 78fc751 Compare September 19, 2023 14:07
@SuperAuguste SuperAuguste force-pushed the auguste/enry-highlighting-lang-detection branch from 78fc751 to 1ef0eb2 Compare September 19, 2023 15:26
@SuperAuguste SuperAuguste enabled auto-merge (squash) September 19, 2023 15:43
@SuperAuguste SuperAuguste merged commit 59ed6a1 into main Sep 19, 2023
9 of 10 checks passed
@SuperAuguste SuperAuguste deleted the auguste/enry-highlighting-lang-detection branch September 19, 2023 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistently highlight Matlab files using the Matlab language grammar
3 participants