client: Avoid complex tokenization in ref panel code #58954

varungandhi-src · 2023-12-13T09:55:47Z

Previously, we relied on detecting the language from file paths,
then using various regexes associated with the language to identify
token boundaries. However, the code mirror blob view always provides
a full token range, which can be used directly, instead of attempting
to recompute the token boundaries.

For older URLs, we fallback to simple identifiers, which should
work for the vast majority of languages and identifiers.

We cannot yet remove the language detection here because the file
extensions associated with the language are later used for search-based
code navigation.

This patch also makes the language spec optional for search-based
code intel, as we do not have a solution to #56376 which would
guarantee that we always have a language available. If a language
is not available, search-based code intel falls back to searching
other files with the same extension as a best effort guess.

Fixes https://github.com/sourcegraph/sourcegraph/issues/58548

Test plan

Locally tested for MATLAB code. The ref panel shows up correctly,
unlike the error earlier in #58548

Previously, we relied on detecting the language from file paths, then using various regexes associated with the language to identify token boundaries. However, the code mirror blob view always provides a full token range, which can be used directly, instead of attempting to recompute the token boundaries. For older URLs, we fallback to simple identifiers, which should work for the vast majority of languages and identifiers. We cannot yet remove the language detection here because the file extensions associated with the language are later used for search-based code navigation.

varungandhi-src · 2023-12-14T04:23:23Z

There is an off-by-one error for the end of the span.

Also, weirdly enough, we're not picking up results in Mathematica files but only in MATLAB and Objective-C files. 🤔

client/shared/src/languages.ts

fkling · 2023-12-14T11:07:47Z

client/web/src/codeintel/ReferencesPanel.tsx

+interface OneBasedPosition {
+    line: number
+    character: number
+}


I've also run into quite a few issues with 0-based vs 1-based positions. I think it's useful to have separate types to express intent, but just note that the type checker won't prevent you from passing a ZeroBasedPostition as a OneBasedPosition, because TS uses structural equivalence (or whatever it is called).

Do you think I should use classes here? I'd be happy to introduce new vocabulary types for Positions and Ranges in a central place that can be reused elsewhere.

OK, let me attempt to do that in a follow-up PR. Thanks for flagging this, I forgot that interfaces have structural subtyping.

varungandhi-src · 2023-12-14T13:46:27Z

Fixed the off-by-one issue

Previously, we relied on detecting the language from file paths, then using various regexes associated with the language to identify token boundaries. However, the code mirror blob view always provides a full token range, which can be used directly, instead of attempting to recompute the token boundaries. For older URLs, we fallback to simple identifiers, which should work for the vast majority of languages and identifiers. We cannot yet remove the language detection here because the file extensions associated with the language are later used for search-based code navigation. This patch also makes the language spec optional for search-based code intel, as we do not have a solution to #56376 which would guarantee that we always have a language available. If a language is not available, search-based code intel falls back to searching other files with the same extension as a best effort guess. Locally tested for MATLAB code. The ref panel shows up correctly, unlike the error earlier. (cherry-picked from c42cad2)

…58954) (#59636) * client: Minor cleanup for search-based code intel (#58331) The separation of the logic into different functions makes it clearer what the order of searches is. It also makes it clearer that for some reason, we're only using the locals information from the SCIP Document for 'Find references', and not for 'Go to definition'. Using the SCIP Document for for 'Go to definition' too could avoid a network request. (cherry-picked from e955cddec490d0cc2b5eba36be2ec4958ba06bf8) * client: Avoid complex tokenization in ref panel code (#58954) Previously, we relied on detecting the language from file paths, then using various regexes associated with the language to identify token boundaries. However, the code mirror blob view always provides a full token range, which can be used directly, instead of attempting to recompute the token boundaries. For older URLs, we fallback to simple identifiers, which should work for the vast majority of languages and identifiers. We cannot yet remove the language detection here because the file extensions associated with the language are later used for search-based code navigation. This patch also makes the language spec optional for search-based code intel, as we do not have a solution to #56376 which would guarantee that we always have a language available. If a language is not available, search-based code intel falls back to searching other files with the same extension as a best effort guess. Locally tested for MATLAB code. The ref panel shows up correctly, unlike the error earlier. (cherry-picked from c42cad2) * Fix lint error due to short variable name

cla-bot bot added the cla-signed label Dec 13, 2023

varungandhi-src requested review from fkling and mrnugget December 13, 2023 09:55

varungandhi-src force-pushed the vg/fix-token-issue branch from 59cfc99 to 5186cc6 Compare December 13, 2023 12:24

fkling approved these changes Dec 14, 2023

View reviewed changes

varungandhi-src added 2 commits December 14, 2023 17:32

cleanup: Deprecate bad APIs

b7ba6be

fix: Fix off-by-one error in slicing

604f9cf

varungandhi-src enabled auto-merge (squash) December 14, 2023 13:48

varungandhi-src merged commit c42cad2 into main Dec 14, 2023

varungandhi-src deleted the vg/fix-token-issue branch December 14, 2023 13:53

varungandhi-src mentioned this pull request Jan 16, 2024

vg/matlab fix #59635

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

client: Avoid complex tokenization in ref panel code #58954

client: Avoid complex tokenization in ref panel code #58954

Uh oh!

varungandhi-src commented Dec 13, 2023 •

edited

Loading

Uh oh!

varungandhi-src commented Dec 14, 2023

Uh oh!

Uh oh!

fkling Dec 14, 2023

Uh oh!

varungandhi-src Dec 14, 2023

Uh oh!

varungandhi-src Dec 14, 2023 •

edited

Loading

Uh oh!

varungandhi-src commented Dec 14, 2023

Uh oh!

Uh oh!

client: Avoid complex tokenization in ref panel code #58954

client: Avoid complex tokenization in ref panel code #58954

Uh oh!

Conversation

varungandhi-src commented Dec 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

varungandhi-src commented Dec 14, 2023

Uh oh!

Uh oh!

fkling Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

varungandhi-src Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

varungandhi-src Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varungandhi-src commented Dec 14, 2023

Uh oh!

Uh oh!

varungandhi-src commented Dec 13, 2023 •

edited

Loading

varungandhi-src Dec 14, 2023 •

edited

Loading