Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Conversation

varungandhi-src
Copy link
Contributor

@varungandhi-src varungandhi-src commented Dec 13, 2023

Previously, we relied on detecting the language from file paths,
then using various regexes associated with the language to identify
token boundaries. However, the code mirror blob view always provides
a full token range, which can be used directly, instead of attempting
to recompute the token boundaries.

For older URLs, we fallback to simple identifiers, which should
work for the vast majority of languages and identifiers.

We cannot yet remove the language detection here because the file
extensions associated with the language are later used for search-based
code navigation.

This patch also makes the language spec optional for search-based
code intel, as we do not have a solution to #56376 which would
guarantee that we always have a language available. If a language
is not available, search-based code intel falls back to searching
other files with the same extension as a best effort guess.

Fixes https://github.com/sourcegraph/sourcegraph/issues/58548

Test plan

Locally tested for MATLAB code. The ref panel shows up correctly,
unlike the error earlier in #58548

CleanShot 2023-12-13 at 15 25 17@2x

Previously, we relied on detecting the language from file paths,
then using various regexes associated with the language to identify
token boundaries. However, the code mirror blob view always provides
a full token range, which can be used directly, instead of attempting
to recompute the token boundaries.

For older URLs, we fallback to simple identifiers, which should
work for the vast majority of languages and identifiers.

We cannot yet remove the language detection here because the file
extensions associated with the language are later used for search-based
code navigation.
@varungandhi-src
Copy link
Contributor Author

There is an off-by-one error for the end of the span.

Also, weirdly enough, we're not picking up results in Mathematica files but only in MATLAB and Objective-C files. 🤔

Comment on lines +102 to +105
interface OneBasedPosition {
line: number
character: number
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also run into quite a few issues with 0-based vs 1-based positions. I think it's useful to have separate types to express intent, but just note that the type checker won't prevent you from passing a ZeroBasedPostition as a OneBasedPosition, because TS uses structural equivalence (or whatever it is called).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think I should use classes here? I'd be happy to introduce new vocabulary types for Positions and Ranges in a central place that can be reused elsewhere.

Copy link
Contributor Author

@varungandhi-src varungandhi-src Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me attempt to do that in a follow-up PR. Thanks for flagging this, I forgot that interfaces have structural subtyping.

@varungandhi-src
Copy link
Contributor Author

Fixed the off-by-one issue

CleanShot 2023-12-14 at 19 16 09@2x

@varungandhi-src varungandhi-src enabled auto-merge (squash) December 14, 2023 13:48
@varungandhi-src varungandhi-src merged commit c42cad2 into main Dec 14, 2023
@varungandhi-src varungandhi-src deleted the vg/fix-token-issue branch December 14, 2023 13:53
varungandhi-src added a commit that referenced this pull request Jan 16, 2024
Previously, we relied on detecting the language from file paths,
then using various regexes associated with the language to identify
token boundaries. However, the code mirror blob view always provides
a full token range, which can be used directly, instead of attempting
to recompute the token boundaries.

For older URLs, we fallback to simple identifiers, which should
work for the vast majority of languages and identifiers.

We cannot yet remove the language detection here because the file
extensions associated with the language are later used for search-based
code navigation.

This patch also makes the language spec optional for search-based
code intel, as we do not have a solution to #56376 which would
guarantee that we always have a language available. If a language
is not available, search-based code intel falls back to searching
other files with the same extension as a best effort guess.

Locally tested for MATLAB code. The ref panel shows up correctly,
unlike the error earlier.

(cherry-picked from c42cad2)
@varungandhi-src varungandhi-src mentioned this pull request Jan 16, 2024
varungandhi-src added a commit that referenced this pull request Jan 17, 2024
Previously, we relied on detecting the language from file paths,
then using various regexes associated with the language to identify
token boundaries. However, the code mirror blob view always provides
a full token range, which can be used directly, instead of attempting
to recompute the token boundaries.

For older URLs, we fallback to simple identifiers, which should
work for the vast majority of languages and identifiers.

We cannot yet remove the language detection here because the file
extensions associated with the language are later used for search-based
code navigation.

This patch also makes the language spec optional for search-based
code intel, as we do not have a solution to #56376 which would
guarantee that we always have a language available. If a language
is not available, search-based code intel falls back to searching
other files with the same extension as a best effort guess.

Locally tested for MATLAB code. The ref panel shows up correctly,
unlike the error earlier.

(cherry-picked from c42cad2)
varungandhi-src added a commit that referenced this pull request Jan 23, 2024
Previously, we relied on detecting the language from file paths,
then using various regexes associated with the language to identify
token boundaries. However, the code mirror blob view always provides
a full token range, which can be used directly, instead of attempting
to recompute the token boundaries.

For older URLs, we fallback to simple identifiers, which should
work for the vast majority of languages and identifiers.

We cannot yet remove the language detection here because the file
extensions associated with the language are later used for search-based
code navigation.

This patch also makes the language spec optional for search-based
code intel, as we do not have a solution to #56376 which would
guarantee that we always have a language available. If a language
is not available, search-based code intel falls back to searching
other files with the same extension as a best effort guess.

Locally tested for MATLAB code. The ref panel shows up correctly,
unlike the error earlier.

(cherry-picked from c42cad2)
BolajiOlajide pushed a commit that referenced this pull request Jan 24, 2024
…58954) (#59636)

* client: Minor cleanup for search-based code intel (#58331)

The separation of the logic into different functions makes it clearer
what the order of searches is. It also makes it clearer that for some
reason, we're only using the locals information from the SCIP
Document for 'Find references', and not for 'Go to definition'.
Using the SCIP Document for for 'Go to definition' too could avoid
a network request.

(cherry-picked from e955cddec490d0cc2b5eba36be2ec4958ba06bf8)

* client: Avoid complex tokenization in ref panel code (#58954)

Previously, we relied on detecting the language from file paths,
then using various regexes associated with the language to identify
token boundaries. However, the code mirror blob view always provides
a full token range, which can be used directly, instead of attempting
to recompute the token boundaries.

For older URLs, we fallback to simple identifiers, which should
work for the vast majority of languages and identifiers.

We cannot yet remove the language detection here because the file
extensions associated with the language are later used for search-based
code navigation.

This patch also makes the language spec optional for search-based
code intel, as we do not have a solution to #56376 which would
guarantee that we always have a language available. If a language
is not available, search-based code intel falls back to searching
other files with the same extension as a best effort guess.

Locally tested for MATLAB code. The ref panel shows up correctly,
unlike the error earlier.

(cherry-picked from c42cad2)

* Fix lint error due to short variable name
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error 'could not find token' when doing Find references
2 participants