Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(prisma-fmt): use UTF-16 offset in the response for the schema that contains multi-byte characters #4815

Merged
merged 20 commits into from
Jun 18, 2024

Conversation

key-moon
Copy link
Contributor

@key-moon key-moon commented Apr 6, 2024

The current Prisma's LSP cannot handle schemas containing multibyte characters correctly.

image
image

This is because the offsets returned by prisma-fmt are calculated based on the number of bytes in UTF-8 encoding. In the LSP protocol, text offsets should be represented by the length in UTF-16 unless otherwise specified.

This pull request includes changes to the offset_to_position and position_to_offset functions, as well as the implementation of the offset_to_lsp_offset function and its usage within lint::run.

This fixes the above issue.
image

fixes prisma/language-tools#1308

@key-moon key-moon requested a review from a team as a code owner April 6, 2024 12:07
@key-moon key-moon requested review from jkomyno and removed request for a team April 6, 2024 12:07
@CLAassistant
Copy link

CLAassistant commented Apr 6, 2024

CLA assistant check
All committers have signed the CLA.

@key-moon key-moon changed the title fix(prisma-fmt): use UTF-16 offset in the response for the schema that contains multi-byte charactors fix(prisma-fmt): use UTF-16 offset in the response for the schema that contains multi-byte characters Apr 6, 2024
Copy link

codspeed-hq bot commented Apr 8, 2024

CodSpeed Performance Report

Merging #4815 will not alter performance

Comparing key-moon:fix-multibyte (3932eaa) with main (27c0eb3)

Summary

✅ 11 untouched benchmarks

@key-moon
Copy link
Contributor Author

key-moon commented Apr 8, 2024

There are warnings about dead_code for many functions in offset.rs. This is due to the fact that most functions in offset.rs are only used in lib.rs. I have allowed dead_code in the whole file for a now, but if you have a better solution, I would appreciate suggestions.

@aqrln
Copy link
Member

aqrln commented Apr 8, 2024

@key-moon this happens because the module is included twice in both the library crate and the binary crate. I see that some existing modules in this package follow this pattern too but it's not really the best way to do it. Aside from leading to problems like the one you encountered, it will also lead to compiling the same code twice and potentially bloating the binary (for us maybe there's no binary size impact because -Os collapses identical functions, but it will affect the compilation speed regardless).

The way it's normally done in Rust is you only include such modules in the library crate (i.e. in lib.rs) and export what you need to use in the binary crate. In other words, if you turn mod offsets; in lib.rs into pub mod offsets; and remove mod offsets; in main.rs, you can use the prisma_fmt::offsets module anywhere in the binary crate (note that we refer to prisma_fmt as an external crate here even though both crates are in the same package with the same cargo manifest and name) and you won't have any dead or duplicate code.

@key-moon
Copy link
Contributor Author

key-moon commented Apr 8, 2024

Understood. I am new to Rust, so your help with the basics was very helpful.

While applying the suggested fix, I ran into a problem caused by the fact that lint uses offsets. Since lint is included by main and lib, two different ways of referencing the function, crate::offsets::offset_to_lsp_offset; and prisma_fmt::offsets::offset_to_lsp_offset; conflict.

This could be solved by not including lint in main and publishing the lint module as well. However, as this would be a relatively large change, I'm hesitate to make such a change.

Edit: I just pushed the fix that was described above. I'm not sure whether this fix was the right way to do it.

@key-moon
Copy link
Contributor Author

key-moon commented Apr 10, 2024

The error caused by the conflict has been resolved. It took some time to resolve, but now this PR is back to a problem-free state.
One thing I discovered in the process of fixing the error. The MiniError in lint::run does not seem to return information about which file the warning was emitted from. I believe this is an unintended behavior.

@key-moon
Copy link
Contributor Author

Is there anything you are having trouble with for the merge? If there is, I will work towards resolving it.

@key-moon
Copy link
Contributor Author

Any updates?

@aqrln aqrln requested a review from Druue May 27, 2024 17:23
@key-moon
Copy link
Contributor Author

I would like to highlight the importance, as it seems the significance of this pull request might not be fully understood. While this pull request may appear to address an odd bug involving strange prisma files with emojis for you, it is actually a highly troublesome bug for those of us using languages represented by multibyte characters. This bug prevents us from using basic features like red underlines and quick fixes when we use our native language in comments.

Could you please consider reviewing this pull request? Simply merging this PR will greatly improve the development experience for thousands of developers in regions where multibyte characters are used.

Copy link
Member

@aqrln aqrln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @key-moon, thanks for the PR and sorry it wasn't reviewed yet. I'm not too familiar with this part of the codebase but to me the changes look good and make sense.

I left a few comments: first of all, there are new changes present on main that are lost here because of moving the code and need to be integrated to offsets.rs — this is more critical, and secondly, there's a suggestion for better performance / algorithmic complexity — take a look if it makes sense but it's not critical.

prisma-fmt/src/offsets.rs Outdated Show resolved Hide resolved
prisma-fmt/src/offsets.rs Show resolved Hide resolved
Comment on lines 30 to 31
start: offset_to_lsp_offset(err.span().start, db.source(err.span().file_id)),
end: offset_to_lsp_offset(err.span().end, db.source(err.span().file_id)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a function that takes a span and returns a pair of LSP offsets to avoid traversing the document in O(n) twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented the span_to_range function. Additionally, I replaced a function with the same name and role in offset.rs(see https://github.com/prisma/prisma-engines/pull/4815/files#diff-557db087b8d611a0049284a4b38c817f1868a1dde965a9fa904c2dcaad576eb0R198). Since the order of arguments was different, I modified the usage accordingly. For consistency, I also adjusted the order of arguments for the range_after_span function. Since this is an internally used function, I believe there should be no issues.

@aqrln
Copy link
Member

aqrln commented Jun 16, 2024

@Druue this is the fix for prisma/language-tools#1308

@key-moon
Copy link
Contributor Author

I apologize for any urgency I may have caused. I really appreciate your prompt review. Additionally, I have replied to the comment that was made. Thank you very much.

@Druue Druue self-assigned this Jun 17, 2024
@Druue Druue added the PR: Bug A PR That Fixes a bug label Jun 17, 2024
@Druue Druue added this to the 5.16.0 milestone Jun 17, 2024
@Druue
Copy link
Contributor

Druue commented Jun 17, 2024

Hey! Thank you so much for the PR and sorry for the delay. I've pulled this on and will see about finding some time to review this :)

Some initial thoughts are if you can revert the function arg orderings to what they were to minimise the number of changes to read through

@key-moon
Copy link
Contributor Author

Thank you for reviewing!

Since the function signatures in offset.rs follows a similar pattern like (offset, document), it makes sense to implement the span_to_range function in offset.rs as span_to_range(span, document). The span_to_range function in code_actions.rs performs the exact same operation as the newly created span_to_range, except for the order of the arguments. Maintaining the original order of arguments would mean creating a wrapper span_to_range function to wrap the span_to_range function. To avoid this kind of code, I decided to change the order of the arguments.

There are only three places where these functions are used, so I don't think the review will be a significant effort.

@key-moon
Copy link
Contributor Author

I realized that just implementing span_to_range does not solve the problem with offset_to_lsp_offset mentioned above, so I implemented span_to_lsp_offsets. Since these two functions use almost the same logic for their calculations, I created a common function, offset_to_position_and_next_offset, for the part where the calculation resumes.

Additionally, with the implementation of span_to_lsp_offsets, offset_to_lsp_offset is no longer used in the code. Although it is currently retained with #[allow(dead_code)], it can be removed if it is no longer needed.

@Druue
Copy link
Contributor

Druue commented Jun 18, 2024

Since the function signatures in offset.rs follows a similar pattern like (offset, document), it makes sense to implement the span_to_range function in offset.rs as span_to_range(span, document).

We generally prefer to include contextual args first (e.g. the schema / document) and spans after / at the end.

The span_to_range function in code_actions.rs performs the exact same operation as the newly created span_to_range, except for the order of the arguments. Maintaining the original order of arguments would mean creating a wrapper span_to_range function to wrap the span_to_range function.

I genuinely don't know what you're talking about here. There is only one fn span_to_range, I don't see why a wrapper would be needed

I will say that this looks good though, thank you. I'm not going to block this on the above.

Screen.Recording.2024-06-18.at.13.36.09.mov

@Druue Druue merged commit 05b7e05 into prisma:main Jun 18, 2024
204 checks passed
@key-moon
Copy link
Contributor Author

Thank you for your review 🙇

I created the span_to_range function in offset.rs and removed it from code_actions.rs. My argument was that wrapper is needed if we want to retain both functions.

Once again, thank you for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: Bug A PR That Fixes a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spans and positions get shifted out of sync when schema includes multibyte characters
4 participants