-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add String.ToLower() and String.ToUpper() #143
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
My main concern is that the C and C++ implementations are not Unicode-aware. For C, there are GLib functions. C++ probably needs some external dependency.
Also, please run make
before committing (libfut.js
is missing) and document in doc/reference.md
.
obj.Accept(this, FuPriority.Argument); | ||
Write("; "); | ||
Write("std::transform(data.begin(), data.end(), data.begin(), "); | ||
Write("[](unsigned char c) { return std::tolower(c); }); "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not Unicode-aware.
0516754
to
91de3e8
Compare
91de3e8
to
f28c9ee
Compare
I have incorporated your suggestions, and rebased to 8edec9b. The only thing I couldn't do is fix the unicode C++ support. I would suggest that we include https://github.com/sheredom/utf8.h in the hpp output (similar to how I suggested outputting subprocess.h and json.h) and use |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #143 +/- ##
=======================================
Coverage 96.66% 96.67%
=======================================
Files 2 2
Lines 17256 17306 +50
=======================================
+ Hits 16680 16730 +50
Misses 494 494
Partials 82 82 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo C++ which I'll handle myself. Thanks!
I'm no expert in Unicode, but https://github.com/sheredom/utf8.h/blob/2aa5709fe39c66d2868c0d52d42788899b90dc92/utf8.h#L1338 looks too simple to handle https://www.unicode.org/Public/15.1.0/ucd/CaseFolding.txt |
I tried utf8.h yesterday and agree, ran into some issues with it. ICU looks like a very big / complex dependency to try and include for a library I feel. You have to link in it's static libs as well as bring in it's header files. A lot of the appeal of fusion for me comes from the fact that the output is very standalone and can just be dropped into another project as a source file. Also at the moment afaik, vckpg only supports source-only libraries, so if making a library using fusion and distributing it via vcpkg will only be possible without ICU. Headers like If we included boost.locale header, by default we could use the |
I have no experience with Boost, ICU or vcpkg. I made a decision based on https://stackoverflow.com/a/24063783/2032514 I got ICU on Windows with:
I suppose it's not harder on Linux or macOS. Doesn't |
Speaking of proper Unicode support - I looked at that issue you linked, which also mentioned that std::string->substr is unaware of utf-8 multi-byte sequences, so could end up cutting a character in half. This is not consistent with the C# behavior, which uses UTF-16 for example, so Substring always acts on characters, not bytes. |
In Fusion it's not "characters", but code units. Just as in UTF-8 one code point can be stored in several bytes, in UTF-16 one code point can be stored as two values (surrogate pair). |
I think there might be something wonky with the C implementation, so you might want to check that before merging. I can't get the tests to run properly on my machine as of yet.