Add String.ToLower() and String.ToUpper() #143

caesay · 2024-02-16T21:14:29Z

I think there might be something wonky with the C implementation, so you might want to check that before merging. I can't get the tests to run properly on my machine as of yet.

pfusik

Thank you for your contribution!
My main concern is that the C and C++ implementations are not Unicode-aware. For C, there are GLib functions. C++ probably needs some external dependency.
Also, please run make before committing (libfut.js is missing) and document in doc/reference.md.

GenC.fu

pfusik · 2024-02-17T08:29:48Z

GenCpp.fu

+			obj.Accept(this, FuPriority.Argument);
+			Write("; ");
+			Write("std::transform(data.begin(), data.end(), data.begin(), ");
+			Write("[](unsigned char c) { return std::tolower(c); }); ");


This is not Unicode-aware.

test/StringLower.fu

test/StringUpper.fu

caesay · 2024-02-17T13:21:55Z

I have incorporated your suggestions, and rebased to 8edec9b. The only thing I couldn't do is fix the unicode C++ support. I would suggest that we include https://github.com/sheredom/utf8.h in the hpp output (similar to how I suggested outputting subprocess.h and json.h) and use utf8lwr etc, but there's no precedent for how to do this in fut at the moment. I guess we need to store it as a string somewhere and include it if necessary.

codecov · 2024-02-17T13:25:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8edec9b) 96.66% compared to head (f28c9ee) 96.67%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #143   +/-   ##
=======================================
  Coverage   96.66%   96.67%           
=======================================
  Files           2        2           
  Lines       17256    17306   +50     
=======================================
+ Hits        16680    16730   +50     
  Misses        494      494           
  Partials       82       82

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pfusik

LGTM, modulo C++ which I'll handle myself. Thanks!

#143

pfusik · 2024-02-19T10:04:33Z

I would suggest that we include https://github.com/sheredom/utf8.h

I'm no expert in Unicode, but https://github.com/sheredom/utf8.h/blob/2aa5709fe39c66d2868c0d52d42788899b90dc92/utf8.h#L1338 looks too simple to handle https://www.unicode.org/Public/15.1.0/ucd/CaseFolding.txt
I have more trust in ICU.

#143

caesay · 2024-02-19T10:35:13Z

I tried utf8.h yesterday and agree, ran into some issues with it. ICU looks like a very big / complex dependency to try and include for a library I feel. You have to link in it's static libs as well as bring in it's header files. A lot of the appeal of fusion for me comes from the fact that the output is very standalone and can just be dropped into another project as a source file. Also at the moment afaik, vckpg only supports source-only libraries, so if making a library using fusion and distributing it via vcpkg will only be possible without ICU.

Headers like json.h or subprocess.h are great because they can be prepended to the header that is produced by fusion, I saw on a StackOverflow post that boost's unicode string library is header only, but it is just an abstraction, and relies on a configurable backend.

If we included boost.locale header, by default we could use the winapi backend on windows, and posix on non-windows. Since boost also supports ICU as a backend, we could add support for the consumer of the library to optionally rely on ICU instead. This keeps our fusion output fairly portable but also allows for "proper" unicode support at the consumers discretion.

caesay · 2024-02-19T11:06:08Z

Also see https://www.boost.org/doc/libs/1_54_0/libs/locale/doc/html/using_localization_backends.html

pfusik · 2024-02-19T12:24:26Z

I have no experience with Boost, ICU or vcpkg. I made a decision based on https://stackoverflow.com/a/24063783/2032514
Unicode case manipulation is no trivial topic and I don't want to reinvent the wheel by maintaining my own implementation.

I got ICU on Windows with:

pacman -S mingw-w64-x86_64-icu

I suppose it's not harder on Linux or macOS. Doesn't vcpkg provide the ICU package?

caesay · 2024-02-19T17:03:58Z

Speaking of proper Unicode support - I looked at that issue you linked, which also mentioned that std::string->substr is unaware of utf-8 multi-byte sequences, so could end up cutting a character in half. This is not consistent with the C# behavior, which uses UTF-16 for example, so Substring always acts on characters, not bytes.

pfusik · 2024-02-19T17:31:15Z

In Fusion it's not "characters", but code units. Just as in UTF-8 one code point can be stored in several bytes, in UTF-16 one code point can be stored as two values (surrogate pair).

#143

pfusik requested changes Feb 17, 2024

View reviewed changes

pfusik added the enhancement New feature or request label Feb 17, 2024

pfusik reviewed Feb 17, 2024

View reviewed changes

test/StringUpper.fu Outdated Show resolved Hide resolved

caesay force-pushed the cs/str-upper-lower branch from 0516754 to 91de3e8 Compare February 17, 2024 13:15

Add String.ToLower() and String.ToUpper()

f28c9ee

caesay force-pushed the cs/str-upper-lower branch from 91de3e8 to f28c9ee Compare February 17, 2024 13:19

pfusik approved these changes Feb 19, 2024

View reviewed changes

pfusik merged commit 0c75701 into fusionlanguage:master Feb 19, 2024
2 of 3 checks passed

pfusik added a commit that referenced this pull request Feb 19, 2024

[string] string.ToLower, string.ToUpper in C++ using ICU.

82c353d

#143

pfusik added a commit that referenced this pull request Feb 19, 2024

[doc] string.ToLower, string.ToUpper.

0551a1e

#143

pfusik added a commit that referenced this pull request Feb 22, 2024

[test] ICU.

9107a5c

#143

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add String.ToLower() and String.ToUpper() #143

Add String.ToLower() and String.ToUpper() #143

caesay commented Feb 16, 2024

pfusik left a comment

pfusik Feb 17, 2024

caesay commented Feb 17, 2024 •

edited

Loading

codecov bot commented Feb 17, 2024

pfusik left a comment

pfusik commented Feb 19, 2024

caesay commented Feb 19, 2024 •

edited

Loading

caesay commented Feb 19, 2024

pfusik commented Feb 19, 2024

caesay commented Feb 19, 2024

pfusik commented Feb 19, 2024

Add String.ToLower() and String.ToUpper() #143

Add String.ToLower() and String.ToUpper() #143

Conversation

caesay commented Feb 16, 2024

pfusik left a comment

Choose a reason for hiding this comment

pfusik Feb 17, 2024

Choose a reason for hiding this comment

caesay commented Feb 17, 2024 • edited Loading

codecov bot commented Feb 17, 2024

Codecov Report

pfusik left a comment

Choose a reason for hiding this comment

pfusik commented Feb 19, 2024

caesay commented Feb 19, 2024 • edited Loading

caesay commented Feb 19, 2024

pfusik commented Feb 19, 2024

caesay commented Feb 19, 2024

pfusik commented Feb 19, 2024

caesay commented Feb 17, 2024 •

edited

Loading

caesay commented Feb 19, 2024 •

edited

Loading