-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spellout numbering #41
Comments
Thanks for the inquiry. This is a bit of a can of worms because .NET doesn't provide much support for extending formatters and parsers. In the ideal world, we would extend .NET to do this.
I had some recent experience with how the Java and .NET approaches differ when porting over the parsers from .NET to add the ability to parse Java-specific formats in J2N.Numerics. A few things of note:
Given the limitations in .NET formatters, it seems like it would be better to aim for extension methods to expose the APIs publicly on number types and provide an
It would generally be simpler to maintain the first approach as a line-by-line port from Java. But it comes with a pretty high performance cost. That being said, it is also a pretty big project to make a rules-based parser at the optimized level that the .NET runtime uses. RuleBasedNumberFormatWhat are your requirements? The
The Unfortunately, it is based on Java's number formatting syntax, which makes it a bit of an oddball in .NET. I also haven't worked out how to unpack the Existing Options
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp3.1</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="IKVM.Maven.Sdk" Version="1.1.1" />
</ItemGroup>
<ItemGroup>
<MavenReference Include="com.ibm.icu:icu4j" Version="60.1" />
</ItemGroup>
</Project> using System;
namespace ICU4JExperimentation
{
internal class Program
{
static void Main(string[] args)
{
double num = 2718.28;
var locale = java.util.Locale.ENGLISH;
var nf = new com.ibm.icu.text.RuleBasedNumberFormat(locale, com.ibm.icu.text.RuleBasedNumberFormat.SPELLOUT);
string formatted = nf.format(num);
Console.WriteLine(formatted);
var parsed = nf.parse(formatted);
Console.WriteLine(parsed.doubleValue());
}
}
} two thousand seven hundred eighteen point two eight
2718.28 As great as this seems, there are some drawbacks:
The PlanDue to the limitations of IKVM, our plan is not to utilize it for Lucene.NET except for the Lucene.Net.Analysis.OpenNLP module, where there are currently no other good options in .NET. Although, we will probably use As for the formatters, we have many compile warnings in work we did to support ContributingIn light of the above, if you still wish to help to port FundingYes, please. Given the number of useful tools I contribute to, I am surprised that there are not more people willing to kick a few dollars my way every month. Unfortunately, I am not great at self-promotion so the millions of package downloads are not translating into cash. We have had a bit of support from Microsoft and iText Software, but at present we have no major funding and it is really tough to work on this enough to get it done when I have to seek other work to pay the bills. |
Many thanks for the detailed response -- I thought it was likely to be complicated, but not that complicated! Let's start with requirements: our requirement is primarily to support format-integer(xx, "w", lang) in XPath 3.1. For example format-integer(12, "w", "en") returns "twelve". We do need it to work for a wide variety of languages (it's easy to implement English ourselves). Ideally we would support arbitrary big integers (we use Singulinks.Numeric for this) but frankly, no-one actually is going to use it for numbers in the trillions so it would be fine to impose a limit. We don't need support for non-integer values, and we don't need the reverse function. Integration with existing APIs in .NET isn't a concern for us at all. We'd be fine with a completely freestanding library that gives us a single method correspondonding to the above call. SaxonCS is a commercial product (we will probably have an open source version at some stage, but we may well keep this functionality as one of the bonuses you get in the paid-for version) so we're happy to talk about funding the development of this as a component which you release as open source. There's certainly value in making the component open source as this will tend to stimulate support for more languages. Contact me off-list at saxonica.com to talk about commercial matters. Oh, and I should add, this is about .NET Core. In the past we delivered Saxon on .NET using IKVM, but that didn't work on Core, so we developed SaxonCS by creating our own source-level Java-to-C# transpiler. |
Thanks also for following up. I have been analyzing this a bit more and have some more details.
This is good news. Limiting the scope like this allows us to commit to a long-term stable API that we can support the spell out functionality while making the rest of the implementation internal until we decide how best to present it (which can even be done after we have a production release). Eliminating
Oddly, Of course, this means that to process
FYI - In .NET, the decimal format symbols are loaded internally in the
Yea, my gut reaction told me that putting it in a separate library made more sense, also. That is, until I started analyzing how It turns out I was completely wrong about having to deal with any Java style formatting (more good news). I guess I had in mind the What this means is that the raw format in the As for the design of
At this point, it is looking very much like a line-by-line port of
Will do. |
As far as I can tell, ICU4N doesn't include the spellout numbering capabilities of ICU4J.
I'm interested in assessing whether it's feasible to port this code and contribute it to the project. Having no familiarity with ICU-J internals, I wouldn't know where to start, but if you can provide any initial thoughts (perhaps you've looked at it and decided it's too hard...) then I'd appreciate any pointers.
Alternatively, rather than doing it ourselves we could sponsor the development.
Note, we are currently using ICU4N in the SaxonCS project for localised collation support.
The text was updated successfully, but these errors were encountered: