Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting decimal numbers with dots & commands based on region #29

Closed
petebankhead opened this issue Dec 3, 2016 · 1 comment
Closed

Comments

@petebankhead
Copy link
Member

An unanticipated issue that arose from QuPath going open source is how it can - and should - behave when being run in different regions, where numbers are expected to be formatted differently.

So, for example, 1,234,567.89 in the UK or US might be written as 1.234.567,89 in Germany or 1 234 567,89 in Russia.

This is described in more detail in the Decimal mark Wikipedia article.

This scenario isn't great for software that is intended to be used worldwide for scientific applications, where the format in which numbers are entered and exported really matters.

Locales & formatting numbers

The good news is that Java can support different Locales. This makes it possible to write code that takes the region into consideration.

The very bad news is that handling this predictably is far from straightforward. This arises partly because there are many ways to format numbers within Java, some more convenient than others, and some more problematic than others. For example, considering the following Groovy script to test out different methods:

import java.text.*;

def count = 1;
def sb = new StringBuffer("\n");

def s = NumberFormat.getInstance().format(1.234); // Depends on default Locale
sb.append(count++).append(": ").append(s).append("\n");

s = NumberFormat.getInstance(Locale.GERMANY).format(1.234); // 1,234
sb.append(count++).append(": ").append(s).append("\n");

s = new DecimalFormat("#.##").format(1.234); // Depends on default Locale, 2 decimal places
sb.append(count++).append(": ").append(s).append("\n");

s = String.format("My number is %.3f", 1.234); // Depends on default Locale
sb.append(count++).append(": ").append(s).append("\n");

s = "My number is " + 1.234; // 1.234 - always uses the dot
sb.append(count++).append(": ").append(s).append("\n");

The output when I run it with my default English UK setting is:

1: 1.234
2: 1,234
3: 1.23
4: My number is 1.234
5: My number is 1.234

Alternatively, if I switch to using a German Locale I see:

1: 1,234
2: 1,234
3: 1,23
4: My number is 1,234
5: My number is 1.234

In most of these scenarios the Locale is respected (either the default, or one that is explicitly set)... but not with the simple string + concatenation.

This is a bit scary, since "My number is " + 1.234; is very tempting syntax for a programmer to use. It is highly likely to exist somewhere within QuPath's code.

Parsing numbers

Similar issues arise when parsing numbers using one of Java's myriad ways.

import java.text.*;

def count = 1;
def sb = new StringBuffer("\n");

for (def locale in [Locale.US, Locale.GERMANY]) {

    sb.append("Locale set to ").append(locale).append("\n");
    Locale.setDefault(locale);
    
    def s = NumberFormat.getInstance().parse("1.234"); // Result depends on Locale
    sb.append(count++).append(": ").append(s).append("\n");

    s = NumberFormat.getInstance().parse("1,234"); // Result depends on Locale
    sb.append(count++).append(": ").append(s).append("\n");
    
    s = Double.parseDouble("1.234"); // Always requires a dot
    sb.append(count++).append(": ").append(s).append("\n");    

    try {
        s = Double.parseDouble("1,234"); // Does not work!
    } catch (Exception e) {
        s = "I cannot parse \"1,234\"!"
    }
    sb.append(count++).append(": ").append(s).append("\n");
    
    try {
        s = Double.valueOf("1,234"); // Does not work!
    } catch (Exception e) {
        s = "I cannot get the value of \"1,234\"!"
    }
    sb.append(count++).append(": ").append(s).append("\n");
}

print(sb.toString())

The output of the script above is:

Locale set to en_US
1: 1.234
2: 1234
3: 1.234
4: I cannot parse "1,234"!
5: I cannot get the value of "1,234"!
Locale set to de_DE
6: 1234
7: 1.234
8: 1.234
9: I cannot parse "1,234"!
10: I cannot get the value of "1,234"!

Again, this is scary, because Double.valueOf(String s) and Double.parseDouble(String s) are quite natural choices for a programmer - yet they don't always work, depending upon how the number is written.

But, much worse than that, if the default Locale is used then def s = NumberFormat.getInstance().parse(String s); gives different results. You can see this in the first two entries being reversed when the Locale is changed.

This means that scripts (or data) written with QuPath with one Locale could give different or unexpected results in another Locale. What's more, it's quite possible for a user to have two computers (perhaps one Windows and one Mac) that are set up to have different Locales, but not to have noticed.

Importing/exporting

Finally, it's important to consider what happens after running QuPath's analysis. Commonly, it's necessary to put the results into another application - such as an Excel Spreadsheet.

Excel isn't immune from these issues, and will also parse numbers according to some system setting. Therefore the spreadsheet application is not guaranteed to interpret the numbers written by QuPath in the way that is intended - it's absolutely essential to check.

How does QuPath handle this?

What QuPath does now

QuPath gives some consideration to Locales in two ways, although neither is a complete solution.

Firstly, the Locale information used when saving a .qpdata file is saved with the file. This way, it can be temporarily applied when reloading the file. This at least helps reduce the possibility that a later change in Locale means that a previously-written data file cannot be read again - or is read wrongly.

Secondly, QuPath gives the user a choice of Locale on first startup (or under Help → Show setup options - along with a warning:
qupath_setup

This doesn't force any particular choice... along it at least raises the issue.

What should QuPath do?

This remains an open question - with feedback and ideas welcome.

My current suspicion is that QuPath should enforce the use of one Locale consistently throughout the application. If so, this would likely have to be Locale.US - because this is guaranteed to exist. This will enforce an internal consistency, which is less likely to be troubled by whether or not the programmer of some component or extension parses their numbers in a different way.

It may still be helpful to optionally export data for a specific Locale - but this would need to be explicitly selected (every time?), for ease of importing results into other software.

However, providing this option would require some more thought and planning for at least two reasons:

  • Some exported data should also be reimported into QuPath, e.g. exported TMA data might be imported to use the TMA data viewer - in this case the correct Locale needs to be used for importing as well.
  • There are different ways to export, both in terms of saving or copying values to the clipboard. Any 'smart' behavior in one place risks lulling a user into a false sense of security that the Locale they unthinkingly expect will always be used.

In short, it's a thorny issue. For now, the best approach is to use a Locale that formats uses dots rather than commas as the decimal mark... and then pay close attention whenever the exported results are imported elsewhere.

Immediate plans

I am considering making the first half the change suggested above in the next update, i.e. to force the use of the US Locale. There is too much that is unclear or untested whenever different Locales may be used.

However comments are welcome on the wisdom of this.

@petebankhead
Copy link
Member Author

Closing this issue due to lack of activity... and lack of complaints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant