Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No grammatical cases in translations #175

Closed
hertg opened this issue Sep 27, 2019 · 13 comments
Closed

No grammatical cases in translations #175

hertg opened this issue Sep 27, 2019 · 13 comments

Comments

@hertg
Copy link
Contributor

hertg commented Sep 27, 2019

When using the formatDuration(Date) method in Locale.GERMAN the translations are not accurate.

Problem

  • Days: 5 Tagen should be 5 Tage
  • Months: 5 Monaten should be 5 Monate
  • Years: 5 Jahren should be 5 Jahre
  • Decade: 5 Jahrzenten should be 5 Jahrzente
  • Century: 5 Jahrhunderten should be 5 Jahrhunderte
  • Millenium: 5 Jahrtausenden should be 5 Jahrtausende

However
These words are correct for the format(Date) method.
Example:

  • vor 5 Tagen
  • vor 5 Monaten
  • etc.

Explanation

Nouns do have a case (Kasus) when used in a sentence in german (and some other languages?).
(see https://en.wikipedia.org/wiki/Grammatical_case)

This means that words would need to be translated for the different grammatical cases.

Instead of just this translation:

{ "SecondPluralName", "Sekunden" }

You'd need to have these:

{ "SecondPluralNameNominative", "Tage" }
{ "SecondPluralNameDative", "Tagen" }

Which would be very annoying.

Possible Solution

You should translate whole sentences and parameterize them instead of trying to translate single words and then later sticking them together, because that will result in errors like these in the long term.

Highly relevant video: https://www.youtube.com/watch?v=GAgp7nXdkLU

@azhalm94
Copy link

azhalm94 commented Dec 3, 2019

The same problem for Arabic language , words would need to be translated for the different grammatical cases.

For Example :-
4.0.2 return
منذ 18 دقائق

Correct is
منذ 18 دقيقة

@lincolnthree
Copy link
Member

Hey Fellows, this can be done using a custom resource class, example: https://github.com/ocpsoft/prettytime/blob/master/core/src/main/java/org/ocpsoft/prettytime/i18n/Resources_cs.java

Happy to accept a PR that fixes this using these overrides!

@hertg
Copy link
Contributor Author

hertg commented Sep 30, 2020

I'll have a look if I can add a custom German resource class.

@hertg
Copy link
Contributor Author

hertg commented Oct 1, 2020

@lincolnthree
It seems like the problem I described can't be solved by just creating a custom TimeFormat for German.

The following code snippets show the logic inside the format() and formatDuration() method in PrettyTime.java when formatting "5 days" / "in 5 days".

format()

...
TimeFormat format = getFormat(duration.getUnit());
String time = format.format(duration); // returns "5 Tagen" 
return format.decorate(duration, time); // returns "in 5 Tagen" -> correct

formatDuration()

...
TimeFormat timeFormat = getFormat(duration.getUnit());
return timeFormat.format(duration); // returns "5 Tagen" -> wrong: should be "5 Tage"

As you can see, both methods do the same, with the difference that format() subsequently calls the decorate() method.
My problem now is that, for the german translation, the format() method needs to return different texts whether the method was called from format() or from formatDuration(), as they both require the unit to be in a different grammatical case.

I could probably solve the issue if I can get that context information inside the format() method (Via an additional parameter for example).

Should i try to fix that and send a PR? As statet above, there would be code changes outside of the custom TimeFormat for German translations. If you'd like to solve that issue yourself in your own way i won't waste my time on that. :)

I would appreciate a response in the next few days, thanks!

@lincolnthree
Copy link
Member

lincolnthree commented Oct 5, 2020

Hmm, interesting.

I think I see the issue you're describing. Because formatDuration() is not prefixed with "in", (and is subsequently the nominative case, I believe?) The noun should not be suffixed as it would in the dative case.

So... Yes, I can see the need to know when the method is being called via format() vs formatDuration().

I think adding a context parameter for TimeFormat.format() method signature could solve this problem. However... we need to preserve backwards compatability, so we'll need to add a default method in the interface that provides a passthrough to the existing function, without the context.

I think the context should probably be called TimeFormatContext or just FormatContext. Something to that effect:

public interface FormatContext {
   /** If the formatting has been requested as part of a relative time differential */
  Boolean isDuration();

   /** I could also see making additional information and the Date object itself available as well. Basically any contextual information we have available across all format*() methods */

   Date getReference(); // Date.now() by default.
   Date subject; // The date being formatted
   Locale locale; // The current locale
}

Something like that... what do you think? And yes, it would be awesome if you want to take a stab at this.

**EDIT: On second thought... why can't this be accomplished via format() vs decorate() --- That's the whole point of having two separate methods:

format()/formatUnrounded() should provide the raw number + date. Then decorate() is responsible for adding prefix/suffix.

   /**
    * Given a populated {@link Duration} object. Apply formatting (with rounding) and output the result.
    * 
    * @param The original {@link Duration} instance from which the time string should be decorated.
    */
   public abstract String format(final Duration duration);

   /**
    * Given a populated {@link Duration} object. Apply formatting (without rounding) and output the result.
    * 
    * @param The original {@link Duration} instance from which the time string should be decorated.
    */
   public String formatUnrounded(Duration duration);

   /**
    * Decorate with past or future prefix/suffix (with rounding)
    * 
    * @param duration The original {@link Duration} instance from which the time string should be decorated.
    * @param time The formatted time string.
    */
   public String decorate(Duration duration, String time);

   /**
    * Decorate with past or future prefix/suffix (without rounding)
    * 
    * @param duration The original {@link Duration} instance from which the time string should be decorated.
    * @param time The formatted time string.
    */
   public String decorateUnrounded(Duration duration, String time);

Thoughts?

@lincolnthree
Copy link
Member

lincolnthree commented Oct 5, 2020

Yeah, unless I'm mistaken, actually it's right there in your example:

format()

...
TimeFormat format = getFormat(duration.getUnit());
String time = format.format(duration); // returns "5 Tagen"  <--- incorrect, this should return "5 Tage"
return format.decorate(duration, time); // returns "in 5 Tagen" <--- and this should return "in 5 Tagen"

formatDuration()

...
TimeFormat timeFormat = getFormat(duration.getUnit());
return timeFormat.format(duration); // returns "5 Tagen" -> wrong: should be "5 Tage" <--- agreed, this should be "5 Tage"
                                                                                        just like in the method above.

So I think this is totally doable using the current API, you'll just need to reduce the functionality of "format", and use decorate to do the rest of the work.

@lincolnthree
Copy link
Member

lincolnthree commented Oct 5, 2020

Here's a better example of doing this in a Resource bundle, using the TimeFormatProvider API:

https://github.com/ocpsoft/prettytime/blob/master/core/src/main/java/org/ocpsoft/prettytime/i18n/Resources_uk.java

So, I could be wrong, but I still think this is possible to do without modifying the existing codebase. But... if you still think it would be cleaner to do this by changing the internal code, I'm happy to listen.

@hertg
Copy link
Contributor Author

hertg commented Oct 5, 2020

I think adding a context parameter for TimeFormat.format() method signature could solve this problem. However... we need to preserve backwards compatability, so we'll need to add a default method in the interface that provides a passthrough to the existing function, without the context.

Yeah, that could solve the problem. But it would put in question if format() and decorate() should even be seperate methods then, wouldn't it? If the code in format() already is aware about that context, couldn't it just execute the code of the decorate() method subsequently depending on the context? Unless the decorate() method is used in other places I'm not aware of, i don't have the code at hand right now. But maybe I'm just thinking too far ahead here.

I'm aware about backwards compatibility, so I will definitely keep that in mind when opening a PR :)

Yeah, unless I'm mistaken, actually it's right there in your example:

format()

...
TimeFormat format = getFormat(duration.getUnit());
String time = format.format(duration); // returns "5 Tagen"  <--- incorrect, this should return "5 Tage"
return format.decorate(duration, time); // returns "in 5 Tagen" <--- and this should return "in 5 Tagen"

formatDuration()

...
TimeFormat timeFormat = getFormat(duration.getUnit());
return timeFormat.format(duration); // returns "5 Tagen" -> wrong: should be "5 Tage" <--- agreed, this should be "5 Tage"
                                                                                        just like in the method above.

So I think this is totally doable using the current API, you'll just need to reduce the functionality of "format", and use decorate to do the rest of the work.

I'm not sure if I understand. Do you mean that the format(duration) method should return 5 Tage and the decorate(duration, time) should then just append an n at the end? Resulting in "in 5 Tagen".

While this might work for the issues I've mentioned in German, this seems like a dirty workaround. It's just lucky for us that all the problems I've found for German can be solved that way, because their plural all end with an e. If the words didn't end with e, the decorate method would need to append an en in most cases, but I'm sure that there are exceptions to the rule, as there always are in grammar 😅

I also don't expect the issue in the Arabic language, which was mentioned above, could be solved that easily. But I don't know anything about Arabic, so I might be wrong.

Or did I understand your proposal incorrectly?

@lincolnthree
Copy link
Member

I'm not sure if I understand. Do you mean that the format(duration) method should return 5 Tage and the
decorate(duration, time) should then just append an n at the end? Resulting in "in 5 Tagen".

Yes, that's right. That's the original intent of the separation between format & decorate methods.

This (format vs decorate with TimeFormatProvider) seems like a proper use of the APIs as they were intended. Many languages have specific requirements that can only be satisfied in some cases by full customization of the TimeFormat instance. SimpleTimeFormat is exactly that, "simple" -- It only works in simple cases for languages that don't require lots of conjugation tensing, or casing. That's the reason the TimeFormatProvider API/SPI exists, to provide additional customization at the locale-specific level.

I think it makes sense to me to provide a custom TimeFormat instance in the German locale resource bundle.

I'm still open to a PR that exposes extra contextual information if you think it will make for a cleaner solution, but I worry about API explosion, where we're now adding another 4 methods on the TimeFormat API just to provide information for one language resource bundle, when a solution already exists in the current API.

I also worry about performance overhead of creating new context objects on every call to format. It's not much, but every object counts in a lower level library like this that gets embedded / used in larger things. I think creating one single/reusable TimeFormat instance in the Resources_de.java file is probably more efficient in the grand scheme of things.

Thoughts?

@hertg
Copy link
Contributor Author

hertg commented Oct 5, 2020

Yeah, I understand your concerns.

I opened a PR fixing the issues I initially reported.
The problem still persists when using accurate durations like 4 months 2 days but I've mentioned that in the PR.

I must say that it felt quite a bit bad to solve it in this "dirty" way, but I agree with you that restructuring the core classes just because of one language isn't a great idea.

But I've seen that there are quite a few unresolved issues. If some of them are unresolved because they can't be resolved very easily with the current API, it might be worth thinking about some restructuring. But I know that the API structure probably isn't what is holding those issues back 😄

Btw. If you got some chores to be done (writing missing tests, adding missing translations, etc), maybe you could create issues for those and tag them with a hacktoberfest label. Or maybe add the hacktoberfest topic to the repository. That might attract some people participating in Hacktoberfest to do some work on that.

I use prettytime in a lot of projects at our company, seeing some people actively working on it would be pretty cool :)

@lincolnthree
Copy link
Member

Thanks so much! I'll review it ASAP.

Like I said, I'm still willing to entertain a PR that addresses the concerns you have with the API, but I just want to make sure the discussion about it was comprehensive and included potential issues that might occur.

@lincolnthree
Copy link
Member

Sorry for the delay @hertg . Releasing version 5 now, including this update.

@lincolnthree
Copy link
Member

Note: Java 8 will now be required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants