Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fail validation because of datetime-formatting messages. LOC-93 #35

Merged
merged 1 commit into from Dec 17, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 8 additions & 1 deletion i18n/dummy.py
Expand Up @@ -22,7 +22,9 @@
generates output conf/locale/$DUMMY_LOCALE/LC_MESSAGES,
where $DUMMY_LOCALE is the dummy_locale value set in the i18n config
"""

from __future__ import print_function

import re

import polib
Expand All @@ -33,6 +35,11 @@
from i18n.generate import clean_pofile


def is_format_message(msg):
"""Is this message a _FORMAT string? These are often treated specially."""
return re.match(r"^[A-Z_]+_FORMAT$", msg.msgid)


class BaseDummyConverter(Converter):
"""Base class for dummy converters.

Expand Down Expand Up @@ -182,7 +189,7 @@ def make_dummy(filename, locale, converter):
for msg in pofile:
# Some strings are actually formatting strings, don't dummy-ify them,
# or dates will look like "DÀTÉ_TÌMÉ_FÖRMÀT Ⱡ'σ# EST"
if re.match(r"^[A-Z_]+_FORMAT$", msg.msgid):
if is_format_message(msg):
continue
converter.convert_msg(msg)

Expand Down
5 changes: 5 additions & 0 deletions i18n/validate.py
Expand Up @@ -10,6 +10,7 @@

import polib

from i18n.dummy import is_format_message
from i18n.execute import call
from i18n.converter import Converter
from i18n import Runner, config
Expand Down Expand Up @@ -100,6 +101,10 @@ def check_messages(filename, report_empty=False):
if astral(msg.msgstr):
problems.append(("Non-BMP char", msg.msgid, msg.msgstr))

if is_format_message(msg):
# LONG_DATE_FORMAT, etc, have %s etc in them, and that's ok.
continue

if msg.msgid_plural:
# Plurals: two strings in, N strings out.
source = msg.msgid + " | " + msg.msgid_plural
Expand Down
14 changes: 14 additions & 0 deletions tests/data/validation_problems.po
Expand Up @@ -49,3 +49,17 @@ msgstr[1] "2. Estas {num} objectos"
# msgid_plural "3. There are {num} things"
# msgstr[0] "3. Estas num objectos"
# msgstr[1] "3. Estas {num} objectos"

# These should not raise a validation error, because LONG_DATE_FORMAT and
# friends are special.
msgid "LONG_DATE_FORMAT"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these names come from? How do we know they'll always end in _FORMAT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a low-tech way to identify these names. I guess I could just make a list instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'd rather keep it like this, in case there are other FORMATs later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm asking is, this PR is whitelisting specific strings for
edx-platform, which seems wrong. Wouldn't it be better to allow a variable
whitelist somewhere in your configuration, so that you can quickly extend
the whitelist for specific things in whatever repo the tool is running on?
As we move to IDAs this toolset is being used more generically across more
repos.

On Thu, Dec 17, 2015 at 11:33 AM, Ned Batchelder notifications@github.com
wrote:

In tests/data/validation_problems.po
#35 (comment):

@@ -49,3 +49,17 @@ msgstr[1] "2. Estas {num} objectos"

msgid_plural "3. There are {num} things"

msgstr[0] "3. Estas num objectos"

msgstr[1] "3. Estas {num} objectos"

+# These should not raise a validation error, because LONG_DATE_FORMAT and
+# friends are special.
+msgid "LONG_DATE_FORMAT"

This is a low-tech way to identify these names. I guess I could just make
a list instead?


Reply to this email directly or view it on GitHub
https://github.com/edx/i18n-tools/pull/35/files#r47927091.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, there was a time there was a string in edx-platform that was
causing issues because it was an instructor facing string explaining how to
use json properly. So, it would get erroneously validated all the time. I
would have loved a generic way to quickly whitelist that specific string.

On Thu, Dec 17, 2015 at 11:38 AM, Sarina Canelake sarina@edx.org wrote:

I guess what I'm asking is, this PR is whitelisting specific strings for
edx-platform, which seems wrong. Wouldn't it be better to allow a variable
whitelist somewhere in your configuration, so that you can quickly extend
the whitelist for specific things in whatever repo the tool is running on?
As we move to IDAs this toolset is being used more generically across more
repos.

On Thu, Dec 17, 2015 at 11:33 AM, Ned Batchelder <notifications@github.com

wrote:

In tests/data/validation_problems.po
#35 (comment):

@@ -49,3 +49,17 @@ msgstr[1] "2. Estas {num} objectos"

msgid_plural "3. There are {num} things"

msgstr[0] "3. Estas num objectos"

msgstr[1] "3. Estas {num} objectos"

+# These should not raise a validation error, because LONG_DATE_FORMAT and
+# friends are special.
+msgid "LONG_DATE_FORMAT"

This is a low-tech way to identify these names. I guess I could just make
a list instead?


Reply to this email directly or view it on GitHub
https://github.com/edx/i18n-tools/pull/35/files#r47927091.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. BTW: the json string is still there, and i'm still thinking about how best to handle that one.
LONG_DATE_FORMAT is a Django string, but I take your point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhhhh OK that's definitely what I was looking for. Since LONG_DATE_FORMAT is a Django string, I'm 👍 on this.

msgstr "%A %d %B %Y"

msgid "DATE_TIME_FORMAT"
msgstr "%d %b %Y, à %H:%M"

msgid "SHORT_DATE_FORMAT"
msgstr "%d %b %Y"

msgid "TIME_FORMAT"
msgstr "%H:%M:%S"