Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upImplement MessageContext.format #67
Conversation
spookylukey
added some commits
May 11, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spookylukey
May 15, 2018
I based the API and implementation on the JS version. Here are some implementation notes:
-
The API of JS MessageContext seemed a bit unwieldy:
var errs = []; const msg = ctx.getMessage('a-message-id'); const val = ctx.format(msg, { arg: 'an argument' }, errs);The message is retrieved with
getMessage, and then just passed straight back to another method onMessageContext- it is essentially an opaque object. Plus it requires manually creating the errors argument, and requires three lines to do the job.So in Python I changed the API to:
val, errs = ctx.format('a-message-id', {'arg': 'an argument'})Any expected errors that occur while generating the translation are collected in the errors list, as per the Javascript API. If the message itself does not exist at all, however, you will get a LookupError - this seems more in keeping with what Python users would expect.
-
For messages with attributes, the Javascript version is
const msg = ctx.getMessage('a-message-id').attrs.subattribute;This means you have to know something about the type of object returned by
getMessage(i.e. theattrsattribute), but then again you just pass the result back to 'format' to do anything useful.So I changed that to:
val, errs = ctx.format('a-message-id.subattribute', {'arg': 'an argument'}) -
getMessageand friends are returning objects that are essentially an implementation detail - AST objects returned by the parser. There is nothing a normal user of MessageContext should be doing with these, and with a different implementation strategy (e.g. compilation instead of interpreting) this might not make any sense.So, I've removed these methods. There is still
has_message()to check a message exists, andmessage_ids()that returns a list of identifiers only, not the contents. -
In the JS version, the implementation uses 'toString' quite a bit, and relies on the fact that you can call
"a string".toString(context)and the extra parametercontextgets ignored, which is the desired behaviour. The equivalent does not hold for Python, so I had to structure the code a bit differently.In addition, I wanted this implementation to be as loosely tied to the classes in the parser AST modules as possible, so I've done everything with 'external' polymorphism rather than modify those AST classes and add methods to them. (I've used 'functools.singledispatch', which is really a fancy but nicer 'isinstance')
-
JS only has 'Number', in Python we have 'int' and 'float'. This complicates a few things relative to the JS version.
-
I've added a few dependencies. The biggest is Babel (python-babel, not the JS thing), a collection of i18n utilities which has a very nice implementation of plural rules, and includes CLDR data, which made implementation of matching plural categories really easy, and also provides number formatting.
spookylukey
commented
May 15, 2018
•
|
I based the API and implementation on the JS version. Here are some implementation notes:
|
spookylukey
added some commits
May 15, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spookylukey
May 26, 2018
It turns out the MessageContext.format API I've suggested here (accepting an ID rather than a Message object) is also being suggested for the JS version - projectfluent/fluent.js#208
spookylukey
commented
May 26, 2018
|
It turns out the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spookylukey
Jun 12, 2018
The biggest remaining TODO on this PR is supporting DATETIME, which means implementing Intl.DateTimeFormat in Python. Unfortunately this looks like a major obstacle. The spec for it is highly involved, and all written assuming a Javascript context - http://www.ecma-international.org/ecma-402/1.0/#sec-12 . Trying to work out the core of the algorithm compared to the Javascript details, and which bits I actually would need to implement, is not at all easy.
The best options is likely to start with Intl.js, but it still looks like being more work than everything else in this patch put together.
spookylukey
commented
Jun 12, 2018
|
The biggest remaining TODO on this PR is supporting The best options is likely to start with Intl.js, but it still looks like being more work than everything else in this patch put together. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
zbraniecki
Jun 12, 2018
Contributor
I don't think it should block. DATETIME is just one of the formatters, and we should aim to get python's API for CLDR datetime formatting for it. Until then, we can either do a dummy date formatting, or skip it all together.
It's important to get it in eventually, but it shouldn't block us from introducing MessageContext API.
|
I don't think it should block. It's important to get it in eventually, but it shouldn't block us from introducing MessageContext API. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
zbraniecki
Jun 13, 2018
Contributor
@spookylukey what prevents you from using http://babel.pocoo.org/en/latest/dates.html ? It seems like this should work quite well, especially the format part which resembles the ECMA402 proposed style option proposal: https://github.com/tc39/proposal-ecma402-datetime-style
I would recommend not using the pattern/skeleton options, but I think we could start with just implementing DATETIME function without params, or with the format/style param only.
Later, we could try to add the option bag that finds the matching skeleton/pattern.
wdyt?
|
@spookylukey what prevents you from using http://babel.pocoo.org/en/latest/dates.html ? It seems like this should work quite well, especially the I would recommend not using the pattern/skeleton options, but I think we could start with just implementing Later, we could try to add the option bag that finds the matching skeleton/pattern. wdyt? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
stasm
Jun 13, 2018
Member
I'd say it's totally fine to not support DATETIME for now. This is already a big PR and a great addition to python-fluent! If you agree to leave it out, please file a new issue about adding the support later on.
|
I'd say it's totally fine to not support |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spookylukey
Jun 14, 2018
@zbraniecki - after a bit of googling the other day I had come across that proposal and had the same idea. I'll implement DATETIME with only the style/dateStyle/timeStyle option(s) for now.
The other options should all be do-able with time, and I think python-babel already exports the data needed to do implement them and get the outputs to match Intl.DateTimeFormat. It's just it will take a while to dig to through the spec and work out what is actually needed, especially as I'm not expert enough with Javascript object model to know the significance of the different ways of constructing objects (e.g. the spec talks about using Object.create and Record etc.)
spookylukey
commented
Jun 14, 2018
|
@zbraniecki - after a bit of googling the other day I had come across that proposal and had the same idea. I'll implement DATETIME with only the The other options should all be do-able with time, and I think python-babel already exports the data needed to do implement them and get the outputs to match |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spookylukey
Jun 14, 2018
Regarding docs - we're probably at the point where we need proper docs, rather than just a README. My instinct is to use Sphinx, which I've used lots before and has great support for Python, plus it only takes a few minutes to set up readthedocs.org and get docs built automatically etc. Any objections to me documenting using Sphinx?
spookylukey
commented
Jun 14, 2018
|
Regarding docs - we're probably at the point where we need proper docs, rather than just a README. My instinct is to use Sphinx, which I've used lots before and has great support for Python, plus it only takes a few minutes to set up readthedocs.org and get docs built automatically etc. Any objections to me documenting using Sphinx? |
spookylukey
added some commits
Jun 13, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spookylukey
Jun 17, 2018
OK, I've implemented DATETIME roughly as discussed. This is ready for review now. I understand that might take some time :-) , I'm also likely to be offline for a while.
spookylukey
commented
Jun 17, 2018
|
OK, I've implemented DATETIME roughly as discussed. This is ready for review now. I understand that might take some time :-) , I'm also likely to be offline for a while. |
spookylukey
added some commits
Jul 6, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spookylukey
Jul 10, 2018
@stasm Another note about this patch: I haven't focussed on performance that much. However, I have another set of patches which add a performance focussed MessageContext implementation which compiles FTL to Python, as opposed to the interpreter approach here. That work is nearing completion. It doesn't supersede this PR however - for various reasons that I'll explain later it will be useful to have both implementations.
spookylukey
commented
Jul 10, 2018
|
@stasm Another note about this patch: I haven't focussed on performance that much. However, I have another set of patches which add a performance focussed |
spookylukey
added some commits
Jul 24, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
stasm
Jul 25, 2018
Member
Hey @spookylukey, thanks for your continued work in this PR! I was busy with the updates to the Syntax Spec; now that 0.6 is out, I hope to be able to start reviewing this soon. On a related note, I'll be happy to help update this PR to the 0.6 AST if you run into any troubles.
Any objections to me documenting using Sphinx?
No objections. We can also add it in a follow-up if you prefer.
|
Hey @spookylukey, thanks for your continued work in this PR! I was busy with the updates to the Syntax Spec; now that 0.6 is out, I hope to be able to start reviewing this soon. On a related note, I'll be happy to help update this PR to the 0.6 AST if you run into any troubles.
No objections. We can also add it in a follow-up if you prefer. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spookylukey
Jul 25, 2018
@stasm - no problem. I've managed to update to 0.6 spec already, as per the last commit above.
I'm now getting failures on Python 2.7 only. It turns out there has been a change that Identifier.name is now a bytestring (str) object on Python 2.7 only. Before it was a unicode string, and this is causing the failure (resolver tries to use unicode strings only).
Before:
>>> FluentParser().parse('foo = { bar }').body[0].value.elements[0].expression.id.name
u'bar'
After 0.6 changes merged:
>>> FluentParser().parse('foo = { bar }').body[0].value.elements[0].expression.id.name
'bar'
On Python 3, it was and still is a unicode string. I could work around this on Python 2.7, but this seems like an unintentional change, and doesn't match other things e.g. MessageReference.name etc.
spookylukey
commented
Jul 25, 2018
|
@stasm - no problem. I've managed to update to 0.6 spec already, as per the last commit above. I'm now getting failures on Python 2.7 only. It turns out there has been a change that Before:
After 0.6 changes merged:
On Python 3, it was and still is a unicode string. I could work around this on Python 2.7, but this seems like an unintentional change, and doesn't match other things e.g. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
stasm
Jul 27, 2018
Member
Yay, Python 2 and Unicode :)
As far as I can tell, this happens because the parser has an internal iterator over the string it parses. The iterator yields char by char. If the string passed into FluentParser.parse is a bytestring, then the chars yielded are bytestrings as well. For instance, this works on Python 2:
>>> FluentParser().parse(u'foo = { bar }').body[0].value.elements[0].expression.id.name
u'bar'I see that your tests import unicode_literals, so I'm not quite sure why you're running into this.
|
Yay, Python 2 and Unicode :) As far as I can tell, this happens because the parser has an internal iterator over the string it parses. The iterator yields char by char. If the string passed into >>> FluentParser().parse(u'foo = { bar }').body[0].value.elements[0].expression.id.name
u'bar'I see that your tests import |
spookylukey
added some commits
Jul 27, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
spookylukey
Jul 27, 2018
@stasm - thanks for the info, I've found it - the dedent_ftl module wasn't using unicode literals, so was converting unicode strings to byte strings.
spookylukey
commented
Jul 27, 2018
|
@stasm - thanks for the info, I've found it - the |
spookylukey commentedMay 15, 2018
Implementation of #65
This is incomplete, but at a reviewable level - the remaining work to be done shouldn't influence the overall design that much.
I'm unlikely to get back to this soon, but leaving this here so that others are aware that a start has been made, and if anyone wants to give feedback then they can.