New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MessageContext.format #67

Open
wants to merge 67 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@spookylukey

spookylukey commented May 15, 2018

Implementation of #65

This is incomplete, but at a reviewable level - the remaining work to be done shouldn't influence the overall design that much.

I'm unlikely to get back to this soon, but leaving this here so that others are aware that a start has been made, and if anyone wants to give feedback then they can.

spookylukey added some commits May 11, 2018

@spookylukey

This comment has been minimized.

Show comment
Hide comment
@spookylukey

spookylukey May 15, 2018

I based the API and implementation on the JS version. Here are some implementation notes:

  1. The API of JS MessageContext seemed a bit unwieldy:

    var errs = [];
    const msg = ctx.getMessage('a-message-id');
    const val = ctx.format(msg, { arg: 'an argument' }, errs);
    

    The message is retrieved with getMessage, and then just passed straight back to another method on MessageContext - it is essentially an opaque object. Plus it requires manually creating the errors argument, and requires three lines to do the job.

    So in Python I changed the API to:

    val, errs = ctx.format('a-message-id', {'arg': 'an argument'})
    

    Any expected errors that occur while generating the translation are collected in the errors list, as per the Javascript API. If the message itself does not exist at all, however, you will get a LookupError - this seems more in keeping with what Python users would expect.

  2. For messages with attributes, the Javascript version is

    const msg = ctx.getMessage('a-message-id').attrs.subattribute;
    

    This means you have to know something about the type of object returned by getMessage (i.e. the attrs attribute), but then again you just pass the result back to 'format' to do anything useful.

    So I changed that to:

    val, errs = ctx.format('a-message-id.subattribute', {'arg': 'an argument'})
    
  3. getMessage and friends are returning objects that are essentially an implementation detail - AST objects returned by the parser. There is nothing a normal user of MessageContext should be doing with these, and with a different implementation strategy (e.g. compilation instead of interpreting) this might not make any sense.

    So, I've removed these methods. There is still has_message() to check a message exists, and message_ids() that returns a list of identifiers only, not the contents.

  4. In the JS version, the implementation uses 'toString' quite a bit, and relies on the fact that you can call "a string".toString(context) and the extra parameter context gets ignored, which is the desired behaviour. The equivalent does not hold for Python, so I had to structure the code a bit differently.

    In addition, I wanted this implementation to be as loosely tied to the classes in the parser AST modules as possible, so I've done everything with 'external' polymorphism rather than modify those AST classes and add methods to them. (I've used 'functools.singledispatch', which is really a fancy but nicer 'isinstance')

  5. JS only has 'Number', in Python we have 'int' and 'float'. This complicates a few things relative to the JS version.

  6. I've added a few dependencies. The biggest is Babel (python-babel, not the JS thing), a collection of i18n utilities which has a very nice implementation of plural rules, and includes CLDR data, which made implementation of matching plural categories really easy, and also provides number formatting.

spookylukey commented May 15, 2018

I based the API and implementation on the JS version. Here are some implementation notes:

  1. The API of JS MessageContext seemed a bit unwieldy:

    var errs = [];
    const msg = ctx.getMessage('a-message-id');
    const val = ctx.format(msg, { arg: 'an argument' }, errs);
    

    The message is retrieved with getMessage, and then just passed straight back to another method on MessageContext - it is essentially an opaque object. Plus it requires manually creating the errors argument, and requires three lines to do the job.

    So in Python I changed the API to:

    val, errs = ctx.format('a-message-id', {'arg': 'an argument'})
    

    Any expected errors that occur while generating the translation are collected in the errors list, as per the Javascript API. If the message itself does not exist at all, however, you will get a LookupError - this seems more in keeping with what Python users would expect.

  2. For messages with attributes, the Javascript version is

    const msg = ctx.getMessage('a-message-id').attrs.subattribute;
    

    This means you have to know something about the type of object returned by getMessage (i.e. the attrs attribute), but then again you just pass the result back to 'format' to do anything useful.

    So I changed that to:

    val, errs = ctx.format('a-message-id.subattribute', {'arg': 'an argument'})
    
  3. getMessage and friends are returning objects that are essentially an implementation detail - AST objects returned by the parser. There is nothing a normal user of MessageContext should be doing with these, and with a different implementation strategy (e.g. compilation instead of interpreting) this might not make any sense.

    So, I've removed these methods. There is still has_message() to check a message exists, and message_ids() that returns a list of identifiers only, not the contents.

  4. In the JS version, the implementation uses 'toString' quite a bit, and relies on the fact that you can call "a string".toString(context) and the extra parameter context gets ignored, which is the desired behaviour. The equivalent does not hold for Python, so I had to structure the code a bit differently.

    In addition, I wanted this implementation to be as loosely tied to the classes in the parser AST modules as possible, so I've done everything with 'external' polymorphism rather than modify those AST classes and add methods to them. (I've used 'functools.singledispatch', which is really a fancy but nicer 'isinstance')

  5. JS only has 'Number', in Python we have 'int' and 'float'. This complicates a few things relative to the JS version.

  6. I've added a few dependencies. The biggest is Babel (python-babel, not the JS thing), a collection of i18n utilities which has a very nice implementation of plural rules, and includes CLDR data, which made implementation of matching plural categories really easy, and also provides number formatting.

@spookylukey

This comment has been minimized.

Show comment
Hide comment
@spookylukey

spookylukey May 26, 2018

It turns out the MessageContext.format API I've suggested here (accepting an ID rather than a Message object) is also being suggested for the JS version - projectfluent/fluent.js#208

spookylukey commented May 26, 2018

It turns out the MessageContext.format API I've suggested here (accepting an ID rather than a Message object) is also being suggested for the JS version - projectfluent/fluent.js#208

@spookylukey

This comment has been minimized.

Show comment
Hide comment
@spookylukey

spookylukey Jun 12, 2018

The biggest remaining TODO on this PR is supporting DATETIME, which means implementing Intl.DateTimeFormat in Python. Unfortunately this looks like a major obstacle. The spec for it is highly involved, and all written assuming a Javascript context - http://www.ecma-international.org/ecma-402/1.0/#sec-12 . Trying to work out the core of the algorithm compared to the Javascript details, and which bits I actually would need to implement, is not at all easy.

The best options is likely to start with Intl.js, but it still looks like being more work than everything else in this patch put together.

spookylukey commented Jun 12, 2018

The biggest remaining TODO on this PR is supporting DATETIME, which means implementing Intl.DateTimeFormat in Python. Unfortunately this looks like a major obstacle. The spec for it is highly involved, and all written assuming a Javascript context - http://www.ecma-international.org/ecma-402/1.0/#sec-12 . Trying to work out the core of the algorithm compared to the Javascript details, and which bits I actually would need to implement, is not at all easy.

The best options is likely to start with Intl.js, but it still looks like being more work than everything else in this patch put together.

@zbraniecki

This comment has been minimized.

Show comment
Hide comment
@zbraniecki

zbraniecki Jun 12, 2018

Contributor

I don't think it should block. DATETIME is just one of the formatters, and we should aim to get python's API for CLDR datetime formatting for it. Until then, we can either do a dummy date formatting, or skip it all together.

It's important to get it in eventually, but it shouldn't block us from introducing MessageContext API.

Contributor

zbraniecki commented Jun 12, 2018

I don't think it should block. DATETIME is just one of the formatters, and we should aim to get python's API for CLDR datetime formatting for it. Until then, we can either do a dummy date formatting, or skip it all together.

It's important to get it in eventually, but it shouldn't block us from introducing MessageContext API.

@zbraniecki

This comment has been minimized.

Show comment
Hide comment
@zbraniecki

zbraniecki Jun 13, 2018

Contributor

@spookylukey what prevents you from using http://babel.pocoo.org/en/latest/dates.html ? It seems like this should work quite well, especially the format part which resembles the ECMA402 proposed style option proposal: https://github.com/tc39/proposal-ecma402-datetime-style

I would recommend not using the pattern/skeleton options, but I think we could start with just implementing DATETIME function without params, or with the format/style param only.

Later, we could try to add the option bag that finds the matching skeleton/pattern.

wdyt?

Contributor

zbraniecki commented Jun 13, 2018

@spookylukey what prevents you from using http://babel.pocoo.org/en/latest/dates.html ? It seems like this should work quite well, especially the format part which resembles the ECMA402 proposed style option proposal: https://github.com/tc39/proposal-ecma402-datetime-style

I would recommend not using the pattern/skeleton options, but I think we could start with just implementing DATETIME function without params, or with the format/style param only.

Later, we could try to add the option bag that finds the matching skeleton/pattern.

wdyt?

@stasm

This comment has been minimized.

Show comment
Hide comment
@stasm

stasm Jun 13, 2018

Member

I'd say it's totally fine to not support DATETIME for now. This is already a big PR and a great addition to python-fluent! If you agree to leave it out, please file a new issue about adding the support later on.

Member

stasm commented Jun 13, 2018

I'd say it's totally fine to not support DATETIME for now. This is already a big PR and a great addition to python-fluent! If you agree to leave it out, please file a new issue about adding the support later on.

@spookylukey

This comment has been minimized.

Show comment
Hide comment
@spookylukey

spookylukey Jun 14, 2018

@zbraniecki - after a bit of googling the other day I had come across that proposal and had the same idea. I'll implement DATETIME with only the style/dateStyle/timeStyle option(s) for now.

The other options should all be do-able with time, and I think python-babel already exports the data needed to do implement them and get the outputs to match Intl.DateTimeFormat. It's just it will take a while to dig to through the spec and work out what is actually needed, especially as I'm not expert enough with Javascript object model to know the significance of the different ways of constructing objects (e.g. the spec talks about using Object.create and Record etc.)

spookylukey commented Jun 14, 2018

@zbraniecki - after a bit of googling the other day I had come across that proposal and had the same idea. I'll implement DATETIME with only the style/dateStyle/timeStyle option(s) for now.

The other options should all be do-able with time, and I think python-babel already exports the data needed to do implement them and get the outputs to match Intl.DateTimeFormat. It's just it will take a while to dig to through the spec and work out what is actually needed, especially as I'm not expert enough with Javascript object model to know the significance of the different ways of constructing objects (e.g. the spec talks about using Object.create and Record etc.)

@spookylukey

This comment has been minimized.

Show comment
Hide comment
@spookylukey

spookylukey Jun 14, 2018

Regarding docs - we're probably at the point where we need proper docs, rather than just a README. My instinct is to use Sphinx, which I've used lots before and has great support for Python, plus it only takes a few minutes to set up readthedocs.org and get docs built automatically etc. Any objections to me documenting using Sphinx?

spookylukey commented Jun 14, 2018

Regarding docs - we're probably at the point where we need proper docs, rather than just a README. My instinct is to use Sphinx, which I've used lots before and has great support for Python, plus it only takes a few minutes to set up readthedocs.org and get docs built automatically etc. Any objections to me documenting using Sphinx?

@spookylukey

This comment has been minimized.

Show comment
Hide comment
@spookylukey

spookylukey Jun 17, 2018

OK, I've implemented DATETIME roughly as discussed. This is ready for review now. I understand that might take some time :-) , I'm also likely to be offline for a while.

spookylukey commented Jun 17, 2018

OK, I've implemented DATETIME roughly as discussed. This is ready for review now. I understand that might take some time :-) , I'm also likely to be offline for a while.

@spookylukey

This comment has been minimized.

Show comment
Hide comment
@spookylukey

spookylukey Jul 10, 2018

@stasm Another note about this patch: I haven't focussed on performance that much. However, I have another set of patches which add a performance focussed MessageContext implementation which compiles FTL to Python, as opposed to the interpreter approach here. That work is nearing completion. It doesn't supersede this PR however - for various reasons that I'll explain later it will be useful to have both implementations.

spookylukey commented Jul 10, 2018

@stasm Another note about this patch: I haven't focussed on performance that much. However, I have another set of patches which add a performance focussed MessageContext implementation which compiles FTL to Python, as opposed to the interpreter approach here. That work is nearing completion. It doesn't supersede this PR however - for various reasons that I'll explain later it will be useful to have both implementations.

@stasm

This comment has been minimized.

Show comment
Hide comment
@stasm

stasm Jul 25, 2018

Member

Hey @spookylukey, thanks for your continued work in this PR! I was busy with the updates to the Syntax Spec; now that 0.6 is out, I hope to be able to start reviewing this soon. On a related note, I'll be happy to help update this PR to the 0.6 AST if you run into any troubles.

Any objections to me documenting using Sphinx?

No objections. We can also add it in a follow-up if you prefer.

Member

stasm commented Jul 25, 2018

Hey @spookylukey, thanks for your continued work in this PR! I was busy with the updates to the Syntax Spec; now that 0.6 is out, I hope to be able to start reviewing this soon. On a related note, I'll be happy to help update this PR to the 0.6 AST if you run into any troubles.

Any objections to me documenting using Sphinx?

No objections. We can also add it in a follow-up if you prefer.

@spookylukey

This comment has been minimized.

Show comment
Hide comment
@spookylukey

spookylukey Jul 25, 2018

@stasm - no problem. I've managed to update to 0.6 spec already, as per the last commit above.

I'm now getting failures on Python 2.7 only. It turns out there has been a change that Identifier.name is now a bytestring (str) object on Python 2.7 only. Before it was a unicode string, and this is causing the failure (resolver tries to use unicode strings only).

Before:

>>> FluentParser().parse('foo = { bar }').body[0].value.elements[0].expression.id.name
u'bar'

After 0.6 changes merged:

>>> FluentParser().parse('foo = { bar }').body[0].value.elements[0].expression.id.name
'bar'

On Python 3, it was and still is a unicode string. I could work around this on Python 2.7, but this seems like an unintentional change, and doesn't match other things e.g. MessageReference.name etc.

spookylukey commented Jul 25, 2018

@stasm - no problem. I've managed to update to 0.6 spec already, as per the last commit above.

I'm now getting failures on Python 2.7 only. It turns out there has been a change that Identifier.name is now a bytestring (str) object on Python 2.7 only. Before it was a unicode string, and this is causing the failure (resolver tries to use unicode strings only).

Before:

>>> FluentParser().parse('foo = { bar }').body[0].value.elements[0].expression.id.name
u'bar'

After 0.6 changes merged:

>>> FluentParser().parse('foo = { bar }').body[0].value.elements[0].expression.id.name
'bar'

On Python 3, it was and still is a unicode string. I could work around this on Python 2.7, but this seems like an unintentional change, and doesn't match other things e.g. MessageReference.name etc.

@stasm

This comment has been minimized.

Show comment
Hide comment
@stasm

stasm Jul 27, 2018

Member

Yay, Python 2 and Unicode :)

As far as I can tell, this happens because the parser has an internal iterator over the string it parses. The iterator yields char by char. If the string passed into FluentParser.parse is a bytestring, then the chars yielded are bytestrings as well. For instance, this works on Python 2:

>>> FluentParser().parse(u'foo = { bar }').body[0].value.elements[0].expression.id.name
u'bar'

I see that your tests import unicode_literals, so I'm not quite sure why you're running into this.

Member

stasm commented Jul 27, 2018

Yay, Python 2 and Unicode :)

As far as I can tell, this happens because the parser has an internal iterator over the string it parses. The iterator yields char by char. If the string passed into FluentParser.parse is a bytestring, then the chars yielded are bytestrings as well. For instance, this works on Python 2:

>>> FluentParser().parse(u'foo = { bar }').body[0].value.elements[0].expression.id.name
u'bar'

I see that your tests import unicode_literals, so I'm not quite sure why you're running into this.

spookylukey added some commits Jul 27, 2018

Fixed failing tests on Python 2.7
dedent_ftl was converting unicode strings to bytestrings
@spookylukey

This comment has been minimized.

Show comment
Hide comment
@spookylukey

spookylukey Jul 27, 2018

@stasm - thanks for the info, I've found it - the dedent_ftl module wasn't using unicode literals, so was converting unicode strings to byte strings.

spookylukey commented Jul 27, 2018

@stasm - thanks for the info, I've found it - the dedent_ftl module wasn't using unicode literals, so was converting unicode strings to byte strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment