-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve german formats #22
Comments
or?
|
Hi, thanks for contributing! I like the first variant better. It looks like it's a more structured way to support multiple languages. E.g. it's easy to put a comment to clarify that it's the same quotations splitter pattern but e.g. in German, etc. About special characters. I believe you should be able to put them as it is e.g. the pattern will look like
And I'd also put the rest of the pattern characters in German. Some characters look the same as English ones but once I entered a password in German and later wasn't able to login because the layout changed :) Indeed, it seems like there are some issues with running the tests in MacOS. I'll try to figure it out. We plan to migrate to scikit in the future which should be easier to integrate with. |
Yes me too, ok I added a comment to this one (see #23), but what do you think about |
Hi @obukhov-sergey shall I improve some of this stuff or what are we going to do? I'm just asking because our project (depending on this mailgun feature) is keep going and I should know if there is any hope that this german support feature is going to hit mailgun's production "soon" or if I have to include a dev version of talon into our projects first. Thank you very much and as I already wrote, I'm willing to help! |
I have another proposal for SPLITTER_PATTERNS in #29: re.compile(u'[\s]*[-]+[ ]*({})[ ]*[-]+'.format(
u'|'.join((
# English
'Original Message', 'Reply Message',
# German
u'Ursprüngliche Nachricht', 'Antwort Nachricht',
# Danish
'Oprindelig meddelelse',
))), re.I) |
hi @jeremyschlatter nobody answered me so i gave up, made my own copy of talon without the signature stuff and my own testcases... works fine for me. Though I'm hoping they let us improve the language stuff, but in my opinion this hole locale thing should be done otherwise. Maybe I we'll do another pull request for this, but the changes would be so big I don't think they let me rewrite the hole library. If the owner puts me in the right direction, I'll come back and contribute my stuff like they say. Until then, I have to keep working on my own copy, because this stuff has to work in my project now. |
sry, didn't recognized you're an owner @jeremyschlatter! as I said in the pull request, one test is still broken and needs further investigation and maybe we should rewrite this hole locale stuff, so that we have one file per language or something. |
I agree that there is probably be a better way to do locale stuff, but I'm not sure it needs to be a big change to the library. Grammar differences might be harder, as in the case of |
Hi guys, I just got the link to this repo from your support. It's awesome that this is open source and I would like to help improve the german formats.
I just need some help to get started. E.g. I would like to respect this german Windows Outlook split pattern:
-----Ursprüngliche Nachricht-----
As you can see this includes a special character
ü
which is represented as=C3=BC
(in Base64, I guess). So what do i have to do, something like this inquotations.py
?Is this correct? I'll try to add more cases which do not work atm (I tested a few, which led me to your support :) I'll open a pull request then. Ah and how can I run the tests and should I create tests for each new case? Thanks for your advise.
The text was updated successfully, but these errors were encountered: