-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to the language and writing direction detection #250
Improvements to the language and writing direction detection #250
Conversation
This code changes the way `:lang` attributes are handled, allowing more flexibility, including a possible Script specification, as specified in BCP 47. The direction specification (è.g. `dir="rtl"`) now uses the language code as a default, but allows the script specification to override this when needed. A side effect of this change is that the additional config file is no longer needed.
Hello, and thanks for this extremely useful plugin! I have spent a bit of time trying to see how this works, as I was trying to improve compatibility of my template with this plugin. In this process, I noticed a couple of issues where I think I could contribute a bit ... So expect a couple more pull requests from me in the near future :-) First off: I noticed that the code used to determine the Firstly, there are languages which use more than one writing system (the technical term is "digraphia"). For example, Serbian can be written either in Latin or in Cyrillic alphabet. Turkish switched the writing system in the 20th century from Arabic to Latin – but there are still many old texts that are written in Arabic. Kurdish can be written in either Arabic, Latin or Cyrillic, etc. But there is more: if you read a transliteration of a non-Latin text, this is still the same language, but written in a different writing system. To illustrate this, the following are actually two examples that I found in my own wiki:
This also shows the correct way to specify the language in this situation: the script will be added as a four-letter-code after the ISO 639 language code (and any potential other code, like the country, etc.). At the moment, the language detection would not pass such codes through to the output, so that had to be changed as well. This means, that the change will make it possible to even specify very obscure languages (Wikipedia has this beautiful example of " This means that this change now makes it also possible to specify the region. Remember that just as I hope you find this change useful, and I will already start looking at some improvements on the semantic markup and CSS ... coming soon ;-) Best greetings /sascha |
Tested, looks good! |
Thanks, @5shekel this is indeed a good use case, as your site mixes RTL and LTR. May I interest you to try if my own Ad-Hoc Tags plugin would be an alternative for you? It gives you more flexibility as it also supports the |
Could it be that people use the config file to configure other combinations? so that merging results in not backward compatibility issues? if you expect not, I will merge. |
Hm, in principle that would be possible, but I think the overhead would not be worth the benefits. I reckon that the built-in list of languages and scripts are now covering most cases, and with the option to override the script this should be pretty much complete. I should add that I have moved on and made my own plugin which implements (and extends) the attribute handling and other aspects. If there is an interest, I am happy to backport some more features here. :-) |
This code changes the way
:lang
attributes are handled, allowing more flexibility, including a possible Script specification, as specified in BCP 47.The direction specification (è.g.
dir="rtl"
) now uses the language code as a default, but allows the script specification to override this when needed.A side effect of this change is that the additional config file is no longer needed.