You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Always use a language attribute on the html tag to declare the default language of the text in the page. When the page contains content in another language, add a language attribute to an element surrounding that content.
3.2. Automation
It would be nice to have automation everywhere if it is possible. lang attribute for html tag is mandatory for W3C validator. We need manually set it each time-wasting our time on it.
And a fortiori, manual set the lang attribute in multilingual texts takes a lot of time.
4. Dependencies
To solve this problem we need dependency for natural language detection. For example, Franc Node.js library intended for this and has a simple syntax.
5. Possible problems
5.1. Element content and attribute values in different languages
posthtml-declaring-language should detect bad code as in the example.
5.2. Incorrect language detection
5.2.1. Detection quality
I don’t tested Franc on any another Node.js tools, but I use Python library cld2-cffi for natural text detection in real books and I’m getting good results. For example, see my issue and reply for another repository: cld2-cffi defines natural language well for physics and chemistry books.
5.2.2. Limiting the number of languages
In my case, in the vast majority of situations I need lang="en" or lang="ru". Tools as Franc and cl2-cffi on short text between tags may have difficulties to determine if it is Russian language or Ukrainian. But they shouldn’t have a problem to determine between Russian and English.
It would be nice to have the languages option in posthtml-declaring-language plugin. If values of this option is en, ru, the plugin will automatically add lang="en" and lang="ru" for tags, the text between which the plugin has regarded with a high degree of probability as English or Russian. If we need lang="uk", we need to add it to our HTML markup manually in this case.
Thanks.
The text was updated successfully, but these errors were encountered:
1. Summary
It would be nice, if would be possible automatically declare natural languages in HTML.
I couldn’t find any tools on any programming language who would do it.
2. Example of desired behavior
For Russian article.
2.1. Input
2.2. Output
3. Argumentation
3.1. W3C
From official World Wide Web Consortium site:
3.2. Automation
It would be nice to have automation everywhere if it is possible.
lang
attribute forhtml
tag is mandatory for W3C validator. We need manually set it each time-wasting our time on it.And a fortiori, manual set the
lang
attribute in multilingual texts takes a lot of time.4. Dependencies
To solve this problem we need dependency for natural language detection. For example, Franc Node.js library intended for this and has a simple syntax.
5. Possible problems
5.1. Element content and attribute values in different languages
See an example from official W3C site:
✖ Bad code. Don’t copy!
Valid code:
posthtml-declaring-language should detect bad code as in the example.
5.2. Incorrect language detection
5.2.1. Detection quality
I don’t tested Franc on any another Node.js tools, but I use Python library cld2-cffi for natural text detection in real books and I’m getting good results. For example, see my issue and reply for another repository: cld2-cffi defines natural language well for physics and chemistry books.
5.2.2. Limiting the number of languages
In my case, in the vast majority of situations I need
lang="en"
orlang="ru"
. Tools as Franc and cl2-cffi on short text between tags may have difficulties to determine if it is Russian language or Ukrainian. But they shouldn’t have a problem to determine between Russian and English.It would be nice to have the
languages
option in posthtml-declaring-language plugin. If values of this option isen, ru
, the plugin will automatically addlang="en"
andlang="ru"
for tags, the text between which the plugin has regarded with a high degree of probability as English or Russian. If we needlang="uk"
, we need to add it to our HTML markup manually in this case.Thanks.
The text was updated successfully, but these errors were encountered: