Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix persian characters on chrome and safari #3707

Merged
merged 10 commits into from Jan 4, 2017
Merged

fix persian characters on chrome and safari #3707

merged 10 commits into from Jan 4, 2017

Conversation

miladkdz
Copy link
Contributor

@miladkdz miladkdz commented Jan 1, 2017

To achieve this purpose, we created a mapping for characters in fix-string.js.

In order to display the names correctly, a temporary tag is created named real_name which is terminated while putting changesets to the server.

@pnorman
Copy link
Contributor

pnorman commented Jan 1, 2017

There's no comments, text in this PR, or issue links which describe what the current problem is that this is supposed to fix.

@bhousel
Copy link
Member

bhousel commented Jan 2, 2017

This is for #3491

@miladkdz, Thank you!! I took a quick look and have a few questions...

  • Is there a reason that the code only looks for highway name tags?
  • I'm not opposed to including the replacement characters in fix-string directly into iD, but is there any existing JavaScript library that we could import that does this for us?
  • You won't want to create a tag called real_name for fear of collisions. Really it's best to assume that OSM can use "any tags they like". We can work around this by adding a transient graph property for the "display tags" to the osmEntity class (like how we compute and cache object extents), I think it probably makes sense to just transliterate (is that the right word?) all the tags, since that's a small dataset and they all get displayed in the sidebar. update: what @mapmeld suggested is better
  • should use utilDetect() instead of inspecting navigator.UserAgent directly (minor nitpick)
  • It probably belongs in the util module and should have a better name than fix-string (minor nitpick)

@mapmeld can you take a quick pass at this and let me know if it seems like a good approach? I'm totally happy to finish up the changes to the code listed above.

@mapmeld
Copy link
Contributor

mapmeld commented Jan 2, 2017

  1. We should apply this to any line because on points and (maybe?) polygons, the right-to-left text is already displaying correctly. The issue is when Webkit-based browsers are placing text along an SVG textPath.
  2. Renaming fix-string and fixTextForSvg would go a long way to making it clear
  3. There should be a regular expression to run this function only when it contains characters in Arabic Unicode block... otherwise we are using it on every character in every label
  4. There should be a test for how this works with bi-directional text or numbers (equivalent of: '38th Street')

@miladkdz
Copy link
Contributor Author

miladkdz commented Jan 2, 2017

Hi @bhousel,

  • The reason I checked only for highway tags is that the SVG rendering bug happens only on elements. Other elements were fine. If there are any other tags that render as , we should look for them too.
  • I didn't find any libraries that would help us here. Of course there are many more characters that we should look for but I settled for Persian standard characters only as my Arabic knowledge is very limited and I have no Idea where to look. But I believe that there are not more than 20 characters that should be added to the file. Maybe an Arab speaking fellow can help us here?
  • I will look more into this. I didn't like the approach either but it was fast.
  • Sure. Will replace.
  • I will move it and discuss the naming further with you and @mapmeld.

Many thanks for your complete review.

…le name.

User regex to detect arabic characters.

Fix editing bugs.
@miladkdz
Copy link
Contributor Author

miladkdz commented Jan 2, 2017

@bhousel
I pushed another commit fixing most of the things you listed. Though the approach for replacing tags(real_name) has not changed yet. I will do it soon. In the meanwhile, I would appreciate if you guide me a little bit more. Maybe a document or another issue that used the approach?

@mapmeld
Copy link
Contributor

mapmeld commented Jan 3, 2017

Is it possible to change the flip to happen in the code where the textPath is being created, rather than changing the name of the object?

Here are some issues which I saw in Tunisia (looking for Arabic language and bidi compatibility)

  • (RN 3 طو) at coordinates 18.49/36.53602/9.99751 -- Arabic letters remain LTR
  • (Avenue Habib Bourguiba شارع الحبيب بورقيبة) at 18.42/36.79970/10.18295 -- Arabic and Latin letters are flipped, including the 'name' shown in the editor is "ةﺒيﻗﺭﻮﺑ ﺐيﺒﺤﻟا ﻉﺭﺎﺷ abiugruoB bibaH eunevA" -- "iin" shows both "ii" and "n" in word-ending forms
  • I am not able to edit the name property
  • These two should be written 'شارع الحرية' (iiyah at left end) and 'نهج الباكستان' (disconnected k in middle)

screen shot 2017-01-03 at 5 29 36 pm

@miladkdz
Copy link
Contributor Author

miladkdz commented Jan 3, 2017

Fixed kaf.

@miladkdz
Copy link
Contributor Author

miladkdz commented Jan 3, 2017

The only problem now is when the name parameter has both persian/arabic and other latin characters.

@mapmeld
Copy link
Contributor

mapmeld commented Jan 3, 2017

I think I have a PR for @miladkdz which fixes latin + Arabic character issues.

Main concerns for me at this point would be:

  • verifying that we have all Arabic letters and their forms right (especially since the LTR/RTL system will have trouble with missing letters)
  • including short vowel signs which go above or below the letter (a i and u)

delete changes.deleted[i].tags.real_name;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this change? I don't think so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. I will remove them.

@mapmeld
Copy link
Contributor

mapmeld commented Jan 3, 2017

@pnorman @bhousel here's my write-up (mostly for non-OSM / non-i18n people) on what this problem is and what this solution is

https://gist.github.com/mapmeld/556b09ddec07a2044c76e1ef45f01c60

@jfirebaugh
Copy link
Member

@mapmeld Awesome writeup, thank you! @miladkdz Thanks for the PR!

I've been lurking here but I want to jump in and note the overlap in this problem to one we've been working on over in Mapbox GL, and cc @ChrisLoer, who has been working on it over there. In particular, the current proposal to improve Arabic text rendering in Mapbox GL JS uses the same approach here, of replacing normal Arabic code points with the appropriate presentation forms. It does so using ICU, which has a robust algorithm for this, compiled to JavaScript via emscripten.

The drawback to this approach is that the compiling the ICU C and C++ code to JavaScript has substantial size overhead. Maybe we could combine forces here and port the relevant ICU code to native and idiomatic JavaScript in a standalone library. That would address the issue of comprehensiveness that @mapmeld brings up, and the size issue that we're facing in GL JS.

digest combined LTR and RTL words
@miladkdz
Copy link
Contributor Author

miladkdz commented Jan 3, 2017

@mapmeld Thanks a lot for your quick fix. 👍

@bhousel
Copy link
Member

bhousel commented Jan 3, 2017

Thanks @mapmeld for the writeup and @miladkdz for the fixes too!

In order to move this forward I'll try to merge the code to master later today. Then people try it out in their browser to see if there are any follow-on tasks per #3707 (comment). Sound good?

@miladkdz
Copy link
Contributor Author

miladkdz commented Jan 3, 2017

@bhousel Perfect.
@mapmeld Thanks for your efforts on this :-)

@ChrisLoer
Copy link

@mapmeld @miladkdz This is awesome! I don't have a lot to add but here are some of the things ICU gives us that may or may not be necessary for iD:

  • Full support for "Tashkeel", aka Arabic Diacritics. This adds a lot of extra characters and is probably something you can get away without... but they do show up. "Yemen" for instance is (often?) written "اليَمَن"
  • ICU BiDi supports explicit order markers which are present in some OSM labels
  • ICU BiDi supports line breaking bidirectional text, which is pretty complicated and probably not necessary for this specific use case

@bhousel bhousel merged commit 3ef1103 into openstreetmap:master Jan 4, 2017
@bhousel
Copy link
Member

bhousel commented Jan 4, 2017

Thanks again, I just merged this! I've tested on Chrome and I do see the difference. Haven't tested with Safari / Edge / IE11 (will leave it up to those fluent in Arabic to test).

We have a mirror of the current master code here that can be used to test the Arabic linestring labels: https://openstreetmap.us/iD/master
Please open new follow-on issues if anything looks off..

I did refactor this into 2 functions: utilDisplayName and utilDisplayNameForPath - because utilDisplayName gets called many places, and for most of them we don't want the reversal code.

This also allowed me to remove the "highway" and "railway" conditionals, because I think the purpose of these was to perform the reversals only on the linestrings. We also label waterways and aeroways with this code, and some highway and railway features may be point or vertex labeled, and we might eventually label other linestrings in the future (like boundaries #2414) so it's better to not hardcode these tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants