Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reversed Arabic Numbers/Words #5426

Open
apollolm opened this issue May 8, 2017 · 31 comments
Open

Reversed Arabic Numbers/Words #5426

apollolm opened this issue May 8, 2017 · 31 comments

Comments

@apollolm
Copy link

apollolm commented May 8, 2017

Hi there. We recently upgraded from MapServer 6.2.1 to 7.0.4.

One issue we're seeing is that Arabic road labels that contain a number and a word like طريق 30 is being displayed with the 30 at the end of the phrase. (So Road 30 instead of 30 Road)

Correct display in 6.2.1:
screen shot 2017-05-04 at 9 32 05 am

After upgrade to 7.0.4:
screen shot 2017-05-04 at 9 31 58 am

The '30' should appear before the word.

We are using fribidi_0.19.6 and harfbuzz_1.4.1 (both in our new and previous versions of MapServer). This is happening on both a Windows and Ubuntu 16.04 environment, and seems to only affect the ordering of labels when a number is present in an Arabic label.

One thing that changed in our installation had to do with a freetype/harfbuzz circular dependency:
I had to compile and install freetype w/o harfbuzz. Then install harfbuzz. And then compile and install freetype w/ harfbuzz on top of existing freetype. I am pretty sure that when I installed freetype the second time it overwrote the freetype libraries from the first installation of freetype.

Here are the options we used when compiling:
-DWITH_GDAL=1 -DWITH_HARFBUZZ=1 -DWITH_THREAD_SAFETY=1 -DWITH_JAVA=1 -DWITH_CAIRO=0 -DWITH_GEOS=0 -DWITH_POSTGIS=0 -DWITH_RSVG=0 -DWITH_CLIENT_WMS=0 -DWITH_CLIENT_WFS=0 -DWITH_WFS=0 -DWITH_LIBXML2=0 -DWITH_KML=0 -DWITH_GIF=0 -DWITH_EXEMPI=0 -DWITH_FCGI=0

The input data for both versions is identical - the only thing I can tell that is different is the version of MapServer.

Is this expected or seen by anyone else?

Any thoughts?

@apollolm
Copy link
Author

apollolm commented May 8, 2017

...oh - and the shapefile/input data in this case is already in the correct order when it gets to MapServer: طريق 30
My hunch is that the libraries are doing what they're supposed to be doing, but I'd like an explanation of why this was being displayed correctly (or at least the way we wanted it) before, and why it's changed now.

@geographika
Copy link
Member

In version 7.0 there was a big overhaul of labelling. RFC98 mentions Arabic - see http://mapserver.org/development/rfc/ms-rfc-98.html#text-rendering-pipeline

Related code changes are at #4673

@apollolm
Copy link
Author

Thanks for the link. I looked at that before, but it's still unclear to me whether or not what I'm seeing here is expected behavior. If I open the .shp in QGIS or ArcMap and turn on labels, I get the desired/expected behavior:
screen shot 2017-05-15 at 11 58 10 am

However, in MapServer, I'm still getting the reversed ordering of text/numbers.

I'm attaching the subset of roads (a zipped .shp) from the screenshot above that contains the طريق 30 road name.
kuwait_major_roads.zip

Thanks again.

@jmckenna
Copy link
Member

Can you include a full working sample in your zip (ttf font file, mapfile + layer including ENCODING parameter)?

@jmckenna
Copy link
Member

by the way, if I look at your dbf contents in any text editor, the record appears as '[arabic text], 30'. The same for QGIS and its labels.

@jmckenna
Copy link
Member

maybe it is just your configuration in the mapfile layer. In any case, this discussion would get much more eyes if you posted it to the mapserver-users mailing list first (then, if they decide it is a bug, post here). However, please update your zip here to a full working sample, and I'll take a look.

@jmckenna
Copy link
Member

wait, my words here made me realize what I bet is wrong in your mapfile layer: in version 7 the ENCODING parameter was moved from the LABEL object, to the LAYER object. See examples at http://mapserver.org/mapfile/encoding.html

Again, this discussion should be happening on the mapserver-users list, not here. But, anyway, please make that change locally.

@tchaddad
Copy link

tchaddad commented May 16, 2017

It does sound like a bug.

The Mapserver 7 example is showing that the characters within a word are rendered right to left, but the words themselves are rendered left to right. The shapefile looks correct in QGIS, and Mapserver is flipping the word order.

I wonder if this is because the column contains both Arabic and English labels in a single field? Perhaps Mapserver is applying the word ordering for an entire field based on the first record it encounters?

Or perhaps Mapserver 7 is being too smart - instead of rendering the data as it appears in the file, it is detecting the arabic, and reversing the word order on the assumption that it should be right to left, when it already is...

@tchaddad
Copy link

Also, just to make a small correction to the original description:

The '30' should appear before the word.

According to the screen shot, this is what Mapserver 7 is doing: it is putting the '30' before the word 'Road', as read in Arabic (right to left). However that is incorrect because the desired rendering is 'Road 30', which is the correct Arabic rendering shown in the Mapserver 6.x screenshot.

@jmckenna
Copy link
Member

thanks for (not) providing a full working sample 🥇 ha. I spent too much effort downloading arabic fonts from the web, mostly bad, until finally found one that worked. Please see the following for a full working sample, including font, layer, data, etc.:

ticket-5426-arabic-labels.zip

Here is the result with MapServer 7.0.4:

map

If you open up the file 'kuwait_major_roads.cpg' you will notice that the encoding of the shapefile is in UTF-8. As @geographika pointed out correctly earlier, with MapServer 7 if the label text is not UTF-8 then the fribidi library converts to UTF-8. In this case, I believe nothing happens because the data is already in UTF-8 encoding.

Download the working sample locally, run the mapfile through shp2img, and provide your feedback here :) thanks all. -jeff

@jmckenna
Copy link
Member

ok some good news: I installed an old MS4W containing MapServer 6.2.1 I modified that same mapfile layer in my sample zip, but moved the ENCODING to inside the LAYER object:

LAYER
  NAME "kuwait-roads"
  TYPE LINE
  STATUS ON
  DATA "kuwait_major_roads.shp"
  #ENCODING "UTF-8"    ###from here
  LabelItem "NAME"    
  CLASS
    NAME "Roads"
    STYLE
      COLOR 200 0 0
      WIDTH 2      
    END 
    LABEL
     ENCODING "UTF-8"    ###to here
     COLOR  0 0 0
     FONT arabic
     TYPE truetype
     SIZE 8
     POSITION AUTO      
     PARTIALS FALSE
     OUTLINECOLOR 255 255 255  
     MINFEATURESIZE 2
     MINDISTANCE 10
     BUFFER 5
     #FORCE TRUE
    END     
  END
END # Layer

ttt-utf8

Ok now I've done too much testing ha. Will let others test locally and give feedback. I believe, as I said previously, the changes in MapServer 7 mean that since the source strings (dbf) are already in UTF-8, then fribidi is never called to convert. Thanks all, -jeff

@tchaddad
Copy link

I get the same results Jeff. All I can add is that yesterday when I commented, I was on a Mac, and the data rendered correctly as expected in QGIS. Today, on a windows machine, QGIS (from OSGeo4W) is not able to render the labels even after I install Jeff's provided font That is a different problem, but.

Mapserver 7 via MS4W is not having a problem rendering the labels, but the flipping of the word order is occurring regardless of if or where encoding is specified in the map file. The incorrect output looks like Jeff's first example above.

@jmckenna
Copy link
Member

I get my same exact results above on Ubuntu with today's master compiled from source, with fribidi & harfbuzz support (arabic labels appear correct, but as '[arabic text], 30' using that test font and mapfile.

@apollolm
Copy link
Author

apollolm commented May 18, 2017

Thanks for all of the digging into this issue.

I downloaded your sample (thanks!) and ran it in a clean MapServer 7.0.4 MS4W environment, and got the same output as you show above - the 30 appears to the right of the word, whereas I'm expecting the 30 to appear on the left side of the word.

I've attached a new .zip that includes a watered down version of my .map and font files. shp2img was run using this sample in 7.0.4. The output I get is consistent with my previous images:

test
...the 30 is to the right of the word rather than the left as in 6.2.1.

I tried this both with ENCODING "UTF-8" in the LABEL block and without. Same result.

Let me know if I can provide any more information.

normal.zip

@apollolm
Copy link
Author

Do you think it still worth posting this issue in the mapserver-users mailing list?

@jmckenna
Copy link
Member

If you still have this issue with MapServer 7.4.1, please do bring the issue to the attention of the mapserver-users mailing list. At least now there is a sample package with mapfile & fonts for everyone to test locally and provide feedback.

@jmckenna
Copy link
Member

Reopening. The problem still exists today on Ubuntu with MapServer 7.4.1, fribidi-1.0.5, harfbuzz-2.4.0 (all compiled from source).

@jmckenna jmckenna reopened this Jul 19, 2019
@jmckenna
Copy link
Member

Bringing this issue to the MapServer-users mailing list now, to hopefully get more eyes on it....

@jmckenna
Copy link
Member

Brought to the mapserver-users list (https://lists.osgeo.org/pipermail/mapserver-users/2019-July/081273.html).

@jmckenna
Copy link
Member

I should also mention that I am using freetype-2.9.1

@jmckenna
Copy link
Member

Interesting, QGIS 3.8.0 now displays the labels correctly..
qgis-3 8 0

@tchaddad
Copy link

The glyphs render in the correct RTL order within words, and the words render in the correct RTL direction within a label. Somehow only the numerals are moved.

I tested this by editing the shapefile to include the full multi-word arabic name of the 4th Ring Road, and I added a fake number at the end (as wanted by the OP):

image

kuwait_major_roads_edited123.zip

QGIS renders the multi-word label with number as it appears in the attribute table of the shapefile (number at the end of the words RTL).

Mapserver renders the individual words correctly, and in the correct RTL order, but moves the number from one end of the label to the other end.

++++++++++++++++

Result in QGIS 2.x:
image

Result in Mapserver (MS4W 4):
(Note: added ANGLE AUTO to the Label to make it easier to see)
image

@jmckenna
Copy link
Member

Great testing @tchaddad

I have also opened the DBF in LibreOffice Calc (don't "try this at home" ha) and confirm that the stored text is [number]-[text] as you found; but for some reason MapServer >=7 reverses this order and places the numbers after.

I've now tried this from source on several Ubuntu machines, several Windows machines. Every way I compile MapServer7 I get this problem.

@jmckenna
Copy link
Member

Also confirmed Tanya's tests, on Ubuntu
test8

@jmckenna
Copy link
Member

(the direct translation of "طريق 30" is "Route 30")

From what I read, it is the Bidi algorithm that handles numbers with text properly.

"BIDI is needed for numbers, while arabic text flows from right to left numbers flow from left to right like in latin languages, so BIDI is required even in unilingual texts."

Somewhere MapServer7 is not handling it properly. Hmm.....

@rouault
Copy link
Contributor

rouault commented Jul 19, 2019

the stored text is [number]-[text]

That's not actually true. If you look at the DBF content with an hexadecimal editor, the stored content is {arabic_in_utf8_probably_with_right_most_glyph_first} 30

On Linux, I can also see the same order with ogrinfo, with a terminal that is probably not bidi aware, and displays probably arabic in the logical order, thus incorrectly left-to-right, as found in the DBF
Interestingly copying that from the terminal to here in Firefox gives (30 just after the left sign and then Arabic with right-to-left order)

  NAME (String) = طريق 30

So the behaviour of MapServer is consistent with RFC 98 ( http://mapserver.org/development/rfc/ms-rfc-98.html#text-rendering-pipeline ), that is the arabic glyphs are rendered right-to-left, and at the right of them 30 is displayed in left-to-right.

But... interestingly if, instead of 30, I put letter ASCII characters like ab, then it is displayed the same in my non bidi-aware terminal and here in Firefox (arabic and then ab)

ط ab

And even more interestingly, if in my non bidi-aware terminal, I have {arabic}30ab, when pasted here itbecomes 30{arabic}ab

ط 30ab

But if in my non bidi-aware terminal, I have {arabic}ab30, when pasted here it becomes {arabic}ab30

ط ab30

So it seems there's a difference of handing in the Firefox renderer between digit and non-digit ASCII characters that immediately follow Arabic glyphs

Actually the immediately is a bit more subtle than that... If in my non dibi-aware terminal, I have {arabic}{space}{comma}30.12ab, when pasted here it becomes 30.12{comma}{space}{arabic}ab

ط  ,30.12ab

I suspect this might be an exception case where numbers should be displayed in left-to-right order, but put at the left visually of Arabic glyphs when, in the binary encoding, they're after them. (aren't digit we use in western languages Arabic numbers after all ... ?)
Take the above with a grain of salt: I'm not an Arabic practitioner, and especially in a computing context...

I've not looked at the code to check if it is a Fribidi issue or a MapServer implementation issue: I'd suspect a MapServer one since RFC98 mentions shortcuts for Latin glyphs.

@jmckenna
Copy link
Member

Thanks @rouault that explains my early tests above, using an old simple 'dbf editor' differs from my recent LibreOffice tests, which must be bidi-aware.

The magic seems to be happening in textlayout.c. Hmmm...

@tbonfort
Copy link
Member

Thank you @rouault for bringing back a bit of rationality to this thread.
Mapserver expects text fields to be stored in logical order, which can only really be garanteed if it was originally input with a bidi enabled editor. So in your examples if the text is stored as "arabic 30" it should be rendered as "30 cibara", i.e with the correct rtl rendering applied.
What's happening internally is that text is split into bidi runs, where some characters specifically switch to LTR, some to RTL, and others don't modify the current direction (namely digits, punctuation, etc...)
To return to the original issue, if the text is stored as "30 arabic" it is and should be rendered as "cibara 30".

Mixing specifically LTR and RTL languages leads to an ambiguous situation, "eg. when rendering text stored as "arabic latin" a LTR centric renderer will choose to render "cibara latin" while a RTL centric renderer will choose to render "latin cibara". RFC98 has made some assumptions in that case, but that is not the topci of this issue.

@jmckenna
Copy link
Member

Thanks for the explanation.

I did notice the runs that Fribidi is doing inside the file "textlayout.c"

The case here is that MapServer<7 handled the raw text "arabic 30" by displaying it at "30 arabic", which is of course very important for street-level maps. This is also how other software such as QGIS display the label, as "30 arabic".

Earlier this morning I tried fribidi at the commandline (Fribidi -1.0.5) on Linux, and it seems to spit the same result as MapServer7 ("arabic 30").

I am honestly having a difficult time finding a bash shell or tool that does not enable Bidi in the results, or when examining the raw input (ogrinfo on the dbf displays the text correctly as "30 arabic").

@jmckenna
Copy link
Member

I think maybe my issue at the fribidi commandline is that I am not using the actual raw text as input. (which I am having a difficult time getting access to)

@jmckenna
Copy link
Member

Here is ogrinfo results, which displays the text in the desired form:
ogrinfo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants