Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode issue in tooltips on Jupyter notebook #1320

Closed
galewis2 opened this issue May 5, 2020 · 11 comments · Fixed by python-visualization/branca#76
Closed

Unicode issue in tooltips on Jupyter notebook #1320

galewis2 opened this issue May 5, 2020 · 11 comments · Fixed by python-visualization/branca#76
Labels
bug An issue describing unexpected or malicious behaviour jupyter This issue/PR has to do about Jupyter

Comments

@galewis2
Copy link

galewis2 commented May 5, 2020

image

MVP:

import folium
map_osm = folium.Map()
folium.GeoJson('{ "type": "Feature", "properties": { "name": "5/7, Линейная улица, Berdsk, Berdsk municipality, Novosibirsk Oblast, Siberian Federal District, 633011, Russia" }, "geometry": { "type": "Point", "coordinates": [ -75.849253579389796, 47.6434349837781 ] }}', name="5/7, Линейная улица, Berdsk, Berdsk municipality, Novosibirsk Oblast, Siberian Federal District, 633011, Russia", tooltip="5/7, Линейная улица, Berdsk, Berdsk municipality, Novosibirsk Oblast, Siberian Federal District, 633011, Russia").add_to(map_osm)
display(map_osm)

I'm running in Jupyter Lab with Python3.7 and latest Folium version (0.10.1+28.ga8ec61d which is with my PR)

Is there a workaround for this?

@Conengmo Conengmo added bug An issue describing unexpected or malicious behaviour jupyter This issue/PR has to do about Jupyter labels May 6, 2020
@Conengmo
Copy link
Member

Conengmo commented May 6, 2020

I can confirm this is indeed an issue, on Jupyter notebooks only. The encoding is set correctly to utf-8 in both the notebook frame and the map iframe. Since the characters display correctly in the layer control, it's unlikely an issue with the file encoding. This issue seems specific to the tooltip. It also happens for popups.

The characters appear garbled in the html:

<div id="html_46e5bc2ac281404b8b359d6c5707703d" style="width: 100.0%; height: 100.0%;">5/7, �инейна� �ли�а, Berdsk</div>

It's already broken in the JS code that generates that html:

var html_46e5bc2ac281404b8b359d6c5707703d = $(`<div id="html_46e5bc2ac281404b8b359d6c5707703d" style="width: 100.0%; height: 100.0%;">5/7, �инейна� �ли�а, Berdsk</div>`)[0];

In the JS code for the layer control the same string is properly encoded though:

overlays :  {
    "5/7, \u041b\u0438\u043d\u0435\u0439\u043d\u0430\u044f \u0443\u043b\u0438\u0446
},

@Conengmo Conengmo changed the title Unicode issue in tooltips Unicode issue in tooltips on Jupyter notebook May 6, 2020
@galewis2
Copy link
Author

galewis2 commented May 7, 2020

I can confirm this is indeed an issue, on Jupyter notebooks only. The encoding is set correctly to utf-8 in both the notebook frame and the map iframe. Since the characters display correctly in the layer control, it's unlikely an issue with the file encoding. This issue seems specific to the tooltip. It also happens for popups.

The characters appear garbled in the html:

<div id="html_46e5bc2ac281404b8b359d6c5707703d" style="width: 100.0%; height: 100.0%;">5/7, �инейна� �ли�а, Berdsk</div>

It's already broken in the JS code that generates that html:

var html_46e5bc2ac281404b8b359d6c5707703d = $(`<div id="html_46e5bc2ac281404b8b359d6c5707703d" style="width: 100.0%; height: 100.0%;">5/7, �инейна� �ли�а, Berdsk</div>`)[0];

In the JS code for the layer control the same string is properly encoded though:

overlays :  {
    "5/7, \u041b\u0438\u043d\u0435\u0439\u043d\u0430\u044f \u0443\u043b\u0438\u0446
},

Any idea where the problem lies? I tried to have a look at it and couldn't see anything obvious. Did some poking around as well but couldn't make any headway

@Conengmo
Copy link
Member

Conengmo commented May 8, 2020

I think I found the issue. In branca we encode the html for in the notebook. This uses encode('utf-8'). A unicode string like "5/7, Линейная улица, Berdsk" is turned into bytes b'5/7, \xd0\x9b\xd0\xb8\xd0\xbd\xd0\xb5\xd0\xb9\xd0\xbd\xd0\xb0\xd1\x8f \xd1\x83\xd0\xbb\xd0\xb8\xd1\x86\xd0\xb0, Berdsk'. This is then base64 encoded.

When the notebook rehydrates this code it uses atob to do base64 decoding. This function does not convert those characters to the right representations: Ð\u009bинейнаÑ\u008f Ñ\u0083лиÑ\u0086а. I'm no expert on JS but I think it uses a default charset of utf-16.

The solution is to encode the html not as utf-8, but using raw_unicode_escape. This converts "5/7, Линейная улица, Berdsk" into b'5/7, \\u041b\\u0438\\u043d\\u0435\\u0439\\u043d\\u0430\\u044f \\u0443\\u043b\\u0438\\u0446\\u0430, Berdsk' which results in proper dehydrated html in the browser.

I'll open a PR in branca with this fix. You could really help by testing that fix!

@Conengmo Conengmo added the work in progress Work is in progress on a PR, check the PR to see its status label May 8, 2020
@Conengmo
Copy link
Member

@galewis2 did you get a chance to test the PR? I'd merge it with more confidence if you could confirm it indeed solves your issue.

pip install git+https://github.com/conengmo/branca.git@fix-notebook-special-chars

@michelmetran
Copy link

I have the same problem. In Brazil, we use accents and cedillas...
When I save the map in an HTML, the "tooltips" appear normally ... but when the map is "rendered" in the jupyter notebook, the tooltips go wrong.

So I used the suggestion of .encode ('raw_unicode_escape') and it improved ...
but now there is a "b '" at the beginning of my tooltip
raw

And with I don't use the .encode ('raw_unicode_escape')
without

my code...
PS: I noted that if I try do the same in popup = '' + row ['Name'] + ''... it's not work inside a HTML....

# Add the different companies with colors by neighborhoods
for index, row in gdf_cap.iterrows ():
     if row ['Source'] in colors.keys ():
         folium.Marker (
             name = 'Fundraising',
             location = [row ['geometry']. y, row ['geometry']. x],
             popup = '<strong>' + row ['Name'] + '</strong>',
             tooltip = row ['Name']. encode ('raw_unicode_escape'),
             icon = folium.Icon (color = colors [row ['Source']], icon = 'gift')
         ) .add_to (m)

@Conengmo Conengmo removed the work in progress Work is in progress on a PR, check the PR to see its status label Jun 18, 2020
@Conengmo
Copy link
Member

Conengmo commented Jun 18, 2020

A fix has been merged in the branca library. It will be availabe in the next release, release date yet unknown. If you want it earlier you can install branca from the git main branch:

pip install git+https://github.com/python-visualization/branca.git@master

@jiekebo
Copy link

jiekebo commented Nov 27, 2020

@michelmetran I had the same problem with Swedish åäö not rendering properly, and .encode('raw_unicode_escaped') helped to render the strings with non-garbled characters. I did get the b'' in the printed string as you got, but managed to solve it by converting back to string and removing the first two and the last characters like this

str(string.encode('raw_unicode_escaped')[2:-1]

@rngadam
Copy link

rngadam commented Dec 18, 2020

This still seems to be a problem on Folium 0.11.0 (as present on Kaggle). I have to use this workaround:

to display a string such as:

avenue Decelles (Montréal, Côte-des-Neiges-Notre-Dame-de-Grâce)

I have to use this workaround:

def escape(x):
    raw = str(x.encode('raw_unicode_escape'))[2:-1]
    return html.escape(raw)

@lmorillas
Copy link

I think it's an issue with js atob. JavaScript built-in functions btoa and atob do not support Unicode strings. I use the atou function I get on the MDN portal

function atou(b64) {
  return decodeURIComponent(escape(atob(b64)));
}

@Conengmo
Copy link
Member

This was fixed in python-visualization/branca#76 and will be in branca version 0.4.2, which hasn't been released yet but will be soon.

@YiorgosEm
Copy link

This was fixed in python-visualization/branca#76 and will be in branca version 0.4.2, which hasn't been released yet but will be soon.

It also works for the greek alphabet, thank you a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue describing unexpected or malicious behaviour jupyter This issue/PR has to do about Jupyter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants