Skip to content
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.

[Windows] UnicodeDecodeError #13

Closed
yemregundogmus opened this issue Feb 1, 2020 · 13 comments · Fixed by #220
Closed

[Windows] UnicodeDecodeError #13

yemregundogmus opened this issue Feb 1, 2020 · 13 comments · Fixed by #220

Comments

@yemregundogmus
Copy link

Hello, i want to try the demo but i got this issue, i try to change render.py with encoding but it not works.

i use

  • Python 3.7.4
  • conda 4.7.12

`import hiplot as hip

data = [{'dropout':0.1, 'lr': 0.001, 'loss': 10.0, 'optimizer': 'SGD'},
{'dropout':0.15, 'lr': 0.01, 'loss': 3.5, 'optimizer': 'Adam'},
{'dropout':0.3, 'lr': 0.1, 'loss': 4.5, 'optimizer': 'Adam'}]
hip.Experiment.from_iterable(data).display()`

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9e in position 122350: character maps to <undefined>

@danthe3rd
Copy link
Contributor

Hi @yemregundogmus

I'm sorry you are having issues using HiPlot :/

I have several questions to try to narrow down the issue:

  • Do you have a stack trace for the error you posted?
  • I assume you are using Jupyter Notebook - is that correct?
  • What OS/version do you have?

Thank you!

@yemregundogmus
Copy link
Author

yemregundogmus commented Feb 1, 2020

Hello @danthe3rd,


UnicodeDecodeError Traceback (most recent call last)
in
2 {'dropout':0.15, 'lr': 0.01, 'loss': 3.5, 'optimizer': 'Adam'},
3 {'dropout':0.3, 'lr': 0.1, 'loss': 4.5, 'optimizer': 'Adam'}]
----> 4 hip.Experiment.from_iterable(data).display()

D:\Anaconda\lib\site-packages\hiplot\experiment.py in display(self, force_full_width)
185
186 self.validate()
--> 187 return display_exp(self, force_full_width=force_full_width)
188
189 def to_html(self, file: Optional[Union[Path, str, IO[str]]] = None) -> str:

D:\Anaconda\lib\site-packages\hiplot\ipython.py in display_exp(xp, force_full_width)
173 displayed_xp = IPythonExperimentDisplayed(xp, comm_id)
174 index_html = make_experiment_standalone_page(options={
--> 175 'experiment': xp._asdict(),
176 })
177 jupyter_render_iframe(

D:\Anaconda\lib\site-packages\hiplot\render.py in make_experiment_standalone_page(options)
78 hiplot_options.update(options)
79
---> 80 index_html = html_inlinize(get_index_html_template())
81 index_html = index_html.replace(
82 "/ON_LOAD_SCRIPT_INJECT/",

D:\Anaconda\lib\site-packages\hiplot\render.py in html_inlinize(html, replace_local)
63 file = Path(static_root, src)
64 new_tag = soup.new_tag("script")
---> 65 new_tag.string = file.read_text()
66 i.replace_with(new_tag)
67 return str(soup)

D:\Anaconda\lib\pathlib.py in read_text(self, encoding, errors)
1205 """
1206 with self.open(mode='r', encoding="utf-8", errors=errors) as f:
-> 1207 return f.read()
1208
1209 def write_bytes(self, data):

D:\Anaconda\lib\encodings\cp1254.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9e in position 122350: character maps to `

this is my full track.

  • Yes, i use Jupyter Notebook
  • i have Windows10

@danthe3rd
Copy link
Contributor

Thank you for the details!

It looks like it's trying to read the hiplot.bundle.js file with the cp1254 charmap (instead of utf-8). I think this should be a minimal code to reproduce the issue

from pathlib import Path
import hiplot as hip
bundle_js = hip.__file__.split('__init__')[0] + 'static/built/hiplot.bundle.js'
text = Path(bundle_js).read_text()  # This should trigger the same UnicodeDecodeError

If you replace the last line with text = Path(bundle_js).read_text(encoding="utf-8") , it should fix the error I assume. Can you try that? Otherwise I'll try to get access to a windows machine to reproduce the bug myself :)

Looking at your trace, it looks like you modified pathlib.py to force an utf-8 encoding, however somehow it is still using cp1254. Did you restart the notebook kernel after you made the change?

Thank you!

@danthe3rd danthe3rd changed the title UnicodeDecodeError [Windows] UnicodeDecodeError Feb 1, 2020
@danthe3rd
Copy link
Contributor

I managed to reproduce the issue on windows in the CI. Adding the encoding="utf-8" fixes it indeed :)
I'll push version 0.1.2 with this fix today

@danthe3rd
Copy link
Contributor

Version 0.1.2 has just been pushed, which should solve this issue for Windows users. Updating (pip install -U hiplot) and restarting your notebook (kernel) should fix the problem.

Let me know if it works for you :)

@yemregundogmus
Copy link
Author

Hello, it works! thank you so much :)

have a nice day :)

@pandrich
Copy link

pandrich commented Nov 2, 2021

Hello, I believe that I am having a similar problem with the use of hiplot-render that I cannot resolve.
Hiplot attempts to use the cp1252 charmap even though the default encoding is utf-8 (I explicitly write the csv file with that encoding as well).
Weirdly, the test proposed above by @danthe3rd passes without problems.

Here is the traceback of the error I am getting:

C:\Users\andri\Documents\HAL\Learning\tech_talks\sklego_datasette> hiplot-render .\data\IOW_roads.csv > .\data\IOW_roads.html

Traceback (most recent call last):
  File "c:\users\andri\anaconda3\envs\sklego_datasette\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\andri\anaconda3\envs\sklego_datasette\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\andri\anaconda3\envs\sklego_datasette\Scripts\hiplot-render.exe\__main__.py", line 7, in <module>
  File "c:\users\andri\anaconda3\envs\sklego_datasette\lib\site-packages\hiplot\render.py", line 106, in hiplot_render_main
    exp.to_html(sys.stdout)
  File "c:\users\andri\anaconda3\envs\sklego_datasette\lib\site-packages\hiplot\experiment.py", line 368, in to_html
    file.write(html)
  File "c:\users\andri\anaconda3\envs\sklego_datasette\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 852958-852959: character maps to <undefined>

I think the issue is in experiment.py:

if file is not None:
            if isinstance(file, (Path, str)):
                Path(file).write_text(html, encoding="utf-8")
            else:
                file.write(html)

I checked and the file path I am using returns True for the if statement above so am not sure why the file.write(html) line is used instead.

I'm on Windows 10.
Thanks for your help!

@danthe3rd
Copy link
Contributor

Hi @pandrich
Here, file is sys.stdout - and the if condition will be False, so it's correct to call file.write.
Unfortunately as I don't have a windows machine, and I don't have your input csv file, I can't test it myself.

Can you try to modify the file c:\users\andri\anaconda3\envs\sklego_datasette\lib\site-packages\hiplot\render.py and replace the line

elif args.format == 'html':
    exp.to_html(sys.stdout)

with

elif args.format == 'html':
    import codecs
    exp.to_html(codecs.getwriter("utf-8")(sys.stdout))

Let me know if it works - if so I can update HiPlot with this fix.

@pandrich
Copy link

pandrich commented Nov 2, 2021

Hey @danthe3rd,

Thanks a lot for this very quick response. And yes sorry, I do realize that it is challenging to address this without the original file and while working on Linux. Just as a note, I should mention that all works well if I work in WSL.

I tried using your suggestion but that results in a different error:

Traceback (most recent call last):
  File "c:\users\andri\anaconda3\envs\sklego_datasette\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\andri\anaconda3\envs\sklego_datasette\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\andri\anaconda3\envs\sklego_datasette\Scripts\hiplot-render.exe\__main__.py", line 7, in <module>
  File "c:\users\andri\anaconda3\envs\sklego_datasette\lib\site-packages\hiplot\render.py", line 107, in hiplot_render_main
    exp.to_html((codecs.getwriter("utf-8")(sys.stdout)))
  File "c:\users\andri\anaconda3\envs\sklego_datasette\lib\site-packages\hiplot\experiment.py", line 369, in to_html
    file.write(html)
  File "c:\users\andri\anaconda3\envs\sklego_datasette\lib\codecs.py", line 378, in write
    self.stream.write(data)
TypeError: write() argument must be str, not bytes

Thanks again!

@danthe3rd
Copy link
Contributor

Oh interesting. Can you try this now?
Replace the following:

    if args.format == 'csv':
        exp.to_csv(sys.stdout)
    elif args.format == 'html':
        exp.to_html(sys.stdout)
    else:
        assert False, args.format

with:

    import codecs
    stdout_writer = codecs.getwriter("utf-8")(sys.stdout.buffer)
    if args.format == 'csv':
        exp.to_csv(stdout_writer)
    elif args.format == 'html':
        exp.to_html(stdout_writer)
    else:
        assert False, args.format

@pandrich
Copy link

pandrich commented Nov 2, 2021

This seems to work perfectly!
I don't completely understand what is happening :) I didn't know that sys.stdout had a buffer attribute (if I run just codecs.getwriter("utf-8")(sys.stdout.buffer) I get an error about this)?

@danthe3rd
Copy link
Contributor

This is now merged and will be in the next release - thanks for the report and help :)

@pandrich
Copy link

pandrich commented Nov 2, 2021

Great! Thank you @danthe3rd for sorting this out so quickly!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants