[MRG] Encoding problems #311

Titan-C · 2017-10-31T16:03:10Z

The filesystem's encoding has generated quite some problems #257, #259, #260, #269
I started this branch long ago, I'll bring it back to live now. This starts the test on Travis-CI for the ASCII encoding.
It does not work yet.

lesteve · 2017-11-10T11:30:17Z

It'd be nice to fix this indeed. Your last commit status is green, does that mean that this is ready for review?

Titan-C · 2017-11-10T12:46:34Z

It'd be nice to fix this indeed. Your last commit status is green, does that mean that this is ready for review?

Not really. I just arrived to an easy solution. Enforcing reading all files as byte strings. But I actually want to refactor and put some more test. This solution is easy and wont bite us back as long as no one declares function names with characters outside ASCII before terminate support for Python2.7. But I'm still a bit worried about comments as normal strings. I also need a new test that verifies code blocks are correctly parsed for embedding documentation links, to avoid #259 happening again.

lesteve · 2017-11-10T13:16:51Z

I also need a new test that verifies code blocks are correctly parsed for embedding documentation links, to avoid #259 happening again.

I think there is a test for that already:

sphinx-gallery/sphinx_gallery/tests/test_full.py

Lines 48 to 64 in 587198a

    
           def test_embed_links(sphinx_app): 
        
               """Test that links are embedded properly in doc.""" 
        
               out_dir = sphinx_app.outdir 
        
               examples_dir = op.join(out_dir, 'auto_examples') 
        
               assert op.isdir(examples_dir) 
        
               example_files = os.listdir(examples_dir) 
        
               assert 'plot_numpy_scipy.html' in example_files 
        
               example_file = op.join(examples_dir, 'plot_numpy_scipy.html') 
        
               with codecs.open(example_file, 'r', 'utf-8') as fid: 
        
                   lines = fid.read() 
        
               # ensure we've linked properly 
        
               assert '#module-scipy.signal' in lines 
        
               assert 'scipy.signal.firwin.html' in lines 
        
               assert '#module-numpy' in lines 
        
               assert 'numpy.arange.html' in lines 
        
               # assert '#module-matplotlib.pyplot' in lines 
        
               # assert 'pyplot.html' in lines

Although for some reason the matplotlib check was not passing on Travis which is why it is commented out. I could not reproduce, see #308 (comment) for more details.

larsoner · 2017-11-10T13:52:41Z

Although for some reason the matplotlib check was not passing on Travis

I suspect this was due to old sphinx being used

Titan-C · 2017-11-19T11:29:14Z

This is finally working. I even found some extra problems that could be originated by different encodings, characters in use and the source file config. I have added comments explaining such events. I also have generated a new example using unicode emoji just to stress test this setup.
https://184-26191147-gh.circle-artifacts.com/0/home/ubuntu/sphinx-gallery/alabaster_html/auto_examples/plot_unicode_everywhere.html

Titan-C · 2017-12-05T13:26:50Z

Are there any comments on this? Can I merge?
@pllim @NelleV does this solve your issues #253 & #269 ?

pllim · 2017-12-05T13:50:24Z

@Titan-C , I don't have time to test this right now but I can give it a try in the future when a new release with this fix is in. Thanks!

lesteve · 2017-12-05T14:33:36Z

@Titan-C I am looking at this, please don't pull the trigger yet ;-) !

lesteve

A few comments

lesteve · 2017-12-05T14:16:27Z

sphinx_gallery/py_source_parser.py

    except SyntaxError:
-        return SYNTAX_ERROR_DOCSTRING, content.decode('utf-8'), 1
+        return SyntaxError, content.decode('utf-8')


This feels a bit kludgy to return an exception class rather than a node. Why don't you SyntaxError through and catch it in get_docstring_and_rest and have an except clause at the parse_file_source call?

I agree is not the nicest way. I ended up using this, because I need this function in few more places and I rely those places always having access to the decoded content and to know if the parsing was successful or not.

I saw that but I would rather let the SyntaxError go through and catch it where you call get_docstring_and_rest. Arguably you could have a SyntaxError somewhere else in your code but I don't think this is that bad. If we care about that we could catch the SyntaxError in parse_source_file and raise a custom exception e.g. ParsingError (or a better name) then catch ParsingError where parse_source_file` is called.

This is on the boundaries of my knowledge. But if I raise the exception how do I get access to the returned value of content?

Just in case I am missing and the exception approach does not work, can you not return None instead?

Of course. I can return None

lesteve · 2017-12-05T14:17:56Z

sphinx_gallery/tests/test_backreferences.py

+                'e.HelloWorld': {'name': 'HelloWorld', 'module': 'd', 'module_short': 'd'}}
+
+    import locale
+    print(locale.getlocale())


Looks like debugging print statements that you forgot to remove.

lesteve · 2017-12-05T14:19:45Z

sphinx_gallery/tests/test_backreferences.py

+    import locale
+    print(locale.getlocale())
+
+    with tempfile.NamedTemporaryFile('wb', delete=False) as f:


Using pytest tmpdir fixture is a lot more neat for this kind of things.

I just did this of using the fixture, but all tests jobs in travis (except 1) failed

Any info on this. It's recent commit f2f2d04

lesteve · 2017-12-05T14:24:33Z

sphinx_gallery/py_source_parser.py

@@ -19,30 +19,57 @@
 """


-def get_docstring_and_rest(filename):
-    """Separate `filename` content between docstring and the rest
+def parse_file_source(filename):


Maybe parse_python_file or parse_source_file?

lesteve · 2017-12-05T14:25:51Z

sphinx_gallery/py_source_parser.py

+
+
+def get_docstring_and_rest(filename):
+    """Separate `filename` content between docstring and the rest


I think you need to use double backticks if you want this to be fixed-width in the HTML reference documentation. Same thing for a few single backticks that you use in a few different places.

lesteve · 2017-12-05T14:26:20Z

sphinx_gallery/tests/test_py_source_parser.py

+def test_get_docstring_and_rest():
+
+    docstring, rest, lineno = sg.get_docstring_and_rest(
+        'sphinx_gallery/tests/unicode.sample')


Can you write the content into a temporary file instead using tmpdir? This way the test is a bit easier to understand.

Not really. If you check on the previous file there is test_identify_names2 there I indeed write to file. And for everything to work correctly in py2 & py3 I write it as bytes. But in this case and the function test_identify_names I really want to read a file from disk, the file is identical. But is the 2 places I need to be sure the file read from disk is correctly parsed.

It seems like this does work:

content = b'\n'.join( [b'', b'# -*- coding: utf-8 -*-', b"'''", b'\xc3\x9anicode in header', b'=================', b'', b'U\xc3\xb1icode in description', b"'''", b'', b'# Code source: \xc3\x93scar N\xc3\xa1jera', b'# License: BSD 3 clause', b'', b'import os', b"path = os.path.join('a','b')", b'', b"a = 'hei\xc3\x9f' # Unicode string", b'', b'import sphinx_gallery.back_references as br', b'br.identify_names', b'', b'from sphinx_gallery.back_references import identify_names', b'identify_names ', b'']) # use tmpdir to get a file here instead open('/tmp/unicode.sample', 'wb').write(content)

Note you can simplify the file because I don't think you are using identify_names in this test.

Are you sure this is simpler than having a unicode heavy python file on disk? Also, I use this file twice, one to test that despite the unicode I get_docstring_and_rest and one to test that despite the unicode I identify_names of the used functions.

Are you sure this is simpler than having a unicode heavy python file on disk?

Yes, this is a lot more clear because you see exactly what you are writing to the file. Otherwise how do we know how you generated your file. Also test files like this don't get included in the installed package which mean that you can not run the tests on an installed package to make sure that they run. For example this is a problem we already have for sphinx_gallery/tests/reference_parse.txt.

If you use the file content in two tests you can put that in a helper function.

OK then. I implemented a pytest fixture to generate this sample file

Titan-C · 2017-12-14T22:26:17Z

This is again ready for review.
Travis on ubuntu fails because the system version of six is too old and pytest coverage refuses to load. But this is a particular situation because
python setup.py test passes as it installs for the test a resent version of six
pytest sphinx_gallery fails because it uses the old system version.
Is everyone fine if I pip install a more resent version of six in this test environment? Other ideas?

lesteve · 2017-12-15T13:13:17Z

I saw that a few days ago (basically we have a failure on master at the moment because of this). I just opened #325 to fix this.

I guess (I did not check in details) that this is one of the reason that @choldgraf had troubles in #244. Installing sphinx 1.5 probably pulls in a more recent version of six ...

Needed to open path objects on Python<3.6

lesteve · 2017-12-15T14:15:29Z

I added an entry in CHANGES.rst, @Titan-C you probably understand the problem more than me, so feel free to improve/fix it!

Other than this, I think this is ready to be merged, great work on a nasty long-standing issue!

lesteve · 2017-12-18T08:53:26Z

Merging this one, thanks a lot @Titan-C!

ImportanceOfBeingErnest · 2018-04-17T15:34:20Z

When will a version containing this fix be released?
(Building the matplotlib docs on windows currently blocks on this.)

choldgraf · 2018-04-17T15:36:45Z

Relatively soon, I believe. We are spot-checking the remaining issues before a release. @lesteve just opened a PR for one and I'll plan to hit the others later this week or next. As always, PRs are welcome ;-)

jni · 2018-04-28T01:29:59Z

I just want to note that this PR also fixed an unclosed file warning when building docs:

/home/travis/venv/lib/python3.4/site-packages/sphinx_gallery/backreferences.py:144:
ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/travis/build/scikit-image/scikit-image/doc/source/auto_examples/color_exposure/plot_ihc_color_separation.py' mode='r' encoding='UTF-8'>

See e.g. https://travis-ci.org/scikit-image/scikit-image/jobs/372237529#L6134

I searched issues and PRs for ResourceWarning and couldn't find anything, so I presumed it was unfixed and almost raised a new issue. =)

Thanks for the great work!

jni · 2018-04-28T01:32:21Z

This was the offending line, now properly context-managed elsewhere:

example_code_obj = identify_names(open(example_file).read())

Titan-C changed the title ~~[WIP] Encoding problems~~ [MRG] Encoding problems Nov 19, 2017

larsoner approved these changes Dec 5, 2017

View reviewed changes

lesteve reviewed Dec 5, 2017

View reviewed changes

Titan-C mentioned this pull request Dec 14, 2017

[WIP/Experimental] Capturing objects. Bokeh test case #324

Closed

Titan-C and others added 16 commits December 15, 2017 15:07

Test py2.7 and py3.6 with ascii encoding

8cdd86c

Read files in utf-8 encoding

32f3efb

Test a unicode file read

939d71d

Reproducing bug of not finding function calls in py2.7

32ed0c3

unnecessary edits

167cac0

sample string test

ef99453

cleaning test

9e50820

reproducing bug

c86a179

fix to correctly reproduce bug

b8d77bc

Test to fix doing ast parsing on byte strings

f2af0d8

consistent with the use of bytes reading

6c2e91a

use common file read, brach before here to revert

1b0b96d

Common read can return exception

982d44e

first test encoding fix py3

9b07a4d

use byte string representation for unicode chars

e3fe6ca

write bytes

b6ae44b

Titan-C and others added 13 commits December 15, 2017 15:07

unicode everywhere example

db8f617

Don't enforce font, circle CI

cb52e1a

Fix Python 2.7 encoding

068bd01

Debug fonts

3d5136a

pick font

2cf0e6b

Rename function parse_source_file

9ce3189

docs style

10cd32a

use pytest fixture for tempfile

5341bec

parse source file returns None on SyntaxError

a86ac7e

write binary

b2f754e

unicode sample fixture

824efe5

Use string representation of temporary paths

e9a0c87

Needed to open path objects on Python<3.6

organize files

5348a56

lesteve force-pushed the encoding branch from 6fd0594 to 5348a56 Compare December 15, 2017 14:08

Add CHANGES.rst entry

d4aad3d

lesteve added 2 commits December 18, 2017 08:58

Merge branch 'master' into encoding

56a7879

Fix rst linnk

836cddf

lesteve merged commit 14470ed into sphinx-gallery:master Dec 18, 2017

lesteve mentioned this pull request Dec 18, 2017

Problem compiling from py to rst with unicode when local is not defined on mac #269

Closed

Titan-C deleted the encoding branch March 12, 2018 14:55

ImportanceOfBeingErnest mentioned this pull request Apr 24, 2018

Documentation, Spinx matplotlib/matplotlib#11117

Closed



		def get_docstring_and_rest(filename):
		"""Separate `filename` content between docstring and the rest

[MRG] Encoding problems #311

[MRG] Encoding problems #311

Conversation

Titan-C commented Oct 31, 2017

lesteve commented Nov 10, 2017

Titan-C commented Nov 10, 2017 via email

lesteve commented Nov 10, 2017

larsoner commented Nov 10, 2017

Titan-C commented Nov 19, 2017

Titan-C commented Dec 5, 2017

pllim commented Dec 5, 2017

lesteve commented Dec 5, 2017

lesteve left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lesteve Dec 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Titan-C commented Dec 14, 2017

lesteve commented Dec 15, 2017

lesteve commented Dec 15, 2017

lesteve commented Dec 18, 2017

ImportanceOfBeingErnest commented Apr 17, 2018

choldgraf commented Apr 17, 2018

jni commented Apr 28, 2018

jni commented Apr 28, 2018

lesteve Dec 8, 2017 •

edited

Loading