New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Encoding problems #311

Merged
merged 32 commits into from Dec 18, 2017

Conversation

Projects
None yet
7 participants
@Titan-C
Member

Titan-C commented Oct 31, 2017

The filesystem's encoding has generated quite some problems #257, #259, #260, #269
I started this branch long ago, I'll bring it back to live now. This starts the test on Travis-CI for the ASCII encoding.
It does not work yet.

@lesteve

This comment has been minimized.

Contributor

lesteve commented Nov 10, 2017

It'd be nice to fix this indeed. Your last commit status is green, does that mean that this is ready for review?

@Titan-C

This comment has been minimized.

Member

Titan-C commented Nov 10, 2017

@lesteve

This comment has been minimized.

Contributor

lesteve commented Nov 10, 2017

I also need a new test that verifies code blocks are correctly parsed for embedding documentation links, to avoid #259 happening again.

I think there is a test for that already:

def test_embed_links(sphinx_app):
"""Test that links are embedded properly in doc."""
out_dir = sphinx_app.outdir
examples_dir = op.join(out_dir, 'auto_examples')
assert op.isdir(examples_dir)
example_files = os.listdir(examples_dir)
assert 'plot_numpy_scipy.html' in example_files
example_file = op.join(examples_dir, 'plot_numpy_scipy.html')
with codecs.open(example_file, 'r', 'utf-8') as fid:
lines = fid.read()
# ensure we've linked properly
assert '#module-scipy.signal' in lines
assert 'scipy.signal.firwin.html' in lines
assert '#module-numpy' in lines
assert 'numpy.arange.html' in lines
# assert '#module-matplotlib.pyplot' in lines
# assert 'pyplot.html' in lines

Although for some reason the matplotlib check was not passing on Travis which is why it is commented out. I could not reproduce, see #308 (comment) for more details.

@larsoner

This comment has been minimized.

Contributor

larsoner commented Nov 10, 2017

Although for some reason the matplotlib check was not passing on Travis

I suspect this was due to old sphinx being used

@Titan-C Titan-C changed the title from [WIP] Encoding problems to [MRG] Encoding problems Nov 19, 2017

@Titan-C

This comment has been minimized.

Member

Titan-C commented Nov 19, 2017

This is finally working. I even found some extra problems that could be originated by different encodings, characters in use and the source file config. I have added comments explaining such events. I also have generated a new example using unicode emoji just to stress test this setup.
https://184-26191147-gh.circle-artifacts.com/0/home/ubuntu/sphinx-gallery/alabaster_html/auto_examples/plot_unicode_everywhere.html

@Titan-C

This comment has been minimized.

Member

Titan-C commented Dec 5, 2017

Are there any comments on this? Can I merge?
@pllim @NelleV does this solve your issues #253 & #269 ?

@pllim

This comment has been minimized.

Contributor

pllim commented Dec 5, 2017

@Titan-C , I don't have time to test this right now but I can give it a try in the future when a new release with this fix is in. Thanks!

@lesteve

This comment has been minimized.

Contributor

lesteve commented Dec 5, 2017

@Titan-C I am looking at this, please don't pull the trigger yet ;-) !

@lesteve

A few comments

except SyntaxError:
return SYNTAX_ERROR_DOCSTRING, content.decode('utf-8'), 1
return SyntaxError, content.decode('utf-8')

This comment has been minimized.

@lesteve

lesteve Dec 5, 2017

Contributor

This feels a bit kludgy to return an exception class rather than a node. Why don't you SyntaxError through and catch it in get_docstring_and_rest and have an except clause at the parse_file_source call?

This comment has been minimized.

@Titan-C

Titan-C Dec 8, 2017

Member

I agree is not the nicest way. I ended up using this, because I need this function in few more places and I rely those places always having access to the decoded content and to know if the parsing was successful or not.

This comment has been minimized.

@lesteve

lesteve Dec 8, 2017

Contributor

I saw that but I would rather let the SyntaxError go through and catch it where you call get_docstring_and_rest. Arguably you could have a SyntaxError somewhere else in your code but I don't think this is that bad. If we care about that we could catch the SyntaxError in parse_source_file and raise a custom exception e.g. ParsingError (or a better name) then catch ParsingError where parse_source_file` is called.

This comment has been minimized.

@Titan-C

Titan-C Dec 8, 2017

Member

This is on the boundaries of my knowledge. But if I raise the exception how do I get access to the returned value of content?

This comment has been minimized.

@lesteve

lesteve Dec 8, 2017

Contributor

Just in case I am missing and the exception approach does not work, can you not return None instead?

This comment has been minimized.

@Titan-C

Titan-C Dec 8, 2017

Member

Of course. I can return None

'e.HelloWorld': {'name': 'HelloWorld', 'module': 'd', 'module_short': 'd'}}
import locale
print(locale.getlocale())

This comment has been minimized.

@lesteve

lesteve Dec 5, 2017

Contributor

Looks like debugging print statements that you forgot to remove.

import locale
print(locale.getlocale())
with tempfile.NamedTemporaryFile('wb', delete=False) as f:

This comment has been minimized.

@lesteve

lesteve Dec 5, 2017

Contributor

Using pytest tmpdir fixture is a lot more neat for this kind of things.

This comment has been minimized.

@Titan-C

Titan-C Dec 8, 2017

Member

I just did this of using the fixture, but all tests jobs in travis (except 1) failed

This comment has been minimized.

@Titan-C

Titan-C Dec 8, 2017

Member

Any info on this. It's recent commit f2f2d04

@@ -19,30 +19,57 @@
"""
def get_docstring_and_rest(filename):
"""Separate `filename` content between docstring and the rest
def parse_file_source(filename):

This comment has been minimized.

@lesteve

lesteve Dec 5, 2017

Contributor

Maybe parse_python_file or parse_source_file?

def get_docstring_and_rest(filename):
"""Separate `filename` content between docstring and the rest

This comment has been minimized.

@lesteve

lesteve Dec 5, 2017

Contributor

I think you need to use double backticks if you want this to be fixed-width in the HTML reference documentation. Same thing for a few single backticks that you use in a few different places.

def test_get_docstring_and_rest():
docstring, rest, lineno = sg.get_docstring_and_rest(
'sphinx_gallery/tests/unicode.sample')

This comment has been minimized.

@lesteve

lesteve Dec 5, 2017

Contributor

Can you write the content into a temporary file instead using tmpdir? This way the test is a bit easier to understand.

This comment has been minimized.

@Titan-C

Titan-C Dec 8, 2017

Member

Not really. If you check on the previous file there is test_identify_names2 there I indeed write to file. And for everything to work correctly in py2 & py3 I write it as bytes. But in this case and the function test_identify_names I really want to read a file from disk, the file is identical. But is the 2 places I need to be sure the file read from disk is correctly parsed.

This comment has been minimized.

@lesteve

lesteve Dec 8, 2017

Contributor

It seems like this does work:

content = b'\n'.join(
    [b'',
     b'# -*- coding: utf-8 -*-',
     b"'''",
     b'\xc3\x9anicode in header',
     b'=================',
     b'',
     b'U\xc3\xb1icode in description',
     b"'''",
     b'',
     b'# Code source: \xc3\x93scar N\xc3\xa1jera',
     b'# License: BSD 3 clause',
     b'',
     b'import os',
     b"path = os.path.join('a','b')",
     b'',
     b"a = 'hei\xc3\x9f'  # Unicode string",
     b'',
     b'import sphinx_gallery.back_references as br',
     b'br.identify_names',
     b'',
     b'from sphinx_gallery.back_references import identify_names',
     b'identify_names ',
     b''])

# use tmpdir to get a file here instead
open('/tmp/unicode.sample', 'wb').write(content)

Note you can simplify the file because I don't think you are using identify_names in this test.

This comment has been minimized.

@Titan-C

Titan-C Dec 8, 2017

Member

Are you sure this is simpler than having a unicode heavy python file on disk? Also, I use this file twice, one to test that despite the unicode I get_docstring_and_rest and one to test that despite the unicode I identify_names of the used functions.

This comment has been minimized.

@lesteve

lesteve Dec 8, 2017

Contributor

Are you sure this is simpler than having a unicode heavy python file on disk?

Yes, this is a lot more clear because you see exactly what you are writing to the file. Otherwise how do we know how you generated your file. Also test files like this don't get included in the installed package which mean that you can not run the tests on an installed package to make sure that they run. For example this is a problem we already have for sphinx_gallery/tests/reference_parse.txt.

If you use the file content in two tests you can put that in a helper function.

This comment has been minimized.

@Titan-C

Titan-C Dec 14, 2017

Member

OK then. I implemented a pytest fixture to generate this sample file

@Titan-C

This comment has been minimized.

Member

Titan-C commented Dec 14, 2017

This is again ready for review.
Travis on ubuntu fails because the system version of six is too old and pytest coverage refuses to load. But this is a particular situation because
python setup.py test passes as it installs for the test a resent version of six
pytest sphinx_gallery fails because it uses the old system version.
Is everyone fine if I pip install a more resent version of six in this test environment? Other ideas?

@lesteve

This comment has been minimized.

Contributor

lesteve commented Dec 15, 2017

I saw that a few days ago (basically we have a failure on master at the moment because of this). I just opened #325 to fix this.

I guess (I did not check in details) that this is one of the reason that @choldgraf had troubles in #244. Installing sphinx 1.5 probably pulls in a more recent version of six ...

@lesteve lesteve force-pushed the Titan-C:encoding branch from 6fd0594 to 5348a56 Dec 15, 2017

@lesteve

This comment has been minimized.

Contributor

lesteve commented Dec 15, 2017

I added an entry in CHANGES.rst, @Titan-C you probably understand the problem more than me, so feel free to improve/fix it!

Other than this, I think this is ready to be merged, great work on a nasty long-standing issue!

lesteve added some commits Dec 18, 2017

@lesteve

This comment has been minimized.

Contributor

lesteve commented Dec 18, 2017

Merging this one, thanks a lot @Titan-C!

@lesteve lesteve merged commit 14470ed into sphinx-gallery:master Dec 18, 2017

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Titan-C Titan-C deleted the Titan-C:encoding branch Mar 12, 2018

@ImportanceOfBeingErnest

This comment has been minimized.

ImportanceOfBeingErnest commented Apr 17, 2018

When will a version containing this fix be released?
(Building the matplotlib docs on windows currently blocks on this.)

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented Apr 17, 2018

Relatively soon, I believe. We are spot-checking the remaining issues before a release. @lesteve just opened a PR for one and I'll plan to hit the others later this week or next. As always, PRs are welcome ;-)

@jni

This comment has been minimized.

jni commented Apr 28, 2018

I just want to note that this PR also fixed an unclosed file warning when building docs:

/home/travis/venv/lib/python3.4/site-packages/sphinx_gallery/backreferences.py:144:
ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/travis/build/scikit-image/scikit-image/doc/source/auto_examples/color_exposure/plot_ihc_color_separation.py' mode='r' encoding='UTF-8'>

See e.g. https://travis-ci.org/scikit-image/scikit-image/jobs/372237529#L6134

I searched issues and PRs for ResourceWarning and couldn't find anything, so I presumed it was unfixed and almost raised a new issue. =)

Thanks for the great work!

@jni

This comment has been minimized.

jni commented Apr 28, 2018

This was the offending line, now properly context-managed elsewhere:

example_code_obj = identify_names(open(example_file).read())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment