Skip to content

Conversation

@gpoulin
Copy link
Contributor

@gpoulin gpoulin commented Feb 13, 2015

First step to at python3 support. Majority of tests still fail but Luigi can be imported in python3. The code still run just fine in python2, the only limitation is an extra dependency on six.

@Tarrasch
Copy link
Contributor

Wow, this looks like quite an endeavour! Will it still be compatible with python 2.6? Because that is what Spotify (the maintainers) is currently using ...

@erikbern
Copy link
Contributor

Travis runs tests on 2.6 so assuming the tests work I say let's merge

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.04% when pulling 27f49b3 on gpoulin:py3k into a7f8f2b on spotify:master.

1 similar comment
@landscape-bot
Copy link

Code Health
Repository health decreased by 0.04% when pulling 27f49b3 on gpoulin:py3k into a7f8f2b on spotify:master.

@gpoulin
Copy link
Contributor Author

gpoulin commented Feb 13, 2015

@Tarrasch seems not :( . I will need to find a way to install 2.6 to be able to run the test locally (not in gentoo tree anymore). It passes under 2.7.

@Tarrasch
Copy link
Contributor

Yep. It would also be worthwhile to add a 3.x-something target to the Travis build matrix.

Ah, well, I would also happily merge this if you can fix the 2.6 bug Travis reports. But it seems to be easy, all stacktraces seem to be about the same error no? https://travis-ci.org/spotify/luigi/jobs/50669429

@erikbern
Copy link
Contributor

Yeah assuming we can get this working let's add 3.1 and 3.2 to Travis builds

@gpoulin
Copy link
Contributor Author

gpoulin commented Feb 14, 2015

I set up a python 2.6 environment. I will try to fix the bug with the atomic_file.

For Travis, isn't it a little early? As I said in my first comment, the majority of the test are still failing.

For the python3 version, I would rather target 3.3 and 3.4. It's easier to support those version and I doubt there is still people using 3.1 and 3.2. People stick with 2.x or upgraded to 3.3 - 3.4 [citation needed].

@erikbern
Copy link
Contributor

Did you look at the Travis output? You probably just have to figure out a workaround for detach

Yeah I didn't mean setting up Travis yet. But we might as well merge the tests as soon as they work with 2.6 & 2.7

Sure let's do 3.3 and 3.4. I'm not super familiar with Python 3 versions

@Tarrasch
Copy link
Contributor

This all sounds sane. :)

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.02% when pulling a6204c8 on gpoulin:py3k into a7f8f2b on spotify:master.

@landscape-bot
Copy link

Code Health
Repository health increased by 0.07% when pulling e30c46b on gpoulin:py3k into a7f8f2b on spotify:master.

@landscape-bot
Copy link

Code Health
Repository health increased by 0.11% when pulling aa79670 on gpoulin:py3k into a7f8f2b on spotify:master.

1 similar comment
@landscape-bot
Copy link

Code Health
Repository health increased by 0.11% when pulling aa79670 on gpoulin:py3k into a7f8f2b on spotify:master.

@landscape-bot
Copy link

Code Health
Repository health increased by 0.14% when pulling 1959b2f on gpoulin:py3k into a7f8f2b on spotify:master.

@landscape-bot
Copy link

Code Health
Repository health increased by 0.09% when pulling 1ed2fb1 on gpoulin:py3k into a7f8f2b on spotify:master.

@gpoulin gpoulin mentioned this pull request Feb 14, 2015
@landscape-bot
Copy link

Code Health
Repository health increased by 0.09% when pulling b88de66 on gpoulin:py3k into a7f8f2b on spotify:master.

@gpoulin
Copy link
Contributor Author

gpoulin commented Feb 14, 2015

I fixed the atomic_file, Travis is happy now. I also continue the effort to port to python3. There is still many tests that fail, but the Top10Artists example seems to work. (examples/top_artists.py Top10Artists --local-scheduler --date-interval 2012-07)

@Tarrasch
Copy link
Contributor

Nice that this passes! Do you want us to merge this so you don't get conflicts or do you want to make it work for 3.x first? (and then please also add the version to the build matrix)

@gpoulin
Copy link
Contributor Author

gpoulin commented Feb 15, 2015

I added python 3.3 and 3.4 to the build matrix, but allowed failure. Feel free to merge this branch. It will indeed make conflict resolution easier.

I'll create another branch py3k_test in which I'll modify the tests to be compatible with python3, I won't modify the tests in the branch py3k. That way we can be more confident that the code is still working and not the test that started to not test correctly.

@landscape-bot
Copy link

Code Health
Repository health increased by 0.10% when pulling 3d72f6d on gpoulin:py3k into f6cb70f on spotify:master.

Tarrasch added a commit that referenced this pull request Feb 15, 2015
Py3k

Note. Python 3 does *not* work yet. Tests are failing and we've just white listed that in Travis.
@Tarrasch Tarrasch merged commit 61f9667 into spotify:master Feb 15, 2015
@Tarrasch
Copy link
Contributor

Thanks! Looking forward for you test-fixing patches! :)

@gpoulin
Copy link
Contributor Author

gpoulin commented Feb 15, 2015

@Tarrasch @erikbern I realized that I might have overlooked the issues with the Target/IO (ie.: atomic_file). Python 3 force you to make the distinction between unicode and bytes (bytes=str for python2 and unicode=str for python3), however Luigi currently doesn't distinguish between this two concepts . There is many ways to solve this issue, however the choice of the solution will have an impact on Luigi API so I just wanted to be sure you are comfortable with the solution I suggest.

Python offer two file modes, text and binary. In python2 the text mode only convert the character \n to the platform specific newline character. However in python3 the text mode also convert the stream into str object (equivalent of unicode in python2). Currenlty Luigi support only the text mode. On python2 on unix-like (\n for newline) system (probably the main target of Luigi) is not an issue because those two modes behave the same, however it's probably an issue for job dealing with binary file on non unix platform. However on python3 the two modes are obviously different.

My suggestion to overcome this issue would be to support both mode having an interface similar to python3. The target that support IO interface could be open in 4 different modes (wb, wt, rb, rt). In binary mode, the target would expect bytes input and return bytes output. In text mode, the target would expect unicode as input and would output unicode object. On python2, w and r would be synonym of wb and rb and on python3, w and r would be synonym of wt and rt. If a unicode object is written to a binary file the the object is encoded to the platform specific encoding and a warning would be raised. On platform with \n as newline, this interface would be compatible with the current one.

advantage:
  • possible to write platform independent code for binary and text file
  • cleaner solution (after all they didn't change the IO interface in py3k just to break stuff)
  • better support for unicode object
  • do what you would expect on python3 and python2 (platform with \n as newline)
inconvenient:
  • Might give unexpected result on python2 with platform that doesn't have \n as newline
  • Targets need to support both mode to fully implement the IO interface

There's probably some work around that is possible to apply to overcome the first inconvenient if ever it's an issue.

The other solution would be to continue to support only the text mode for python2 and support only the binary mode for python3.

I'm ready to make the extra effort to support the both modes in the current targets that support the IO interface and to add the relevant test.

Any thought?

@Tarrasch
Copy link
Contributor

Thanks for the summary! I'm for your solution. You should awaits @erikbern s opinion though.

Hey, @themalkolm , you're our most active Windows user I suppose, would you be sad if we assumed \n as as newline?

@themalkolm
Copy link
Contributor

TL;DR; Yep, it makes me sad.

@Tarrasch
Copy link
Contributor

@themalkolm You can upgrade to python 3.x and you won't have any problems afaict. :)

@themalkolm
Copy link
Contributor

@Tarrasch sure, or we can face the fact that files are different in python2 and python3 and do not try to hide it in any way with extra logic or abstraction.

@gpoulin
Copy link
Contributor Author

gpoulin commented Feb 15, 2015

Just to be clear. It would still be possible to deal with text on system that doesn't have \n as newline. If the file is only use by Luigi then there shouldn't be any issue (the newline would be write as \n and \n will be read as a newline). If the file need to be use somewhere else, opening the file in text mode and using unicode object would do the trick.

Note that it might break some code using text file on platform that doesn't use \n as newline, but it should allow using binary file on those platform. My understanding is that currently \n would be converted to \n\r even if I'm trying to write or read a picture.

@gpoulin
Copy link
Contributor Author

gpoulin commented Feb 15, 2015

@themalkolm In fact, it would require more logic and abstraction to recreate the python2 file interface for python2 and the python3 file interface for python3.

On the other hand, the solution that I suggest would only require to explicitly specify if the file is a text file or a binary file to make sure that the code is compatible on all platform and on all python version.

It would also be possible to detect if the system doesn't use \n as newline and change the behavior in consequence, but it would be more a hack to be able to be 100% retro compatible (I would be ready to implement it if we conclude that it's a must).

@gpoulin
Copy link
Contributor Author

gpoulin commented Feb 15, 2015

@themalkolm Can you run the following snippet and say me if you get the expected newline for windows? If it works, it shouldn't be too difficult to deal with newline character on python2 with platform that doesn't have \n has newline character.

import io
f = io.BufferedWriter(io.FileIO('test123', 'w'))
f.write('this\nis\na\ntest\n')
f.close()

@gpoulin gpoulin mentioned this pull request Feb 15, 2015
@themalkolm
Copy link
Contributor

Nope, only \n is there.

On Sun, Feb 15, 2015 at 3:23 PM, Guillaume Poulin notifications@github.com
wrote:

@themalkolm https://github.com/themalkolm Can you run the following
snippet and say me if you get the expected newline for windows? If it
works, it shouldn't be too difficult to deal with newline character on
python2 with platform that doesn't have \n has newline character.

import io
f = io.BufferedWriter(io.FileIO('test123', 'w'))
f.write('this\nis\na\ntest\n')
f.close()


Reply to this email directly or view it on GitHub
#745 (comment).

Regards,
Alexander

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants