Python 3.2 compatibility #50

Agmagor · 2014-07-29T13:02:11Z

Since Python 3, all strings are unicode so the u'...' notation do not make sense anymore.

One can't install your package on a Python 3 installation because of the return statement of this function

def fetch_file(urls):
    """ Decode the first successfully fetched URL, from UTF-8 encoding to
    Python unicode.
    """
    s = ''

    for url in urls:
        try:
            conn = urlopen(url)
            s = conn.read()
        except Exception as e:
            LOG.error('Exception reading Public Suffix List url ' + url + ' - ' + str(e) + '.')
        else:
            return _decode_utf8(s)

    LOG.error('No Public Suffix List found. Consider using a mirror or constructing your TLDExtract with `fetch=False`.')
    return u''

The text was updated successfully, but these errors were encountered:

floer32 · 2014-07-31T04:05:20Z

I personally haven't made the switch to 3 yet (even for a pet project, I know, it's dumb). Would it suffice to replace return u'' with return unicode('')?

Agmagor · 2014-07-31T07:17:15Z

Oh, I didn't get that you weren't trying to do the switch yet :-)
I'm gonna do a fork, try to solve problems and check you out when I'm done

Agmagor · 2014-07-31T07:49:00Z

I just wrote these 2 commits that fixed my problems
Agmagor@5d23350
Agmagor@c63733e

This one is just about the way i prefer to get extract results
Agmagor@58b1e4d

I also tried it on a python 2.7 installation and nothing seems broken

Python 3:

>>> a = tldextract.extract("http://xn--h1alffa9f.xn--p1ai")
>>> list(a)
['', 'россия', 'рф']

Python 2:

>>> a = tldextract.extract("http://xn--h1alffa9f.xn--p1ai")
>>> list(a)
[u'', u'\u0440\u043e\u0441\u0441\u0438\u044f', u'\u0440\u0444']

john-kurkowski · 2014-07-31T15:45:30Z

@hangtwenty's approach sounds reasonable.

However, I'm not seeing any install errors for Python 3. Output below:

➜  ~  pip3 --version
pip 1.5.6 from /usr/local/lib/python3.4/site-packages (python 3.4)
➜  ~  pip3 install tldextract
Downloading/unpacking tldextract
  Downloading tldextract-1.4.tar.gz (57kB): 57kB downloaded
  Running setup.py (path:/private/var/folders/39/lk1fdfxd5kb_w_6wbb6960l40000gn/T/pip_build_John/tldextract/setup.py) egg_info for package tldextract

Installing collected packages: tldextract
  Running setup.py install for tldextract

    Installing tldextract script to /usr/local/bin
Successfully installed tldextract
Cleaning up...
➜  ~  python3
Python 3.4.0 (default, Apr  9 2014, 11:51:10)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tldextract
>>> tldextract.extract('http://xn--h1alffa9f.xn--p1ai')
ExtractResult(subdomain='', domain='россия', suffix='рф')

Agmagor · 2014-07-31T21:11:49Z

Indeed, it happens with python3.2 and not with python3.4.
According to python3.4 documentation, the u'str' notation was reintroduced for legacy purposes:

New in version 3.3: Support for the unicode legacy literal (u'value') was reintroduced to simplify the maintenance of dual Python 2.x and 3.x codebases. See PEP 414 for more information.

You won't find a unicode() function on python3+

floer32 · 2014-08-04T15:57:11Z

def _unicode(string):
    try:
        return unicode(string)
    except NameError:
        return string

but more importantly... is this a job for 2to3?

Agmagor · 2014-08-04T16:47:52Z

@hangtwenty The piece of code you wrote will ensure compatibility with both Python versions, it's a good point.

2to3 usage here is completely overkill: if you take a look at my commits, you will see that a very little change fixes everything.

IHMO, switching to 3 for tldextract can be done easily, without useless suffering, but I have no idea how other libraries manage python versions.

Now, a little warning: give another name to your "unicode" function or you will enter a recursion loop

floer32 · 2014-08-04T17:17:53Z

@Agmagor good call on that naming issue, fixed

john-kurkowski · 2014-08-04T18:33:46Z

This thread is starting to remind me why I only listed 3.3 as compatible. ;)

I'd be curious what 2to3 does just for the troublesome lines for Python 3.2. However the try/except NameError works for me! We use that branching elsewhere. It's better for 2.x, as I'd rather not have the function return unicode sometimes and str other times; we cannot simply drop the u from the literal.

Agmagor · 2014-08-04T20:32:34Z

Notice that unicode(str) seems to be a kind of alternative to u'str' :

Python 2.7.3 (default, Apr 10 2013, 06:20:15) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> str = unicode("\u0440\u043e\u0441\u0441\u0438\u044f")
>>> str
u'\\u0440\\u043e\\u0441\\u0441\\u0438\\u044f'

Here is the diff output I got using 2to3 tool : https://gist.github.com/Agmagor/dbe542ce4df63b49222a

Thank you for your interest on this issue :-)

john-kurkowski · 2014-08-07T02:20:04Z

I see 2to3 is more forward-only. Oh well. The try/except NameError in _unicode will do.

Partly addresses #50. Uniformly export `unicode` helper.

john-kurkowski · 2014-10-05T19:46:12Z

I just noticed this line already exports a bridge similar to the _unicode proposed in this thread. So I got rid of the only u'' literal in tldextract.py with that. Now if someone removes the literals from the tests, we can claim 2.6-3.4 compatibility!

mauricioabreu · 2015-11-11T22:55:47Z

It looks like this issue could be closed, no? Tests are running against a lot of Python versions. What is your opinion?

floer32 · 2015-11-11T23:26:42Z

I think @john-kurkowski 's comment above explained pretty well why 3.2 is annoying ... exactly to your point @mauricioabreu .

But apparently we are almost there. See comment right above yours, @mauricioabreu : #50 (comment)

mauricioabreu · 2015-11-11T23:55:23Z

@hangtwenty ok, after reading it I think we lack of tests/coverage, right?

floer32 · 2015-11-12T00:42:28Z

@mauricioabreu actually just fixing tests seemingly. Read #50 (comment)

mauricioabreu · 2015-11-12T11:18:49Z

@hangtwenty ok, I got it. Thanks!

john-kurkowski · 2015-11-12T17:39:29Z

Another option is to close this. No requestor activity in over a year. I'm not dead set on supporting old Python 3.x. 3.3-5 have far more users.

floer32 · 2015-11-12T17:50:01Z

I tried to use pyenv to install 3.2 and run tests, see where they fail, and maybe do the fix quickly. No dice:

$ nosetests
~/.pyenv/versions/3.2/lib/python3.2/site-packages/pkg_resources/__init__.py:103: UserWarning: Support for Python 3.0-3.2 has been dropped. Future versions will fail here.
  warnings.warn(msg)

----------------------------------------------------------------------
Ran 0 tests in 0.013s

OK

So yeah @john-kurkowski given lack of requestor activity let's close this.

john-kurkowski changed the title ~~Python 3 compatibility~~ Python 3.2 compatibility Aug 9, 2014

john-kurkowski added a commit that referenced this issue Oct 5, 2014

Avoid u'' literal for Python 3.0 - 3.2 compatibility.

88b6743

Partly addresses #50. Uniformly export `unicode` helper.

floer32 closed this as completed Nov 12, 2015

shirish93 mentioned this issue Jan 21, 2016

Fixed setup.py so that it doesn't throw encoding error. tqdm/tqdm#112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python 3.2 compatibility #50

Python 3.2 compatibility #50

Agmagor commented Jul 29, 2014

floer32 commented Jul 31, 2014

Agmagor commented Jul 31, 2014

Agmagor commented Jul 31, 2014

john-kurkowski commented Jul 31, 2014

Agmagor commented Jul 31, 2014

floer32 commented Aug 4, 2014

Agmagor commented Aug 4, 2014

floer32 commented Aug 4, 2014

john-kurkowski commented Aug 4, 2014

Agmagor commented Aug 4, 2014

john-kurkowski commented Aug 7, 2014

john-kurkowski commented Oct 5, 2014

mauricioabreu commented Nov 11, 2015

floer32 commented Nov 11, 2015

mauricioabreu commented Nov 11, 2015

floer32 commented Nov 12, 2015

mauricioabreu commented Nov 12, 2015

john-kurkowski commented Nov 12, 2015

floer32 commented Nov 12, 2015

Python 3.2 compatibility #50

Python 3.2 compatibility #50

Comments

Agmagor commented Jul 29, 2014

floer32 commented Jul 31, 2014

Agmagor commented Jul 31, 2014

Agmagor commented Jul 31, 2014

john-kurkowski commented Jul 31, 2014

Agmagor commented Jul 31, 2014

floer32 commented Aug 4, 2014

Agmagor commented Aug 4, 2014

floer32 commented Aug 4, 2014

john-kurkowski commented Aug 4, 2014

Agmagor commented Aug 4, 2014

john-kurkowski commented Aug 7, 2014

john-kurkowski commented Oct 5, 2014

mauricioabreu commented Nov 11, 2015

floer32 commented Nov 11, 2015

mauricioabreu commented Nov 11, 2015

floer32 commented Nov 12, 2015

mauricioabreu commented Nov 12, 2015

john-kurkowski commented Nov 12, 2015

floer32 commented Nov 12, 2015