Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.2 compatibility #50

Closed
Agmagor opened this issue Jul 29, 2014 · 19 comments
Closed

Python 3.2 compatibility #50

Agmagor opened this issue Jul 29, 2014 · 19 comments

Comments

@Agmagor
Copy link

Agmagor commented Jul 29, 2014

Since Python 3, all strings are unicode so the u'...' notation do not make sense anymore.

One can't install your package on a Python 3 installation because of the return statement of this function

def fetch_file(urls):
    """ Decode the first successfully fetched URL, from UTF-8 encoding to
    Python unicode.
    """
    s = ''

    for url in urls:
        try:
            conn = urlopen(url)
            s = conn.read()
        except Exception as e:
            LOG.error('Exception reading Public Suffix List url ' + url + ' - ' + str(e) + '.')
        else:
            return _decode_utf8(s)

    LOG.error('No Public Suffix List found. Consider using a mirror or constructing your TLDExtract with `fetch=False`.')
    return u''
@floer32
Copy link
Collaborator

floer32 commented Jul 31, 2014

I personally haven't made the switch to 3 yet (even for a pet project, I know, it's dumb). Would it suffice to replace return u'' with return unicode('')?

@Agmagor
Copy link
Author

Agmagor commented Jul 31, 2014

Oh, I didn't get that you weren't trying to do the switch yet :-)
I'm gonna do a fork, try to solve problems and check you out when I'm done

@Agmagor
Copy link
Author

Agmagor commented Jul 31, 2014

I just wrote these 2 commits that fixed my problems
Agmagor@5d23350
Agmagor@c63733e

This one is just about the way i prefer to get extract results
Agmagor@58b1e4d

I also tried it on a python 2.7 installation and nothing seems broken

Python 3:

>>> a = tldextract.extract("http://xn--h1alffa9f.xn--p1ai")
>>> list(a)
['', 'россия', 'рф']

Python 2:

>>> a = tldextract.extract("http://xn--h1alffa9f.xn--p1ai")
>>> list(a)
[u'', u'\u0440\u043e\u0441\u0441\u0438\u044f', u'\u0440\u0444']

@john-kurkowski
Copy link
Owner

@hangtwenty's approach sounds reasonable.

However, I'm not seeing any install errors for Python 3. Output below:

➜  ~  pip3 --version
pip 1.5.6 from /usr/local/lib/python3.4/site-packages (python 3.4)
➜  ~  pip3 install tldextract
Downloading/unpacking tldextract
  Downloading tldextract-1.4.tar.gz (57kB): 57kB downloaded
  Running setup.py (path:/private/var/folders/39/lk1fdfxd5kb_w_6wbb6960l40000gn/T/pip_build_John/tldextract/setup.py) egg_info for package tldextract

Installing collected packages: tldextract
  Running setup.py install for tldextract

    Installing tldextract script to /usr/local/bin
Successfully installed tldextract
Cleaning up...
➜  ~  python3
Python 3.4.0 (default, Apr  9 2014, 11:51:10)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tldextract
>>> tldextract.extract('http://xn--h1alffa9f.xn--p1ai')
ExtractResult(subdomain='', domain='россия', suffix='рф')

@Agmagor
Copy link
Author

Agmagor commented Jul 31, 2014

Indeed, it happens with python3.2 and not with python3.4.
According to python3.4 documentation, the u'str' notation was reintroduced for legacy purposes:

New in version 3.3: Support for the unicode legacy literal (u'value') was reintroduced to simplify the maintenance of dual Python 2.x and 3.x codebases. See PEP 414 for more information.

You won't find a unicode() function on python3+

@floer32
Copy link
Collaborator

floer32 commented Aug 4, 2014

def _unicode(string):
    try:
        return unicode(string)
    except NameError:
        return string

but more importantly... is this a job for 2to3?

@Agmagor
Copy link
Author

Agmagor commented Aug 4, 2014

@hangtwenty The piece of code you wrote will ensure compatibility with both Python versions, it's a good point.

2to3 usage here is completely overkill: if you take a look at my commits, you will see that a very little change fixes everything.

IHMO, switching to 3 for tldextract can be done easily, without useless suffering, but I have no idea how other libraries manage python versions.

Now, a little warning: give another name to your "unicode" function or you will enter a recursion loop

@floer32
Copy link
Collaborator

floer32 commented Aug 4, 2014

@Agmagor good call on that naming issue, fixed

@john-kurkowski
Copy link
Owner

This thread is starting to remind me why I only listed 3.3 as compatible. ;)

I'd be curious what 2to3 does just for the troublesome lines for Python 3.2. However the try/except NameError works for me! We use that branching elsewhere. It's better for 2.x, as I'd rather not have the function return unicode sometimes and str other times; we cannot simply drop the u from the literal.

@Agmagor
Copy link
Author

Agmagor commented Aug 4, 2014

Notice that unicode(str) seems to be a kind of alternative to u'str' :

Python 2.7.3 (default, Apr 10 2013, 06:20:15) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> str = unicode("\u0440\u043e\u0441\u0441\u0438\u044f")
>>> str
u'\\u0440\\u043e\\u0441\\u0441\\u0438\\u044f'

Here is the diff output I got using 2to3 tool : https://gist.github.com/Agmagor/dbe542ce4df63b49222a

Thank you for your interest on this issue :-)

@john-kurkowski
Copy link
Owner

I see 2to3 is more forward-only. Oh well. The try/except NameError in _unicode will do.

@john-kurkowski john-kurkowski changed the title Python 3 compatibility Python 3.2 compatibility Aug 9, 2014
john-kurkowski added a commit that referenced this issue Oct 5, 2014
Partly addresses #50. Uniformly export `unicode` helper.
@john-kurkowski
Copy link
Owner

I just noticed this line already exports a bridge similar to the _unicode proposed in this thread. So I got rid of the only u'' literal in tldextract.py with that. Now if someone removes the literals from the tests, we can claim 2.6-3.4 compatibility!

@mauricioabreu
Copy link
Contributor

It looks like this issue could be closed, no? Tests are running against a lot of Python versions. What is your opinion?

@floer32
Copy link
Collaborator

floer32 commented Nov 11, 2015

I think @john-kurkowski 's comment above explained pretty well why 3.2 is annoying ... exactly to your point @mauricioabreu .

But apparently we are almost there. See comment right above yours, @mauricioabreu : #50 (comment)

@mauricioabreu
Copy link
Contributor

@hangtwenty ok, after reading it I think we lack of tests/coverage, right?

@floer32
Copy link
Collaborator

floer32 commented Nov 12, 2015

@mauricioabreu actually just fixing tests seemingly. Read #50 (comment)

@mauricioabreu
Copy link
Contributor

@hangtwenty ok, I got it. Thanks!

@john-kurkowski
Copy link
Owner

Another option is to close this. No requestor activity in over a year. I'm not dead set on supporting old Python 3.x. 3.3-5 have far more users.

@floer32
Copy link
Collaborator

floer32 commented Nov 12, 2015

I tried to use pyenv to install 3.2 and run tests, see where they fail, and maybe do the fix quickly. No dice:

$ nosetests
~/.pyenv/versions/3.2/lib/python3.2/site-packages/pkg_resources/__init__.py:103: UserWarning: Support for Python 3.0-3.2 has been dropped. Future versions will fail here.
  warnings.warn(msg)

----------------------------------------------------------------------
Ran 0 tests in 0.013s

OK

So yeah @john-kurkowski given lack of requestor activity let's close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants