Fork Chardet #951

Closed
kennethreitz opened this Issue Nov 23, 2012 · 29 comments

Projects

None yet

4 participants

@kennethreitz
Owner

Maintain a simple adaptation of chardet/chardet2 that works in codebases.

Shouldn't be difficult.

@piotr-dobrogost

Could you elaborate?

@kennethreitz
Owner

Chardet is for Python 2. Chardet 2 is for Python 3.

Our chardet should work for both.

@ghost
ghost commented Nov 24, 2012

@Lukasa @kennethreitz #939 solves these problems without any major disadvantage I can think of. Furthermore it has the slight advantage of letting the package maintainers easily add separate versions of any other package requests may depend on in the future without having to follow this same forking approach everytime. But this is just my humble opinion, of course :-)

@kennethreitz
Owner

Oh, so "chardet2" already supports python 2, even though it says it doesn't. Easy enough.

@kennethreitz
Owner

No longer needed.

@sigmavirus24
Collaborator

Har

Kenneth Reitz notifications@github.com wrote:

Oh, so "chardet2" already supports python 2, even though it says it
doesn't. Easy enough.


Reply to this email directly or view it on GitHub:
#951 (comment)

@kennethreitz
Owner

I take that back.

@kennethreitz kennethreitz reopened this Nov 27, 2012
@sigmavirus24
Collaborator

Where are you forking chardet to?

@kennethreitz
Owner

@sigmavirus24 do what to do it for me? :)

@sigmavirus24
Collaborator

I was under the impression it was going to be forked to the requests organization. If you want me to do it give me the permissions too. :P

@sigmavirus24
Collaborator

See sigmavirus24/charade

@sigmavirus24
Collaborator

So as of sv24-archive/charade@3ac2bce I feel confident in saying that the library should work in both python 2 and 3. I just need tests to try this on. I figured I'd move discussion from #907 to here since this is the correct issue and it's open.

If anyone has some test cases I could use to verify that it works, I will ❤️ you.

@sigmavirus24
Collaborator

I just scoured the forks of dcramer/chardet and found one with Mark's tests & test runner. Running the tests now on python2 first.

@sigmavirus24
Collaborator

Well it's definitely not ready yet.

Like I said before ~1 week should suffice.

@kennethreitz
Owner

Ready enough for me to continue with development :)

@sigmavirus24
Collaborator

From the tests running in python3, I'm guessing not.

@kennethreitz
Owner

Yeah, nothing's working at all. But... i'm excited :)

@kennethreitz
Owner

yeah, dcramer/chardet is a terrible fork.

@sigmavirus24
Collaborator

Yeah, I'm starting to get that. I'm going to look at some other forks to work
with.

@puzzlet
puzzlet commented Nov 30, 2012

The original chardet used to support Python 3. You may want to look at https://github.com/puzzlet/chardet/tree/MarkPilgrim/src-python3

@sigmavirus24
Collaborator

@puzzlet that seems extraordinarily useful. Thank you! We're trying to make one library for both python 2 and 3, I've add you and @kennethreitz as contributors. Any and all further help you could provide would be greatly appreciated.

On a separate topic, @kennethreitz, the library shouldn't give you problems anymore, it just might not (yet) give you the correct encoding.

@kennethreitz
Owner

omg it works!

@kennethreitz
Owner

I love you so much.

@sigmavirus24
Collaborator

The tests just imply that it doesn't detect the correct encoding like it
should. So far diffing the files in puzzlet's links and mine are showing only
stylistic (pep8) changes. I haven't run the tests on Mark's original python3
code but, I'm looking.

I think part of the problem is coming from the universaldetector file. I used
Mark's logic for the comparison's, i.e., str <= '\202' becomes ord(str) <= 202. My concern is that when you do ord('\202') that doesn't become
202, that becomes 130.

Technically everything that was true for that statment will work BUT the
problem is now that extra characters are included there. Also, doesn't help
that I have never actually studied Mozilla's algorithm for doing all of this.

:shrug: I'll get it back to being actually reliable.

@kennethreitz
Owner

Looking forward to it! I'm happy that my basic tests are at least working now :)

@sigmavirus24
Collaborator

I was thinking of the wrong file. And that bit that I mentioned is unrelated to detection of anything other than JP languages (I think). (The file I was thinking of was jpcntx.py)

@sigmavirus24
Collaborator

I may have found the issue ;)

@kennethreitz
Owner

\o/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment