Use JSON instead of pickle for tld data #81

andresriancho · 2015-12-02T21:30:27Z

.tld_set_snapshot and .tld_set use pickle to store the TLD information. While this is perfectly fine in most cases it brings the following issues:

Binary data stored in your git repository is a bad practice
If this library gets packaged for ubuntu/debian, the maintainer will complain. I just started using tldextract in the w3af project which is part of ubuntu/debian. When the package maintainer packages the next version he'll most likely dislike the binary blob
Pickles are "executables". A specially crafted pickle can trigger an arbitrary remote command execution when unpickled. While I did review the library for bugs and backdoors before including it in w3af, I did not read the whole pickle; which is bad for w3af user's security.

floer32 · 2015-12-02T23:06:22Z

This is a good idea.

floer32 · 2015-12-02T23:07:26Z

I wasn't sure if performance would degrade, looks like it would actually improve. Haha.

Pickle vs JSON — Which is Faster?
If you’re here for the short answer — JSON is 25 times faster in reading (loads) and 15 times faster in writing (dumps).

source

john-kurkowski · 2015-12-03T00:41:11Z

👍

This could be the needle to finally cut a 2.0 release. If this gets done, can abandon pickle entirely, no need for backwards compatibility. (Then clean up other 1.x deprecations.)

andresriancho · 2015-12-03T01:11:33Z

Awesome, happy to see you guys liked the idea

mauricioabreu · 2016-01-18T19:02:44Z

Good idea. :-)
I already contribute to tldextract project. Maybe it is a chance to contribute again.

john-kurkowski · 2016-01-19T00:37:27Z

@mauricioabreu by all means!

I started a 2.0 branch. Target any work there. Then you can run free with this issue, with less worry about backwards compatibility.

mauricioabreu · 2016-03-07T11:29:51Z

Could not it be a plain text file?

mauricioabreu · 2016-03-07T12:30:25Z

I mean that JSON is good for structured data, no?
We don't have a structure here, just a file separated by new lines.

Python would support this by reading lines as a normal file. What do you think @andresriancho ?

andresriancho · 2016-03-07T13:21:12Z

I usually use JSON even for very simple data because it allows me to add more "structure" to it later (if required by the next versions of the software) without rewriting the code; but a plain text file sounds good also.

mauricioabreu · 2016-03-07T18:40:29Z

Good! Thanks for your opinion @andresriancho I will start it using plain text. :)

john-kurkowski · 2016-03-07T18:58:06Z

I would start it with JSON for to be more forward thinking, as mentioned.

As a concrete use case, metadata about each suffix will be required for #66, i.e. whether it's a private suffix or not. Teasing this from JSON is easy. Teasing from plain text is less fun.

mauricioabreu · 2016-03-07T18:58:59Z

@john-kurkowski okay! I am happy with these two alternatives. :-)
Thanks! Going to use JSON then.

john-kurkowski · 2016-03-09T06:17:46Z

This weekend, I'll use #91 to work toward a 2.0 release, to close this.

john-kurkowski · 2016-04-04T02:20:07Z

Closed via #92.

floer32 added the ❗ security ❗ label Dec 2, 2015

john-kurkowski mentioned this issue Feb 7, 2016

use requests instead of urllib #89

Merged

mauricioabreu mentioned this issue Mar 9, 2016

Add JSON support to cached files #91

Merged

john-kurkowski added a commit that referenced this issue Mar 13, 2016

Update CHANGELOG.md for #81

f3cd585

john-kurkowski closed this as completed Apr 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use JSON instead of pickle for tld data #81

Use JSON instead of pickle for tld data #81

andresriancho commented Dec 2, 2015

floer32 commented Dec 2, 2015

floer32 commented Dec 2, 2015

john-kurkowski commented Dec 3, 2015

andresriancho commented Dec 3, 2015

mauricioabreu commented Jan 18, 2016

john-kurkowski commented Jan 19, 2016

mauricioabreu commented Mar 7, 2016

mauricioabreu commented Mar 7, 2016

andresriancho commented Mar 7, 2016

mauricioabreu commented Mar 7, 2016

john-kurkowski commented Mar 7, 2016

mauricioabreu commented Mar 7, 2016

john-kurkowski commented Mar 9, 2016

john-kurkowski commented Apr 4, 2016

Use JSON instead of pickle for tld data #81

Use JSON instead of pickle for tld data #81

Comments

andresriancho commented Dec 2, 2015

floer32 commented Dec 2, 2015

floer32 commented Dec 2, 2015

john-kurkowski commented Dec 3, 2015

andresriancho commented Dec 3, 2015

mauricioabreu commented Jan 18, 2016

john-kurkowski commented Jan 19, 2016

mauricioabreu commented Mar 7, 2016

mauricioabreu commented Mar 7, 2016

andresriancho commented Mar 7, 2016

mauricioabreu commented Mar 7, 2016

john-kurkowski commented Mar 7, 2016

mauricioabreu commented Mar 7, 2016

john-kurkowski commented Mar 9, 2016

john-kurkowski commented Apr 4, 2016