Improved download and extract functions plus download scripts #569

cpuhrsch · 2019-07-26T18:34:45Z

No description provided.

zhangguanheng66 · 2019-07-26T22:02:18Z

torchtext/data/utils.py

+        line = pattern_re.sub(replaced_str, line)
+    return line.split()
+
+
 def get_tokenizer(tokenizer, language='en'):
    # default tokenizer is string.split(), added as a module function for serialization


We need docs for this part (you can leave it to me) since this function will be called by users.

zhangguanheng66 · 2019-07-26T22:07:59Z

torchtext/utils.py

@@ -73,7 +91,7 @@ def process_response(r):
        url = url + "&confirm=" + confirm_token
        response = session.get(url, stream=True)

-    process_response(response)
+    return process_response(response, root, filename)


 def unicode_csv_reader(unicode_csv_data, **kwargs):


We need an example/docs here. Happy to add.

zhangguanheng66 · 2019-07-26T22:10:13Z

examples/text_classification/download.py

@@ -0,0 +1,23 @@
+import logging


Is this an example to show download function? I feel it is not relevant to text_classification things. Better to move to torch.utils docs. Or you could even put it as a separate example under examples directory

zhangguanheng66 · 2019-07-26T22:16:19Z

examples/text_classification/download.py

@@ -0,0 +1,23 @@
+import logging


Since download/extract_archive functions are more orthogonal now, we could add this file under examples directory. They are not so relevant to text_classification things.

The only reason it's specific to text classification is because it pulls the URL from the text_classification package. But we can make the URL an argument.

I think use an URL argument makes more sense. It's just a download API and we want people to use it separately.

zhangguanheng66 · 2019-07-26T22:17:07Z

examples/text_classification/vocab.py

@@ -0,0 +1,39 @@
+import logging


same as above. this example is to show how to build vocab so it should be under examples directory.

zhangguanheng66 · 2019-07-26T22:18:31Z

torchtext/data/utils.py

+                   ' ', ' ']
+
+
+def _basic_english_normalize(line):


we need to update the unit test

zhangguanheng66 · 2019-07-26T22:23:23Z

torchtext/utils.py

    """

-    def process_response(r):
+    def process_response(r, root, filename):


So this changes the API. Make sure it doesn't break any existing applications.

Ah, but it's a local function. We should prepend it with "_"

Christian Puhrsch added 7 commits July 26, 2019 11:14

Download and vocab example scripts

b3ce82b

Add progress line

944904d

flake8

1fd7482

Merge remote-tracking branch 'upstream/master' into more

7d86079

flake8

93562c5

simplify extraction

bf00b15

update download.py

9ce7c68

zhangguanheng66 reviewed Jul 26, 2019

View reviewed changes

Christian Puhrsch added 3 commits July 26, 2019 15:39

perf improvements

d2d8752

update test

6960766

flake8

47b4d85

zhangguanheng66 approved these changes Jul 26, 2019

View reviewed changes

Christian Puhrsch added 5 commits July 26, 2019 15:56

Merge branch 'more' into perfdict

6c01b3d

ngrams

162b90a

small changes to train

2aa33b7

vocab

b71d625

Small fixes

6a90285

cpuhrsch merged commit 038515c into pytorch:master Jul 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved download and extract functions plus download scripts #569

Improved download and extract functions plus download scripts #569

cpuhrsch commented Jul 26, 2019

zhangguanheng66 Jul 26, 2019

zhangguanheng66 Jul 26, 2019

zhangguanheng66 Jul 26, 2019

zhangguanheng66 Jul 26, 2019

cpuhrsch Jul 26, 2019

zhangguanheng66 Jul 26, 2019

zhangguanheng66 Jul 26, 2019

zhangguanheng66 Jul 26, 2019

zhangguanheng66 Jul 26, 2019

cpuhrsch Jul 26, 2019

Improved download and extract functions plus download scripts #569

Improved download and extract functions plus download scripts #569

Conversation

cpuhrsch commented Jul 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment