Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved download and extract functions plus download scripts #569

Merged
merged 15 commits into from
Jul 27, 2019

Conversation

cpuhrsch
Copy link
Contributor

No description provided.

line = pattern_re.sub(replaced_str, line)
return line.split()


def get_tokenizer(tokenizer, language='en'):
# default tokenizer is string.split(), added as a module function for serialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need docs for this part (you can leave it to me) since this function will be called by users.

@@ -73,7 +91,7 @@ def process_response(r):
url = url + "&confirm=" + confirm_token
response = session.get(url, stream=True)

process_response(response)
return process_response(response, root, filename)


def unicode_csv_reader(unicode_csv_data, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an example/docs here. Happy to add.

@@ -0,0 +1,23 @@
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an example to show download function? I feel it is not relevant to text_classification things. Better to move to torch.utils docs. Or you could even put it as a separate example under examples directory

@@ -0,0 +1,23 @@
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since download/extract_archive functions are more orthogonal now, we could add this file under examples directory. They are not so relevant to text_classification things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason it's specific to text classification is because it pulls the URL from the text_classification package. But we can make the URL an argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think use an URL argument makes more sense. It's just a download API and we want people to use it separately.

@@ -0,0 +1,39 @@
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above. this example is to show how to build vocab so it should be under examples directory.

' ', ' ']


def _basic_english_normalize(line):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to update the unit test

"""

def process_response(r):
def process_response(r, root, filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this changes the API. Make sure it doesn't break any existing applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but it's a local function. We should prepend it with "_"

@cpuhrsch cpuhrsch merged commit 038515c into pytorch:master Jul 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants