-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved download and extract functions plus download scripts #569
Conversation
line = pattern_re.sub(replaced_str, line) | ||
return line.split() | ||
|
||
|
||
def get_tokenizer(tokenizer, language='en'): | ||
# default tokenizer is string.split(), added as a module function for serialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need docs for this part (you can leave it to me) since this function will be called by users.
@@ -73,7 +91,7 @@ def process_response(r): | |||
url = url + "&confirm=" + confirm_token | |||
response = session.get(url, stream=True) | |||
|
|||
process_response(response) | |||
return process_response(response, root, filename) | |||
|
|||
|
|||
def unicode_csv_reader(unicode_csv_data, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need an example/docs here. Happy to add.
@@ -0,0 +1,23 @@ | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an example to show download function? I feel it is not relevant to text_classification things. Better to move to torch.utils docs. Or you could even put it as a separate example under examples directory
@@ -0,0 +1,23 @@ | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since download/extract_archive functions are more orthogonal now, we could add this file under examples directory. They are not so relevant to text_classification things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason it's specific to text classification is because it pulls the URL from the text_classification package. But we can make the URL an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think use an URL argument makes more sense. It's just a download API and we want people to use it separately.
@@ -0,0 +1,39 @@ | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above. this example is to show how to build vocab so it should be under examples directory.
' ', ' '] | ||
|
||
|
||
def _basic_english_normalize(line): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to update the unit test
torchtext/utils.py
Outdated
""" | ||
|
||
def process_response(r): | ||
def process_response(r, root, filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this changes the API. Make sure it doesn't break any existing applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but it's a local function. We should prepend it with "_"
No description provided.