New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
20200501 fbsync #750
20200501 fbsync #750
Conversation
fbshipit-source-id: bb7d2eb52240c7223b57c3c9624e61d116e77e39
Summary: [20:45:34: cpuhrsch@devvm3140 pytorch]$ ./fb_build/import_text.sh Reviewed By: pbelevich Differential Revision: D21320577 fbshipit-source-id: ac2148b9f0d58e5538443c879845bfb4f6ca7202
Summary: ./fb_build/import_text.sh Reviewed By: zhangguanheng66 Differential Revision: D21343124 fbshipit-source-id: c08ecad2cc6f439fa40130aeaf91383be9403fe8
Summary: See title Reviewed By: pbelevich Differential Revision: D21344211 fbshipit-source-id: a8bcf7f3ab9bb2c2853e27f612e82caa341d3651
@@ -31,7 +31,7 @@ class Vocab(object): | |||
# TODO (@mttk): Populate classs with default values of special symbols | |||
UNK = '<unk>' | |||
|
|||
def __init__(self, counter, max_size=None, min_freq=1, specials=['<unk>', '<pad>'], | |||
def __init__(self, counter, max_size=None, min_freq=1, specials=('<unk>', '<pad>'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""
A new list is created once when the function is defined, and the same list is used in each successive call.
Python’s default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well.
"""
We could use None as a default and construct a list if it is None, but I don't see the need for specials
to be a list based on the rest of the code. A tuple is not mutable, so we can use it instead.
@@ -135,7 +135,7 @@ def test_vocab_download_fasttext_vectors(self): | |||
def test_vocab_extend(self): | |||
c = Counter({'hello': 4, 'world': 3, 'ᑌᑎIᑕOᗪᕮ_Tᕮ᙭T': 5, 'freq_too_low': 2}) | |||
# Build a vocab and get vectors twice to test caching. | |||
for i in range(2): | |||
for _ in range(2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming a loop variable and not using it, is a common lint issue. Using _
prevents this.
@@ -6,7 +6,7 @@ | |||
from functools import partial | |||
|
|||
|
|||
def _split_tokenizer(x): | |||
def _split_tokenizer(x): # noqa: F821 | |||
# type: (str) -> List[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only support python3 now, we can use python3 type annotation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lint is still being raised internally due to some issues in flake8. This way we suppress the error.
@@ -55,15 +55,9 @@ def __iter__(self): | |||
self.setup_iter() | |||
|
|||
for i, item in enumerate(self._iterator): | |||
if i == self.start: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those changes to resolve the errors for sync? Or I think we should do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lint complained that item
wasn't used even though it is defined within the context of this loop. The code further along in lines 61+ relies on the idea that item will be populated by the loop iteration, which I'm not even sure will happen if self._iterator were to already be depleted, which would then raise an error on line 63.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Ready to merge after the CI tests pass.
This PR pulls in changes from fbsync to get both this repo and the internal repo in sync.