Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

20200501 fbsync #750

Merged
merged 15 commits into from May 4, 2020
Merged

20200501 fbsync #750

merged 15 commits into from May 4, 2020

Conversation

cpuhrsch
Copy link
Contributor

@cpuhrsch cpuhrsch commented May 1, 2020

This PR pulls in changes from fbsync to get both this repo and the internal repo in sync.

Guanheng Zhang and others added 10 commits March 25, 2020 08:01
fbshipit-source-id: bb7d2eb52240c7223b57c3c9624e61d116e77e39
Summary: [20:45:34: cpuhrsch@devvm3140 pytorch]$ ./fb_build/import_text.sh

Reviewed By: pbelevich

Differential Revision: D21320577

fbshipit-source-id: ac2148b9f0d58e5538443c879845bfb4f6ca7202
Summary: ./fb_build/import_text.sh

Reviewed By: zhangguanheng66

Differential Revision: D21343124

fbshipit-source-id: c08ecad2cc6f439fa40130aeaf91383be9403fe8
Summary: See title

Reviewed By: pbelevich

Differential Revision: D21344211

fbshipit-source-id: a8bcf7f3ab9bb2c2853e27f612e82caa341d3651
@@ -31,7 +31,7 @@ class Vocab(object):
# TODO (@mttk): Populate classs with default values of special symbols
UNK = '<unk>'

def __init__(self, counter, max_size=None, min_freq=1, specials=['<unk>', '<pad>'],
def __init__(self, counter, max_size=None, min_freq=1, specials=('<unk>', '<pad>'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable default arguments:

"""
A new list is created once when the function is defined, and the same list is used in each successive call.

Python’s default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well.
"""

We could use None as a default and construct a list if it is None, but I don't see the need for specials to be a list based on the rest of the code. A tuple is not mutable, so we can use it instead.

@@ -135,7 +135,7 @@ def test_vocab_download_fasttext_vectors(self):
def test_vocab_extend(self):
c = Counter({'hello': 4, 'world': 3, 'ᑌᑎIᑕOᗪᕮ_Tᕮ᙭T': 5, 'freq_too_low': 2})
# Build a vocab and get vectors twice to test caching.
for i in range(2):
for _ in range(2):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming a loop variable and not using it, is a common lint issue. Using _ prevents this.

@@ -6,7 +6,7 @@
from functools import partial


def _split_tokenizer(x):
def _split_tokenizer(x): # noqa: F821
# type: (str) -> List[str]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only support python3 now, we can use python3 type annotation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lint is still being raised internally due to some issues in flake8. This way we suppress the error.

@@ -55,15 +55,9 @@ def __iter__(self):
self.setup_iter()

for i, item in enumerate(self._iterator):
if i == self.start:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those changes to resolve the errors for sync? Or I think we should do it in a separate PR.

Copy link
Contributor Author

@cpuhrsch cpuhrsch May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lint complained that item wasn't used even though it is defined within the context of this loop. The code further along in lines 61+ relies on the idea that item will be populated by the loop iteration, which I'm not even sure will happen if self._iterator were to already be depleted, which would then raise an error on line 63.

Copy link
Contributor

@zhangguanheng66 zhangguanheng66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ready to merge after the CI tests pass.

@cpuhrsch cpuhrsch merged commit fd764ad into pytorch:master May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants