Conversation
parlai/core/dict.py
Outdated
@@ -694,21 +697,6 @@ def sort(self, trim=True): | |||
assert len(self.freq) == len(self.ind2tok) == len(self.tok2ind) | |||
return sorted_pairs | |||
|
|||
def parse(self, txt_or_vec, vec_type=list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unrelated change, but just killing a bad method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Probably best to put this in a separate change for rubber stamping?
7c4cc42
to
9883495
Compare
Should be finished now. |
parlai/utils/bpe.py
Outdated
return parser | ||
|
||
def set_training_mode(self, mode: bool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Make function name more descriptive and reference BPE explicitly?
(Likelihood of set_training_mode
being a name collision seems decently high enough.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was intentional. BPE dropout is just one training time decision you might make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this can take multiple values beyond just a setting BPE to true, either it should be changed away from a bool to an enum or mode
should be use_bpe
or something.
(All kind of nits, but from a "someone else looking at the code quickly figuring out what's supposed to happen here" perspective...)
parlai/core/torch_agent.py
Outdated
|
||
# possibly change tokenization methodology based on if this is a | ||
# training example | ||
if hasattr(self.dict, 'set_training_mode'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm lightly confused by this piece of code. set_training_mode
is a function in DictAgent but this if
seems to be looking for it as a member variable?
(Noticing that this is stacked on top of another diff so maybe it's written there, but makes it a tad tricky to review if so.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions are also attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases where self.dict
isn't DictionaryAgent
? Cause otherwise, this will always return True
, no?
parlai/core/dict.py
Outdated
@@ -694,21 +697,6 @@ def sort(self, trim=True): | |||
assert len(self.freq) == len(self.ind2tok) == len(self.tok2ind) | |||
return sorted_pairs | |||
|
|||
def parse(self, txt_or_vec, vec_type=list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Probably best to put this in a separate change for rubber stamping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits, but otherwise seems fine
dcfcde4
to
b4710cb
Compare
Patch description
Only works for BART and SlowByteLevelBPE. Still need to add support for HuggingFace Tokenizers.
Testing steps
Internal experiments.