Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Commit

Permalink
Change model input tokens to optional (#1099)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #1099

The Byte LSTM does not need the input of tokens, which it inherits from LSTM language model. Making it optional and pass it as None will allow to model to skip build vocab part and report confusing OOV problems.

Reviewed By: kmalik22

Differential Revision: D18253210

fbshipit-source-id: 4de4a726c713b8f1d9b3f7991cef5118bf8d13c6
  • Loading branch information
Fan Wang authored and facebook-github-bot committed Nov 5, 2019
1 parent 9d64d10 commit fd31ea1
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion pytext/models/language_models/lmlstm.py
Expand Up @@ -52,7 +52,7 @@ class LMLSTM(BaseModel):

class Config(BaseModel.Config):
class ModelInput(Model.Config.ModelInput):
tokens: TokenTensorizer.Config = TokenTensorizer.Config(
tokens: Optional[TokenTensorizer.Config] = TokenTensorizer.Config(
add_bos_token=True, add_eos_token=True
)

Expand All @@ -67,8 +67,17 @@ class ModelInput(Model.Config.ModelInput):
stateful: bool = False
caffe2_format: ExporterType = ExporterType.PREDICTOR

@classmethod
def checkTokenConfig(cls, tokens: Optional[TokenTensorizer.Config]):
if tokens is None:
raise ValueError(
"Tokens cannot be None. Please set it to TokenTensorizer in"
"config file."
)

@classmethod
def from_config(cls, config: Config, tensorizers: Dict[str, Tensorizer]):
cls.checkTokenConfig(tensorizers["tokens"])
embedding = create_module(config.embedding, tensorizer=tensorizers["tokens"])
representation = create_module(
config.representation, embed_dim=embedding.embedding_dim
Expand Down

0 comments on commit fd31ea1

Please sign in to comment.