You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure if this is where this is supposed to go but, at the highest level,
I'm using rust-bert and trying to instantiate an NERModel with seemingly
default XLMRoberta configurations like:
let config = TokenClassificationConfig::new(ModelType::XLMRoberta,RemoteResource::from_pretrained(RobertaModelResources::XLM_ROBERTA_NER_EN),RemoteResource::from_pretrained(RobertaVocabResources::XLM_ROBERTA_NER_EN),RemoteResource::from_pretrained(RobertaConfigResources::XLM_ROBERTA_NER_EN),None,//merges resource only relevant with ModelType::Robertafalse,//lowercaseNone,//strip_accentsNone,//add_prefix_spaceLabelAggregationOption::Mode,);let ner_model = NERModel::new(config).unwrap();
and this is failing with:
TokenizerError("Error when loading vocabulary file, the file
may be corrupted or does not match the expected format: incorrect tag")
That resource seems to be pointing to the sentencepiece.bpe.model file for this model
and it seems to be parsed by ModelProto in this repo.
I'm not sure if:
I'm doing the wrong thing with rust-bert
The file hosted in the xlm-roberta dataset is out of date
This repo is out of date with a new update to the xlm-roberta dataset file
This issue is really only relevant for the last item (though I wouldn't know where
to report the second item). Apologies if it's the first item but I'm wondering
if you can offer any insight here!
Note
I'm using the version of rust-bert on the master git branch, not
the latest published crate (because I need 0.7 tch support).
The text was updated successfully, but these errors were encountered:
Thank you for raising this. The TokenClassificationConfig expects the inputs in the following order:
model resource
config resource
vocab resource
merges resources
Could you please try swapping arguments 3 and 4, i.e.:
let config = TokenClassificationConfig::new(ModelType::XLMRoberta,RemoteResource::from_pretrained(RobertaModelResources::XLM_ROBERTA_NER_EN),RemoteResource::from_pretrained(RobertaConfigResources::XLM_ROBERTA_NER_EN),RemoteResource::from_pretrained(RobertaVocabResources::XLM_ROBERTA_NER_EN),None,//merges resource only relevant with ModelType::Robertafalse,//lowercaseNone,//strip_accentsNone,//add_prefix_spaceLabelAggregationOption::Mode,);
I'm not sure if this is where this is supposed to go but, at the highest level,
I'm using
rust-bert
and trying to instantiate anNERModel
with seeminglydefault
XLMRoberta
configurations like:and this is failing with:
That resource seems to be pointing to the
sentencepiece.bpe.model
file for this modeland it seems to be parsed by
ModelProto
in this repo.I'm not sure if:
rust-bert
xlm-roberta
dataset is out of datexlm-roberta
dataset fileThis issue is really only relevant for the last item (though I wouldn't know where
to report the second item). Apologies if it's the first item but I'm wondering
if you can offer any insight here!
Note
I'm using the version of
rust-bert
on themaster
git branch, notthe latest published crate (because I need 0.7
tch
support).The text was updated successfully, but these errors were encountered: