Skip to content

Can't load en_core_web_md.vectors when wrapping spaCy model in an MLFlow Model #8736

@markFriel

Description

@markFriel

I'm wrapping a spaCy model within an MLFlow model for deployment, however, when I try and do inference on new data, I am getting the following error:

Traceback (most recent call last):
  File "/Users/n1534239/Documents/Pycharm-Projects/innovasion_nlp/Working Body Model/script.py", line 50, in <module>
    result = model.predict(data)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/mlflow/pyfunc/__init__.py", line 596, in predict
    return self._model_impl.predict(data)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/mlflow/pyfunc/model.py", line 257, in predict
    return self.python_model.predict(self.context, model_input)
  File "/Users/n1534239/Documents/Pycharm-Projects/innovasion_nlp/Working Body Model/nlp_toolkit/email_entity_extraction/models/email_body_mlflow_wrapper.py", line 97, in predict
    extracted_entities, _ = predict_entities(self.ner_model, email_text, expanded_tags)
  File "/Users/n1534239/Documents/Pycharm-Projects/innovasion_nlp/Working Body Model/nlp_toolkit/email_entity_extraction/evaluation/model_evaluation_helper.py", line 98, in predict_entities
    doc = model(raw_text)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/spacy/language.py", line 446, in __call__
    doc = proc(doc, **component_cfg.get(name, {}))
  File "pipes.pyx", line 397, in spacy.pipeline.pipes.Tagger.__call__
  File "pipes.pyx", line 416, in spacy.pipeline.pipes.Tagger.predict
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 167, in __call__
    return self.predict(x)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/neural/_classes/feed_forward.py", line 40, in predict
    X = layer(X)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 167, in __call__
    return self.predict(x)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 310, in predict
    X = layer(layer.ops.flatten(seqs_in, pad=pad))
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 167, in __call__
    return self.predict(x)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/neural/_classes/feed_forward.py", line 40, in predict
    X = layer(X)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 167, in __call__
    return self.predict(x)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/neural/_classes/model.py", line 131, in predict
    y, _ = self.begin_update(X, drop=None)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 379, in uniqued_fwd
    Y_uniq, bp_Y_uniq = layer.begin_update(X_uniq, drop=drop)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/neural/_classes/feed_forward.py", line 46, in begin_update
    X, inc_layer_grad = layer.begin_update(X, drop=drop)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 163, in begin_update
    values = [fwd(X, *a, **k) for fwd in forward]
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 163, in <listcomp>
    values = [fwd(X, *a, **k) for fwd in forward]
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 256, in wrap
    output = func(*args, **kwargs)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 163, in begin_update
    values = [fwd(X, *a, **k) for fwd in forward]
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 163, in <listcomp>
    values = [fwd(X, *a, **k) for fwd in forward]
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 256, in wrap
    output = func(*args, **kwargs)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 163, in begin_update
    values = [fwd(X, *a, **k) for fwd in forward]
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 163, in <listcomp>
    values = [fwd(X, *a, **k) for fwd in forward]
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 256, in wrap
    output = func(*args, **kwargs)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 163, in begin_update
    values = [fwd(X, *a, **k) for fwd in forward]
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 163, in <listcomp>
    values = [fwd(X, *a, **k) for fwd in forward]
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/api.py", line 256, in wrap
    output = func(*args, **kwargs)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/neural/_classes/static_vectors.py", line 60, in begin_update
    vector_table = self.get_vectors()
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/neural/_classes/static_vectors.py", line 55, in get_vectors
    return get_vectors(self.ops, self.lang)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/extra/load_nlp.py", line 26, in get_vectors
    nlp = get_spacy(lang)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/thinc/extra/load_nlp.py", line 14, in get_spacy
    SPACY_MODELS[lang] = spacy.load(lang, **kwargs)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/spacy/__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "/Users/n1534239/.local/share/virtualenvs/innovasion_nlp-FljM6oUq/lib/python3.8/site-packages/spacy/util.py", line 175, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_web_md.vectors'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

I believe the issue to be the same as those outlined in the following issues:
#4349
#3690
#3552

I haven't been able to solve this issue with any of the current workarounds that are outlined in those threads. Any help with this would be much appreciated.

How to reproduce the behaviour

Your Environment

  • Operating System: MacOS Catalina
  • Python Version Used: 3.8.5
  • spaCy Version Used:2.3.0
  • thinc Version Used: 7.4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    third-partyThird-party packages and servicesv2spaCy v2.x

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions