Server/Client around Spacy to load spacy only once
Switch branches/tags
Nothing to show
Clone or download
Latest commit cccc26d Jan 17, 2018
Permalink
Failed to load latest commit information.
spacy_api version update Jan 17, 2018
tests added string tests Mar 29, 2017
.gitignore initial commit Mar 19, 2017
README.md simplified installation Jun 14, 2017
README.rst added more file Mar 19, 2017
conftest.py initial commit Mar 19, 2017
deploy.py tree_view now works with both local and client Sep 27, 2017
setup.cfg added more file Mar 19, 2017
setup.py version in setup Jan 17, 2018
tox.ini update mprpc Mar 23, 2017

README.md

spacy_api

Helps with loading models in a separate, dedicated process.

Caching happens on unique arguments.

Features

  • ✓ Serve models separately
  • ✓ Client- and Server-side caching
  • ✓ CLI interface

Install

Should work with py2 and py3.

Assumes you have installed spacy.

Install:

pip install spacy_api[all]

Example

Run the server:

spacy serve

Then open a python process and run code in the next section.

Single document

from spacy_api import Client

spacy_client = Client() # default args host/port

doc = spacy_client.single("How are you")
doc
# [[How, are, you]]

# iterate over sentences
for sentence in doc.sents:
    for token in sentence:
        print(token.text, token.pos_, token.lemma_)

# iterate over a whole document
for token in doc:
    print(token)

Switch to running spacy within the process

Instead of

from spacy_api import Client

use

from spacy_api import LocalClient

Arguments

LocalClient/Client:

# language/model
spacy_client = Client(model="en")

# Using google pretrained vectors
spacy_client = Client(embeddings_path="en_google")

To make a call:

# Tell spacy which attributes to give back, comma separated
spacy_client.single("How are you", attributes="text,lemma_,pos,vector")

Naturally, you can use any combination of these.

Bulk of documents

docs = spacy_client.bulk(["How are you"]*100)
for doc in docs:
    for sentence in doc.sents:
        for token in sentence:
            print(token.text, token.pos_, token.lemma_)