In [22]:
# Imports
from steamship import Steamship

In [23]:
# Create the client
client = Steamship(
  apiBase="http://127.0.0.1:8080/api/v1",
  appBase="http://127.0.0.1:8081",
  profile="test"
)

# Plugins

The Steamship engine treats every unit of processing as a plugin. You can add plugins to convert data, parse data, classify it, and so on.

Let's take a look at the plugins configured on your current instance:

In [26]:
models = client.models.listPublic().data.models

In [27]:
for model in models:
    print("[{}] - {}".format(model.modelType, model.handle))

[importer] - builtin-importer-valueOrData-v1
[exporter] - test-exporter-v1
[converter] - test-converter-v1
[parser] - test-parser-v1
[embedder] - test-embedder-v1
[importer] - builtin-importer-url-v1
[converter] - builtin-converter-blockJson-v1
[parser] - sp_en_core_web_trf
[embedder] - st_msmarco_distilbert_base_v3
[importer] - test-importer-valueOrData-v1
[converter] - markdown-converter-default-v1
[embedder] - st_paraphrase_mpnet_base_v2
[converter] - ocr_ms_vision_default
[converter] - html-converter-default-v1
[converter] - acr_assembly_default


Let's add some models to use. Both of these are hosted on Sagemaker.
* A parser (SpaCy EN)
* An embedder (Bert-based)

In [28]:
parser = client.models.create(
   name='SpaCy',
   handle='parser',
   description='Demo of loading spacy',
   isPublic=False,
   modelType='parser',
   adapterType='jsonOverHttp',
   url='https://sp-en-cr-web-trf.model.plugin.steamship.com/parse',
   upsert=True
).data

In [29]:
marco = client.models.create(
   name='Marco',
   handle='marco',
   description='Distilbert Embeddings',
   isPublic=False,
   modelType='embedder',
   adapterType='jsonOverHttp',
   url='https://msmarco-dbert-base-3.model.plugin.steamship.com/embed',
   upsert=True
).data

We can see our private models here:

In [30]:
for model in client.models.listPrivate().data.models:
    print("[{}] - {}".format(model.modelType, model.handle))

[parser] - parser
[embedder] - marco


# Invoking our just-added Plugins

In [33]:
client.embed(docs=["What is the meaning of this sentence"], model=marco.handle).data.embeddings[0][:10]

[-0.027399681508541107,
 -0.35275447368621826,
 0.5389035940170288,
 -0.27483126521110535,
 0.28179123997688293,
 0.5697582364082336,
 0.9327279329299927,
 0.1832217574119568,
 -0.10979442298412323,
 0.13224680721759796]

In [34]:
resp = client.parse(docs=["Hi"], model=parser.handle)

In [35]:
resp.wait()

In [36]:
resp.task.taskStatus

'succeeded'

In [37]:
resp.data

ParseResponse(blocks=[Block(client=<steamship.client.client.Steamship object at 0x107879f10>, id=None, type='doc', text=None, children=[Block(client=<steamship.client.client.Steamship object at 0x107879f10>, id=None, type='sentence', text='Hi', children=[], tokens=[Token(client=<steamship.client.client.Steamship object at 0x107879f10>, id=None, blockId=None, text='Hi', textWithWs='Hi', whitespace=None, head=None, headI=None, leftEdge=None, rightEdge=None, entType=None, entIob=None, lemma='hi', normalized=None, shape=None, prefix=None, suffix=None, isAlpha=None, isAscii=None, isDigit=None, isTitle=None, isPunct=False, isLeftPunct=None, isRightPunct=None, isSpace=None, isBracket=None, isQuote=None, isCurrency=None, likeUrl=None, likeNum=None, likeEmail=None, isOov=None, isStop=False, pos='INTJ', tag=None, dep='ROOT', lang=None, prob=None, charIndex=0, tokenIndex=0)], spans=None)], tokens=[], spans=None)])

# Using them on a File

## 1. Import the File

In [38]:
URL = "https://washington.org/DC-information/washington-dc-history"

In [39]:
doc = client.scrape(URL).data

## 2. Convert the File

In [40]:
task = doc.convert(model='html-converter-default-v1')

In [41]:
task.wait()

## 3. Parse the File

In [42]:
task = doc.parse(model=parser.handle)

Everything in Steamship is a distributed task. A client can wait on a task that's scheduled by the engine for eventual execution in a plugin.

In [43]:
task.wait()

## Query the File

In [49]:
for block in doc.query(blockType='sentence').data.blocks:
  print("[{}] {}".format(block.type, block.text))

[sentence] Founded on July 16, 1790, Washington, DC is unique among American cities because it was established by the Constitution of the United States to serve as the nation's capital.
[sentence] You can read the actual line at the National Archives.
[sentence] From its beginning, it has been embroiled in political maneuvering, sectional conflicts and issues of race, national identity, compromise and, of course, power.
[sentence] President George Washington chose the exact site along the Potomac and Anacostia Rivers, and the city was officially founded in 1790 after both Maryland and Virginia ceded land to this new "district," to be distinct and distinguished from the rest of the states.
[sentence] To design the city, he appointed Pierre Charles L'Enfant, who presented a vision for a bold, modern city featuring grand boulevards (now the streets named for states) and ceremonial spaces reminiscent of another great world capital, L'Enfant's native Paris.
[sentence] He planned a grid syst

## 4. Embed the File

In [20]:
index = doc.index(model=marco.handle, blockType="sentence")

In [21]:
for hit in index.search("Who was chosen to design Washington DC?").data.hits:
  print("[{}] {}".format(hit.score, hit.value, hit.externalId, hit.externalType))

[0.4443884357826034] To design the city, he appointed Pierre Charles L'Enfant, who presented a vision for a bold, modern city featuring grand boulevards (now the streets named for states) and ceremonial spaces reminiscent of another great world capital, L'Enfant's native Paris.
