Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make API definitions consistent #54

Closed
davidmezzetti opened this issue Jan 8, 2021 · 0 comments
Closed

Make API definitions consistent #54

davidmezzetti opened this issue Jan 8, 2021 · 0 comments
Assignees
Milestone

Comments

@davidmezzetti
Copy link
Member

davidmezzetti commented Jan 8, 2021

With the additional functionality added to txtai over the last few releases, the API definitions have gotten somewhat inconsistent. This issue will address that and make many of the return types across modules consistent. The changes are breaking in many cases and will require a bump of the major version of txtai to v2.

The current Python API definitions for v1 are:

Current Python API v1

  • embeddings.search("query text")
    return [(id, score)] sort score desc

  • embeddings.similarity("query text", documents)
    return [score]

  • embeddings.add(documents)
    embeddings.index()

  • embeddings.transform("text")
    return [float]

  • extractor(sections, queue)
    return [(name, answer)]

  • labels("text", ["label1"])
    return [(label, score)] sort score desc

The new method templates and return types are below.

New Python API v2

  • embeddings.search("query text")
    return [(id, score)] sort score desc

  • embeddings.batchsearch(["query text1", "query text2])
    return [[(id, score)] sort score desc]

  • embeddings.add(documents)
    embeddings.index()

  • embeddings.similarity("query text", texts)
    return [(id, score)] sort score desc

  • embeddings.batchsimilarity(["query text1", "query text2], texts)
    return [[(id, score)] sort score desc]

  • embeddings.transform("text")
    return [float]

  • embeddings.batchtransform(["text1", "text2"])
    return [[float]]

  • extractor(queue, texts)
    return [(name, answer)]

  • labels("text", ["label1"])
    return [(id, score)] sort score desc

  • labels(["text1", "text2"], ["label1"])
    return [[(id, score)] sort score desc]

  • similarity("query text", texts)
    return [(id, score)] sort score desc

  • batchsimilarity(["query text1", "query text2], texts)
    return [[(id, score)] sort score desc]

External v2 API Calls

The API methods also need to have corresponding changes.

Given that json doesn't support tuples and some languages can't easily map arrays/tuples to objects, the return types are mapped from tuples to json objects. For example instead of (id, score) the API will return {"id": value, "score": value}.

The API also has the following differences with the native Python API.

  • extract uses the Extractor pipeline which is a callable object in Python.
  • label/batchlabel uses the Labels pipeline which is a callable object in Python that supports both string and list input.
  • similarity/batchsimilarity uses the Similarity pipeline which is a callable object in Python that supports both string and list input.

The following list shows how the API methods will look through language binding libraries.

  • embeddings.search("query text")
    embeddings.batchsearch(["query text1", "query text2])

  • embeddings.add(documents)
    embeddings.index()

  • embeddings.similarity("query text", texts)
    embeddings.batchsimilarity(["query text1", "query text2], texts)

  • embeddings.transform("text")
    embeddings.batchTransform(["text1", "text2"])

  • extractor.extract(questions, texts)

  • labels.label("text", ["label1"])
    labels.batchlabel(["text1", "text2"], ["label1"])

  • similarity.similarity("query text", texts)
    similarity.batchsimilarity(["query text1", "query text2], texts)

@davidmezzetti davidmezzetti self-assigned this Jan 8, 2021
@davidmezzetti davidmezzetti changed the title Simplify and clean API definitions Simplify and make API definitions consistent Jan 8, 2021
@davidmezzetti davidmezzetti changed the title Simplify and make API definitions consistent Make API definitions consistent Jan 12, 2021
davidmezzetti added a commit that referenced this issue Jan 12, 2021
@davidmezzetti davidmezzetti added this to the v2.0.0 milestone May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant