Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Improvements #962

Open
knoopx opened this issue Nov 1, 2023 · 3 comments
Open

API Improvements #962

knoopx opened this issue Nov 1, 2023 · 3 comments
Labels
feature request New feature or request

Comments

@knoopx
Copy link

knoopx commented Nov 1, 2023

I'm currently writing a webui for ollama but I find the API quite limited/cumbersome.
What is your vision/plan regarding it? Is it in a frozen state, or are you planning to improve it?

Here's some criticism:

  • mixed model/generation endpoints. some namespacing would be nice.

  • mixed model/name params that refer to the same thing.

  • /api/tags: why is this named tags?

  • GET /api/tags to get all available local models but POST /api/show to get one?

  • some endpoints throw errors, some return status as a JSON property.

  • No way to query the available public models repository

  • POST /api/create: doesn't allow to specify the Modelfile as just raw text, so there's no way to create models without file system access (client-side). Also no way to specify model file using just an object. Also for this to work, FROM needs to handle remote resources aswell.

  • POST /api/show: returns a string which forces the client to parse it to get the actual data. It would be nice it it also returned a JSON object.

  • POST /api/embeddings: without batching support is mostly useless

  • template in Modelfile:
    to properly support chat agents it would be nice to have a chat-specific generation endpoint, and to be able to iterate over them in the template.
    otherwise the feature itself is quite limited and requires the client to mostly override and re-implement all the logic (and it also needs to know all the underlying model parameters).
    (this is how Hugginface does https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json#L34)

    Example:

    define Modelfile template :

    {{range .Messages}}
    <|{{ .Role }}|>
    {{ .Content }}
    </s>
    {{end}}
    

    instead of:

    {{- if .System }}
    <|system|>
    {{ .System }}
    </s>
    {{- end }}
    <|user|>
    {{ .Prompt }}
    </s>
    <|assistant|>
    

    and then passing the messages as a JSON array to

    POST /api/chat/generate

    {
        ...,
        "messages": [
            {"role": "system", "content": "you are an assistant"},
            {"role": "user", "content": "hello"},
            {"role": "assistant", "content": "hi there"}
        ]
    }
    

Here's my app if you want to have a peek: https://github.com/knoopx/llm-workbench

@BruceMacD BruceMacD added the feature request New feature or request label Nov 1, 2023
@mysticfall
Copy link

+1 for the chat agent support and potential template format change.

Even though LangChain supports Ollama out of the box, its model implementation is wrong because it uses its own prompt format (i.e. Alpaca-like) to preprocess the input, which is again wrapped with a model-specific prompt template once the request is sent to the server. (See https://github.com/langchain-ai/langchainjs/blob/main/langchain/src/chat_models/ollama.ts#L256)

It's a problem that LangChain should fix, but the real issue is that there's no way to correctly implement the model with how Ollama currently handles the prompt template.

To be specific, LangChain presupposes a chat model can process a list of messages in a single prompt, which can be from the system, user, or AI.

But even though we may change ChatOllama to make it query the model-specific template and send the formatted prompt using the raw parameter, there's no way to parse the template to extract the proper format for each message type.

@qsdhj
Copy link

qsdhj commented May 14, 2024

Hi, is someone working on the feature to enable batch processing with embeddings? Without it, the feature is, besides for basic testing with small corpuses of text, not useable.

@IvanoBilenchi
Copy link

Batch embeddings really are a must for the whole embeddings feature to be usable. It looks like some work was done in #3642, though it's been in draft state for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants