# Mlflow Deployments

https://mlflow.org/docs/latest/llms/deployments/index.html#deployments-rest-api

The MLflow Deployments Server is a powerful tool designed to streamline the usage and management of various large language model (LLM) providers, such as OpenAI and Anthropic, within an organization. It offers a high-level interface that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM related requests.

A major advantage of using the MLflow Deployments Server is its centralized management of API keys. By storing these keys in one secure location, organizations can significantly enhance their security posture by minimizing the exposure of sensitive API keys throughout the system. It also helps to prevent exposing these keys within code or requiring end-users to manage keys safely.

The deployments server is designed to be flexible and adaptable, capable of easily defining and managing endpoints by updating the configuration file. This enables the easy incorporation of new LLM providers or provider LLM types into the system without necessitating changes to applications that interface with the deployments server. This level of adaptability makes the MLflow Deployments Server Service an invaluable tool in environments that require agility and quick response to changes.

## EndPoint Configuration File

```python
endpoints:
  - name: completions
    endpoint_type: llm/v1/completions
    model:
      provider: openai
      name: gpt-3.5-turbo
      config:
        openai_api_key: $OPENAI_API_KEY
    limit:
      renewal_period: minute
      calls: 10

  - name: chat
    endpoint_type: llm/v1/chat
    model:
      provider: openai
      name: gpt-3.5-turbo
      config:
        openai_api_key: $OPENAI_API_KEY

  - name: chat-gpt4
    endpoint_type: llm/v1/chat
    model:
      provider: openai
      name: gpt-4
      config:
        openai_api_key: $OPENAI_API_KEY

  - name: embeddings
    endpoint_type: llm/v1/embeddings
    model:
      provider: openai
      name: text-embedding-ada-002
      config:
        openai_api_key: $OPENAI_API_KEY

  - name: antrophic-chat
    endpoint_type: llm/v1/chat
    model:
      provider: anthropic
      name: claude-2.1
      config:
        anthropic_api_key: $ANTHROPIC_API_KEY
```

# Install

pip install 'mlflow[genai]'


- export OPENAI_API_KEY
- export MLFLOW_DEPLOYMENTS_CONFIG 
- export MLFLOW_DEPLOYMENTS_TARGET 
- export ANTHROPIC_API_KEY

mlflow deployments start-server --config-path /mnt/d/repos/mlserve/config.yaml --port 5888 --host 0.0.0.0 --workers 2 &




In [1]:
from mlflow.deployments import get_deploy_client
import mlflow

* 'schema_extra' has been renamed to 'json_schema_extra'


In [2]:
mlflow.set_tracking_uri("http://localhost:5000") 

In [3]:
client = get_deploy_client("http://127.0.0.1:5888")

In [38]:
end_points = client.list_endpoints()

In [39]:
for e in end_points:
    print(e)
    print("-"*50)

name='completions' endpoint_type='llm/v1/completions' model=RouteModelInfo(name='gpt-3.5-turbo', provider='openai') endpoint_url='http://127.0.0.1:5888/gateway/completions/invocations' limit=Limit(calls=10, key=None, renewal_period='minute')
--------------------------------------------------
name='chat' endpoint_type='llm/v1/chat' model=RouteModelInfo(name='gpt-3.5-turbo', provider='openai') endpoint_url='http://127.0.0.1:5888/gateway/chat/invocations' limit=None
--------------------------------------------------
name='chat-gpt4' endpoint_type='llm/v1/chat' model=RouteModelInfo(name='gpt-4', provider='openai') endpoint_url='http://127.0.0.1:5888/gateway/chat-gpt4/invocations' limit=None
--------------------------------------------------
name='embeddings' endpoint_type='llm/v1/embeddings' model=RouteModelInfo(name='text-embedding-ada-002', provider='openai') endpoint_url='http://127.0.0.1:5888/gateway/embeddings/invocations' limit=None
--------------------------------------------------


In [40]:
response = client.predict(
    endpoint="chat",
    inputs={"messages": [{"role": "user", "content": "Tell me a joke about taxidrivers"}]},
)
print(response)

{'id': 'chatcmpl-9XQp2hj66LcCmK704g0dF3GLIIof8', 'object': 'chat.completion', 'created': 1717754956, 'model': 'gpt-3.5-turbo-0125', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'Why did the taxidriver break up with his girlfriend? Because she kept asking for a fare relationship!'}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 16, 'completion_tokens': 21, 'total_tokens': 37}}


In [41]:
response = client.predict(
    endpoint="antrophic-chat",
    inputs={"max_tokens": 2000, "messages": [{"role": "user", "content": "Tell me a joke about taxidrivers"}]},
)
print(response)

{'id': 'msg_015MK9n6Gmu76DJKMg5ZKje5', 'object': 'chat.completion', 'created': 1717755001, 'model': 'claude-2.1', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "Why don't taxidrivers like taking mathematicians as passengers? Because they always ask too many questions about the shortest route!"}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 18, 'completion_tokens': 30, 'total_tokens': 48}}


In [42]:
response = client.predict(
    endpoint="chat-gpt4",
    inputs={"messages": [{"role": "user", "content": "Tell me a joke about taxidrivers"}]},
)
print(response)

{'id': 'chatcmpl-9XQpzPLFws905opQKqFnuYxY0vP5j', 'object': 'chat.completion', 'created': 1717755015, 'model': 'gpt-4-0613', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "Why don't taxidrivers ever get lost?\n\nBecause they always follow their fare instincts!"}, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 16, 'completion_tokens': 19, 'total_tokens': 35}}


In [43]:

data = {
    "prompt": (
        "What would happen if an asteroid the size of "
        "a basketball encountered the Earth traveling at 0.5c? "
        "Please provide your answer in .rst format for the purposes of documentation."
    ),
    "temperature": 0.5,
    "max_tokens": 1000,
    "n": 1,
    "frequency_penalty": 0.2,
    "presence_penalty": 0.2,
}

r = client.predict(endpoint="completions", inputs=data)

In [44]:
r

{'id': 'chatcmpl-9XQq5lbL5HAYVrs8F1ozzOwY52YY1',
 'object': 'text_completion',
 'created': 1717755021,
 'model': 'gpt-3.5-turbo-0125',
 'choices': [{'index': 0,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 43, 'completion_tokens': 223, 'total_tokens': 266}}

In [45]:
print(r['choices'][0]['text'])

# Asteroid Impact Scenario

## Description

An asteroid the size of a basketball is traveling towards Earth at a velocity of 0.5 times the speed of light (0.5c).

## Potential Consequences

1. Upon impact, the asteroid would release a massive amount of energy due to its high velocity.
2. The impact would result in a significant explosion upon contact with Earth's surface.
3. The explosion would generate a shockwave that could cause widespread destruction in the surrounding area.
4. The impact crater created by the asteroid would be substantial, potentially causing further damage to the local environment.
5. The release of debris and dust into the atmosphere could lead to long-term environmental consequences, such as climate change and decreased sunlight reaching the surface.

## Recommendations

1. Monitor the trajectory of the asteroid and assess potential impact zones.
2. Implement evacuation procedures for areas at risk of being affected by the impact.
3. Coordinate with internation

In [22]:
import mlflow
from langchain import LLMChain, PromptTemplate
from langchain.llms import Mlflow

In [23]:
mlflow.set_experiment("ml_server")

<Experiment: artifact_location='file:///home/olonok/mlflow/mlruns/21', creation_time=1717748561923, experiment_id='21', last_update_time=1717748561923, lifecycle_stage='active', name='ml_server', tags={}>

In [46]:
llm = Mlflow(target_uri="http://127.0.0.1:5888", endpoint="completions")
llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate(
        input_variables=["adjective"],
        template="Tell me a {adjective} joke",
    ),
)
result = llm_chain.run(adjective="funny")
print(result)

with mlflow.start_run():
    model_info = mlflow.langchain.log_model(llm_chain, "model")

model = mlflow.pyfunc.load_model(model_info.model_uri)
print(model.predict([{"adjective": "very sad"}]))

Why couldn't the bicycle stand up by itself?

Because it was two tired!




["Why couldn't the bicycle stand up by itself?\n\nBecause it was two-tired."]


In [25]:
mlflow.deployments.set_deployments_target("http://127.0.0.1:5888")

In [26]:
mlflow.deployments.get_deployments_target() 

'http://127.0.0.1:5888'

In [47]:
! curl -X GET "http://0.0.0.0:5888/api/2.0/endpoints/"

{"endpoints":[{"name":"completions","endpoint_type":"llm/v1/completions","model":{"name":"gpt-3.5-turbo","provider":"openai"},"endpoint_url":"/gateway/completions/invocations","limit":{"calls":10,"key":null,"renewal_period":"minute"}},{"name":"chat","endpoint_type":"llm/v1/chat","model":{"name":"gpt-3.5-turbo","provider":"openai"},"endpoint_url":"/gateway/chat/invocations","limit":null},{"name":"chat-gpt4","endpoint_type":"llm/v1/chat","model":{"name":"gpt-4","provider":"openai"},"endpoint_url":"/gateway/chat-gpt4/invocations","limit":null},{"name":"embeddings","endpoint_type":"llm/v1/embeddings","model":{"name":"text-embedding-ada-002","provider":"openai"},"endpoint_url":"/gateway/embeddings/invocations","limit":null},{"name":"antrophic-chat","endpoint_type":"llm/v1/chat","model":{"name":"claude-2.1","provider":"anthropic"},"endpoint_url":"/gateway/antrophic-chat/invocations","limit":null}],"next_page_token":null}