# Welcome to FlexAI!


We found the process of fine-tuning and inference really complicated, from initializing a GPU server to setting the app's env installions, OOMs, prompt templates, multi-GPU, and more. We decided to build a platform that helps you do fine-tuning and inference for over 60+ open-source LLMs, in a single API, so your teams can save up to 70% of the time on non-important stuff, all serverless (like Unsloth/Axololt but as a SaaS).

Some of the features in FlexAI platform:

- Serverless fine-tuning and inference
- Live time and cost estimations
- Checkpoints management
- Lora and multi-Lora
- Target inference validations
- OpenAI compatible Endpoints API in a single click
- Playground

Check us out - https://getflex.ai/

# Setup
(We suggest using CPU for this colab, since the compute is remote)

To install FlexAI, run the following command:


In [None]:
!pip install flex_ai openai

## Create a FlexAI client and choose your model
You need to sign up first at https://app.getflex.ai/sign-in to get an API key (You have 10$ for training and inference for free)

To get the API key go to "Settings" -> "API Keys" -> Paste your API key to the form.

Then, choose the model that you want to fine-tune from the dropdown. If you need any other HuggingFace model, please contact-us.



In [None]:
from flex_ai import FlexAI

# you can get your api key from https://app.getflex.ai/settings/api-keys

# @title
API_KEY = "" # @param {"type":"string"}
MODEL = "meta-llama/Llama-3.2-3B-Instruct" # @param ['TinyLlama/TinyLlama_v1.1', 'meta-llama/Meta-Llama-3-70B-Instruct', 'microsoft/Phi-3-medium-128k-instruct', 'Qwen/Qwen2-1.5B', 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'Qwen/Qwen2-57B-A14B', 'microsoft/Phi-3-small-8k-instruct', 'meta-llama/Meta-Llama-3-8B-Instruct', 'google/gemma-2-27b-it', 'Qwen/Qwen2-1.5B-Instruct', '01-ai/Yi-1.5-9B', '01-ai/Yi-1.5-9B-Chat', 'google/gemma-2-2b-it', 'google/gemma-2-9b-it', 'ai21labs/Jamba-v0.1', '01-ai/Yi-1.5-6B', '01-ai/Yi-1.5-34B-Chat', 'Qwen/Qwen2-72B-Instruct', 'Qwen/Qwen2-72B', 'google/gemma-2-27b', 'Qwen/Qwen2-7B', 'Qwen/Qwen2-7B-Instruct', 'microsoft/Phi-3-mini-128k-instruct', 'microsoft/Phi-3-medium-4k-instruct', 'microsoft/Phi-3-mini-4k-instruct', 'Qwen/Qwen2-0.5B-Instruct', 'microsoft/Phi-3-small-128k-instruct', 'meta-llama/Meta-Llama-3-8B', 'meta-llama/Meta-Llama-3-70B', 'mistralai/Mistral-7B-Instruct-v0.3', 'deepseek-ai/DeepSeek-Coder-V2-Lite-Base', 'CohereForAI/c4ai-command-r-v01', 'Qwen/Qwen2-57B-A14B-Instruct', 'mistralai/Mistral-Nemo-Instruct-2407', 'google/gemma-2-9b', '01-ai/Yi-1.5-34B-Chat-16K', '01-ai/Yi-1.5-34B-32K', 'mistralai/Mistral-Nemo-Base-2407', 'Qwen/Qwen2-0.5B', '01-ai/Yi-1.5-9B-Chat-16K', '01-ai/Yi-1.5-34B', 'internlm/internlm2_5-7b', 'Qwen/Qwen2.5-Coder-7B-Instruct', '01-ai/Yi-1.5-9B-32K', '01-ai/Yi-1.5-6B-Chat', 'internlm/internlm2_5-7b-chat', 'meta-llama/Meta-Llama-3.1-8B', 'meta-llama/Meta-Llama-3.1-8B-Instruct', 'meta-llama/Meta-Llama-3.1-70B', 'meta-llama/Meta-Llama-3.1-70B-Instruct', 'meta-llama/Llama-3.2-3B-Instruct', 'meta-llama/Llama-3.2-1B-Instruct']

client=FlexAI(api_key=API_KEY)

## Create your first dataset
If you dont have any data, you can unmark this code and you will download a sample of train and eval datasets

In [None]:
# If you dont have any files to upload , you can this data example
# Download and Create instruction dataset
# train_url = "https://secure.getflex.ai/storage/v1/object/public/files/sample_data/train.jsonl"
# eval_url = "https://secure.getflex.ai/storage/v1/object/public/files/sample_data/eval.jsonl"

# !wget {train_url} -O train.jsonl
# !wget {eval_url} -O eval.jsonl

## Upload your own Dataset

You can check the supported datasets formats here: https://docs.getflex.ai/quickstart#upload-your-first-dataset

In [None]:
from google.colab import files
import ipywidgets as widgets
from IPython.display import display, HTML
import json
import time

def upload_and_process_file(file_type):
    progress = widgets.FloatProgress(value=0, min=0, max=100, description=f'Uploading {file_type}:')
    display(progress)

    uploaded = files.upload()

    for filename, content in uploaded.items():
        # Simulate processing
        for i in range(100):
            time.sleep(0.05)  # Simulate some work
            progress.value = i + 1

        # Process the file (in this case, just count the lines)
        lines = content.decode('utf-8').split('\n')
        num_lines = len(lines)

        print(f"{file_type.capitalize()} file '{filename}' uploaded and processed. It contains {num_lines} lines.")

        # You can add more processing here as needed

    return uploaded

# Create buttons
train_button = widgets.Button(description="Upload Train File")
eval_button = widgets.Button(description="Upload Eval File")

# Define button click handlers
def on_train_button_clicked(b):
    train_data = upload_and_process_file('train')

def on_eval_button_clicked(b):
    eval_data = upload_and_process_file('eval')

# Attach click handlers to buttons
train_button.on_click(on_train_button_clicked)
eval_button.on_click(on_eval_button_clicked)

# Display the interface
display(HTML("<h2>Upload Your Custom .jsonl Files</h2>"))
display(HTML("<p>Click the buttons below to upload your train and eval files.</p>"))
display(train_button, eval_button)

In [None]:
# Eval is optional, but recommended
dataset = client.create_dataset("My First Flex Dataset","train.jsonl","eval.jsonl")
print("Check the Datasets: https://app.getflex.ai/datasets")

## Create a Fine Tune Task

In [None]:
# Start a fine-tuning task
task = client.create_finetune(
    name="My Task New",
    dataset_id=dataset["id"],
    model=MODEL,
    n_epochs=10,
    train_with_lora=True,
    lora_config={
        "lora_r": 64,
        "lora_alpha": 8,
        "lora_dropout": 0.1
    },
    n_checkpoints_and_evaluations_per_epoch=1,
    batch_size=4,
    learning_rate=0.0001,
    save_only_best_checkpoint=False
)

# You can now monitor the task status and results through the Flex AI dashboard
print(f"Fine-tuning task created with ID: {task['id']}")
print("Check the Tasks Page for status updates: https://app.getflex.ai/tasks")

## Show Live Logs and wait for the task to be completed

In [None]:
client.wait_for_task_completion(task_id=task["id"])

## Wait for last checkpoint to be uploaded

In [None]:
import time

while True:
    checkpoints = client.get_task_checkpoints(task_id=task["id"])
    if checkpoints and checkpoints[-1]["stage"] == "FINISHED":
        last_checkpoint = checkpoints[-1]
        checkpoint_list = [{
            "id": last_checkpoint["id"],
            "name": "step_" + str(last_checkpoint["step"])
        }]
        break
    time.sleep(10)  # Wait 10 seconds before checking again

##Create an OpenAI-compatible endpoint for your Fine Tuned Model

In [None]:
endpoint_id = client.create_multi_lora_endpoint(name="My Endpoint New", lora_checkpoints=checkpoints_list, compute="A100-40GB")
endpoint = client.wait_for_endpoint_ready(endpoint_id=endpoint_id)

##Use your fine tuned Model

In [None]:
from openai import OpenAI

openai_endpoint_url = endpoint["url"]
client = OpenAI(
 api_key=API_KEY,
 base_url=f"{openai_endpoint_url}/v1"
)
completion = client.completions.create(
 model=MODEL,
 prompt="Translate the following English text to French",
 max_tokens=60
)
print(completion.choices[0].text)