## Quick Start: Financial Sentiment Analysis
This tutorial demonstrates how to fine-tune a small language model (Qwen 2.5 0.5B) for financial sentiment analysis using the Factory SDK. We'll use the Financial PhraseBank dataset to train a model that can classify financial statements as positive, negative, or neutral.


###Setup and Installation
First, we need to install the required packages:

In [None]:
!pip install --upgrade factory-sdk
!pip uninstall -y pynvml #ensure that colab has no wrong nvidia bindings installed
!pip install flash-attn --no-build-isolation

# !!Please restart runtime after installation!!

Please restart runtime after installation or execute the cell below:

In [None]:
import os
import time

# Command to restart the runtime
print("Restarting Colab runtime...")
os.kill(os.getpid(), 9)

###Initializing the Factory Client
The Factory SDK provides a simple way to manage the entire ML pipeline. We start by initializing a client:

### Account Setup

Before you begin, you'll need a Factory account to access the platform's features:

1. **Create your account** at [factory.manufactai.com](https://factory.manufactai.com)
2. **Locate your credentials** after logging in:
   - Your tenant name
   - Your project name
   - Your API key

> **Note**: The Factory platform provides a free tier for getting started. For detailed instructions on account creation and finding your credentials, visit our [account setup documentation](/docs/getting-started/create-account).

In [None]:
from factory_sdk import FactoryClient, TrainArgs, AdapterArgs, InitArgs, EvalArgs, DeploymentArgs, ModelChatInput, Role, Message

factory=FactoryClient(
    tenant="ole",
    project="project3",
    token="d5541389d29782dcd408a92ef8627845",
)


This connects your local computing resource to the Factory Hub. It enables seamless integration of your local development environment with the central Factory services for versioning, tracking, and deployment.

###Loading the Base Model
Next, we load a small but powerful foundation model - Qwen 2.5 0.5B Instruct:

In [None]:
model=factory.base_model.with_name("qwen_small").from_open_weights("Qwen/Qwen2.5-0.5B-Instruct").save_or_fetch()


After this step, a revision of the model is securely stored in Factory for further use and deployment. This guarantees reproducibility and enables tracking of all model versions in the Factory Hub. The model is only 0.5B parameters in size, making it efficient to run on modest hardware while still providing good performance for our task.

###Preparing the Dataset
We'll use the Financial PhraseBank dataset, which contains financial statements labeled with sentiment:

In [None]:
from datasets import load_dataset

data=load_dataset("takala/financial_phrasebank","sentences_allagree")
data=data["train"].train_test_split(test_size=0.1,seed=42)

dataset=factory.dataset.with_name("financial-phrases").from_local(data).save_or_fetch()

After this step, the dataset can be viewed in the DataView in Factory.
The dataset contains sets of financial messages categorized as follows:

0: negative
1: neutral
2: positive

###Creating a Recipe
We need to format our data into a chat format the model can understand:

In [None]:
def processor(x):
    return ModelChatInput(
        messages=[
            Message(content=x["sentence"],role=Role.USER),
            Message(content="The answer is: "+str(x["label"]),role=Role.ASSISTANT)]
    )

recipe=factory.recipe\
    .with_name("financial-phrases")\
    .using_dataset(dataset)\
    .with_preprocessor(processor)\
    .save_or_fetch()


During this process, the data is analyzed for IID (Independent and Identically Distributed) characteristics and potential data shifts between training and test sets are automatically detected. A preview of the recipe and the statistically founded results of the data shift tests can be found in the Factory Hub. This analysis is crucial for identifying potential issues with data distribution that could impact model performance.
Das Recipe erstellt ein Chat-Format, bei dem:

* The financial statement is the user's message

* The sentiment label is the assistant's response

###Fine-tuning the Model
Now we can fine-tune our model using parameter-efficient techniques:

In [None]:
adapter=factory.adapter\
    .with_name("financial-phrases")\
    .based_on_recipe(recipe)\
    .using_model(model)\
    .with_hyperparameters(
        TrainArgs(train_batch_size=8,eval_batch_size=8,gradient_accumulation_steps=2,num_train_epochs=2,eval_every_n_minutes=2, max_eval_samples=100),
        AdapterArgs(layer_selection_percentage=.5),
        InitArgs(n_test_samples=200)
    )\
    .run()

Factory automatically measures the response of layers in the network to the training data and selects the best possible layers for fine-tuning. At the same time, it determines and sets the optimal LoRA parameters for fine-tuning based on the data. After these optimizations, the training begins. All metrics and measurement results are stored in the Factory Hub for monitoring and analysis.
We're using several techniques to make training efficient:

* Small batch sizes with gradient accumulation (effective batch size of 16)
* Only tuning 50% of the model layers (parameter-efficient fine-tuning)
* Regular evaluation during training to monitor progress

###Evaluating the Model
After training, we evaluate our model's performance:

In [None]:
from factory_sdk.metrics import ExactMatch, LevenshteinDistance, PrecisionOneVsRest, Recall, RecallOneVsRest

evaluation=factory.evaluation\
    .with_name("eval1")\
    .for_adapter(adapter)\
    .using_metric(ExactMatch)\
    .using_metric(LevenshteinDistance,lower_is_better=True)\
    .using_metric(PrecisionOneVsRest)\
    .using_metric(RecallOneVsRest)\
    .on_recipe(recipe)\
    .with_config(EvalArgs(
        max_samples=500, batch_size=8
    ))\
    .run()


Factory allows you to use pre-built metrics or pass in your own code implementations of metrics. The evaluation is automatically executed with high parallelization. During this process, the energy consumption for operation is measured and the CO2 usage is determined. The results of the evaluation are stored in the Factory Hub for comparison and can be retrieved and compared there.
We're using multiple metrics to get a comprehensive view of performance:

* Exact Match: Measures if predictions match the expected labels exactly
* Levenshtein Distance: Measures character-level differences between predictions and expected labels
* Precision and Recall: Classification metrics for each class (positive, neutral, negative)

###Deploying the Model
Now we can deploy our fine-tuned model as an API:

In [None]:
deployment=factory.deployment\
      .with_name("deployment1")\
      .for_adapter(adapter)\
      .with_config(DeploymentArgs(
          dtype="fp16",
          port=9777,
          max_memory_utilization=.8,
          swap_space=0
      ))\
      .run(daemon=True)

The deployment automatically connects with the Factory Hub, transmitting data and metrics. At the same time, new queries are embedded and compared with the training data through statistical tests in the latent space. This way, the Factory Hub always contains a reliable assessment of potential data shifts, providing continuous monitoring of model performance.
This deployment:

* Uses FP16 precision for efficiency
* Runs on port 9777
* Uses 80% of available GPU memory
* Uses no cpu swap space (due to Colab memory limits)

**Please wait until the deploymentis ready!!! Since everything runs in a separate process, you can call up the current logs here**

In [None]:
import os
print(open(deployment.log_file,"r").read())

### Installing OpenAI Client
To communicate with our deployed model, we'll use the OpenAI client which works with our API:

In [None]:
!pip install openai

In [None]:
from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:9777/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

A Factory deployment implements exactly the same interface as OpenAI. This allows you to simply use the OpenAI package, and existing applications can be easily switched over. You only need to specify the deployment API endpoint.

###Stress Testing the Model
Finally, we'll stress test our API with multiple concurrent requests:

In [None]:
import concurrent.futures
import time
import json
from openai import OpenAI
from tqdm import tqdm

def fire_requests_in_threads(client, model_name, test_data, num_requests=1000, max_workers=32, temperature=0.1):
    """
    Fire multiple requests to an OpenAI-compatible API using threading.

    Args:
        client: The OpenAI client instance
        model_name: Name of the model to query
        test_data: List of test sentences/prompts to use
        num_requests: Total number of requests to make
        max_workers: Maximum number of parallel threads
        temperature: Temperature parameter for the model

    Returns:
        list: Results from all requests
    """
    results = []
    errors = []

    # Function to make a single request
    def make_request(idx):
        try:
            # Use test data in a round-robin fashion if there are fewer test items than requests
            data_idx = idx % len(test_data)
            prompt = test_data[data_idx]["sentence"]

            start_time = time.time()
            completion = client.chat.completions.create(
                model=model_name,
                messages=[{
                    "content": prompt,
                    "role": "user"
                }],
                temperature=temperature
            )
            end_time = time.time()

            return {
                "request_id": idx,
                "prompt": prompt,
                "response": completion.choices[0].message.content,
                "response_time": end_time - start_time,
                "status": "success"
            }
        except Exception as e:
            print(f"Error in request {idx}: {e}")
            return {
                "request_id": idx,
                "prompt": prompt if 'prompt' in locals() else None,
                "error": str(e),
                "status": "error"
            }

    # Use ThreadPoolExecutor to limit concurrent requests
    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        # Submit all tasks
        future_to_idx = {executor.submit(make_request, i): i for i in range(num_requests)}

        # Process results as they complete with a progress bar
        with tqdm(total=num_requests, desc="Processing requests") as pbar:
            for future in concurrent.futures.as_completed(future_to_idx):
                result = future.result()
                if result["status"] == "success":
                    results.append(result)
                else:
                    errors.append(result)
                pbar.update(1)

    print(f"Completed {len(results)} successful requests with {len(errors)} errors")

    # Calculate statistics
    if results:
        response_times = [r["response_time"] for r in results]
        avg_response_time = sum(response_times) / len(response_times)
        max_response_time = max(response_times)
        min_response_time = min(response_times)

        print(f"Average response time: {avg_response_time:.4f}s")
        print(f"Minimum response time: {min_response_time:.4f}s")
        print(f"Maximum response time: {max_response_time:.4f}s")

    return {
        "successful_requests": results,
        "failed_requests": errors
    }

# Example usage


In [None]:
results = fire_requests_in_threads(
        client=client,
        model_name="financial-phrases",  # Replace with your model name
        test_data=data["test"],
        num_requests=500,
        max_workers=32,
        temperature=0
    )

The comparison of these test data has now been stored in the Factory Hub, and you can check if any data shifts have occurred. This provides continuous monitoring of your model's performance in production.
This test:

* Sends 500 requests to our API
* Uses up to 32 concurrent workers
* Measures response times and success rates

### Print the results

In [None]:
from rich.pretty import pprint

pprint(results)