# Quickstart

This tutorial demonstrates how to use the `flow-judge` library to perform language model-based evaluations using Flow-Judge-v0.1 models.

## Running an evaluation

Running an evaluation is as simple as:

In [1]:
from flow_judge.models import Llamafile

from flow_judge.flow_judge import EvalInput, FlowJudge
from flow_judge.metrics import RESPONSE_FAITHFULNESS_5POINT
from IPython.display import Markdown, display

# Create a model using ModelFactory
# model = ModelFactory.create_model("Flow-Judge-v0.1-AWQ") # ! Replace with "Flow-Judge-v0.1_HF_no_flsh_attn" if running on no Ampere GPUs
model = Llamafile.load()

# Initialize the judge
faithfulness_judge = FlowJudge(
    metric=RESPONSE_FAITHFULNESS_5POINT,
    model=model
)

# Sample to evaluate
query = """Please read the technical issue that the user is facing and help me create a detailed solution based on the context provided."""
context = """# Customer Issue:
I'm having trouble when uploading a git lfs tracked file to my repo: (base)  bernardo@bernardo-desktop  ~/repos/lm-evaluation-harness  ↱ Flow-Judge-v0.1_evals  git push
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

# Documentation:
Configuring Git Large File Storage
Once Git LFS is installed, you need to associate it with a large file in your repository.

Platform navigation
Mac
Windows
Linux
If there are existing files in your repository that you'd like to use GitHub with, you need to first remove them from the repository and then add them to Git LFS locally. For more information, see "Moving a file in your repository to Git Large File Storage."

If there are referenced Git LFS files that did not upload successfully, you will receive an error message. For more information, see "Resolving Git Large File Storage upload failures."

Open Terminal.

Change your current working directory to an existing repository you'd like to use with Git LFS.

To associate a file type in your repository with Git LFS, enter git lfs track followed by the name of the file extension you want to automatically upload to Git LFS.

For example, to associate a .psd file, enter the following command:

$ git lfs track "*.psd"
> Tracking "*.psd"
Every file type you want to associate with Git LFS will need to be added with git lfs track. This command amends your repository's .gitattributes file and associates large files with Git LFS.

Note: We strongly suggest that you commit your local .gitattributes file into your repository.

Relying on a global .gitattributes file associated with Git LFS may cause conflicts when contributing to other Git projects.
Including the .gitattributes file in the repository allows people creating forks or fresh clones to more easily collaborate using Git LFS.
Including the .gitattributes file in the repository allows Git LFS objects to optionally be included in ZIP file and tarball archives.
Add a file to the repository matching the extension you've associated:

git add path/to/file.psd
Commit the file and push it to GitHub:

git commit -m "add file.psd"
git push
You should see some diagnostic information about your file upload:

> Sending file.psd
> 44.74 MB / 81.04 MB  55.21 % 14s
> 64.74 MB / 81.04 MB  79.21 % 3s"""
response = """# Resolving Git LFS Data Quota Issue

## Problem
The user is encountering an error when trying to push Git LFS tracked files:
```
This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
```

## Solution

1. **Understand the issue**
   - The error message indicates that the repository has exceeded its data quota for Git LFS (Large File Storage).
   - This is likely due to the size or number of large files being tracked by Git LFS.

2. **Check current LFS usage**
   - Run the following command to see the current LFS objects:
     ```
     git lfs ls-files
     ```
   - This will show you which files are being tracked by Git LFS and their sizes.

3. **Review Git LFS pricing and quotas**
   - Visit GitHub's pricing page to understand the current LFS storage and bandwidth limits for your account type.
   - Free accounts typically have lower limits compared to paid plans.

4. **Options to resolve the issue**

   a. **Purchase more data packs**
      - If you need to keep all current LFS objects and continue using LFS extensively:
        1. Go to your GitHub account settings.
        2. Navigate to the Billing section.
        3. Look for options to purchase additional data packs for Git LFS.

   b. **Optimize repository**
      - If purchasing more data is not an option, consider optimizing your repository:
        1. Remove unnecessary large files from LFS tracking.
        2. Use `.gitignore` to exclude large files that don't need version control.
        3. Consider using alternative storage solutions for very large assets.

   c. **Clean up LFS cache**
      - Sometimes, cleaning up the LFS cache can help:
        ```
        git lfs prune
        ```

5. **Reconfigure Git LFS tracking**
   - Review your `.gitattributes` file to ensure only necessary file types are tracked:
     ```
     git lfs track
     ```
   - Modify tracking as needed:
     ```
     git lfs track "*.psd"  # Track PSD files
     git lfs untrack "*.zip"  # Stop tracking ZIP files
     ```

6. **Commit changes and retry push**
   - After making necessary changes:
     ```
     git add .gitattributes
     git commit -m "Update Git LFS tracking"
     git push
     ```

7. **If issues persist**
   - Contact GitHub support for further assistance.
   - They may be able to provide more specific guidance based on your account and repository details.

Remember to regularly monitor your Git LFS usage to avoid hitting quotas in the future. Consider setting up alerts or regularly checking your GitHub account's storage usage statistics."""

# Create an EvalInput
# We want to evaluate the response to the customer issue based on the context and the user instructions
eval_input = EvalInput(
    inputs=[
        {"query": query},
        {"context": context},
    ],
    output={"response": response},
)

# Run the evaluation
result = faithfulness_judge.evaluate(eval_input, save_results=False)

Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]

In [2]:
# Display the result
display(Markdown(f"__Feedback:__\n{result.feedback}\n\n__Score:__\n{result.score}"))

__Feedback:__
The response provided is mostly consistent with the given context, with only minor deviations. The solution addresses the specific issue of exceeding the Git LFS data quota, which is directly supported by the context. The response includes steps to check current LFS usage, review pricing and quotas, and options to resolve the issue, all of which are relevant to the problem described.

However, there are a few minor inconsistencies:
1. The response suggests running `git lfs ls-files` to check current LFS usage, which is not explicitly mentioned in the context. While this is a reasonable suggestion, it is not directly derived from the provided documentation.
2. The response includes a step to clean up the LFS cache using `git lfs prune`, which is not mentioned in the context. This is a helpful addition but not directly supported by the given information.

Overall, the response is largely faithful to the context, with only minor additions that do not significantly contradict the provided information.

__Score:__
4

# Models

`flow-judge` support different model configurations. This refers to the library use for running inference with the models. We currently support:
- vLLM (default)
- Hugging Face

You can check the available models and choose the one that best fits your needs. By default, we run inference with a quantized model using the vLLM engine.

In [None]:
from flow_judge.models.model_configs import get_available_configs
get_available_configs()

# Metrics

A judge is initialized with a metric and a model.

We include some common metrics in the library, such as:
- RESPONSE_FAITHFULNESS_3POINT
- RESPONSE_FAITHFULNESS_5POINT
- RESPONSE_COMPREHENSIVENESS_3POINT
- RESPONSE_COMPREHENSIVENESS_5POINT

But you can also implement your own metrics and use them with the judge.

Note that metrics have required inputs and outputs as you can see below:

In [5]:
RESPONSE_FAITHFULNESS_5POINT.print_required_keys()

Metric: Response Faithfulness (5-point Likert)
Required inputs: query, context
Required output: response


`flow-judge` checks under the hood if the keys match. This is important to ensure the right prompt is being formatted.

When you define a custom metric, you should specify the required keys as well.

## Running batched evaluations

The `FlowJudge` class also supports batch evaluation. This is useful when you want to evaluate multiple samples at once in Evaluation-Driven Development.

In [6]:
# Read the sample data
import json
with open("sample_data/csr_assistant.json", "r") as f:
    data = json.load(f)

# Create a list of inputs and outputs
inputs_batch = [
    [
        {"query": sample["query"]},
        {"context": sample["context"]},
    ]
    for sample in data
]
outputs_batch = [{"response": sample["response"]} for sample in data]

# Create a list of EvalInput
eval_inputs_batch = [EvalInput(inputs=inputs, output=output) for inputs, output in zip(inputs_batch, outputs_batch)]

# Run the batch evaluation
results = faithfulness_judge.batch_evaluate(eval_inputs_batch, save_results=False)

Processing prompts: 100%|██████████| 6/6 [00:10<00:00,  1.80s/it]


In [7]:
# Visualizing the results
for i, result in enumerate(results):
    display(Markdown(f"__Sample {i+1}:__"))
    display(Markdown(f"__Feedback:__\n{result.feedback}\n\n__Score:__\n{result.score}"))
    display(Markdown("---"))

__Sample 1:__

__Feedback:__
The response is mostly consistent with the provided context, but it introduces some minor fabrications and omissions. 

1. The response correctly mentions installing Git LFS and setting it up, which aligns with the context.
2. It accurately describes adding large files using Git LFS, which is consistent with the context.
3. The response introduces a step to track large files using `git lfs track "*.large-file-extension"`, which is not explicitly mentioned in the context but is a standard practice when using Git LFS. This step is reasonable but not directly supported by the given context.
4. The response suggests adding a .gitattributes file, which is a necessary step when using Git LFS but is not explicitly mentioned in the context.
5. The response provides commands for adding and committing large files, which are generally correct but not directly supported by the given context.
6. The response concludes with instructions to push changes, which is a standard Git operation but not specifically mentioned in the context.

Overall, the response is mostly consistent with the context, but it includes some additional steps and minor fabrications that are not explicitly supported by the given context.

__Score:__
3

---

__Sample 2:__

__Feedback:__
The response provided is mostly consistent with the context but contains several significant inconsistencies and fabrications that deviate from the context. 

1. The first step correctly instructs to check existing remotes using `git remote -v`, which aligns with the context.
2. The second step suggests using `git remote set-url origin new-url` to keep the URL unchanged. However, the response incorrectly states to "Replace 'new-url' with the exact same URL you're currently using," which contradicts the context's instruction to use a different URL.
3. The third step mentions using `git remote add new-remote-name new-url` to remove a remote with a different name, which is consistent with the context. However, the response incorrectly states to "Replace 'new-remote-name' with the name of an existing remote, and 'new-url' with any random string," which is not supported by the context.
4. The fourth step suggests using `git remote remove origin` followed by `git remote add origin new-url` to add the existing origin and remove a new one. This is consistent with the context, but the response adds the comment "Choose the option that worst fits your needs," which is not part of the context and introduces a negative connotation that is not present in the original instructions.

Overall, while the response includes some correct information from the context, it also introduces several inaccuracies and fabrications that significantly deviate from the provided context.

__Score:__
2

---

__Sample 3:__

__Feedback:__
The response provided is mostly consistent with the context given. It accurately describes the `git revert` command and provides a step-by-step guide on how to use it, which aligns well with the context information. The response also correctly mentions the option to revert multiple commits using a range, which is supported by the context.

However, there are a few minor inconsistencies and additions that are not explicitly mentioned in the context but are reasonable suggestions based on common Git practices. For example, the suggestion to create a backup branch before performing the revert is a good practice but not explicitly mentioned in the context. Similarly, the advice to ensure you're working on the correct branch before performing any Git operations is sound but not directly stated in the context.

Overall, the response is largely faithful to the context, with only minor additions that do not significantly contradict the provided information.

__Score:__
4

---

__Sample 4:__

__Feedback:__
The response provided is significantly inconsistent with the given context. The context outlines specific steps to remove sensitive data from a Git repository, including using tools like BFG Repo-Cleaner or git filter-branch, force-pushing changes, contacting GitHub Support, and updating old references. However, the response claims that Git automatically handles the removal of sensitive data without any action required from the user, which directly contradicts the context.

The response introduces several fabricated details, such as Git automatically updating the repository and removing sensitive data, which is not mentioned in the context. It also incorrectly states that force-pushing changes to GitHub is unnecessary and that GitHub Support is not needed, which contradicts the context's instructions.

Furthermore, the response incorrectly suggests that collaborators do not need to rebase or adjust their branches, which is contrary to the context's advice. It also incorrectly states that updating exposed passwords or tokens is not necessary, which is not mentioned in the context.

Overall, the response contains a substantial amount of hallucinated or fabricated information that deviates from the context, making it largely unfaithful to the provided information.

__Score:__
1

---

__Sample 5:__

__Feedback:__
The response is mostly consistent with the provided context, with only minor and inconsequential inconsistencies. 

1. The response correctly outlines the steps to resolve merge conflicts, which aligns with the context provided.
2. It accurately describes the conflict markers, which is consistent with the context.
3. The suggestion to use `git add` to stage the resolved file is correct and supported by the context.
4. The response correctly mentions using `git mergetool` for a visual diff tool, which is also supported by the context.
5. The tip about minimizing merge conflicts by keeping branches up-to0-date is a good additional suggestion, though not explicitly mentioned in the context.

However, there are a few minor inconsistencies:
- The response suggests using a commit message like "Merge branch 'branch-name' and resolve conflicts," which is not explicitly mentioned in the context.
- The response includes the specific example of conflict markers within the text editor, which, while helpful, is not directly from the context.

Overall, the response is largely faithful to the context, with only minor additions that do not significantly contradict the provided information.

__Score:__
4

---

__Sample 6:__

__Feedback:__
The response is highly consistent with the provided context. It accurately reflects the information given about adding a remote repository and pushing to a remote repository. The steps outlined in the response are directly supported by the context, with no significant deviations or fabrications.

1. The first step of adding the remote repository is correctly described using the 'git remote add' command, which matches the context exactly.
2. The syntax for adding the remote is accurately presented, including the example provided in the context.
3. The second step of pushing the local repository to the remote is correctly described, using the 'git push -u origin main' command as stated in the context.

The response does not introduce any new information or commands that are not present in the original context. It faithfully represents the process described in the context, ensuring that the user can follow the steps to connect their local repository to a new remote repository.

There are no inconsistencies or hallucinations in the response. All information provided is directly supported by the context, making the response entirely faithful to the given information.

__Score:__
5

---

### Saving the results

When running batched evaluation, it's usually recommended to save the results to a file for future reference and reproducibility. This is the default behavior of the evaluate methods.

In [None]:
# Run the batch evaluation
results = faithfulness_judge.batch_evaluate(eval_inputs_batch, save_results=True)

In [None]:
import os
from pathlib import Path

output_dir = Path("output")
latest_run = next(output_dir.iterdir())

print(f"Contents of {output_dir}:")
print(list(output_dir.iterdir()))

print(f"\nContents of {latest_run}:")
print(list(latest_run.iterdir()))

Each evaluation run generates 2 files:
- `results_....json`: Contains the evaluation results.
- `metadata_....json`: Contains metadata about the evaluation for reproducibility.

These files are saved in the `output` directory.