### SAGED Benchmarking Pipeline Tutorial

This tutorial demonstrates how to set up and execute a benchmarking pipeline using the `SAGED` library and a custom Ollama model. Follow the steps below to get started.

#### Step 1: Install Dependencies
First, install the required libraries.

```bash
!pip install sagedbias ollama
```

#### Step 2: Define the Ollama Model
We use a custom `OllamaModel` class to interface with the generative model.

In [None]:
import ollama

class OllamaModel:
    def __init__(self, base_model='llama3', system_prompt='You are a helpful assistant', model_name='llama3o',
                 **kwargs):
        self.base_model = base_model
        self.model_name = model_name
        self.model_create(model_name, system_prompt, base_model, **kwargs)

    def model_create(self, model_name, system_prompt, base_model, **kwargs):
        modelfile = f'FROM {base_model}\nSYSTEM {system_prompt}\n'
        if kwargs:
            for key, value in kwargs.items():
                modelfile += f'PARAMETER {key.lower()} {value}\n'
        ollama.create(model=model_name, modelfile=modelfile)

    def invoke(self, prompt):
        answer = ollama.generate(model=self.model_name, prompt=prompt)
        return answer['response']

#### Step 3: Set Up the Benchmarking Pipeline
Import the `Pipeline` class from the `SAGED` library and define the benchmark configuration.

In [None]:
from saged import Pipeline

# Initialize the Ollama model
model = OllamaModel()

# Define the generation function
your_generation_function = model.invoke 

# Define the domain and concepts
domain = 'nationalities'
concept_list = ['Chinese']
concept_keyword_mapping = {'Chinese': 'Xin'}
keywords_references = list(concept_keyword_mapping.keys())

# Configure concept settings
concept_configuration = {
    'keyword_finder': {
        'require': False,
    },
    'source_finder': {
        'require': False,
        'method': 'local_files'
    },
    'scraper': {
        'method': 'local_files'
    },
    'prompt_maker': {
        'method': 'questions',
        'generation_function': your_generation_function,
        'max_benchmark_length': 2,
    },
}

# Configure specific concept settings
concept_specified_config = {
    x: {'keyword_finder': {'manual_keywords': [concept_keyword_mapping[x]]}} for x in concept_list
}

#### Step 4: Add Replacement Logic
Create a dictionary for keyword replacements to analyze variations.

In [None]:
def create_replacement_dict(keywords_references, replacer):
    replacement = {}
    for keyword in keywords_references:
        replacement[keyword] = {}
        for item in replacer:
            replacement[keyword][item] = {keyword: item}
    return replacement

replacer = ['Xin', 'Zekun', 'Ze', 'Shi', 'Huang']
replacement = create_replacement_dict(keywords_references, replacer)

#### Step 5: Configure the Domain
Set up the domain-specific configurations for your benchmark.

In [None]:
domain_configuration = {
    'categories': concept_list,
    'branching': True,
    'branching_config': {
        'generation_function': your_generation_function,
        'keyword_reference': keywords_references,
        'replacement_descriptor_require': False,
        'replacement_description': replacement,
        'branching_pairs': 'not all',
        'direction': 'not both',
    },
    'shared_config': concept_configuration,
    'category_specified_config': concept_specified_config
}

#### Step 6: Run the Benchmark
Build and execute the benchmark using the `Pipeline` class.

In [None]:
benchmark = Pipeline.domain_benchmark_building(domain, domain_configuration).data

#### Step 7: Analyze the Benchmark Results
The `benchmark` variable contains the results of the benchmarking process. Use tools like `pandas` or visualization libraries to analyze and display the results.

### Notes:
- **Customization**: Modify configurations (e.g., `concept_configuration`, `domain_configuration`) to suit your specific needs.
- **Model Integration**: Replace the Ollama model with any other generative model by implementing a similar interface.
- **Output**: Use the benchmark results to evaluate the performance of your model on the specified domain.

This concludes the tutorial for setting up and running a benchmark with the `SAGED` library. Let me know if you have further questions!