# Create a Slice, Launch ollama with LLM model

This notebook provisions a slice on a single site, deploying one node equipped with a GPU and connected to a NIC_Basic via the FABNetv4 service. 

On this node, we install and configure Ollama to use the *deepseek-r1:7b* model and set up Open-WebUI on the VM. 

By establishing SSH tunnels, you can access Open-WebUI to submit queries through the web interface or interact with the LLM via the API. 

Additionally, nodes in other FABRIC slices connected to FABNetv4 can send queries to this LLM through the API over the FabNetv4 network. 

While this example utilizes FabNetv4, it can be adapted to work with the FabNetv6 service as well.

## Import the FABlib Library


In [None]:
from ipaddress import ip_address, IPv4Address, IPv6Address, IPv4Network, IPv6Network
import ipaddress
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()

fablib.show_config();

## Create the Experiment Slice

This section identifies a FABRIC site with an available GPU and sufficient CPU, RAM, and disk resources. Once a suitable site is found, a node is added with a GPU and a basic NIC, and it is connected to the FABNetv4 network to enable communication with other slices.

In [None]:
ollama_slice_name = 'Ollama-slice'

ollama_node_name ='ollama_node'

network_name='net1'
nic_name = 'nic1'
model_name = 'NIC_Basic'

### Select a Site  
Choose a GPU model and search for a site that offers the specified GPU.

In [None]:
min_cores = 16
min_ram_gb = 32
min_disk_gb = 100
min_gpu_any = 0       # >0 means at least one GPU of any model for the initial filter
min_gpu_for_pick = 1  # >1 means at least two for the random pick

In [None]:
import random
import pandas as pd

fields = ['name', 'state', 'cores_available', 'ram_available', 'disk_available']
gpu_models = ["GPU_RTX6000", "GPU_Tesla_T4", "GPU_A30", "GPU_A40"]
gpu_fields = [f"{m.split('_', 1)[1].lower()}_available" for m in gpu_models]
fields += [f for f in gpu_fields if f not in fields]

# If empty -> do not filter by name
sites_like: list[str] = []   # e.g., ['BRIST', 'TOKY'] or [] to disable
avoid_like: list[str] = ["TACC", "GATECH", "GPN"]   # e.g., ['BRIST', 'TOKY'] or [] to disable
min_cores = 4
min_ram_gb = 16
min_disk_gb = 200
min_gpu_any = 0       # >0 means at least one GPU of any model for the initial filter
min_gpu_for_pick = 1  # >1 means at least two for the random pick

def filter_function(row: dict) -> bool:
    # Name filter: only apply if sites_like is non-empty
    if sites_like:
        name = (row.get('name') or '')
        name_ok = any(tok.lower() in name.lower() for tok in sites_like)
    else:
        name_ok = True

    res_ok = (
        row.get('cores_available', 0) > min_cores and
        row.get('ram_available', 0) > min_ram_gb and
        row.get('disk_available', 0) > min_disk_gb and
        row.get('state') == 'Active'
    )
    any_gpu_ok = any(row.get(gf, 0) > min_gpu_any for gf in gpu_fields)

    return name_ok and res_ok and any_gpu_ok

styled_or_df = fablib.list_hosts(fields=fields, pretty_names=False, avoid=avoid_like, filter_function=filter_function)

# Normalize Styler/DataFrame/list-of-dicts -> DataFrame
if isinstance(styled_or_df, pd.io.formats.style.Styler):
    df = styled_or_df.data
elif isinstance(styled_or_df, pd.DataFrame):
    df = styled_or_df
else:
    df = pd.DataFrame(styled_or_df or [])

if df.empty:
    raise RuntimeError("No hosts matched the filter criteria.")

# Random pick where any GPU count > 1
model_map = dict(zip(gpu_fields, gpu_models))
long = (
    df.reset_index()[["index"] + gpu_fields]
      .melt(id_vars="index", var_name="gpu_field", value_name="count")
)
eligible = long[long["count"] > min_gpu_for_pick]
if eligible.empty:
    raise RuntimeError("No site has any GPU model with count > 1.")

pick = eligible.sample(1).iloc[0]
host_row = df.loc[pick["index"]]
picked_gpu_model = model_map[pick["gpu_field"]]

print(
    f"Chosen Host: {host_row.get('name', '<unknown>')} | "
    f"GPU: {picked_gpu_model} | Available: {int(pick['count'])}"
)

if "GPU_Tesla_T4" == picked_gpu_model:
    picked_gpu_model = "GPU_TeslaT4"

picked_host = host_row.get('name')
picked_site = picked_host.split('-', 1)[0].upper()

### Set Up the Slice  

Users can specify alternative models such as:  

`llama2-7b`, `mistral-7b`, `gemma-7b`, `deepseek-r1:67b`, `phi-2`, `gpt-neo-2.7b`  

For more available models, visit: [Ollama Model Search](https://ollama.com/search)

In [None]:
default_llm_model = "deepseek-r1:7b"

In [None]:
#Create Slice
ollama_slice = fablib.new_slice(name=ollama_slice_name)

net1 = ollama_slice.add_l3network(name=network_name)

ollama_node = ollama_slice.add_node(name=ollama_node_name, cores=min_cores, ram=min_ram_gb, host=picked_host,
                                    disk=min_disk_gb, site=picked_site, image='default_ubuntu_22')

ollama_node.add_component(model=picked_gpu_model, name='gpu1')


iface1 = ollama_node.add_component(model=model_name, name=nic_name).get_interfaces()[0]
iface1.set_mode('auto')
net1.add_interface(iface1)

ollama_node.add_post_boot_upload_directory('ollama_tools','.')
ollama_node.add_post_boot_upload_directory('node_tools','.')
ollama_node.add_post_boot_execute('node_tools/enable_docker.sh {{ _self_.image }} ')
ollama_node.add_post_boot_execute('node_tools/dependencies.sh {{ _self_.image }} ')
ollama_node.add_post_boot_execute(f'cd ollama_tools && cp env.template .env && sed -i "s/^MODEL_NAME=.*/MODEL_NAME={default_llm_model}/" .env && docker compose up -d')

ollama_slice.submit();

## Query LLM via API  

This section demonstrates how to interact with the LLM using a Python API. We upload the `query.py` script to the `ollamanode` and execute it to send queries to the model.

In [None]:
ollama_slice=fablib.get_slice(ollama_slice_name)
ollama_node = ollama_slice.get_node(ollama_node_name)

### Confirm Container Status  

The containers may take a few minutes to start. Please verify that they are running before sending any queries.

In [None]:
stdout, stderr = ollama_node.execute("docker ps -a")

In [None]:
stdout, stderr = ollama_node.execute("docker logs ollama")

In [None]:
#stdout, stderr = ollama_node.execute("docker logs open-webui")

### Send queries

In [None]:
stdout, stderr = ollama_node.execute(f'python3 ollama_tools/query.py --model {default_llm_model} --prompt "Hello World"')

## Enable Access to Ollama Node Across FABRIC  

Configure the `ollamanode` to be accessible from any VM running across FABRIC on FabNetV4 by setting up the necessary routes.

In [None]:
ollama_fabnet_network = ollama_slice.get_network(network_name)

ollama_node.add_route(subnet=fablib.FABNETV4_SUBNET, 
                      next_hop=ollama_fabnet_network.get_gateway())

ollama_node.config_routes()

stdout, stderr = ollama_node.execute("sudo ip route list")

### Retrieve the FabNet IP Address  
Display the FabNet IP address of the Ollama node for sharing with other slices.

In [None]:
ollama_fabnet_ip_addr = ollama_node.get_interface(network_name=network_name).get_ip_addr()

print(f"Ollama is accessible from other slices at: {ollama_fabnet_ip_addr}")

## Querying Ollama

Users can interact with the LLM through the REST API, the command-line interface, or the Open WebUI.

### REST Examples

The `query.py` script demonstrates how to query the LLM over the REST interface. Although Ollama can run on a remote host, the example below targets the local instance by passing `--host localhost`. Users may also specify a different `--host` and `--port` as needed.


In [None]:
stdout, stderr = ollama_node.execute(f"python3 ollama_tools/query.py --model {default_llm_model} --prompt 'Tell me about National Science Foundation' --host localhost --port 11434")

In [None]:
stdout, stderr = ollama_node.execute(f"python3 ollama_tools/query.py --model {default_llm_model} --prompt 'Tell me about NVIDIA BlueField DPUs' --host localhost --port 11434")

### CLI Examples

SSH into the `ollama_node` using the command provided above.
To view available models, run:

```bash
docker exec -it ollama ollama list
```

To start a model and interact with it:

```bash
docker exec -it ollama ollama run deepseek-r1:7b
```

This will open an interactive prompt where you can type questions directly.

### Open Web UI

To access the Open Web UI from your laptop, you’ll need to create an SSH tunnel.
Follow the steps below to complete the setup.


#### Start the SSH Tunnel

- Create SSH Tunnel Configuration `fabric_ssh_tunnel_tools.zip`
- Download your custom `fabric_ssh_tunnel_tools.zip` tarball from the `fabric_config` folder.  
- Untar the tarball and put the resulting folder (`fabric_ssh_tunnel_tools`) somewhere you can access it from the command line.
- Open a terminal window. (Windows: use `powershell`) 
- Use `cd` to navigate to the `fabric_ssh_tunnel_tools` folder.
- In your terminal, run the command that results from running the following cell (leave the terminal window open).

In [None]:
fablib.create_ssh_tunnel_config(overwrite=True)

#### Launch Open Web UI

To access the Open Web UI running on the ollama node, create an SSH tunnel from your local machine using the command generated by the next cell:

```bash
ssh -L 8080:<manager-ip>:8080 -i <private_key> -F <ssh_config> <your-username>@<manager-host>
```

Replace `<manager-ip>` and `<manager-host>` with the actual IP address and hostname of the Ceph manager VM.

Then, open your browser and navigate to:


http://localhost:8080


In [None]:
import os
# Port on your local machine that you want to map the File Browser to.
local_port='8080'
# Local interface to map the File Browser to (can be `localhost`)
local_host='127.0.0.1'

# Port on the node used by the File Browser Service
target_port='8080'

# Username/node on FABRIC
target_host=f'{ollama_node.get_username()}@{ollama_node.get_management_ip()}'

print("Use `cd` to navigate into the `fabric_ssh_tunnel_tools` folder.")
print("In your terminal, run the SSH tunnel command")
print()
print(f'ssh  -L {local_host}:{local_port}:127.0.0.1:{target_port} -i {os.path.basename(fablib.get_default_slice_public_key_file())[:-4]} -F ssh_config {target_host}')
print()
print("After running the SSH command, open Open WebUI at http://localhost:8080. If prompted, create an account and start asking questions.")

## Delete the Slice

Please delete your slice when you are done with your experiment.

In [None]:
#ollama_node = fablib.get_slice(ollama_slice_name)
#ollama_node.delete()