# Create a Slice, Launch ollama with LLM model

This notebook provisions a slice on a single site, deploying one node equipped with a GPU and connected to a NIC_Basic via the FABNetv4 service. 

On this node, we install and configure Ollama to use the *deepseek-r1:7b* model and set up Open-WebUI on the VM. 

By establishing SSH tunnels, you can access Open-WebUI to submit queries through the web interface or interact with the LLM via the API. 

Additionally, nodes in other FABRIC slices connected to FABNetv4 can send queries to this LLM through the API over the FabNetv4 network. 

While this example utilizes FabNetv4, it can be adapted to work with the FabNetv6 service as well.

## Import the FABlib Library


In [None]:
from ipaddress import ip_address, IPv4Address, IPv6Address, IPv4Network, IPv6Network
import ipaddress
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()

fablib.show_config();

In [None]:
cores_column_name = 'cores_available'
ram_column_name = 'ram_available'
disk_column_name = 'disk_available'

core=16
ram=32
disk=100

## Create the Experiment Slice

This section identifies a FABRIC site with an available GPU and sufficient CPU, RAM, and disk resources. Once a suitable site is found, a node is added with a GPU and a basic NIC, and it is connected to the FABNetv4 network to enable communication with other slices.

In [None]:
ollama_slice_name = 'Ollama-deep-seek'

ollama_node_name ='ollama_node'

network_name='net1'
nic_name = 'nic1'
model_name = 'NIC_Basic'

### Select a Site  
Choose a GPU model and search for a site that offers the specified GPU.

In [None]:
# choices include
# GPU_RTX6000
# GPU_TeslaT4
# GPU_A30
# GPU_A40
GPU_CHOICE = 'GPU_A30' 

# don't edit - convert from GPU type to a resource column name
# to use in filter lambda function below
choice_to_column = {
    "GPU_RTX6000": "rtx6000_available",
    "GPU_TeslaT4": "tesla_t4_available",
    "GPU_A30": "a30_available",
    "GPU_A40": "a40_available"
}

column_name = choice_to_column.get(GPU_CHOICE, "Unknown")
print(f'{column_name=}')

In [None]:
# find a site with at least one available GPU of the selected type
site_override = None

cores_column_name = 'cores_available'
ram_column_name = 'ram_available'
disk_column_name = 'disk_available'

if site_override:
    site1 = site_override
else:
    site1 = fablib.get_random_site(filter_function=lambda x: x[column_name] > 0 and 
                                   x[cores_column_name] > core and 
                                   x[ram_column_name] > ram and  
                                   x[disk_column_name] > disk,
                                  avoid = ["GATECH", "GPN"])
    
print(f'Preparing to create slice "{ollama_slice_name}" with node {ollama_node_name} in site {site1}')

### Set Up the Slice  

Users can specify alternative models such as:  

`llama2-7b`, `mistral-7b`, `gemma-7b`, `deepseek-r1:67b`, `phi-2`, `gpt-neo-2.7b`  

For more available models, visit: [Ollama Model Search](https://ollama.com/search)

In [None]:
default_llm_model = "deepseek-r1:7b"

In [None]:
#Create Slice
ollama_slice = fablib.new_slice(name=ollama_slice_name)

net1 = ollama_slice.add_l3network(name=network_name)

ollama_node = ollama_slice.add_node(name=ollama_node_name, cores=core, ram=ram, 
                                    disk=disk, site=site1, image='default_ubuntu_22')

ollama_node.add_component(model=GPU_CHOICE, name='gpu1')


iface1 = ollama_node.add_component(model=model_name, name=nic_name).get_interfaces()[0]
iface1.set_mode('auto')
net1.add_interface(iface1)

ollama_node.add_post_boot_upload_directory('ollama_tools','.')
ollama_node.add_post_boot_upload_directory('node_tools','.')
ollama_node.add_post_boot_execute('node_tools/enable_docker.sh {{ _self_.image }} ')
ollama_node.add_post_boot_execute('node_tools/dependencies.sh {{ _self_.image }} ')
ollama_node.add_post_boot_execute(f'cd ollama_tools && cp env.template .env && sed -i "s/^MODEL_NAME=.*/MODEL_NAME={default_llm_model}/" .env && docker compose up -d')

ollama_slice.submit();

## Query LLM via API  

This section demonstrates how to interact with the LLM using a Python API. We upload the `query.py` script to the `ollamanode` and execute it to send queries to the model.

In [None]:
ollama_slice=fablib.get_slice(ollama_slice_name)
ollama_node = ollama_slice.get_node(ollama_node_name)

### Confirm Container Status  

The containers may take a few minutes to start. Please verify that they are running before sending any queries.

In [None]:
stdout, stderr = ollama_node.execute("docker ps -a")

In [None]:
stdout, stderr = ollama_node.execute("docker logs ollama")

In [None]:
#stdout, stderr = ollama_node.execute("docker logs open-webui")

### Send queries

In [None]:
stdout, stderr = ollama_node.execute(f'python3 ollama_tools/query.py --model {default_llm_model} --prompt "Hello World"')

## Enable Access to Ollama Node Across FABRIC  

Configure the `ollamanode` to be accessible from any VM running across FABRIC on FabNetV4 by setting up the necessary routes.

In [None]:
ollama_fabnet_network = ollama_slice.get_network(network_name)

ollama_node.add_route(subnet=fablib.FABNETV4_SUBNET, 
                      next_hop=ollama_fabnet_network.get_gateway())

ollama_node.config_routes()

stdout, stderr = ollama_node.execute("sudo ip route list")

### Retrieve the FabNet IP Address  
Display the FabNet IP address of the Ollama node for sharing with other slices.

In [None]:
ollama_fabnet_ip_addr = ollama_node.get_interface(network_name=network_name).get_ip_addr()

print(f"Ollama is accessible from other slices at: {ollama_fabnet_ip_addr}")

## Delete the Slice

Please delete your slice when you are done with your experiment.

In [None]:
ollama_node = fablib.get_slice(ollama_slice_name)
ollama_node.delete()