# AI Generation of the MOT Model YAML Files 
*With IBM Granite*

## In this notebook
This notebook contains instructions for performing YAML Model generation via custom Granite flow with Ollama.
This notebook is heavily inspired by the official IBM Granite workshop, for the hardware setup please refer to the [workshop](https://ibm.github.io/granite-workshop/).

## Setting up the environment

Ensure you are running python 3.10, 3.11, or 3.12 in a freshly-created virtual environment.

In [1]:
import sys
assert sys.version_info >= (3, 10) and sys.version_info < (3, 13), "Use Python 3.10, 3.11, or 3.12 to run this notebook."

### Install dependencies

Granite utils provides some helpful functions for recipes.

In [2]:
! pip install git+https://github.com/ibm-granite-community/utils \
    transformers \
    langchain_community \
    langchain_huggingface \
    langchain_ollama \
    langchain_milvus \
    replicate \
    wget \
    requests 

Collecting git+https://github.com/ibm-granite-community/utils
  Cloning https://github.com/ibm-granite-community/utils to /tmp/pip-req-build-kn3_156t
  Running command git clone --filter=blob:none --quiet https://github.com/ibm-granite-community/utils /tmp/pip-req-build-kn3_156t
  Resolved https://github.com/ibm-granite-community/utils to commit da3c800822615230c65b4d4cdee3bc7e48cbfa60
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: ibm-granite-community-utils
  Building wheel for ibm-granite-community-utils (pyproject.toml) ... [?25ldone
[?25h  Created wheel for ibm-granite-community-utils: filename=ibm_granite_community_utils-0.1.dev81-py3-none-any.whl size=12904 sha256=97cebadc8a0e4028d76e009719ba55609949fa18adbb03d16a4f6afe0a710103
  Stored in directory: /tmp/pip-ephem-wheel-cache-l1ohflyg/wheels/e2/74/0e/e7dc80cad1c61a0c57be

### Serving the Granite AI model


This notebook requires IBM Granite models to be served by an AI model runtime so that the models can be invoked or called. This notebook can use a locally accessible [Ollama](https://github.com/ollama/ollama) server to serve the models, or the [Replicate](https://replicate.com) cloud service.

During the pre-work, you may have either started a local Ollama server on your computer, or setup Replicate access and obtained an [API token](https://replicate.com/account/api-tokens).

## Selecting System Components

### Choose your Embeddings Model

Specify the model to use for generating embedding vectors from text.

In [4]:
from langchain_huggingface import HuggingFaceEmbeddings
from transformers import AutoTokenizer

embeddings_model_path = "ibm-granite/granite-embedding-30m-english"
embeddings_model = HuggingFaceEmbeddings(
    model_name=embeddings_model_path,
)
embeddings_tokenizer = AutoTokenizer.from_pretrained(embeddings_model_path)

### Choose your Vector Database

Specify the database to use for storing and retrieving embedding vectors.

In [5]:
from langchain_milvus import Milvus
import tempfile

db_file = tempfile.NamedTemporaryFile(prefix="milvus_", suffix=".db", delete=False).name
print(f"The vector database will be saved to {db_file}")

vector_db = Milvus(
    embedding_function=embeddings_model,
    connection_args={"uri": db_file},
    auto_id=True,
    index_params={"index_type": "AUTOINDEX"},
)

The vector database will be saved to /tmp/milvus_jo5azdsq.db


  from pkg_resources import DistributionNotFound, get_distribution
2025-07-27 16:31:29,212 [DEBUG][_create_connection]: Created new connection using: 50ee2b6865a24602bc6bf11a4512e6b3 (async_milvus_client.py:599)


## Select your model

Select a Granite model to use. Here we use a Langchain client to connect to the model. If there is a locally accessible Ollama server, we use an Ollama client to access the model. Otherwise, we use a Replicate client to access the model.

To use Replicate, please refer to [workshop](https://ibm.github.io/granite-workshop/).

In [6]:
import os
from langchain_ollama.llms import OllamaLLM

model_path = "ibm-granite/granite-3.3-8b-instruct"
model = OllamaLLM(
    model="granite3.3:8b",
    num_ctx=65536, # 64K context window
)
model = model.bind(raw=True) # Client side controls prompt

tokenizer = AutoTokenizer.from_pretrained(model_path)

## Building the Vector Database

Now we will input the Model name and corresponding github address.

### Download the document

Here we you can use Mistral-7B for testin, "mistralai/Mistral-7B-Instruct-v0.2".

In [8]:
import requests
import json

model_name = input("model: ").strip()
url = f"https://huggingface.co/api/models/{model_name}"

response = requests.get(url)
print("Status code:", response.status_code)

if response.status_code == 200:
    data = response.json()
    
    # Convert to pretty JSON string
    metadata_text = json.dumps(data, indent=2)
    
    # Save to a text file
    with open("model_metadata.txt", "w", encoding="utf-8") as f:
        f.write(metadata_text)
    
    print("Metadata saved to model_metadata.txt")
else:
    print("Error fetching model info")

model:  mistralai/Mistral-7B-Instruct-v0.2


Status code: 200
Metadata saved to model_metadata.txt


### Split the document into chunks

Split the document into text segments that can fit into the model's context window.

In [9]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter

loader = TextLoader("model_metadata.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter.from_huggingface_tokenizer(
    tokenizer=embeddings_tokenizer,
    chunk_size=embeddings_tokenizer.max_len_single_sentence,
    chunk_overlap=0,
)
texts = text_splitter.split_documents(documents)
doc_id = 0
for text in texts:
    text.metadata["doc_id"] = (doc_id:=doc_id+1)
print(f"{len(texts)} text document chunks created")

Token indices sequence length is longer than the specified maximum sequence length for this model (3649 > 512). Running this sequence through the model will result in indexing errors


1 text document chunks created


### Populate the vector database

NOTE: Population of the vector database may take over a minute depending on your embedding model and service.

In [10]:
ids = vector_db.add_documents(texts)
print(f"{len(ids)} documents added to the vector database")

1 documents added to the vector database


## Querying and Return

### Search and Retrun

Search the database for similar documents by proximity of the embedded vector in vector space and return the output YAML.

In [29]:
query = """You are a model metadata analyzer and YAML generator.

Your task is to fill in the `{{ }}` sections in the YAML template below based only on the provided model documentation.

### Instructions:

1. If a component is clearly mentioned in the documentation (e.g. model card, training code, datasets), include that component in the `components` section. If it is not mentioned, omit it entirely.
2. For each component:
   - If a license is found, include its name in `license`.
   - If a license path is found, add it in `license_path`. If not, leave it blank.
   - If a component file path or URL is available, put it in `component_path`. 
   - If no specific path is found, default to the Hugging Face model root: `https://huggingface.co/{{ model_name }}`
3. Do **not** fabricate any information. Only use what is explicitly stated in the provided documentation or metadata.
4. Keep the descriptions exactly as written in the template.
5. Use consistent YAML syntax and indentation.

### Template to fill:

framework:
  name: "Model Openness Framework"
  version: "1.0"
  date: "2024-12-15"

release:
  name: "{{ model_name }}"
  version: "{{ version }}"
  date: "{{ release_date }}"
  license:
    distribution:
      name: "{{ distribution_license }}"
      path: "{{ distribution_license_path }}"
    code:
      name: "{{ code_license }}"
      path: "{{ code_license_path }}"
    data:
      name: "{{ data_license }}"
      path: "{{ data_license_path }}"
    document:
      name: "{{ documentation_license }}"
      path: "{{ documentation_license_path }}"
  type: "{{ release_type }}"
  architecture: "{{ architecture }}"
  origin: "{{ origin }}"
  producer: "{{ producer }}"
  contact: "{{ contact_url }}"
  components:
    - name: "Model architecture"
      description: "Well commented code for the model's architecture"
      license: unlicensed
      component_path: "{{ model_code_path }}"

    - name: "Training code"
      description: "Code used for training the model"
      license: "{{ training_license }}"
      license_path: "{{ training_license_path }}"
      component_path: "{{ training_code_path }}"

    - name: "Inference code"
      description: "Code used for running the model to make predictions"
      license: "{{ inference_license }}"
      license_path: "{{ inference_license_path }}"
      component_path: "{{ inference_code_path }}"

    - name: "Datasets"
      description: "Training, validation and testing datasets used for the model"
      license: "{{ dataset_license }}"
      license_path: "{{ dataset_license_path }}"
      component_path: "{{ dataset_path }}"

    - name: "Model card"
      description: "Model details including performance metrics, intended use, and limitations"
      license: "{{ model_card_license }}"
      license_path: "{{ model_card_license_path }}"
      component_path: "{{ model_card_path }}"

    # Include additional components (e.g., Evaluation code, Data card, Research paper, etc.) only if they are mentioned

"""
docs = vector_db.similarity_search(query)
print(f"{len(docs)} documents returned")
for doc in docs:
    print(doc)
    print("=" * 80)  # Separator for clarity

1 documents returned
page_content='{
  "_id": "65770c3426ef61bbf101d4da",
  "id": "mistralai/Mistral-7B-Instruct-v0.2",
  "private": false,
  "pipeline_tag": "text-generation",
  "library_name": "transformers",
  "tags": [
    "transformers",
    "pytorch",
    "safetensors",
    "mistral",
    "text-generation",
    "finetuned",
    "mistral-common",
    "conversational",
    "arxiv:2310.06825",
    "license:apache-2.0",
    "autotrain_compatible",
    "text-generation-inference",
    "region:us"
  ],
  "downloads": 1681429,
  "likes": 2884,
  "modelId": "mistralai/Mistral-7B-Instruct-v0.2",
  "author": "mistralai",
  "sha": "63a8b081895390a26e140280378bc85ec8bce07a",
  "lastModified": "2025-07-24T16:57:21.000Z",
  "gated": "auto",
  "disabled": false,
  "widgetData": [
    {
      "messages": [
        {
          "role": "user",
          "content": "What is your favorite condiment?"
        }
      ]
    }
  ],
  "model-index": null,
  "config": {
    "architectures": [
      "Mist

In [30]:
from ibm_granite_community.notebook_utils import escape_f_string
from langchain.prompts import PromptTemplate
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# Create a Granite prompt for question-answering with the retrieved context
prompt = tokenizer.apply_chat_template(
    conversation=[{
        "role": "user",
        "content": "{input}",
    }],
    documents=[{
        "doc_id": "0",
        "text": "{context}",
    }],
    add_generation_prompt=True,
    tokenize=False,
)
# The Granite prompt can contain JSON strings, so we must escape them
prompt_template = PromptTemplate.from_template(template=escape_f_string(prompt, "input", "context"))

# Create a Granite document prompt template to wrap each retrieved document
document_prompt_template = PromptTemplate.from_template(template="""\
<|end_of_text|>
<|start_of_role|>document {{"document_id": "{doc_id}"}}<|end_of_role|>
{page_content}""")
document_separator=""

# Assemble the retrieval-augmented generation chain
combine_docs_chain = create_stuff_documents_chain(
    llm=model,
    prompt=prompt_template,
    document_prompt=document_prompt_template,
    document_separator=document_separator,
)
rag_chain = create_retrieval_chain(
    retriever=vector_db.as_retriever(),
    combine_docs_chain=combine_docs_chain,
)

In [31]:
output = rag_chain.invoke({"input": query})

print(output['answer'])

framework:
  name: 'Model Openness Framework'
  version: '1.0'
  date: '2024-12-15'
release:
  name: ''
  version: ''
  date: ''
  license:
    distribution:
      name: 'Apache 2.0'
      path: 'license:apache-2.0'
    code:
      name: ''
      path: ''
    data:
      name: ''
      path: ''
    document:
      name: ''
      path: ''
  type: ''
  architecture: 'MistralForCausalLM'
  origin: 'mistralai'
  producer: 'Hugging Face'
  contact: ''
components:
  - name: 'Model architecture'
    description: "Well commented code for the model's architecture"
    license: unlicensed
    component_path: ''

  - name: 'Data preprocessing code'
    description: 'Code for data cleansing, normalization, and augmentation'
    license: ''
    license_path: ''
    component_path: ''

  - name: 'Training code'
    description: 'Code used for training the model'
    license: ''
    license_path: ''
    component_path: ''

  - name: 'Inference code'
    description: 'Code used for running the model t