# AI Generation of the MOT Model YAML Files 
*With IBM Granite*

## In this notebook
This notebook contains instructions for performing YAML Model generation via custom Granite flow with Ollama.
This notebook is heavily based by the official [IBM Granite workshop](https://ibm.github.io/granite-workshop/), for the detailed hardware setup please refer to the workshop.

## Setting up the environment

Ensure you are running python 3.10, 3.11, or 3.12 in a freshly-created virtual environment.

In [32]:
import sys
assert sys.version_info >= (3, 10) and sys.version_info < (3, 13), "Use Python 3.10, 3.11, or 3.12 to run this notebook."

### Install dependencies

Granite utils provides some helpful functions for recipes.

In [None]:
! pip install git+https://github.com/ibm-granite-community/utils \
    transformers \
    langchain_community \
    langchain_huggingface \
    langchain_ollama \
    langchain_milvus \
    replicate \
    gitpython \
    requests \
    pypdf

### Serving the Granite AI model


This notebook requires IBM Granite models to be served by an AI model runtime so that the models can be invoked or called. This notebook can use a locally accessible [Ollama](https://github.com/ollama/ollama) server to serve the models, or the [Replicate](https://replicate.com) cloud service.

During the pre-work, you may have either started a local Ollama server on your computer, or setup Replicate access and obtained an [API token](https://replicate.com/account/api-tokens).

## Selecting System Components

### Choose your Embeddings Model

Specify the model to use for generating embedding vectors from text.

In [34]:
from langchain_huggingface import HuggingFaceEmbeddings
from transformers import AutoTokenizer

embeddings_model_path = "ibm-granite/granite-embedding-30m-english"
embeddings_model = HuggingFaceEmbeddings(
    model_name=embeddings_model_path,
)
embeddings_tokenizer = AutoTokenizer.from_pretrained(embeddings_model_path)

### Choose your Vector Database

Specify the database to use for storing and retrieving embedding vectors.

In [None]:
from langchain_milvus import Milvus
import tempfile

db_file = tempfile.NamedTemporaryFile(prefix="milvus_", suffix=".db", delete=False).name
print(f"The vector database will be saved to {db_file}")

vector_db = Milvus(
    embedding_function=embeddings_model,
    connection_args={"uri": db_file},
    auto_id=True,
    index_params={"index_type": "AUTOINDEX"},
)

## Select your model

Select a Granite model to use. Here we use a Langchain client to connect to the model. If there is a locally accessible Ollama server, we use an Ollama client to access the model. Otherwise, we use a Replicate client to access the model.

To use Replicate, please refer to [workshop](https://ibm.github.io/granite-workshop/).

In [36]:
import os
from langchain_ollama.llms import OllamaLLM

model_path = "ibm-granite/granite-3.3-8b-instruct"
model = OllamaLLM(
    model="granite3.3:8b",
    num_ctx=65536, # 64K context window
)
model = model.bind(raw=True) # Client side controls prompt

tokenizer = AutoTokenizer.from_pretrained(model_path)

## Building the Vector Database

Now we will input the Model name and corresponding github address.

### Download the document

Here we can use any model github repo as template (plan to add model name search with web search api instead of github).

In [None]:
import os
import re
import requests
import tempfile
from git import Repo
from pypdf import PdfReader

def clone_repo(repo_url, dest_dir=os.getcwd()):
    print(f"Cloning {repo_url} into {dest_dir}")
    Repo.clone_from(repo_url, dest_dir)

def search_files_for_info(dest_dir=os.getcwd(), extensions=(".md", ".txt", ".pdf")):
    info = []
    for subdir, _, files in os.walk(dest_dir):
        for file in files:
            if file.endswith(extensions):
                filepath = os.path.join(subdir, file)
                if(file.endswith('.pdf')):
                    try:
                        reader = PdfReader(file)
                        content = "\n".join(page.extract_text() for page in reader.pages if page.extract_text())
                        info.append((filepath, content))
                    except Exception as e:
                        print(f"Failed to read {filepath}: {e}")
                elif(file.endswith('.md') or file.endswith('.txt')):
                    try:
                        with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
                            content = f.read()
                            info.append((filepath, content))
                    except Exception as e:
                        print(f"Failed to read {filepath}: {e}")
    return info
    
'''experimental features for advanced search such as multi-repository
def extract_github_links(text):
    return re.findall(r'https://github\.com/[^\s)]+', text

def fetch_and_store_links(links, base_dir):
    for i, link in enumerate(set(links)):
        name = f"linked_repo_{i}"
        dest = os.path.join(base_dir, name)
        print(f"Cloning linked repo: {link}")
        try:
            Repo.clone_from(link, dest)
        except Exception as e:
            print(f"Failed to clone {link}: {e}") '''

def search_github_repo(repo_url):
    with tempfile.TemporaryDirectory() as tmpdir:
        clone_repo(repo_url, tmpdir)
        all_info = search_files_for_info(tmpdir)
        all_text = "\n\n".join(f"File: {filepath.replace(tmpdir,repo_url)}\n\n{content}" for filepath, content in all_info)

        
        # external_links = extract_github_links(all_text)
        # fetch_and_store_links(external_links, tmpdir)

        # Return full aggregated text content for Granite/Ollama input
        return all_text

# === Example usage ===
if __name__ == "__main__":
    github_repo = input("Enter GitHub repo URL: ").strip()
    all_text = search_github_repo(github_repo)

    with open("repo_summary_input.txt", "w", encoding="utf-8") as f:
        f.write(all_text)

    print("\nAll repo content written to 'repo_summary_input.txt'")
    print("Feed it into Granite/Ollama to generate your YAML.")

  return re.findall(r'https://github\.com/[^\s)]+', text


Cloning https://github.com/dpfried/incoder into /var/folders/r5/n4xqkxwd157fww2y2qptk5xr0000gn/T/tmpbpqzb0a8

All repo content written to 'repo_summary_input.txt'
Feed it into Granite/Ollama to generate your YAML.


### Split the document into chunks

Split the document into text segments that can fit into the model's context window.

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter

loader = TextLoader("repo_summary_input.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter.from_huggingface_tokenizer(
    tokenizer=embeddings_tokenizer,
    chunk_size=embeddings_tokenizer.max_len_single_sentence,
    chunk_overlap=0,
)
texts = text_splitter.split_documents(documents)
doc_id = 0
for text in texts:
    text.metadata["doc_id"] = (doc_id:=doc_id+1)
print(f"{len(texts)} text document chunks created")

### Populate the vector database

NOTE: Population of the vector database may take over a minute depending on your embedding model and service.

In [None]:
ids = vector_db.add_documents(texts)
print(f"{len(ids)} documents added to the vector database")

## Querying the Vector Database

### Conduct a similarity search

Search the database for similar documents by proximity of the embedded vector in vector space.

In [None]:
query = """
Given the following information about a model, fill in the YAML template with proper values.  
If any license or path is not available, leave it blank or omit the path.  
Ensure uniform description for each component as in the template.

For components, fill license and paths if the components exist; otherwise if components do not exist omitt the sectiton,
if license or paths is not present leave unlicensed or blank respectively.

Make sure only replace '' for the model information, do not change the description.
Only use the provided documents and do not fake it is important documents.

Use the YAML format as:
framework:
  name: 'Model Openness Framework'
  version: '1.0'
  date: '2024-12-15'
release:
  name: ''
  version: ''
  date: ''
  license:
    distribution:
      name: ''
      path: ''
    code:
      name: ''
      path: ''
    data:
      name: ''
      path: ''
    document:
      name: ''
      path: ''
  type: ''
  architecture: ''
  origin: ''
  producer: ''
  contact: ''
  components:
    - name: 'Model architecture'
      description: "Well commented code for the model's architecture"
      license: unlicensed
      component_path: ''
      
    - name: 'Data preprocessing code'
      description: 'Code for data cleansing, normalization, and augmentation'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Training code'
      description: 'Code used for training the model'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Inference code'
      description: 'Code used for running the model to make predictions'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Evaluation code'
      description: 'Code used for evaluating the model'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Supporting libraries and tools'
      description: "Libraries and tools used in the model's development"
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Model parameters (Final)'
      description: 'Trained model parameters, weights and biases'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Model parameters (Intermediate)'
      description: 'Trained model parameters, weights and biases'
      license: ''
      license_path: ''
      component_path: ''

    - name: Datasets
      description: 'Training, validation and testing datasets used for the model'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Evaluation data'
      description: 'Data used for evaluating the model'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Model metadata'
      description: 'Any model metadata including training configuration and optimizer states'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Sample model outputs'
      description: 'Examples of outputs generated by the model'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Model card'
      description: 'Model details including performance metrics, intended use, and limitations'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Data card'
      description: 'Documentation for datasets including source, characteristics, and preprocessing details'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Technical report'
      description: 'Technical report detailing capabilities and usage instructions for the model'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Research paper'
      description: 'Research paper detailing the development and capabilities of the model'
      license: ''
      license_path: ''
      component_path: ''

    - name: 'Evaluation results'
      description: 'The results from evaluating the model'
      license: ''
      license_path: ''
      component_path: ''
"""
docs = vector_db.similarity_search(query)
print(f"{len(docs)} documents returned")
for doc in docs:
    print(doc)
    print("=" * 80)  # Separator for clarity

## Answering Questions

### Automate the RAG pipeline

Build a RAG chain with the model and the document retriever.

In [44]:
from ibm_granite_community.notebook_utils import escape_f_string
from langchain.prompts import PromptTemplate
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# Create a Granite prompt for question-answering with the retrieved context
prompt = tokenizer.apply_chat_template(
    conversation=[{
        "role": "user",
        "content": "{input}",
    }],
    documents=[{
        "doc_id": "0",
        "text": "{context}",
    }],
    add_generation_prompt=True,
    tokenize=False,
)
# The Granite prompt can contain JSON strings, so we must escape them
prompt_template = PromptTemplate.from_template(template=escape_f_string(prompt, "input", "context"))

# Create a Granite document prompt template to wrap each retrieved document
document_prompt_template = PromptTemplate.from_template(template="""\
<|end_of_text|>
<|start_of_role|>document {{"document_id": "{doc_id}"}}<|end_of_role|>
{page_content}""")
document_separator=""

# Assemble the retrieval-augmented generation chain
combine_docs_chain = create_stuff_documents_chain(
    llm=model,
    prompt=prompt_template,
    document_prompt=document_prompt_template,
    document_separator=document_separator,
)
rag_chain = create_retrieval_chain(
    retriever=vector_db.as_retriever(),
    combine_docs_chain=combine_docs_chain,
)

### Generate a retrieval-augmented response to a question

Use the RAG chain to process a question. The document chunks relevant to that question are retrieved and used as context.

In [None]:
output = rag_chain.invoke({"input": query})

print(output['answer'])