Notebook to experiment with the following end to end process: from dataset+task in NL, to typology based diagram, to design recommendation.

In [27]:
from langchain_community.llms import Ollama
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
from langchain.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.llms import Ollama
from langchain.text_splitter import TokenTextSplitter
import glob
from pprint import pprint

from langchain.document_loaders import NotionDirectoryLoader
from langchain.text_splitter import MarkdownHeaderTextSplitter
from langchain_community.document_loaders import UnstructuredMarkdownLoader
from langchain_core.documents import Document

from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser

from langchain.schema.runnable import RunnableMap
from langchain_core.prompts import PromptTemplate

In [65]:
model = Ollama(model="llama3:8b", temperature=0)

## Determine the end goal/decision

In [66]:
documents = []
documents.extend(PyPDFLoader("docs/dm.pdf").load())

In [67]:
text_splitter = TokenTextSplitter(chunk_size=100, chunk_overlap=25)
docs = text_splitter.split_documents(documents)
len(docs)

260

In [68]:
model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": True}
hf = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

persist_directory = 'docs/chroma/'
!rm -rf docs/chroma  # remove old database files if any

vectordb = Chroma.from_documents( # had an error previuously, downgraded to chromadb version 0.4.3 using command: pip install chromadb==0.4.3. See https://github.com/zylon-ai/private-gpt/issues/1012
    documents=docs,
    embedding=hf,
    persist_directory=persist_directory,
)
retriever = vectordb.as_retriever()

print(vectordb._collection.count())

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


260


In [69]:
template = """Imagine you are a visualization designer who wants to understand what are the decisions an expert in embryology and in vitro fertilization would make when designing a visualization.
You are tasked with taking the dataset and the task the expert is trying to accomplish and translating the task into one of the decision making tasks that appear in Typology of
Decision-Making Tasks for Visualization paper.

The three possible decision tasks are: CHOOSE, ACTIVATE and CREATE. Give a brief explanation of the decision making task you chose and why you think it is the most appropriate for the task at hand.
When providing reasons, give explanations that relate to the definitions of the three tasks as described in the Typology of Decision-Making Tasks for Visualization paper.
The dataset description, task description, and typology of decision making tasks paper are given below. 

Relevant context from the Typology of Decision-Making Tasks for Visualization paper: {context}
Data Description: {data_description}
Task Description: {task_description}
"""
# prompt = ChatPromptTemplate.from_template(template)
prompt_template = PromptTemplate.from_template(template)

In [70]:
dm_task_definitions_question = "What are the three decision making tasks in my typology?"
retriever.get_relevant_documents(dm_task_definitions_question)

[Document(page_content=' a typology for decision-making tasks in visualiza-\ntion, addressing the limitations of existing taxonomies. Built upon prior\nresearch and informed by design goals derived from a thorough liter-\nature review, the typology comprises three decision tasks: CHOOSE,\nACTIV ATE, and CREATE. These tasks allow for the representation\nof complex decision-making structures, as they can be composed or\ndecomposed into other tasks. The typology demonstrates completeness,', metadata={'page': 8, 'source': 'docs/dm.pdf'}),
 Document(page_content=' real-world visualization\nsystems.\n4.1 Decision-Making Tasks\nOur typology consists of three tasks derived from the scientific\nliterature [27, 28] : CHOOSE, ACTIV ATE, and CREATE. Each task is\na function that represents a specific and distinct decision problem. The\ntype of the inputs to these functions does not change the core process\nof the decision task. Some of the key differences between the tasks are\nthe unique transfor

In [71]:
nl_to_typology_goal_chain = RunnableMap({
    "context": lambda x: retriever.get_relevant_documents(dm_task_definitions_question),
    "data_description": lambda x: x["data"],
    "task_description": lambda x: x["task"]
}) | prompt_template | model

In [73]:
nl_to_typology_goal_chain_output =  nl_to_typology_goal_chain.invoke({"data": "tabular data where each row is a patient, and the associated levels of age, bmi amh and afc at the time of the Egg Retrieval Procedure.",
              "task": "understand how the medication dose varies with the following patient parameters: age, bmi amh and afc then recommend a dosage for the current patient."
              })
print(nl_to_typology_goal_chain_output)

After analyzing the dataset and task description, I believe that the most appropriate decision-making task is ACTIVATE.

The ACTIVATE task represents a decision where options are evaluated, and only those that meet or exceed a threshold are returned. In this case, the expert in embryology and in vitro fertilization needs to evaluate the medication dose based on various patient parameters (age, BMI, AMH, and AFC) and recommend a dosage for the current patient.

The task requires evaluating options (different medication doses) against specific criteria (patient parameters), and only those that meet or exceed a certain threshold (optimal dosage) are returned. This process involves filtering out suboptimal options based on the evaluation of the patient's characteristics, which aligns with the ACTIVATE task definition.

In contrast, the CHOOSE task would require selecting one option from a set of available options, which might not accurately capture the complexity of evaluating multiple pat

## Expand the goal of the 