## Demo #3 Using LLM to assist with submission grading tasks

This demo shows how you can use LLMs to help perform a grading task with a rubic. The example uses an established SIB data management rubric to assess SIB submissions involving data mangement.

In this demo, we'll set up an army of agents tasked with assessing data management standards for research protocols. We will setup individual agents to evaulate each criteria and a final grader agent to consolidate all the results from each agent to make a recommendation on the submission.The criteria comes from SIB's "Research Protocol Data Management Review Guidelines."

As with all AI-assisted tasks, a human being should still have final say. Though this system may be implemented before the human reviewer step ensure higher quality submissions being reviewed and potentially reducing the number of rounds of review thereby speeding the process.

Set up your notebook as before...

In [4]:
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from crewai import Agent, Task, Crew, Process
from crewai_tools import DOCXSearchTool
from dotenv import load_dotenv
import os

In [11]:
os.environ["AZURE_OPENAI_VERSION"] = os.environ.get("AZURE_OPENAI_VERSION")
os.environ["AZURE_OPENAI_DEPLOYMENT"] = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
os.environ["AZURE_OPENAI_ENDPOINT"] = os.environ.get("AZURE_OPENAI_ENDPOINT")
os.environ["AZURE_OPENAI_API_KEY"] = os.environ.get("AZURE_OPENAI_KEY")

In [3]:
azure_llm = AzureChatOpenAI(
    api_version=os.environ["AZURE_OPENAI_VERSION"],
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"]
)

Customize the tool's config to use Azure models then initialize the DOCXSearchTool...\
\
<i>DOCXSearchTool is a RAG tool designed for semantic searching within DOCX documents. It enables users to effectively search and extract relevant information from DOCX files using query-based searches. This tool is invaluable for data analysis, information management, and research tasks, streamlining the process of finding specific information within large document collections (https://docs.crewai.com/tools/DOCXSearchTool/)<i>

In [20]:
submission_tool = DOCXSearchTool(
    config=dict(
        llm=dict(
            provider="azure_openai", # or google, openai, anthropic, llama2, ...
            config=dict(
                model="gpt-4-32k",
                # temperature=0.5,
                # top_p=1,
                # stream=true,
            ),
        ),
        embedder=dict(
            provider="azure_openai", # or openai, ollama, ...
            config=dict(
                model="text-embedding-ada-002",
                #task_type="retrieval_document",
                # title="Embeddings",
            ),
        ),
    ),
    docx='resources/Nigeria HIV CBS Protocol_v3.4_27062022_PHIS3_clean.docx'
)

#submission = submission_tool(docx='resources/Nigeria HIV CBS Protocol_v3.4_27062022_PHIS3_clean.docx')

SchemaError: Key 'llm' error:
AzureChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7f66a7b0ea40>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7f66a7b40c10>, openai_api_key=SecretStr('**********'), openai_proxy='', azure_endpoint='https://edav-openai-pilot-east2.openai.azure.com/', deployment_name='edav-chatapp-share-gpt4-32k-tpm25kplus-v0613-dfilter', openai_api_version='2023-05-15', openai_api_type='azure') should be instance of 'dict'

Now we'll set up the agents...

In [5]:
policy_and_responsibilities_agent = Agent(
  role='Policies and reponsibilities grader',
  goal='Extract page number and text supporting a policy.',
  backstory="""You are knowledgeable about the development of protocols.
  You will check if the protocol has written policies, processes, and procedures for security breaches and training for all staff members with access to confidential data.
  You will look for data security policies define the roles and access levels of all persons with authorized access to confidential public health data and the procedures for accessing data securely.
  Protocol has a data sharing plan that includes intent and scope, methods that will be used to share data, steps that will be taken to ensure confidentiality and security of shared data, confidentiality agreements, and how shared data will be used, and published. In addition, protocol ensures that public health information, if released, is only for purposes related to public health, except where required by law. Protocol describes standards for any sharing data using appropriate secure methods.
  Protocol describes who owns the data collected and how data collected will be retained or destroyed.
  """,
  llm = azure_llm,
  verbose=True,
  allow_delegation=False,
  tools=[submission_tool]
)

NameError: name 'submission_tool' is not defined

And add their tasks...

In [None]:
task1 = Task(
  description="""Check if there written policies and procedures to protect the privacy and security of personally identifiable data.""",
  expected_output="Page number and quotation(s). If none found, return N/A for both page number and quotation",
  agent=policy_and_responsibilities_agent
)

task2 = Task(
  description="""Check if policies and procedures be reviewed by all staff members with authorized access to confidential individual-level data.""",
  expected_output="Page number and quotation(s). If none found, return N/A for both page number and quotation",
  agent=policy_and_responsibilities_agent
)

Next will instantiate the crew of agents to evaulate the submission...

In [None]:
crew = Crew(
  agents=[policy_and_responsibilities_agent, policy_and_responsibilities_agent],
  tasks=[task1, task2],
  verbose=2, # You can set it to 1 or 2 to different logging levels
)

Finally you can make your crew work!

In [None]:
result = crew.kickoff()

print("######################")
print(result)