In [1]:
pip install langchain langchain_openai langchain_community pypdf docarray

You should consider upgrading via the '/Users/khan/Desktop/Gen-AI/local-model/.venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


## Importing Models


These lines import necessary modules and load environment variables from a .env file into the script. This is useful for securely managing sensitive information like API keys.

In [4]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MODEL = "gpt-3.5-turbo"
#MODEL = "mixtral"
#MODEL = "llama2"


AttributeError: 'str' object has no attribute 'invoke'

The script retrieves the OpenAI API key from the environment variables and sets the model to be used. The initial model is set to "gpt-3.5-turbo", but it can be commented out and replaced with "llama2" depending on usage. Chatgpt turbo 3 is used to refrence the output with llama2 model's output.

In [9]:
#Invoking GPT model to test as an example
from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI(api_key=OPENAI_API_KEY, model=MODEL)
model.invoke("Tell a joke?")

AIMessage(content="Why couldn't the bicycle find its way home?\n\nBecause it lost its bearings!", response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 11, 'total_tokens': 27}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-8dad809d-63f5-4e42-a92b-0a9f941c9080-0', usage_metadata={'input_tokens': 11, 'output_tokens': 16, 'total_tokens': 27})

In [11]:
#Invoking llama2 model
#Embedding the question from user
from langchain_community.llms import Ollama
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_community.embeddings import OllamaEmbeddings

if MODEL.startswith("gpt"):
    model = ChatOpenAI(api_key=OPENAI_API_KEY, model=MODEL)
    embeddings = OpenAIEmbeddings()

else:
    model = Ollama(model=MODEL)
    embeddings = OllamaEmbeddings()

model.invoke("Give a joke")


AIMessage(content='Why did the scarecrow win an award? Because he was outstanding in his field!', response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 10, 'total_tokens': 27}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-76d16efe-3c9d-4c63-abf5-4ac25c96aa37-0', usage_metadata={'input_tokens': 10, 'output_tokens': 17, 'total_tokens': 27})

## Output via Chain and Parser

* Now adding a StrOutParser to convert AImessage into a string, in the langchain chain, and pipe the output as input of next component.

This code imports StrOutputParser, which is used to convert the model's output into a string. It then pipes the output of the model through the parser and invokes the chain with the prompt "Tell me a joke?".

In [28]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser

chain.invoke("Tell me a joke?")

"Why don't scientists trust atoms?\n\nBecause they make up everything!"

## Splitting PDF into smaller Chunks

This part imports PyPDFLoader to load and split a PDF file into individual pages. TheAny given pdf file "path/to/pdf" is loaded and split into pages, which are stored in the pages variable.

In [30]:
#loads a PDF file and splits it into pages using LangChain

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("AI-Risks.pdf")
pages = loader.load_and_split()

pages

[Document(metadata={'source': 'AI-Risks.pdf', 'page': 0}, page_content='RISKS FROM AI\nAn Overview of\nCatastrophic AI\nRisks\nArtificial intelligence (AI) has recently seen rapid advancements, raising\nconcerns among experts, policymakers, and world leaders about its potential\nrisks. As with all powerful technologies, advanced AI must be handled with\ngreat responsibility to manage the risks and harness its potential.\nNARRATED RENDITION:\nThe narration covers the full paper, offering more depth than the overview.\n0:000:00\n/ 3:17:51/ 3:17:51\nCareers\nDonateAbout\nusAI\xa0Risk Contact Our work\ue603 Resources\ue6037/17/24, 12:51 PM AI Risks that Could Lead to Catastrophe | CAIS\nhttps://www.safe.ai/ai-risk 1/41'),
 Document(metadata={'source': 'AI-Risks.pdf', 'page': 1}, page_content='Catastrophic AI risks\ncan be grouped under\nfour key categories\nwhich are summarized\nbelow.\nMalicious use\x00 People could intentionally\nharness powerful AIs to cause\nwidespread harm. AI could b

## Simple Prompt Template 

#Creating a prompt template to allow us to ask the model the question from pdf by providing specific context, this prompt Template takes in two parameters, {context}, and  {question}. Here {context} is the documents loaded in the doc loader, which will be stored in local vector store, and then passed into the chain as prompt.


In [31]:
from langchain.prompts import PromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="Here is some context", question="Here is a question"))



Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: Here is some context

Question: Here is a question



In [15]:
chain.input_schema.schema()

{'title': 'ChatOpenAIInput',
 'anyOf': [{'type': 'string'},
  {'$ref': '#/definitions/StringPromptValue'},
  {'$ref': '#/definitions/ChatPromptValueConcrete'},
  {'type': 'array',
   'items': {'anyOf': [{'$ref': '#/definitions/AIMessage'},
     {'$ref': '#/definitions/HumanMessage'},
     {'$ref': '#/definitions/ChatMessage'},
     {'$ref': '#/definitions/SystemMessage'},
     {'$ref': '#/definitions/FunctionMessage'},
     {'$ref': '#/definitions/ToolMessage'}]}}],
 'definitions': {'StringPromptValue': {'title': 'StringPromptValue',
   'description': 'String prompt value.',
   'type': 'object',
   'properties': {'text': {'title': 'Text', 'type': 'string'},
    'type': {'title': 'Type',
     'default': 'StringPromptValue',
     'enum': ['StringPromptValue'],
     'type': 'string'}},
   'required': ['text']},
  'ToolCall': {'title': 'ToolCall',
   'type': 'object',
   'properties': {'name': {'title': 'Name', 'type': 'string'},
    'args': {'title': 'Args', 'type': 'object'},
    'id': {

## Storing & Embedding Pages (Document Chunks) in Vector Store

This {DocarrayInMemorySearch} is a local vector store opposed to pinecone which has a cloud storage. It takes pages from the documents splitter output and embed it based on the embeddings variable that stores embedding instance for respective model.

In [34]:

from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(
    pages, 
    embedding=embeddings
)

## Add Retriver to Chain

This retriver fetches the top 4 most relevant documents from the stored embeddings in DocArrayInMemorySearch

In [35]:
retriever = vectorstore.as_retriever()

retriever.invoke("Machine Learning")

[Document(metadata={'source': 'AI-Risks.pdf', 'page': 22}, page_content="Accidents Are Hard to Avoid\nAccidents in complex systems may be\ninevitable, but we must ensure that\naccidents don't cascade into catastrophes.\nThis is especially difficult for deep learning\nsystems, which are highly challenging to\ninterpret.\nTechnology can advance much faster than\npredicted: in 1901, the Wright brothers\nclaimed that powered flight was fifty years\naway, just two years before they achieved it.\nUnpredictable leaps in AI capabilities, such\nas AlphaGo's triumph over the world’s best\nGo player, and GPT\x004's emergent\ncapabilities, make it difficult to anticipate\nfuture AI risks, let alone control them.\nIdentifying risks tied to new technologies\noften takes years. Chlorofluorocarbons\n\x00CFCs), initially considered safe and used in\naerosol sprays and refrigerants, were later\nfound to deplete the ozone layer. This\nhighlights the need for cautious technology\nrollouts and extended tes

In [37]:
from operator import itemgetter


chain = (
{
    "context": itemgetter("question") 
    
    | retriever,

    "question": itemgetter("question")
}

| prompt | model | parser 
)

chain.invoke({"question": "What is the name of instructor?"})

"I don't know."

In [21]:
#itemgetter functionality

itemgetter("question")({"question": "What is machine learning?"})

'What is machine learning?'

In [52]:
questions = [
    "What are the four key categories of catastrophic AI risks mentioned in the summary?",
    "How could AI be used maliciously to engineer new pandemics?",
    "What measures are suggested to mitigate the risks of malicious use of AI?",
    "What are the potential dangers of an AI race between nations and corporations?",
    "How could AI-driven autonomous weapons escalate conflicts and impact global security?",
    "What organizational risks are associated with the development of advanced AI?",
    "Why is fostering a safety-oriented organizational culture important for AI development?",
    "What are some proposed measures to prevent and manage organizational risks in AI labs?",
    "What are the risks associated with rogue AIs, and how can they deviate from their original goals?",
    "What are the suggestions provided to mitigate the risks posed by rogue AIs?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print()

Question: What are the four key categories of catastrophic AI risks mentioned in the summary?
Answer: The four key categories of catastrophic AI risks mentioned in the summary are malicious use, AI races, organizational risks, and rogue AIs.

Question: How could AI be used maliciously to engineer new pandemics?
Answer: AI could be used maliciously to engineer new pandemics by providing non-experts with access to directions and designs for synthesizing deadly pathogens while evading safeguards. Additionally, AI could assist in protein synthesis and predicting protein structures, allowing for the development of biological agents that are more deadly, transmissible, and resistant to treatments than natural pandemics.

Question: What measures are suggested to mitigate the risks of malicious use of AI?
Answer: To mitigate the risks from malicious use of AI, the following measures are suggested:
1. Implement strict access controls for AIs with capabilities in biological research to prevent r

## Output Stream & Batch Handling

In [48]:
#lists all the answers from the input set of questions
chain.batch([{"question": q} for q in questions])

['The four key categories of catastrophic AI risks mentioned in the summary are malicious use, AI races, organizational risks, and rogue AIs.',
 'AI could be used maliciously to engineer new pandemics by providing non-experts with access to directions and designs for synthesizing deadly pathogens, evading safeguards, and increasing the number of people who can develop biological agents, thus multiplying the risks of an engineered pandemic. Additionally, AI could help discover and unleash novel chemical and biological weapons by providing step-by-step instructions for producing biological and chemical weapons and facilitating malicious use.',
 'The measures suggested to mitigate the risks of malicious use of AI include strict access controls for AIs with capabilities in biological research, removal of biological capabilities from AIs intended for general use, exploring ways to use AI for biosecurity, investing in general biosecurity interventions such as early detection of pathogens thr

In [51]:
for s in chain.stream({"question": "What are the four key categories of catastrophic AI risks mentioned in the summary?"}):
    print(s, end="", flush=True)

The four key categories of catastrophic AI risks mentioned in the summary are malicious use, AI races, organizational risks, and rogue AIs.