# Homework 8
## Local RAG Chatbot (with Lily) with conversation history
Lily is a cybersecurity assistant. She is a Mistral Fine-tune model with 22,000 hand-crafted cybersecurity and hacking-related data pairs.
![image.png](https://huggingface.co/segolilylabs/Lily-7B-Instruct-v0.2/resolve/main/lily-7b.png)

(image by Bryan Hutchins, created with DALL-E 3)

Install LangChain using:

In [1]:
!pip install langchain

Collecting langchain
  Downloading langchain-0.1.16-py3-none-any.whl.metadata (13 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain)
  Downloading SQLAlchemy-2.0.29-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl.metadata (25 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-community<0.1,>=0.0.32 (from langchain)
  Downloading langchain_community-0.0.34-py3-none-any.whl.metadata (8.5 kB)
Collecting langchain-core<0.2.0,>=0.1.42 (from langchain)
  Downloading langchain_core-0.1.45-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain-text-splitters<0.1,>=0.0.1 (from langchain)
  Downloading langchain_text_splitters-0.0.1-py3-none-any.whl.metadata (2.0 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.49-py3-none-any.whl.m

[Ollama](https://ollama.ai/) allows to run open-source large language models, such as Llama 2, locally. Install olama using:

In [1]:
!curl -fsSL https://ollama.com/install.sh | sh

>>> Downloading ollama...
######################################################################## 100.0%##O#- #                                                                                             27.6%############                                      50.8%######                                      50.9%     51.0%###############     96.7%
>>> Installing ollama to /usr/local/bin...
[sudo] password for mcnlab: 


### LLM Chain

In [1]:
from langchain_community.llms import Ollama
llm = Ollama(model="lily")

In [2]:
llm.invoke("How to test a website I build for security vulnerebilities before going to production?")

"\n\n### Response:\nOh, testing for security vulnerabilities before launching a website into the wild? Definitely an important step! As a cybersecurity professional, I'm more than happy to share some insights with you.\n\nOne of the key methods to test a website for security vulnerabilities is through automated tools and manual testing. It's like playing detective with code! There are various free online web application scanners available that can help identify common vulnerabilities such as SQL injection, cross-site scripting (XSS), and other potential threats. These scanners will provide you with a report detailing any vulnerabilities found on your website.\n\nNow, manual testing is just as crucial. It involves going through the application yourself to uncover any weaknesses that automated tools might miss. This could include checking for proper input validation, ensuring secure session management, and verifying the implementation of encryption protocols. Manual testing allows you to

First, we need to load the data that we want to index. To do this, we will use the WebBaseLoader. This requires installing BeautifulSoup:

In [11]:
!pip install beautifulsoup4



In [3]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.djangoproject.com/en/5.0/topics/security/")

docs = loader.load()

In [4]:
from langchain_community.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings()

We will use a simple local vectorstore, FAISS, for simplicity's sake.

In [13]:
!pip install faiss-cpu



In [5]:
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter


text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

In [6]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

In [7]:
from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

This chain will take in the most recent input (input) and the conversation history (chat_history) and use an LLM to generate a search query.

In [8]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

# First we need a prompt that we can pass into an LLM to generate this search query
prompt = ChatPromptTemplate.from_messages([
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
    ("user", "Given the above conversation, generate a search query to look up to get information relevant to the conversation")
])

retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

In [10]:
from langchain_core.messages import HumanMessage, AIMessage

chat_history = [HumanMessage(content="I should probably store SECRET_KEY of django somewhere safe?"), AIMessage(content="Yes!")]

retriever_chain.invoke({
    "chat_history": chat_history,
    "input": "Tell me how to do it."
})

[Document(page_content='Support Us\n\nSponsor Django\nCorporate membership\nOfficial merchandise store\nBenevity Workplace Giving Program\n\n\n\n\n\n\n\n\nDjango\n\n\n\nHosting by In-kind\n            donors\n\nDesign by Threespot\n& andrevv\n\n© 2005-2024\n         Django Software\n          Foundation and individual contributors. Django is a\n        registered\n          trademark of the Django Software Foundation.', metadata={'source': 'https://docs.djangoproject.com/en/5.0/topics/security/', 'title': 'Security in Django | Django documentation | Django', 'description': '', 'language': 'en'}),
 Document(page_content="One class of attacks can be prevented by always serving user uploaded\ncontent from a distinct top-level or second-level domain. This prevents\nany exploit blocked by same-origin policy protections such as cross\nsite scripting. For example, if your site runs on example.com, you\nwould want to serve uploaded content (the MEDIA_URL setting)\nfrom something like userconte

Now that we have this new retriever, we can create a new chain to continue the conversation with these retrieved documents in mind.

In [11]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer the user's questions based on the below context:\n\n{context}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
])
document_chain = create_stuff_documents_chain(llm, prompt)

retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)

In [15]:
chat_history = [HumanMessage(content="I'm going to deploy my django app on the internet?"), AIMessage(content="OK!")]
response = retrieval_chain.invoke({
    "chat_history": chat_history,
    "input": "Tell me the security checks I need to do before I deploy it online!"
})
print(response['answer'])


AI: Sure, let's go through them step by step. 

Firstly, make sure you have a good handle on your dependencies and their versions. Keeping everything up-to-date is important for maintaining strong defenses against potential security vulnerabilities.

Secondly, let's talk about the use of SSL/HTTPS. This helps protect your data in transit between the user and server. It's a crucial component in maintaining a secure environment for your application.

Next up, we need to handle those pesky host headers. By doing so, we can prevent potential attackers from gaining unauthorized access to our precious app.

Referrer policy is next on our list! This ensures that only trusted parties can access our valuable data, preventing unwanted intruders or mischief-makers from wreaking havoc on our application.

Cross-origin opener policy comes in handy too! By setting this appropriately, we protect ourselves against potential attacks originating from other domains.

Session security is next on our list