# Building a RAG model for retriving information regarding latest cyber attacks

This project leverages NLP and machine learning for intelligent search and conversation. Cohere's Embed API converts text to embeddings, capturing semantic meaning. The Chat API enables natural language interaction. FAISS library efficiently searches embeddings for relevant information. Built on LangChain for scalability and efficiency. Enhances search capabilities, provides intuitive user experience.

### Saving the api key

Add your API Key

In [None]:
import os

os.environ["COHERE_API_KEY"] = ""

### Install dependencies

In [None]:
pip install --quiet langchain_community faiss-cpu tiktoken langchain-cohere

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m42.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m52.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.6/974.6 kB[0m [31m59.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.6/315.6 kB[0m [31m26.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.2/125.2 kB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m173.8/173.8 kB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.2/139.2 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_cohere import CohereEmbeddings
from langchain_community.document_loaders import WebBaseLoader

embd = CohereEmbeddings()

urls = [
    "https://www.fortinet.com/resources/cyberglossary/types-of-cyber-attacks",
    "https://www.crowdstrike.com/cybersecurity-101/cyberattacks/most-common-types-of-cyberattacks/"
]

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=512, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)



### Storing in Vector

In [None]:
from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(
    documents=doc_splits,
    embedding=embd,
)

vectorstore_retriever = vectorstore.as_retriever()

Creating a retiver to retrive information from vector database

In [None]:
from langchain.tools.retriever import create_retriever_tool

vectorstore_search = create_retriever_tool(
    retriever=vectorstore_retriever,
    name="vectorstore_search",
    description="Retrieve relevant info from a vectorstore that contains documents related to Cyber Attacks",
)

## Chat Model

In [None]:
from langchain.agents import AgentExecutor
from langchain_cohere.react_multi_hop.agent import create_cohere_react_agent
from langchain_core.prompts import ChatPromptTemplate

In [None]:
# LLM
from langchain_cohere.chat_models import ChatCohere

chat = ChatCohere(model="command-r-plus", temperature=0.3)

# Preamble
preamble = """
You are an expert who answers the user's question with the most relevant datasource.
You are equipped with an  a special vectorstore of information about types and common Cyber Attacks.
If the query covers the topics of types and common Cyber Attacks use the vectorstore search.
"""

# Prompt
prompt = ChatPromptTemplate.from_template("{input}")

# Create the ReAct agent
agent = create_cohere_react_agent(
    llm=chat,
    tools=[vectorstore_search],
    prompt=prompt,
)

In [None]:
agent_executor = AgentExecutor(
    agent=agent, tools=[vectorstore_search], verbose=False
)

### Chat with the model

In [None]:
response = agent_executor.invoke(
    {
        "input": "Tell me the most common cyber attacks",
        "preamble": preamble,
    }
)
print(response['output'])

The most common types of cyberattacks include:
- Malware
- Denial-of-Service (DoS) Attacks
- Phishing
- Spoofing
- Identity-Based Attacks
- Code Injection Attacks
- Supply Chain Attacks
- Social Engineering Attacks
- Insider Threats
- DNS Tunneling
- IoT-Based Attacks
- AI-Powered Attacks


In [None]:
response = agent_executor.invoke(
    {
        "input": "What are AI powerd attacks",
        "preamble": preamble,
    }
)
print(response['output'])

AI-powered attacks are those that leverage AI and ML technology to access a network or steal sensitive information. Examples include:
- Adversarial AI/ML: Seeks to disrupt the operations of AI and ML systems by manipulating or misleading them, often by introducing inaccuracies in training data
- Dark AI: Specifically engineered to exploit vulnerabilities, often going unnoticed until the damage is done
- Deepfakes: AI-generated forgeries that appear very real and have the potential to reshape public opinion, damage reputations, and sway political landscapes
- AI-generated social engineering: Attackers create fake chatbots or virtual assistants capable of having human-like interactions and engaging in conversations with users to get them to provide sensitive information
