# The OpenSource TestPlan Playbook
_[Based on gkamradt's Ask A Book Questions](https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/Ask%20A%20Book%20Questions.ipynb)_

This Jupyter notebook is an example on how we can load data into a vector database to provide context to a large language model.
LLMs are hard to avoid these days and they can be really useful tools to get your answers when discovering a new library.
The idea here was to create a POC/play around with OpenAI for the opensource TestPlan documentation, so we can chat with it. LLMs can be used to summarize long texts, but there is a upper limit on how long the input text can be. 

**How does it work?**

Fortunately, we can "expand" its knowledge. We do not need to train a LLM to get specific context. We can use OpenAI Embeddings to create vectors for our data, which the gpt api can understand and search in. Simply, we convert our data and query into vectors of 1536. Vector databases provide a way to do a similarity search based on these values. We can get the most similar ones to the query, pass that to the GPT API as context, which it should use to answer our question.

**Want to try it out?**

- You will need an OpenAI API key. Make sure to set a monthly limit on your usage, so you can still pay your rent. Jokes aside. Embedding the whole documentation cost less than a dollar. BUT be sure to set that usage limit. The text completion costs significantly more though
- You will need a Pinecone account. There you want to create a index called 'testplan' with the size of 1536. This is the size that OpenAI uses.


_Make sure to not share your API Keys with anyone_

In [29]:
from langchain.document_loaders import UnstructuredURLLoader
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse, urldefrag
from langchain.document_loaders import UnstructuredURLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings


##### Scraping the URLs from the OpenSource TetsPlan Documentation 

In [None]:
def get_all_links(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    for link in soup.find_all("a"):
        href = link.get("href")
        if href:
            yield urljoin(url, href)

def is_download_link(url, base_url):
    parsed = urlparse(url)
    if parsed.path.startswith("/_downloads/") or parsed.path.endswith(".py"):
        return True
    return False

def main(base_url):
    visited_urls = set()
    urls_to_visit = {base_url}
    all_urls = set()

    while urls_to_visit:
        current_url = urls_to_visit.pop()
        if current_url in visited_urls:
            continue

        print(f"Visiting: {current_url}")
        visited_urls.add(current_url)

        for link in get_all_links(current_url):
            link_no_fragment, _ = urldefrag(link)
            if link_no_fragment.startswith(base_url) and not is_download_link(link_no_fragment, base_url) and link_no_fragment.endswith('.html'):
                all_urls.add(link_no_fragment)
                urls_to_visit.add(link_no_fragment)

    print("\nAll URLs:")
    for url in all_urls:
        print(url)
    
    return all_urls

base_url = "https://testplan.readthedocs.io/en/latest/"
all_urls= main(base_url)

##### Scrape the data and store it in raw documents from the URLs

In [4]:
loader = UnstructuredURLLoader(urls=all_urls)
raw_documents = loader.load()

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

##### Split the data into chunks

In [8]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(raw_documents)

In [35]:
OPENAI_API_KEY = '...'
PINECONE_API_KEY = '...'
PINECONE_API_ENV = '...'

##### Create the OpenAIEmbedding so that we can create the Vectors for each text chunk

In [36]:
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

In [15]:
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  # next to api key in console
)
index_name = "testplan"

##### Embed the chunks and push them into the Pinecone database

In [16]:
docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

##### Ask the question

In [17]:
query = 'How to create a new driver for a custom application?'
docs = docsearch.similarity_search(query, include_metadata=True)

In [18]:
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

In [19]:
llm = OpenAI(temperature=0,openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm, chain_type='stuff')

##### Run the chain

In [28]:
chain.run(input_documents=docs, question=query)

 You can run tests interactively by running your testplan with the "-i" flag. This will start up a web UI that you can access at http://localhost:39514/interactive/. This will allow you to run individual testcases and testsuites on-demand and control test environments on-demand.


In [31]:
query = 'What python versions are supported by TestPlan?'
docs = docsearch.similarity_search(query, include_metadata=True)
chain.run(input_documents=docs, question=query)

' Testplan is tested to work with Python 3.7 and 3.8, 3.10 and 3.11 so we recommend choosing one of those.'

In [32]:
query = 'How can you make parameterized tests?'
docs = docsearch.similarity_search(query, include_metadata=True)
chain.run(input_documents=docs, question=query)

' You can make parameterized tests by passing a dictionary of lists/tuples or list of dictionaries/tuples as parameters value to the testcase method declaration. See the downloadable example for more information.'