### ASSITIVE SEARCH IMPLEMENTATION ON  https://apidocs.document360.com/apidoc

- The contents scraped from the webpages are stored in csv and its loaded using csvloader 

- OpenAIEmbedding modelis used to convert text chunks into embeddings. These embeddings are then  stored in a Faiss vector store for fast and efficient similarity searches, ensuring data durability and accessibility via langchain framework



In [1]:
import pandas as pd
import os
import re
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA



In [2]:
doc360_api_docs = pd.read_csv("./data/document360_api_docs.csv")
doc360_api_docs.head()

Unnamed: 0.1,Unnamed: 0,url,Title,http_Request,API_endpoint,Code
0,0,https://apidocs.document360.com/apidocs/get-ar...,Gets an article,GET,/v2/Articles/{articleId}/{langCode},curl --request GET \\n --url 'https://apihub....
1,1,https://apidocs.document360.com/apidocs/gets-a...,Gets all version languages in the project,GET,/v2/Language/{projectVersionId},curl --request GET \\n --url https://apihub.d...
2,2,https://apidocs.document360.com/apidocs/get-st...,Get the status of import,GET,/v2/Project/Import/{importId},curl --request GET \\n --url https://apihub.d...
3,3,https://apidocs.document360.com/apidocs/import...,Import documentation,POST,/v2/Project/Import,"{\n ""source_documentation_url"": ""string"",\n ""p..."
4,4,https://apidocs.document360.com/apidocs/export...,Start a new export,POST,/v2/Project/Export,"\n{\n ""entity"": ""string"",\n ""version_id"": [\n ..."


#### SIMILARITY SEARCH USING FAISS IN LANGCHAIN

In [3]:
#loading the file which has the scraped content
loader = CSVLoader(file_path="./data/document360_api_docs.csv")
data = loader.load()
print(data[0])

page_content=": 0\nurl: https://apidocs.document360.com/apidocs/get-article\nTitle: Gets an article\nhttp_Request: GET\nAPI_endpoint: /v2/Articles/{articleId}/{langCode}\nCode: curl --request GET \\\n  --url 'https://apihub.document360.io/v2/Articles/%7BarticleId%7D/en?isForDisplay=False&isPublished=False&appendSASToken=True' \\\n  --header 'accept: application/json' \\\n  --header 'api_token: REPLACE_KEY_VALUE'" metadata={'source': './data/document360_api_docs.csv', 'row': 0}


In [4]:
#Embedding model used
embeddings = OpenAIEmbeddings()

#Indexing and storing the data into FAISS
db = FAISS.from_documents(data, embeddings)

#saving the data in local in which we can use it later 
db.save_local("faiss_index")

In [5]:
#function for validating the user input (API endpoints, type of request)

def get_valid_api_endpoint():
    while True:
        user_input = input("Enter the API endpoint (e.g., /v2/Articles/{articleId}/{langCode}): ")
        # Define a regular expression pattern for the generic format
        pattern = r'^/v2/[\w/{}]+/?$'
        # Check if the user input matches the pattern
        if re.match(pattern, user_input):
            return user_input
        else:
            print("Invalid API endpoint format. Please provide a valid endpoint.")

def get_http_request():
    while True:
        user_input = input("Enter type of http request (GET/POST/PUT/DELETE):")
        #check if the user input is one among the request type
        if user_input.lower() in ['get','post','put','delete']:
            return user_input
        else:
            print('Invalid request. Please provide a valid request')

        


In [6]:
# function call to Validate the user input
api_endpoint = get_valid_api_endpoint()
http_request = get_http_request()

#search query
query = f"Provide code samples for {http_request} {api_endpoint}"

#Finding the similar match for the query
results = db.similarity_search(query)


Enter the API endpoint (e.g., /v2/Articles/{articleId}/{langCode}):  from langchain.chains import RetrievalQA`


Invalid API endpoint format. Please provide a valid endpoint.


Enter the API endpoint (e.g., /v2/Articles/{articleId}/{langCode}):  /v2/Articles/{articleId}/{langCode}
Enter type of http request (GET/POST/PUT/DELETE): get


#### LOADING THE DATABASE FROM DISK AND CREATING THE CHAIN

In [8]:
#Initializing the QA chain
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=db.as_retriever())

In [9]:
# running the query to fetch the response
qa.run(query)


" The code sample for get /v2/Articles/{articleId}/{langCode} is:\n\ncurl --request GET \\\n  --url 'https://apihub.document360.io/v2/Articles/%7BarticleId%7D/en?isForDisplay=False&isPublished=False&appendSASToken=True' \\\n  --header 'accept: application/json' \\\n  --header 'api_token: REPLACE_KEY_VALUE'"