<a href="https://colab.research.google.com/github/jatinmayekar/ai_project_1/blob/main/TextQuery2PDFs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

README:
This Google Colab notebook is designed to:
- Load the OpenAI API key from a file stored in Google Drive.
- Set up the necessary environment by installing required libraries.
- Convert a PDF file to text and clean it.
- Query the cleaned text using a user's question.
- Refine the answer using OpenAI's GPT model.

References:
- https://github.com/openai/tiktoken
- https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
- https://pypi.org/search/?q=tiktoken


In [1]:
# --- INSTALLATION ---
!pip install langchain unstructured[all-docs] openai chromadb tiktoken pypdf2

Collecting langchain
  Downloading langchain-0.0.309-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unstructured[all-docs]
  Downloading unstructured-0.10.19-py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m87.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting chromadb
  Downloading chromadb-0.4.13-py3-none-any.whl (437 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m437.8/437.8 kB[0m [31m40.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m

In [2]:
# --- IMPORTS ---
import os
import PyPDF2
import re
from google.colab import drive, files
from unstructured.partition.auto import partition
from unstructured.documents.elements import NarrativeText
from langchain.text_splitter import NLTKTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
import openai

In [3]:
# --- SETUP ---

# Mount Google Drive to access the OpenAI API key
drive.mount('/content/drive')

# Load the OpenAI API key from Google Drive
file_path = '/content/drive/My Drive/ai_portfolio/openai_key.txt'
with open(file_path, 'r') as file:
    api_key = file.readline().strip()
os.environ["OPENAI_API_KEY"] = api_key
print("API Key loaded successfully!")

Mounted at /content/drive
API Key loaded successfully!


In [4]:
# Convert PDF to text
def pdf_to_text(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = ''.join([reader.pages[i].extract_text() for i in range(len(reader.pages))])
    return text

# Clean the extracted text
def clean_text(text):
    # Remove any set of characters "XXX"
    text = text.replace("XXX", "")

    # Remove trailing whitespaces and filter lines
    cleaned_lines = []
    for line in text.splitlines():
        line = line.strip()  # Remove trailing whitespaces

        # Skip lines with non-standard characters
        if re.search(r"[^a-zA-Z0-9.,!?;:'\" \-()]", line):
            continue

        # Skip lines that are blank or only have periods
        if not line or line == ".":
            continue

        # If a line ends with a hyphen, remove that hyphen
        if line.endswith('-'):
            line = line[:-1]

        cleaned_lines.append(line)

    return '\n'.join(cleaned_lines)

In [5]:
pdf_path = "/content/3HAC052982 PS IRB 14000-en.pdf"
text = pdf_to_text(pdf_path)
cleaned_text = clean_text(text)

In [6]:
# Save the cleaned text
with open('/content/cleaned_text.txt', 'w') as file:
    file.write(cleaned_text)


In [7]:
# Download the cleaned text (optional)
files.download('cleaned_text.txt')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [8]:
link = "/content/cleaned_text.txt"
elements = partition(filename=link, content_type="application/text")
input_text = ''.join(str(element) for element in elements if isinstance(element, NarrativeText))

nltksplit = NLTKTextSplitter(chunk_size=250)
nsplit = nltksplit.split_text(input_text)

embed = OpenAIEmbeddings()
db = Chroma.from_texts(nsplit, embed)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [18]:
query = "what are the calibration angles for the yumi arms?"
outputtext = db.similarity_search(query)
print(outputtext[0].page_content)

This is further detailed in the ApplicationIf a service operation is done to a robot with the option Absolute Accuracy, a newabsolute accuracy calibration is required in order to establish full performance.For most cases after replacements that do not include taking apart the robotstructure, standard calibration is sufficient.Fine calibration is made by moving the axes so that the synchronization mark oneach joint is aligned, and running the CalHall routine.For detailed information on calibration of the robot see Productmanual-IRB14000.Absolute Accuracy is a calibration concept that improves TCP accuracy.


In [12]:
print(outputtext[1].page_content)

The gripper has one basic servo module and two optional functionalmodules, vacuum and vision.

The three modules can be combined to provide fiveA pair of getting-started fingers are provided together with the gripper for demoand test purposes.


In [13]:
print(outputtext)

[Document(page_content='The robot has an open structurethat is especially adapted for flexible use, and can communicate extensively with1.1.1 Introduction to structure1.1.1.1 Robot descriptionThe difference between IRB 14000 and IRB 14000 Type A is that the Type A hasAs a result of this, the following parts differ between types:Those robots in original design are simply named IRB 14000 (no-type-specified).How to know which type the robot is?The following characteristics can be used to figure out the robot type.The robot type must be correctly selected when setting the arm configuration duringmay occur.Type A is available for selection as below only in RobotStudio 2019.5.3 or laterParticle emission from the robot ( IRB 14000 YuMi including gripper and suctioncup) fulfill Clean room class 5 standard according to DIN EN ISO 14644-1, -14 .According to IPA test result, the robot IRB 14000 YuMi is suitable for use in cleanClassification of airborne molecular contamination, see below:(Accordi

In [20]:
openai.api_key = api_key
response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": f" Show the initial query of the user and craft a refined answer based on the query: '{query}'\
        and relevant chunk of text from the source pdf: '{outputtext[0].page_content}', '{outputtext[1].page_content}', and '{outputtext[2].page_content}'?"}
    ]
)

refined_output = response.choices[0].message['content'].strip()
print(refined_output)


Based on the initial query and the relevant chunk of text from the source PDF, it seems that the user is looking for information about the calibration angles for the Yumi arms. However, the provided text does not directly mention the calibration angles for the Yumi arms specifically. It talks about calibration in general and mentions different types of calibration methods and processes. It also mentions the concept of Absolute Accuracy and standard calibration.

Based on this information, it appears that to calibrate the Yumi arms, a standard calibration process is typically sufficient. The fine calibration involves aligning the synchronization mark on each joint and running the CalHall routine.

For detailed information on calibration methods specific to the Yumi robot, it is recommended to refer to the product manual provided by ABB. The product manual should provide more detailed information and instructions on the calibration process for the Yumi arms.

Please note that specific ca