<a href="https://colab.research.google.com/github/envererguven/vectordb/blob/main/embeddingPDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
pip install -U langchain-community

In [None]:
import streamlit as st

import os

from PyPDF2 import PdfReader

from langchain.embeddings import HuggingFaceEmbeddings

from langchain.vectorstores import FAISS

from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.chains import RetrievalQA

from langchain.llms import Ollama

from langchain.chains.question_answering import load_qa_chain



# Function to extract text from PDF

def extract_text_from_pdf(pdf_path):

    reader = PdfReader(pdf_path)

    text = ""

    for page in reader.pages:

        text += page.extract_text()

    return text


# Function to create FAISS vector store

def create_faiss_vector_store(text, path="faiss_index"):

    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

    chunks = splitter.split_text(text)


    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

    vector_store = FAISS.from_texts(chunks, embedding=embeddings)

    vector_store.save_local(path)


# Load FAISS vector store

def load_faiss_vector_store(path="faiss_index"):

    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

    vector_store = FAISS.load_local(path, embeddings,

                 allow_dangerous_deserialization=True)

    return vector_store


# Build QA Chain

def build_qa_chain(vector_store_path="faiss_index"):

    vector_store = load_faiss_vector_store(vector_store_path)

    retriever = vector_store.as_retriever()

    # Load QA chain for combining documents

    llm = Ollama(model="llama3.2")

    qa_chain = load_qa_chain(llm, chain_type="stuff")

    qa_chain = RetrievalQA(retriever=retriever,combine_documents_chain=qa_chain)

    return qa_chain


# Streamlit App

st.title("RAG Chatbot with FAISS and LLaMA")

st.write("Upload a PDF and ask questions based on its content.")


uploaded_file = st.file_uploader("Upload your PDF file", type="pdf")


if uploaded_file is not None:

    pdf_path = f"uploaded/{uploaded_file.name}"

    os.makedirs("uploaded", exist_ok=True)


    with open(pdf_path, "wb") as f:

        f.write(uploaded_file.getbuffer())


    text = extract_text_from_pdf(pdf_path)



    st.info("Creating FAISS vector store...")

    create_faiss_vector_store(text)


    st.info("Initializing chatbot...")

    qa_chain = build_qa_chain()

    st.success("Chatbot is ready!")


if 'qa_chain' in locals():

    question = st.text_input("Ask a question about the uploaded PDF:")

    if question:

        st.info("Querying the document...")

        answer = qa_chain.run(question)

        st.success(f"Answer: {answer}")

In [None]:
streamlit run /usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py

In [None]:
!streamlit run /usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py

# Task
The user wants to access a Streamlit application running on a specific IP address and port ("http://34.168.128.182:8501"). They are unable to access it directly in their browser. The task is to help the user access the application, likely by using `ngrok` to create a public URL that tunnels to the application.

## Install ngrok

### Subtask:
Install the `ngrok` package.


**Reasoning**:
The subtask is to install the `ngrok` package. I will use `pip` to install the package.



In [None]:
pip install ngrok

## Authenticate ngrok

### Subtask:
Add your `ngrok` authentication token to allow `ngrok` to connect your account.


**Reasoning**:
Add the ngrok authentication token to allow ngrok to connect to the account.



In [None]:
get_ipython().system('ngrok authtoken c038f154-86e4-4625-a4dc-1614b5dbcf49')

**Reasoning**:
The previous attempt to set the ngrok authtoken failed because the ngrok command was not found. This is likely because the installation was done using `pip install ngrok`, which installs the Python library, not the executable. To fix this, I need to install the ngrok executable using a shell command.



In [None]:
get_ipython().system('curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null')
get_ipython().system('echo "deb http://packages.ngrok.com stable main" | sudo tee /etc/apt/sources.list.d/ngrok.list')
get_ipython().system('sudo apt update')
get_ipython().system('sudo apt install ngrok')

**Reasoning**:
The ngrok installation failed because it could not resolve the hostname `packages.ngrok.com`. This indicates a potential network issue preventing access to the ngrok package repository. Since installing the ngrok executable via `apt` is not working, I will try downloading the ngrok executable directly using `wget` and then make it executable. After that, I will attempt to set the authtoken again.



In [None]:
get_ipython().system('wget https://bin.equinox.io/c/bNyj1zyRfK4/ngrok-v3-stable-linux-amd64.tgz')
get_ipython().system('tar xvzf ngrok-v3-stable-linux-amd64.tgz')
get_ipython().system('mv ngrok /usr/local/bin')
get_ipython().system('chmod +x /usr/local/bin/ngrok')
get_ipython().system('ngrok authtoken c038f154-86e4-4625-a4dc-1614b5dbcf49')

**Reasoning**:
The direct download of the ngrok executable failed with a 404 error, indicating the specified URL is incorrect or no longer valid. I need to find the correct download URL for the latest ngrok Linux amd64 stable release. I will try a different URL based on the ngrok documentation format and then repeat the download, extraction, move, and authtoken steps.



In [None]:
get_ipython().system('wget https://update.equinox.io/ngrok/stable/linux-amd64/ngrok.tgz')
get_ipython().system('tar xvzf ngrok.tgz')
get_ipython().system('mv ngrok /usr/local/bin')
get_ipython().system('chmod +x /usr/local/bin/ngrok')
get_ipython().system('ngrok authtoken c038f154-86e4-4625-a4dc-1614b5dbcf49')

**Reasoning**:
The previous attempt to download ngrok using a different URL also resulted in a 404 error. It seems the direct download links for ngrok might be changing. I will try a more general approach by downloading the official ngrok zip file for Linux from their website, extracting it, and then moving the executable. After confirming the executable is in the correct path, I will set the authtoken.



In [None]:
get_ipython().system('wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip')
get_ipython().system('unzip ngrok-stable-linux-amd64.zip')
get_ipython().system('mv ngrok /usr/local/bin/')
get_ipython().system('chmod +x /usr/local/bin/ngrok')
get_ipython().system('ngrok authtoken c038f154-86e4-4625-a4dc-1614b5dbcf49')

## Run ngrok

### Subtask:
Start `ngrok` and tunnel to the port where Streamlit is running.


**Reasoning**:
Run ngrok to create an HTTP tunnel to port 8501 and run it in the background.



In [None]:
get_ipython().system('ngrok http 8501 &')

**Reasoning**:
The previous command failed because the authtoken was not recognized. I need to re-authenticate ngrok with the provided token.



In [None]:
get_ipython().system('ngrok authtoken c038f154-86e4-4625-a4dc-1614b5dbcf49')
get_ipython().system('ngrok http 8501 &')

## Run ngrok retry 1

### Subtask:
Start `ngrok` and tunnel to the port where Streamlit is running.


**Reasoning**:
Run the ngrok command to tunnel to port 8501 in the background, proceeding despite the previous authtoken error.



In [None]:
get_ipython().system('ngrok http 8501 &')

## Access streamlit app

### Subtask:
Provide the public URL generated by `ngrok` to access the Streamlit application.


**Reasoning**:
I need to extract the public URL from the output of the `ngrok` execution. The `ngrok` process was started in the background using `!ngrok http 8501 &`. I should check the output for the "Forwarding" line.



In [None]:
# The ngrok process was started in the background in a previous step.
# To get the output, I can check the logs or infer from the previous successful run's output structure.
# Based on typical ngrok output, the forwarding URL is printed to stdout.
# Since the previous cell executed successfully, the ngrok tunnel should be active and the URL should be in the output.

# Manually examine the output of the cell with `!ngrok http 8501 &`
# The output contains lines like:
# Forwarding                    http://<random_chars>.ngrok.io -> http://localhost:8501
# Forwarding                    https://<random_chars>.ngrok.io -> http://localhost:8501

# I will provide the https URL as it is generally preferred.

# Since I cannot programmatically access the output of a background process in a simple way in this environment,
# I will assume the ngrok URL was printed and I can manually provide it based on the output shown in the history.

# Based on the history output, the ngrok public URL is available.
# The output from the cell 12070259 shows:
# External URL: http://34.168.128.182:8501
# But this is the Streamlit URL, not the ngrok URL.

# Looking back at the previous `ngrok http 8501 &` execution attempts in the history (specifically cell ORvwHjOLajvC),
# there is no output from the ngrok process itself showing the forwarding URL.
# The output from cell 12070259 where `!streamlit run ...` was executed shows the Streamlit URLs.

# It seems the command `!ngrok http 8501 &` did not print the forwarding URL to the notebook output in a way I can capture or see in the history for the specific successful run.

# However, a common way to get the ngrok URL when it's running in the background is to query its local API.
# I can try to query the ngrok API endpoint which is usually at http://localhost:4040.

import requests

try:
    response = requests.get('http://localhost:4040/api/tunnels')
    response.raise_for_status() # Raise an exception for bad status codes
    tunnels_data = response.json()

    # Find the public URL in the tunnels data
    ngrok_public_url = None
    for tunnel in tunnels_data['tunnels']:
        if tunnel['proto'] == 'https': # Prefer https URL
            ngrok_public_url = tunnel['public_url']
            break # Found the https URL, no need to check others

    if ngrok_public_url:
        print(f"Ngrok Public URL: {ngrok_public_url}")
    else:
        print("Could not find the ngrok public URL.")

except requests.exceptions.RequestException as e:
    print(f"Could not connect to ngrok API: {e}")
    print("Please ensure ngrok is running and the API is accessible.")


**Reasoning**:
I could not connect to the ngrok API to get the public URL, indicating that the ngrok process might not be running or the API is not accessible. I will try running ngrok again, ensuring it prints the URL to stdout.



In [None]:
# Stop any potentially running background ngrok processes
get_ipython().system('pkill ngrok')

# Run ngrok again, this time without the '&' to keep it in the foreground
# and ensure its output is visible. We will stop it manually once we get the URL.
get_ipython().system('ngrok http 8501')

## Access streamlit app retry 1

### Subtask:
Provide the public URL generated by `ngrok` to access the Streamlit application.


**Reasoning**:
Stop any existing ngrok processes and then run ngrok in the foreground for port 8501 to get the public URL from its output.



In [None]:
get_ipython().system('pkill ngrok')
get_ipython().system('ngrok http 8501')

## Summary:

### Data Analysis Key Findings

*   Attempting to install `ngrok` using `apt` failed due to hostname resolution issues.
*   Direct downloads of `ngrok` executables via `wget` from various URLs resulted in 404 errors.
*   The `ngrok` authentication token provided (`c038f154-86e4-4625-a4dc-1614b5dbcf49`) was consistently rejected by `ngrok` with an `ERR_NGROK_105` error, indicating it is invalid.
*   Due to the invalid authtoken, `ngrok` failed to start and establish a tunnel to port 8501.
*   Consequently, querying the local `ngrok` API at `http://localhost:4040` failed with a "Connection refused" error, as `ngrok` was not running.

### Insights or Next Steps

*   The primary issue preventing the task completion is the invalid `ngrok` authentication token. The user needs to provide a valid token from their `ngrok` account.
*   Once a valid `ngrok` authtoken is provided and configured, the `ngrok http 8501` command should successfully create a public URL for the Streamlit application.


## Authenticate ngrok

### Subtask:
Add your `ngrok` authentication token to allow `ngrok` to connect your account.

**Reasoning**:
Add the ngrok authentication token to allow ngrok to connect to the account.

In [None]:
get_ipython().system('ngrok authtoken 312yxECFghMbgmnukdFoX5xXeY7_2XwF2fj1JwfakLYpzdber')

## Run ngrok

### Subtask:
Start `ngrok` and tunnel to the port where Streamlit is running.

**Reasoning**:
Run ngrok to create an HTTP tunnel to port 8501 and run it in the background.

In [None]:
get_ipython().system('ngrok http 8501 &')

## Update ngrok Agent

### Subtask:
Update the installed ngrok agent to the latest version.

**Reasoning**:
Attempt to update the ngrok agent using the `ngrok update` command to resolve the `ERR_NGROK_121` error.

In [None]:
get_ipython().system('ngrok update')

## Re-install ngrok Executable

### Subtask:
Download and install the latest ngrok executable manually.

**Reasoning**:
Since `ngrok update` failed to resolve the version issue, I will manually download, extract, and install the latest ngrok executable to ensure a recent version is being used.

In [None]:
get_ipython().system('wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip')
get_ipython().system('unzip ngrok-stable-linux-amd64.zip')
get_ipython().system('mv ngrok /usr/local/bin/')
get_ipython().system('chmod +x /usr/local/bin/ngrok')

## Run ngrok (Attempt after Re-install)

### Subtask:
Start `ngrok` and tunnel to the port where Streamlit is running, and get the public URL.

**Reasoning**:
Stop any existing ngrok processes and then run ngrok in the foreground for port 8501 to get the public URL from its output.

In [None]:
get_ipython().system('pkill ngrok')
get_ipython().system('ngrok http 8501')