<a href="https://colab.research.google.com/github/merdandt/MagicBook/blob/main/MagicBook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Package installatoin

### Check if you have an NVIDIA GPU


In [None]:
# Note: If this returns "command not found", then GPU-based algorithms via cuGraph are unavailable

!nvidia-smi
!nvcc --version

/bin/bash: line 1: nvidia-smi: command not found
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0


In [2]:
!pip install nx_arangodb arango-datasets langchain_community langchain-openai langchain-google-genai gradio PyPDF2

Collecting nx_arangodb
  Downloading nx_arangodb-1.3.0-py3-none-any.whl.metadata (9.3 kB)
Collecting arango-datasets
  Downloading arango_datasets-1.2.3-py3-none-any.whl.metadata (4.5 kB)
Collecting langchain_community
  Downloading langchain_community-0.3.19-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-openai
  Downloading langchain_openai-0.3.7-py3-none-any.whl.metadata (2.3 kB)
Collecting langchain-google-genai
  Downloading langchain_google_genai-2.0.11-py3-none-any.whl.metadata (3.6 kB)
Collecting gradio
  Downloading gradio-5.20.0-py3-none-any.whl.metadata (16 kB)
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting networkx<=3.4,>=3.0 (from nx_arangodb)
  Downloading networkx-3.4-py3-none-any.whl.metadata (6.3 kB)
Collecting phenolrs~=0.5 (from nx_arangodb)
  Downloading phenolrs-0.5.9-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.7 kB)
Collecting python-arango~=8.1 (from nx_arangodb)
  Downloading python_

### Imports

In [39]:
from langchain_community.graphs.arangodb_graph import ArangoGraph
from langchain_community.chains.graph_qa.arangodb import ArangoGraphQAChain
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate
from langchain.agents import AgentExecutor, Tool
from langchain.tools import BaseTool, tool
from langchain.chains import LLMChain
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage


import networkx as nx

import re
from typing import List, Dict, Any
import nx_arangodb as nxadb
from arango import ArangoClient
import os, logging, PyPDF2


import pandas as pd
import gradio as gr

# Configure logging to show debug output in the console
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')

### Set environment variables for ArangoDB connection

In [20]:
from google.colab import userdata

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
os.environ['GEMINI_API_KEY'] = userdata.get('GEMINI_API_KEY')

os.environ['DATABASE_HOST'] = 'https://0e04-75-148-99-49.ngrok-free.app'
os.environ['DATABASE_USERNAME'] = 'root'
os.environ['DATABASE_PASSWORD'] = 'openSesame'       # Replace with your actual database name
os.environ['DATABASE_NAME'] = '_system'


### Initialize the db

In [21]:
client = ArangoClient(hosts="https://0e04-75-148-99-49.ngrok-free.app")
db = client.db("_system", username="root", password="openSesame", verify=False)

print(db)

<StandardDatabase _system>


### Initialize LLM

In [42]:
book_llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    api_key=os.environ.get("GEMINI_API_KEY"),
    temperature=0
  )

chat_llm = ChatOpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    model="gpt-4o",
    temperature=0
  )

In [38]:
def extract_text_from_pdf(pdf_path):
    """Extract text from a PDF file with no length constraints."""
    logging.debug("Starting PDF text extraction from file: %s", pdf_path)
    text = ""
    try:
        pdf_reader = PyPDF2.PdfReader(pdf_path)
        logging.debug("Opened PDF file successfully. Total pages: %d", len(pdf_reader.pages))
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            page_text = page.extract_text()
            # Log the first 100 characters from the page's text for debugging
            logging.debug("Page %d extracted text (first 100 chars): %s", page_num + 1, (page_text or "")[:100])
            if page_text:
                text += page_text + "\n"
            else:
                logging.debug("No text found on page %d", page_num + 1)
    except Exception as e:
        error_msg = f"Error extracting text from PDF: {str(e)}"
        logging.error(error_msg)
        return error_msg

    logging.debug("Completed PDF text extraction. Total extracted text length: %d characters", len(text))
    return text

def ask_book(question_input):
    """Ask a question about the book."""

def analyze_book_with_gemini(pdf_file):
    """Extract text from PDF and send it to Gemini for analysis."""
    logging.debug("Starting analysis of uploaded PDF file.")
    if pdf_file is None:
        logging.warning("No file uploaded.")
        return "No file uploaded. Please upload a PDF file."

    try:
        # Extract text from the PDF file
        logging.debug("Extracting text from PDF file: %s", pdf_file.name)
        text = extract_text_from_pdf(pdf_file.name)

        if text.startswith("Error"):
            logging.error("Error during PDF text extraction: %s", text)
            return text

        # Initialize Gemini model
        logging.debug("Initializing Gemini model: gemini-2.0-flash")


        # Create prompt with the extracted text
        prompt = f"""
        The following is text extracted from a book. Please analyze this and tell me:
        1. What is the title of this book?
        2. Who is the author?
        3. What is the main subject or genre?
        4. Provide a brief summary of what the book is about.

        Here's the text:
        {text}
        """
        # Log a snippet of the prompt for debugging
        logging.debug("Constructed prompt (first 200 characters): %s", prompt[:200])

        # Get response from Gemini by invoking with the prompt string
        logging.debug("Sending prompt to Gemini model.")
        response = book_llm.invoke(prompt)
        logging.debug("Received response from Gemini model: %s", response.content)

        return response.content

    except Exception as e:
        error_msg = f"Error analyzing book with Gemini: {str(e)}"
        logging.error(error_msg)
        return error_msg




Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://31ae420657acb54919.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


