# ThesisChat - Basic Usage Example

This notebook demonstrates the basic usage of the ThesisChat module for processing LaTeX documents and creating a conversational interface.

## Prerequisites

Before running this notebook, make sure you have:
1. Installed the thesis_chat package
2. Obtained API keys for Pinecone and OpenAI
3. A LaTeX thesis document to process

## 1. Setup and Configuration

In [None]:
import os
from thesis_chat import ThesisChat, Config

# Set your API keys (preferably as environment variables)
PINECONE_API_KEY = os.getenv('PINECONE_API_KEY', 'your-pinecone-api-key')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY', 'your-openai-api-key')

# Path to your LaTeX thesis file
LATEX_FILE_PATH = "path/to/your/thesis.tex"

print("API keys and file path configured!")

## 2. Initialize ThesisChat

Create a ThesisChat instance with your configuration:

In [None]:
# Optional: Create custom configuration
config = Config(
    chunk_size=300,        # Words per chunk
    overlap=60,            # Word overlap between chunks
    keep_captions=True,    # Include figure/table captions
    max_context_chunks=6   # Max chunks to use for LLM context
)

# Initialize ThesisChat
chat = ThesisChat(
    pinecone_api_key=PINECONE_API_KEY,
    openai_api_key=OPENAI_API_KEY,
    index_name="my-thesis-chat",
    namespace="v1",
    config=config
)

print(f"ThesisChat initialized: {chat}")

## 3. Setup the System

This creates the Pinecone index and prepares the query engine:

In [None]:
# Setup the thesis chat system
chat.setup(force_recreate_index=False)

# Check current index stats
stats = chat.get_index_stats()
print("Index statistics:")
print(f"  Total vectors: {stats['total_vector_count']}")
print(f"  Dimension: {stats['dimension']}")
print(f"  Metric: {stats['metric']}")

## 4. Process and Index Your LaTeX Document

This is the main processing step that converts your thesis into searchable chunks:

In [None]:
# Process the LaTeX file and create embeddings
try:
    chat.process_and_index_latex(LATEX_FILE_PATH)
    print("✅ LaTeX document processed and indexed successfully!")
except Exception as e:
    print(f"❌ Error processing document: {str(e)}")
    # You might want to handle specific errors here

In [None]:
# Get summary of processed chunks
summary = chat.get_chunk_summary()
print("\n📊 Document Processing Summary:")
print(f"  Total chunks: {summary['total_chunks']}")
print(f"  Unique chapters: {summary['unique_chapters']}")
print(f"  Has embeddings: {summary['has_embeddings']}")
print(f"  Is indexed: {summary['is_indexed']}")

print("\n📑 Chunk types:")
for chunk_type, count in summary['chunk_types'].items():
    print(f"  {chunk_type}: {count}")

print("\n📚 Thesis parts:")
for part, count in summary['thesis_parts'].items():
    print(f"  {part}: {count}")

## 5. Query Your Document

Now you can ask questions about your thesis!

In [None]:
# Ask a question about your thesis
question = "What are the main contributions of this thesis?"

response = chat.query(
    question=question,
    language="auto",
    include_sources=True,
    temperature=0.7
)

print(f"🤔 Question: {response['query']}")
print(f"\n🤖 Answer:\n{response['answer']}")
print(f"\n📈 Retrieved {response['chunks_retrieved']} chunks, used {response['chunks_used']} for context")

In [None]:
# Show sources used in the answer
print("\n📚 Sources used:")
for source in response['sources']:
    print(f"  [{source['index']}] {source['reference']}")
    print(f"      Similarity: {source['similarity_score']:.3f}, Rerank: {source['rerank_score']:.3f}")
    print(f"      Preview: {source['text_preview']}")
    print()

## 6. Try More Questions

Ask different types of questions to explore your thesis:

In [None]:
# Try different questions
questions = [
    "What methodology was used in this research?",
    "What were the main findings?",
    "What are the limitations of this study?",
    "What future work is suggested?"
]

for q in questions:
    print(f"\n{'='*60}")
    print(f"🤔 {q}")
    print('='*60)
    
    try:
        response = chat.query(q, temperature=0.7)
        print(f"🤖 {response['answer'][:500]}..." if len(response['answer']) > 500 else response['answer'])
        print(f"\n📊 Used {response['chunks_used']} chunks")
    except Exception as e:
        print(f"❌ Error: {str(e)}")

## 7. Search Without LLM Response

Sometimes you just want to find relevant chunks without generating a response:

In [None]:
# Search for relevant chunks
search_query = "machine learning methods"
search_results = chat.search(search_query, top_k=5)

print(f"🔍 Search results for: '{search_query}'")
print(f"Found {len(search_results)} relevant chunks:\n")

for i, result in enumerate(search_results, 1):
    metadata = result['metadata']
    print(f"[{i}] Chapter: {metadata.get('chapter', 'N/A')}")
    print(f"    Section: {metadata.get('section', 'N/A')}")
    print(f"    Similarity: {result.get('score', 0):.3f}")
    print(f"    Rerank: {result.get('rerank_score', 0):.3f}")
    print(f"    Preview: {metadata.get('text', '')[:200]}...")
    print()

## 8. Save and Load Processed Data

Save your processed chunks for later use:

In [None]:
# Save chunks with embeddings
output_file = "my_thesis_chunks.jsonl"
chat.save_chunks(output_file, include_embeddings=True)
print(f"💾 Chunks saved to: {output_file}")

# You can later load these chunks in another session:
# chat.load_chunks(output_file)

## 9. Multilingual Support

The system supports multilingual queries:

In [None]:
# Ask questions in different languages
spanish_question = "¿Cuáles son las principales conclusiones de esta tesis?"

response = chat.query(
    question=spanish_question,
    language="es",  # Spanish
    include_sources=True
)

print(f"🤔 Pregunta: {response['query']}")
print(f"\n🤖 Respuesta:\n{response['answer']}")

## 10. Cleanup (Optional)

If you want to clear the index:

In [None]:
# Uncomment to clear the index
# chat.clear_index()
# print("🗑️  Index cleared!")

print("✅ Basic usage example completed!")