# Basic Egnyte-LangChain Integration

This notebook demonstrates the basic integration between Egnyte and LangChain for document retrieval and AI-powered analysis.

## Prerequisites

1. **Egnyte Account**: Access to an Egnyte domain
2. **API Credentials**: User token or OAuth setup
3. **Python Environment**: Python 3.8+ with required packages

## Installation

In [None]:
# Install required packages
pip install egnyte-langchain-connector
pip install langchain-openai  # For AI model integration
pip install python-dotenv    # For environment variables

## Environment Setup

Create a `.env` file with your Egnyte credentials:

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Verify credentials are available
EGNYTE_DOMAIN = os.getenv("EGNYTE_DOMAIN")
EGNYTE_USER_TOKEN = os.getenv("EGNYTE_USER_TOKEN")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

print(f"Egnyte Domain: {EGNYTE_DOMAIN}")
print(f"Token Available: {'Yes' if EGNYTE_USER_TOKEN else 'No'}")
print(f"OpenAI Key Available: {'Yes' if OPENAI_API_KEY else 'No'}")

## Basic Document Retrieval

Let's start with basic document retrieval from Egnyte:

In [None]:
from langchain_egnyte import EgnyteRetriever, EgnyteSearchOptions

# Create the retriever
retriever = EgnyteRetriever(
    domain=EGNYTE_DOMAIN,
    user_token=EGNYTE_USER_TOKEN
)

print(f"✅ Retriever created for domain: {EGNYTE_DOMAIN}")

In [None]:
# Perform a basic search
query = "project proposal"
documents = retriever.invoke(query)

print(f"Found {len(documents)} documents for query: '{query}'")
print("\n📄 Document Results:")

for i, doc in enumerate(documents[:3], 1):  # Show first 3 results
    print(f"\n{i}. {doc.metadata.get('name', 'Unknown')}")
    print(f"   Path: {doc.metadata.get('path', 'Unknown')}")
    print(f"   Size: {doc.metadata.get('size', 'Unknown')} bytes")
    print(f"   Modified: {doc.metadata.get('last_modified', 'Unknown')}")
    print(f"   Content Preview: {doc.page_content[:100]}...")

## Advanced Search Options

Use search options to refine your queries:

In [None]:
from datetime import datetime, timedelta
from langchain_egnyte import create_folder_search_options, create_date_range_search_options

# Search in specific folder
folder_options = create_folder_search_options(
    folder_path="/Shared/Projects",
    limit=10
)

retriever_with_folder = EgnyteRetriever(
    domain=EGNYTE_DOMAIN,
    user_token=EGNYTE_USER_TOKEN,
    search_options=folder_options
)

folder_documents = retriever_with_folder.invoke("budget analysis")
print(f"Found {len(folder_documents)} documents in /Shared/Projects folder")

In [None]:
# Search with date range (last 30 days)
thirty_days_ago = datetime.now() - timedelta(days=30)

date_options = create_date_range_search_options(
    created_after=thirty_days_ago,
    limit=5
)

retriever_with_date = EgnyteRetriever(
    domain=EGNYTE_DOMAIN,
    user_token=EGNYTE_USER_TOKEN,
    search_options=date_options
)

recent_documents = retriever_with_date.invoke("meeting notes")
print(f"Found {len(recent_documents)} recent documents (last 30 days)")

## AI-Powered Document Analysis

Now let's integrate with OpenAI for intelligent document analysis:

In [None]:
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# Initialize OpenAI model
llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0.1,
    openai_api_key=OPENAI_API_KEY
)

# Create a custom prompt for document analysis
prompt_template = """
You are an expert document analyst. Based on the following documents from Egnyte, 
provide a comprehensive answer to the question.

Context from Egnyte documents:
{context}

Question: {question}

Please provide a detailed answer based on the document content, including:
1. Key findings from the documents
2. Relevant details and data points
3. Source document references

Answer:
"""

PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

# Create the QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": PROMPT},
    return_source_documents=True
)

print("AI-powered QA chain created")

In [None]:
# Ask questions about your documents
question = "What are the key project milestones mentioned in the documents?"

result = qa_chain.invoke({"query": question})

print(f"AI Analysis for: '{question}'")
print("\nAnswer:")
print(result["result"])

print("\nSource Documents:")
for i, doc in enumerate(result["source_documents"], 1):
    print(f"{i}. {doc.metadata.get('name', 'Unknown')} - {doc.metadata.get('path', 'Unknown')}")

## Async Operations for Better Performance

For high-performance applications, use async operations:

In [None]:
import asyncio

async def async_document_search():
    """Demonstrate async document retrieval."""
    
    # Multiple concurrent searches
    queries = [
        "financial report",
        "project timeline",
        "team meeting"
    ]
    
    # Execute searches concurrently
    tasks = [retriever.ainvoke(query) for query in queries]
    results = await asyncio.gather(*tasks)
    
    print("Async Search Results:")
    for query, docs in zip(queries, results):
        print(f"  '{query}': {len(docs)} documents found")
    
    return results

# Run async search
async_results = await async_document_search()

## Error Handling and Best Practices

Implement robust error handling for production use:

In [None]:
from langchain_egnyte import (
    AuthenticationError,
    ValidationError,
    RateLimitError,
    ConnectionError
)

def safe_document_search(query: str, max_retries: int = 3):
    """Perform document search with comprehensive error handling."""
    
    for attempt in range(max_retries):
        try:
            documents = retriever.invoke(query)
            print(f"Search successful: {len(documents)} documents found")
            return documents
            
        except AuthenticationError as e:
            print(f"Authentication failed: {e}")
            print("Please check your Egnyte credentials")
            break
            
        except ValidationError as e:
            print(f"Invalid query: {e}")
            break
            
        except RateLimitError as e:
            print(f"Rate limit exceeded (attempt {attempt + 1}/{max_retries}): {e}")
            if attempt < max_retries - 1:
                import time
                time.sleep(2 ** attempt)  # Exponential backoff
                
        except ConnectionError as e:
            print(f"Connection error (attempt {attempt + 1}/{max_retries}): {e}")
            if attempt < max_retries - 1:
                import time
                time.sleep(1)
                
        except Exception as e:
            print(f"Unexpected error: {e}")
            break
    
    return []

# Test error handling
safe_results = safe_document_search("test query")
print(f"Safe search returned {len(safe_results)} documents")

## Performance Monitoring

Monitor performance for optimization:

In [None]:
import time
from typing import List
from langchain.schema import Document

def benchmark_search(queries: List[str]) -> dict:
    """Benchmark search performance across multiple queries."""
    
    results = {
        "total_queries": len(queries),
        "total_documents": 0,
        "total_time": 0,
        "average_time_per_query": 0,
        "queries_per_second": 0
    }
    
    start_time = time.time()
    
    for query in queries:
        query_start = time.time()
        documents = retriever.invoke(query)
        query_time = time.time() - query_start
        
        results["total_documents"] += len(documents)
        print(f"Query: '{query}' - {len(documents)} docs in {query_time:.2f}s")
    
    results["total_time"] = time.time() - start_time
    results["average_time_per_query"] = results["total_time"] / len(queries)
    results["queries_per_second"] = len(queries) / results["total_time"]
    
    return results

# Benchmark performance
test_queries = ["report", "meeting", "project", "budget", "analysis"]
performance = benchmark_search(test_queries)

print("\nPerformance Results:")
print(f"Total Queries: {performance['total_queries']}")
print(f"Total Documents: {performance['total_documents']}")
print(f"Total Time: {performance['total_time']:.2f}s")
print(f"Average Time per Query: {performance['average_time_per_query']:.2f}s")
print(f"Queries per Second: {performance['queries_per_second']:.2f}")

## Next Steps

This notebook covered the basics of Egnyte-LangChain integration. For more advanced use cases, check out:

1. **[Advanced RAG Patterns](02-advanced-rag-patterns.ipynb)** - Complex retrieval-augmented generation
2. **[Enterprise Workflows](03-enterprise-workflows.ipynb)** - Production deployment patterns
3. **[Multi-Modal Analysis](04-multimodal-analysis.ipynb)** - Working with different document types

## Resources

- **Documentation**: [Egnyte-LangChain Connector Docs](https://github.com/your-repo/docs)
- **API Reference**: [Egnyte Public API](https://developers.egnyte.com)
- **LangChain Docs**: [LangChain Documentation](https://docs.langchain.com)
- **Support**: [GitHub Issues](https://github.com/your-repo/issues)