# EgnyteRetriever

This will help you get started with the Egnyte [retriever](https://github.com/langchain-ai/langchain/tree/aa63de936659ea68ffcd788f3bd97215985b0e2c/docs/docs/integrations/retrievers).

# Overview

The `EgnyteRetriever` class helps you get your unstructured content from Egnyte in LangChain's `Document` format. You can search for files using Egnyte's advanced hybrid search API, which combines keyword and semantic search capabilities to find the most relevant documents.

:::info
Egnyte hybrid search requires an Egnyte Business or Enterprise plan with AI features enabled
:::

Files without text content will be skipped during retrieval.

### Integration details

| Retriever | Self-host | Cloud offering | Package |
| :--- | :--- | :---: | :---: |
| `EgnyteRetriever` | ❌ | ✅ | langchain-egnyte |

## Setup

In order to use the Egnyte package, you will need:

* An Egnyte account with Business or Enterprise plan
* [An Egnyte API application](https://developers.egnyte.com/docs/read/Getting_Started) — Register your application in the Egnyte Developer Portal
* API access enabled by your Egnyte administrator
* User authentication token with appropriate permissions

### Credentials

For these examples, we will use OAuth token authentication. You can obtain tokens through Egnyte's OAuth flow or use service account tokens for server-to-server applications. If you want to learn more about Egnyte authentication, visit the [Egnyte Developer Documentation](https://developers.egnyte.com/docs).

In [2]:
import getpass
import os

egnyte_domain = input("Enter your Egnyte domain (e.g., 'company.egnyte.com'): ")
egnyte_user_token = getpass.getpass("Enter your Egnyte User Token: ")

If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:

In [3]:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

### Installation

This retriever lives in the `langchain-egnyte` package:

In [4]:
%pip install -qU langchain-egnyte

Note: you may need to restart the kernel to use updated packages.


## Instantiation

Now we can instantiate our retriever:

## Basic Search

In [5]:
from langchain_egnyte import EgnyteRetriever

retriever = EgnyteRetriever(domain=egnyte_domain)

For more granular search, we offer a comprehensive set of options to help you filter and customize your search results. This uses the `langchain_egnyte.utilities.EgnyteSearchOptions` class to filter on things like creation date, folder paths, file creators, and more.

In [6]:
from langchain_egnyte import EgnyteSearchOptions
import datetime

# Create search options for more targeted results
search_options = EgnyteSearchOptions(
    limit=50,
    folderPath="/Shared/Finance",
    createdAfter=int(datetime.datetime(2024, 1, 1).timestamp() * 1000),  # Unix timestamp in milliseconds
    createdBefore=int(datetime.datetime(2024, 12, 31).timestamp() * 1000)
)

retriever = EgnyteRetriever(
    domain=egnyte_domain,
    search_options=search_options
)

# Search with user token provided per request
retriever.invoke("financial report Q4", egnyte_user_token=egnyte_user_token)

[Document(metadata={'source': 'https://company.egnyte.com/pubapi/v1/search/content', 'title': 'Q4_Financial_Report.pdf', 'path': '/Shared/Finance/Q4_Financial_Report.pdf', 'created_date': '2024-01-15T10:30:00Z', 'created_by': 'john.doe'}, page_content='Q4 Financial Summary\n\nRevenue: $2.5M\nExpenses: $1.8M\nNet Income: $700K\n\nKey highlights:\n- 15% growth in recurring revenue\n- Successful product launch in Q4\n- Expansion into new markets')]

## Advanced Search Options

The Egnyte retriever supports comprehensive search filtering:

In [7]:
from langchain_egnyte import EgnyteRetriever, EgnyteSearchOptions

# Advanced search configuration
advanced_search_options = EgnyteSearchOptions(
    limit=100,
    folderPath="/Shared/Documents",
    createdBy="john.doe",
    createdAfter=int(datetime.datetime(2024, 1, 1).timestamp() * 1000),
    preferredFolderPath="/Shared/Documents/Important",
    excludeFolderPaths=["/Shared/Documents/Archive", "/Shared/Documents/Temp"],
    collectionId="marketing-materials"
)

retriever = EgnyteRetriever(
    domain=egnyte_domain,
    search_options=advanced_search_options
)

## Usage

In [8]:
query = "marketing strategy and brand awareness"

retriever.invoke(query, egnyte_user_token=egnyte_user_token)

[Document(metadata={'source': 'https://company.egnyte.com/pubapi/v1/search/content', 'title': 'Marketing_Strategy_2024.docx', 'path': '/Shared/Documents/Important/Marketing_Strategy_2024.docx', 'created_date': '2024-02-01T14:20:00Z', 'created_by': 'john.doe'}, page_content='Marketing Strategy 2024\n\nObjectives:\n1. Increase brand awareness by 30%\n2. Generate 500 qualified leads per month\n3. Launch new product line\n\nKey Initiatives:\n- Digital marketing campaign\n- Content marketing strategy\n- Partnership development')]

## Utility Functions

The Egnyte package provides convenient utility functions for common search patterns:

In [9]:
from langchain_egnyte import create_folder_search_options, create_date_range_search_options

# Create folder-specific search options
folder_options = create_folder_search_options(
    folder_path="/Shared/Projects",
    limit=25
)

retriever = EgnyteRetriever(
    domain=egnyte_domain,
    search_options=folder_options
)

retriever.invoke("project plan", egnyte_user_token=egnyte_user_token)

[Document(metadata={'source': 'https://company.egnyte.com/pubapi/v1/search/content', 'title': 'Project_Plan.pdf', 'path': '/Shared/Projects/Project_Plan.pdf', 'created_date': '2024-03-15T09:45:00Z', 'created_by': 'jane.smith'}, page_content='Project Plan\n\nProject: Website Redesign\nTimeline: Q2 2024\nBudget: $50,000\n\nMilestones:\n1. Design mockups - Week 2\n2. Development - Weeks 3-8\n3. Testing - Weeks 9-10\n4. Launch - Week 12')]

## Date Range Search

In [10]:
# Create date range search options
date_range_options = create_date_range_search_options(
    start_date=int(datetime.datetime(2024, 3, 1).timestamp() * 1000),
    end_date=int(datetime.datetime(2024, 3, 31).timestamp() * 1000),
    limit=50
)

retriever = EgnyteRetriever(
    domain=egnyte_domain,
    search_options=date_range_options
)

retriever.invoke("meeting notes", egnyte_user_token=egnyte_user_token)

[Document(metadata={'source': 'https://company.egnyte.com/pubapi/v1/search/content', 'title': 'Meeting_Notes_March.docx', 'path': '/Shared/Meetings/Meeting_Notes_March.docx', 'created_date': '2024-03-20T16:30:00Z', 'created_by': 'alice.johnson'}, page_content='Meeting Notes - March 20, 2024\n\nAttendees: Alice, Bob, Carol\nAgenda: Q1 Review\n\nKey Points:\n- Q1 targets exceeded by 12%\n- New client onboarding process\n- Team expansion plans for Q2')]

## Use within a chain

Like other retrievers, EgnyteRetriever can be incorporated into LLM applications via `chains`.

We will need a LLM or chat model:

import ChatModelTabs from "@theme/ChatModelTabs";

<ChatModelTabs customVarName="llm" />


In [11]:
openai_key = getpass.getpass("Enter your OpenAI key: ")

Enter your OpenAI key:  ········


In [12]:
# | output: false
# | echo: false

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, openai_api_key=openai_key)

In [13]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Create search options for financial documents
financial_search_options = EgnyteSearchOptions(
    limit=100,
    folderPath="/Shared/Finance",
    createdAfter=int(datetime.datetime(2024, 1, 1).timestamp() * 1000)
)

retriever = EgnyteRetriever(
    domain=egnyte_domain,
    search_options=financial_search_options
)

question = "What were the key financial highlights for Q4?"

prompt = ChatPromptTemplate.from_template(
    """Answer the question based only on the context provided.

    Context: {context}

    Question: {question}"""
)


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


# Create a custom retriever function that includes the token
def retrieve_with_token(query):
    return retriever.invoke(query, egnyte_user_token=egnyte_user_token)


chain = (
    {"context": retrieve_with_token | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [14]:
chain.invoke(question)

'Based on the Q4 Financial Report, the key financial highlights were:\n\n- **Revenue**: $2.5M\n- **Net Income**: $700K\n- **Growth**: 15% increase in recurring revenue\n- **Product Success**: Successful product launch in Q4\n- **Expansion**: Successfully expanded into new markets\n\nThe company showed strong performance with healthy revenue growth and successful market expansion initiatives.'

## Use as an agent tool

Like other retrievers, EgnyteRetriever can also be added to a LangGraph agent as a tool.

In [15]:
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_egnyte import create_retriever_tool

In [16]:
# Create search options for the agent
agent_search_options = EgnyteSearchOptions(
    limit=50,
    folderPath="/Shared"
)

retriever = EgnyteRetriever(
    domain=egnyte_domain,
    search_options=agent_search_options
)

# Create a custom retriever tool that includes the token
def egnyte_search_with_token(query: str) -> str:
    """Search Egnyte documents and return relevant content."""
    docs = retriever.invoke(query, egnyte_user_token=egnyte_user_token)
    return "\n\n".join([f"Title: {doc.metadata.get('title', 'Unknown')}\nContent: {doc.page_content}" for doc in docs])

egnyte_search_tool = create_retriever_tool(
    retriever,
    "egnyte_search_tool",
    "This tool searches Egnyte documents and retrieves relevant content based on the query. Use this to find documents, reports, and other files stored in Egnyte.",
)

tools = [egnyte_search_tool]

In [17]:
prompt = hub.pull("hwchase17/openai-tools-agent")

llm = ChatOpenAI(temperature=0, openai_api_key=openai_key)

agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)

In [18]:
result = agent_executor.invoke(
    {
        "input": "Find all documents related to marketing strategy and summarize the key objectives"
    }
)

In [19]:
print(result['output'])

Based on the marketing strategy document found, here are the key objectives for 2024:

**Main Objectives:**
1. **Increase brand awareness by 30%** - Focus on expanding market recognition
2. **Generate 500 qualified leads per month** - Improve lead generation capabilities
3. **Launch new product line** - Introduce new offerings to the market

**Key Initiatives to achieve these objectives:**
- Digital marketing campaign
- Content marketing strategy
- Partnership development

The strategy appears to be comprehensive, focusing on both brand building and lead generation while supporting business growth through new product launches and strategic partnerships.


## Error Handling

The Egnyte retriever includes comprehensive error handling for common scenarios:

In [20]:
from langchain_egnyte import (
    AuthenticationError,
    AuthorizationError,
    NotFoundError,
    RateLimitError,
    ServerError
)

try:
    retriever = EgnyteRetriever(domain=egnyte_domain)
    documents = retriever.invoke("test query", egnyte_user_token="invalid_token")
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except AuthorizationError as e:
    print(f"Access denied: {e}")
except RateLimitError as e:
    print(f"Rate limit exceeded: {e}")
except ServerError as e:
    print(f"Server error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

## API reference

For detailed documentation of all EgnyteRetriever features and configurations head to the [API reference](https://python.langchain.com/api_reference).


## Help

If you have questions, you can check out our [developer documentation](https://developers.egnyte.com) or reach out to us through [Egnyte Support](https://helpdesk.egnyte.com).