# Enterprise Workflows with Egnyte-LangChain

This notebook demonstrates production-ready enterprise workflows using Egnyte-LangChain integration for real-world business scenarios.

## Enterprise Use Cases Covered

1. **Document Intelligence Pipeline**: Automated document analysis and insights
2. **Compliance Monitoring**: Regulatory document tracking and analysis
3. **Knowledge Management**: Intelligent knowledge base creation
4. **Executive Reporting**: Automated executive summary generation
5. **Risk Assessment**: Document-based risk analysis workflows

In [None]:
# Enterprise-grade imports
import os
import json
import logging
from datetime import datetime, timedelta
from typing import List, Dict, Any, Optional
from dataclasses import dataclass
from enum import Enum

# LangChain enterprise components
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from langchain.callbacks import get_openai_callback

# Egnyte integration
from langchain_egnyte import (
    EgnyteRetriever, 
    EgnyteSearchOptions,
    create_date_range_search_options,
    create_folder_search_options
)

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

print("🏢 Enterprise Workflows Setup Complete")

## 1. Document Intelligence Pipeline

Automated pipeline for document analysis and insight extraction:

In [None]:
class DocumentType(Enum):
    FINANCIAL = "financial"
    LEGAL = "legal"
    TECHNICAL = "technical"
    MARKETING = "marketing"
    HR = "hr"
    GENERAL = "general"

@dataclass
class DocumentInsight:
    document_name: str
    document_path: str
    document_type: DocumentType
    key_topics: List[str]
    sentiment: str
    risk_level: str
    action_items: List[str]
    summary: str
    confidence_score: float
    processing_timestamp: datetime

class DocumentIntelligencePipeline:
    """Enterprise document intelligence pipeline."""
    
    def __init__(self, egnyte_domain: str, egnyte_token: str, openai_key: str):
        self.retriever = EgnyteRetriever(domain=egnyte_domain, user_token=egnyte_token)
        self.llm = ChatOpenAI(model="gpt-4", temperature=0.1, openai_api_key=openai_key)
        
        # Document classification prompt
        self.classification_prompt = PromptTemplate(
            template="""
            Analyze this document and provide structured insights:
            
            Document: {document_name}
            Content: {content}
            
            Provide analysis in this JSON format:
            {{
                "document_type": "financial|legal|technical|marketing|hr|general",
                "key_topics": ["topic1", "topic2", "topic3"],
                "sentiment": "positive|neutral|negative",
                "risk_level": "low|medium|high",
                "action_items": ["action1", "action2"],
                "summary": "Brief summary of key points",
                "confidence_score": 0.95
            }}
            
            JSON Response:
            """,
            input_variables=["document_name", "content"]
        )
        
        self.analysis_chain = LLMChain(llm=self.llm, prompt=self.classification_prompt)
    
    def process_documents(self, query: str, max_docs: int = 10) -> List[DocumentInsight]:
        """Process documents through intelligence pipeline."""
        
        logger.info(f"Starting document intelligence pipeline for query: {query}")
        
        # Retrieve documents
        search_options = EgnyteSearchOptions(limit=max_docs)
        self.retriever.search_options = search_options
        documents = self.retriever.invoke(query)
        
        logger.info(f"Retrieved {len(documents)} documents for analysis")
        
        insights = []
        
        for doc in documents:
            try:
                # Analyze document
                analysis_result = self.analysis_chain.invoke({
                    "document_name": doc.metadata.get('name', 'Unknown'),
                    "content": doc.page_content[:4000]  # Limit content for API
                })
                
                # Parse JSON response
                analysis_data = json.loads(analysis_result['text'])
                
                # Create insight object
                insight = DocumentInsight(
                    document_name=doc.metadata.get('name', 'Unknown'),
                    document_path=doc.metadata.get('path', 'Unknown'),
                    document_type=DocumentType(analysis_data['document_type']),
                    key_topics=analysis_data['key_topics'],
                    sentiment=analysis_data['sentiment'],
                    risk_level=analysis_data['risk_level'],
                    action_items=analysis_data['action_items'],
                    summary=analysis_data['summary'],
                    confidence_score=analysis_data['confidence_score'],
                    processing_timestamp=datetime.now()
                )
                
                insights.append(insight)
                logger.info(f"Processed: {insight.document_name} - Type: {insight.document_type.value}")
                
            except Exception as e:
                logger.error(f"Error processing document {doc.metadata.get('name', 'Unknown')}: {e}")
                continue
        
        logger.info(f"Pipeline completed. Generated {len(insights)} insights")
        return insights
    
    def generate_intelligence_report(self, insights: List[DocumentInsight]) -> Dict[str, Any]:
        """Generate comprehensive intelligence report."""
        
        # Aggregate insights
        doc_types = {}
        risk_distribution = {"low": 0, "medium": 0, "high": 0}
        sentiment_distribution = {"positive": 0, "neutral": 0, "negative": 0}
        all_topics = []
        all_actions = []
        
        for insight in insights:
            # Document types
            doc_type = insight.document_type.value
            doc_types[doc_type] = doc_types.get(doc_type, 0) + 1
            
            # Risk distribution
            risk_distribution[insight.risk_level] += 1
            
            # Sentiment distribution
            sentiment_distribution[insight.sentiment] += 1
            
            # Collect topics and actions
            all_topics.extend(insight.key_topics)
            all_actions.extend(insight.action_items)
        
        # Calculate averages
        avg_confidence = sum(i.confidence_score for i in insights) / len(insights) if insights else 0
        
        return {
            "report_timestamp": datetime.now().isoformat(),
            "total_documents": len(insights),
            "document_types": doc_types,
            "risk_distribution": risk_distribution,
            "sentiment_distribution": sentiment_distribution,
            "average_confidence": avg_confidence,
            "top_topics": list(set(all_topics))[:10],
            "action_items": list(set(all_actions)),
            "high_risk_documents": [
                {"name": i.document_name, "path": i.document_path, "summary": i.summary}
                for i in insights if i.risk_level == "high"
            ]
        }

print("Document Intelligence Pipeline created")

In [None]:
# Initialize and run document intelligence pipeline
from dotenv import load_dotenv
load_dotenv()

pipeline = DocumentIntelligencePipeline(
    egnyte_domain=os.getenv("EGNYTE_DOMAIN"),
    egnyte_token=os.getenv("EGNYTE_USER_TOKEN"),
    openai_key=os.getenv("OPENAI_API_KEY")
)

# Process documents
query = "quarterly report financial analysis"
insights = pipeline.process_documents(query, max_docs=5)

# Generate intelligence report
intelligence_report = pipeline.generate_intelligence_report(insights)

print("DOCUMENT INTELLIGENCE REPORT")
print("=" * 50)
print(f"Total Documents Analyzed: {intelligence_report['total_documents']}")
print(f"Average Confidence Score: {intelligence_report['average_confidence']:.2f}")
print(f"\n Document Types: {intelligence_report['document_types']}")
print(f" Risk Distribution: {intelligence_report['risk_distribution']}")
print(f" Sentiment Distribution: {intelligence_report['sentiment_distribution']}")
print(f"\n Top Topics: {intelligence_report['top_topics'][:5]}")
print(f"\n Action Items ({len(intelligence_report['action_items'])}):")
for action in intelligence_report['action_items'][:3]:
    print(f"  • {action}")

## 2. Compliance Monitoring Workflow

Automated compliance monitoring and regulatory document tracking:

In [None]:
class ComplianceRule:
    def __init__(self, name: str, description: str, keywords: List[str], severity: str):
        self.name = name
        self.description = description
        self.keywords = keywords
        self.severity = severity  # low, medium, high, critical

@dataclass
class ComplianceViolation:
    rule_name: str
    document_name: str
    document_path: str
    violation_text: str
    severity: str
    confidence: float
    detected_at: datetime
    remediation_suggestion: str

class ComplianceMonitor:
    """Enterprise compliance monitoring system."""
    
    def __init__(self, egnyte_domain: str, egnyte_token: str, openai_key: str):
        self.retriever = EgnyteRetriever(domain=egnyte_domain, user_token=egnyte_token)
        self.llm = ChatOpenAI(model="gpt-4", temperature=0, openai_api_key=openai_key)
        
        # Define compliance rules
        self.compliance_rules = [
            ComplianceRule(
                name="PII_EXPOSURE",
                description="Personal Identifiable Information exposure",
                keywords=["SSN", "social security", "credit card", "passport", "driver license"],
                severity="critical"
            ),
            ComplianceRule(
                name="FINANCIAL_DISCLOSURE",
                description="Improper financial information disclosure",
                keywords=["insider trading", "material information", "earnings", "confidential financial"],
                severity="high"
            ),
            ComplianceRule(
                name="DATA_RETENTION",
                description="Data retention policy violations",
                keywords=["delete after", "retention period", "archive", "permanent storage"],
                severity="medium"
            )
        ]
        
        # Compliance analysis prompt
        self.compliance_prompt = PromptTemplate(
            template="""
            Analyze this document for compliance violations based on the given rule:
            
            Rule: {rule_name}
            Description: {rule_description}
            Keywords: {keywords}
            
            Document: {document_name}
            Content: {content}
            
            Provide analysis in JSON format:
            {{
                "violation_detected": true/false,
                "violation_text": "specific text that violates the rule",
                "confidence": 0.95,
                "remediation_suggestion": "specific steps to address the violation"
            }}
            
            JSON Response:
            """,
            input_variables=["rule_name", "rule_description", "keywords", "document_name", "content"]
        )
    
    def scan_documents(self, folder_path: str = None, days_back: int = 30) -> List[ComplianceViolation]:
        """Scan documents for compliance violations."""
        
        logger.info(f"Starting compliance scan for folder: {folder_path}, days back: {days_back}")
        
        # Create search options
        search_options = create_date_range_search_options(
            created_after=datetime.now() - timedelta(days=days_back),
            limit=20
        )
        
        if folder_path:
            search_options.folder_path = folder_path
        
        self.retriever.search_options = search_options
        
        # Get recent documents
        documents = self.retriever.invoke("*")  # Get all recent documents
        logger.info(f"Scanning {len(documents)} documents for compliance")
        
        violations = []
        
        for doc in documents:
            for rule in self.compliance_rules:
                try:
                    # Check if document contains rule keywords
                    content_lower = doc.page_content.lower()
                    if any(keyword.lower() in content_lower for keyword in rule.keywords):
                        
                        # Detailed analysis with LLM
                        analysis = self.llm.invoke(
                            self.compliance_prompt.format(
                                rule_name=rule.name,
                                rule_description=rule.description,
                                keywords=", ".join(rule.keywords),
                                document_name=doc.metadata.get('name', 'Unknown'),
                                content=doc.page_content[:3000]
                            )
                        )
                        
                        # Parse analysis
                        analysis_data = json.loads(analysis.content)
                        
                        if analysis_data['violation_detected']:
                            violation = ComplianceViolation(
                                rule_name=rule.name,
                                document_name=doc.metadata.get('name', 'Unknown'),
                                document_path=doc.metadata.get('path', 'Unknown'),
                                violation_text=analysis_data['violation_text'],
                                severity=rule.severity,
                                confidence=analysis_data['confidence'],
                                detected_at=datetime.now(),
                                remediation_suggestion=analysis_data['remediation_suggestion']
                            )
                            violations.append(violation)
                            logger.warning(f"Compliance violation detected: {rule.name} in {doc.metadata.get('name')}")
                
                except Exception as e:
                    logger.error(f"Error analyzing document {doc.metadata.get('name')} for rule {rule.name}: {e}")
                    continue
        
        logger.info(f"Compliance scan completed. Found {len(violations)} violations")
        return violations
    
    def generate_compliance_report(self, violations: List[ComplianceViolation]) -> Dict[str, Any]:
        """Generate compliance report."""
        
        # Group violations by severity
        severity_groups = {"critical": [], "high": [], "medium": [], "low": []}
        rule_counts = {}
        
        for violation in violations:
            severity_groups[violation.severity].append(violation)
            rule_counts[violation.rule_name] = rule_counts.get(violation.rule_name, 0) + 1
        
        return {
            "report_timestamp": datetime.now().isoformat(),
            "total_violations": len(violations),
            "severity_breakdown": {
                severity: len(viols) for severity, viols in severity_groups.items()
            },
            "rule_breakdown": rule_counts,
            "critical_violations": [
                {
                    "rule": v.rule_name,
                    "document": v.document_name,
                    "path": v.document_path,
                    "confidence": v.confidence,
                    "remediation": v.remediation_suggestion
                }
                for v in severity_groups["critical"]
            ],
            "recommendations": self._generate_recommendations(violations)
        }
    
    def _generate_recommendations(self, violations: List[ComplianceViolation]) -> List[str]:
        """Generate compliance recommendations."""
        recommendations = []
        
        critical_count = sum(1 for v in violations if v.severity == "critical")
        if critical_count > 0:
            recommendations.append(f"URGENT: Address {critical_count} critical compliance violations immediately")
        
        pii_violations = sum(1 for v in violations if v.rule_name == "PII_EXPOSURE")
        if pii_violations > 0:
            recommendations.append(f"Implement PII scanning and redaction for {pii_violations} documents")
        
        recommendations.append("Schedule regular compliance training for document handling")
        recommendations.append("Implement automated compliance monitoring for new documents")
        
        return recommendations

print("Compliance Monitor created")

In [None]:
# Run compliance monitoring
compliance_monitor = ComplianceMonitor(
    egnyte_domain=os.getenv("EGNYTE_DOMAIN"),
    egnyte_token=os.getenv("EGNYTE_USER_TOKEN"),
    openai_key=os.getenv("OPENAI_API_KEY")
)

# Scan for violations
violations = compliance_monitor.scan_documents(days_back=7)

# Generate compliance report
compliance_report = compliance_monitor.generate_compliance_report(violations)

print("COMPLIANCE MONITORING REPORT")
print("=" * 50)
print(f"Total Violations Found: {compliance_report['total_violations']}")
print(f"\n Severity Breakdown:")
for severity, count in compliance_report['severity_breakdown'].items():
    if count > 0:
        print(f"  {severity.upper()}: {count}")

print(f"\n Rule Breakdown:")
for rule, count in compliance_report['rule_breakdown'].items():
    print(f"  {rule}: {count}")

if compliance_report['critical_violations']:
    print(f"\n CRITICAL VIOLATIONS:")
    for violation in compliance_report['critical_violations']:
        print(f"  • {violation['rule']} in {violation['document']}")
        print(f"    Confidence: {violation['confidence']:.2f}")
        print(f"    Action: {violation['remediation']}")

print(f"\n RECOMMENDATIONS:")
for rec in compliance_report['recommendations']:
    print(f"  • {rec}")

## 3. Executive Reporting Workflow

Automated executive summary generation from enterprise documents:

In [None]:
class ExecutiveReportGenerator:
    """Generate executive reports from enterprise documents."""
    
    def __init__(self, egnyte_domain: str, egnyte_token: str, openai_key: str):
        self.retriever = EgnyteRetriever(domain=egnyte_domain, user_token=egnyte_token)
        self.llm = ChatOpenAI(model="gpt-4", temperature=0.2, openai_api_key=openai_key)
        
        # Executive summary prompt
        self.executive_prompt = PromptTemplate(
            template="""
            Create an executive summary from the following business documents.
            Focus on key insights, strategic implications, and actionable recommendations.
            
            Documents:
            {documents}
            
            Generate a comprehensive executive summary with:
            
            ## EXECUTIVE SUMMARY
            
            ### Key Highlights
            - [3-5 most important findings]
            
            ### Strategic Insights
            - [Strategic implications and trends]
            
            ### Financial Impact
            - [Financial implications and metrics]
            
            ### Risk Assessment
            - [Key risks and mitigation strategies]
            
            ### Recommendations
            - [Specific, actionable recommendations]
            
            ### Next Steps
            - [Immediate actions required]
            
            Keep the summary concise but comprehensive, suitable for C-level executives.
            """,
            input_variables=["documents"]
        )
    
    def generate_executive_report(
        self, 
        topic: str, 
        folder_paths: List[str] = None,
        days_back: int = 30
    ) -> Dict[str, Any]:
        """Generate executive report for a specific topic."""
        
        logger.info(f"Generating executive report for topic: {topic}")
        
        all_documents = []
        
        # Search in specified folders or globally
        if folder_paths:
            for folder in folder_paths:
                search_options = create_folder_search_options(
                    folder_path=folder,
                    limit=10
                )
                search_options.created_after = datetime.now() - timedelta(days=days_back)
                
                self.retriever.search_options = search_options
                docs = self.retriever.invoke(topic)
                all_documents.extend(docs)
        else:
            search_options = create_date_range_search_options(
                created_after=datetime.now() - timedelta(days=days_back),
                limit=15
            )
            self.retriever.search_options = search_options
            all_documents = self.retriever.invoke(topic)
        
        logger.info(f"Retrieved {len(all_documents)} documents for executive report")
        
        # Prepare document content for analysis
        document_summaries = []
        for doc in all_documents[:10]:  # Limit to top 10 documents
            summary = f"""
            Document: {doc.metadata.get('name', 'Unknown')}
            Path: {doc.metadata.get('path', 'Unknown')}
            Content: {doc.page_content[:1500]}...
            """
            document_summaries.append(summary)
        
        # Generate executive summary
        with get_openai_callback() as cb:
            executive_summary = self.llm.invoke(
                self.executive_prompt.format(
                    documents="\n\n".join(document_summaries)
                )
            )
        
        # Compile report
        report = {
            "report_title": f"Executive Report: {topic.title()}",
            "generated_at": datetime.now().isoformat(),
            "topic": topic,
            "documents_analyzed": len(all_documents),
            "time_period": f"Last {days_back} days",
            "executive_summary": executive_summary.content,
            "source_documents": [
                {
                    "name": doc.metadata.get('name', 'Unknown'),
                    "path": doc.metadata.get('path', 'Unknown'),
                    "size": doc.metadata.get('size', 'Unknown'),
                    "modified": doc.metadata.get('last_modified', 'Unknown')
                }
                for doc in all_documents[:10]
            ],
            "api_usage": {
                "total_tokens": cb.total_tokens,
                "prompt_tokens": cb.prompt_tokens,
                "completion_tokens": cb.completion_tokens,
                "total_cost": cb.total_cost
            }
        }
        
        logger.info(f"Executive report generated. API cost: ${cb.total_cost:.4f}")
        return report
    
    def save_report(self, report: Dict[str, Any], filename: str = None) -> str:
        """Save executive report to file."""
        
        if not filename:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            topic_clean = report['topic'].replace(' ', '_').lower()
            filename = f"executive_report_{topic_clean}_{timestamp}.json"
        
        with open(filename, 'w') as f:
            json.dump(report, f, indent=2, default=str)
        
        logger.info(f"Executive report saved to {filename}")
        return filename

print("Executive Report Generator created")

In [None]:
# Generate executive report
exec_generator = ExecutiveReportGenerator(
    egnyte_domain=os.getenv("EGNYTE_DOMAIN"),
    egnyte_token=os.getenv("EGNYTE_USER_TOKEN"),
    openai_key=os.getenv("OPENAI_API_KEY")
)

# Generate report for quarterly performance
exec_report = exec_generator.generate_executive_report(
    topic="quarterly performance metrics",
    folder_paths=["/Shared/Reports", "/Shared/Finance"],
    days_back=90
)

print("EXECUTIVE REPORT GENERATED")
print("=" * 60)
print(f"Title: {exec_report['report_title']}")
print(f"Documents Analyzed: {exec_report['documents_analyzed']}")
print(f"Time Period: {exec_report['time_period']}")
print(f"API Cost: ${exec_report['api_usage']['total_cost']:.4f}")
print("\n" + "=" * 60)
print(exec_report['executive_summary'])
print("\n" + "=" * 60)
print(f"\n Source Documents ({len(exec_report['source_documents'])}):")
for i, doc in enumerate(exec_report['source_documents'][:5], 1):
    print(f"{i}. {doc['name']} - {doc['path']}")

# Save report
report_file = exec_generator.save_report(exec_report)
print(f"\n Report saved to: {report_file}")

## 4. Enterprise Workflow Orchestration

Orchestrate multiple workflows for comprehensive enterprise automation:

In [None]:
class EnterpriseWorkflowOrchestrator:
    """Orchestrate multiple enterprise workflows."""
    
    def __init__(self, egnyte_domain: str, egnyte_token: str, openai_key: str):
        self.domain = egnyte_domain
        self.token = egnyte_token
        self.openai_key = openai_key
        
        # Initialize all workflow components
        self.doc_intelligence = DocumentIntelligencePipeline(egnyte_domain, egnyte_token, openai_key)
        self.compliance_monitor = ComplianceMonitor(egnyte_domain, egnyte_token, openai_key)
        self.exec_generator = ExecutiveReportGenerator(egnyte_domain, egnyte_token, openai_key)
    
    def run_daily_workflow(self) -> Dict[str, Any]:
        """Run comprehensive daily enterprise workflow."""
        
        logger.info("Starting daily enterprise workflow")
        workflow_start = datetime.now()
        
        results = {
            "workflow_start": workflow_start.isoformat(),
            "status": "running",
            "components": {}
        }
        
        try:
            # 1. Document Intelligence Analysis
            logger.info("Running document intelligence analysis...")
            doc_insights = self.doc_intelligence.process_documents("daily reports", max_docs=15)
            intelligence_report = self.doc_intelligence.generate_intelligence_report(doc_insights)
            
            results["components"]["document_intelligence"] = {
                "status": "completed",
                "insights_generated": len(doc_insights),
                "high_risk_docs": len(intelligence_report['high_risk_documents']),
                "avg_confidence": intelligence_report['average_confidence']
            }
            
            # 2. Compliance Monitoring
            logger.info("Running compliance monitoring...")
            violations = self.compliance_monitor.scan_documents(days_back=1)  # Daily scan
            compliance_report = self.compliance_monitor.generate_compliance_report(violations)
            
            results["components"]["compliance_monitoring"] = {
                "status": "completed",
                "violations_found": len(violations),
                "critical_violations": compliance_report['severity_breakdown']['critical'],
                "recommendations": len(compliance_report['recommendations'])
            }
            
            # 3. Executive Reporting (weekly)
            if workflow_start.weekday() == 0:  # Monday
                logger.info("Generating weekly executive report...")
                exec_report = self.exec_generator.generate_executive_report(
                    topic="weekly business summary",
                    days_back=7
                )
                
                results["components"]["executive_reporting"] = {
                    "status": "completed",
                    "documents_analyzed": exec_report['documents_analyzed'],
                    "api_cost": exec_report['api_usage']['total_cost']
                }
            else:
                results["components"]["executive_reporting"] = {
                    "status": "skipped",
                    "reason": "Weekly report only generated on Mondays"
                }
            
            # 4. Generate Workflow Summary
            workflow_summary = self._generate_workflow_summary(results)
            results["workflow_summary"] = workflow_summary
            
            results["status"] = "completed"
            results["workflow_end"] = datetime.now().isoformat()
            results["duration_minutes"] = (datetime.now() - workflow_start).total_seconds() / 60
            
            logger.info(f"Daily workflow completed in {results['duration_minutes']:.2f} minutes")
            
        except Exception as e:
            logger.error(f"Workflow failed: {e}")
            results["status"] = "failed"
            results["error"] = str(e)
            results["workflow_end"] = datetime.now().isoformat()
        
        return results
    
    def _generate_workflow_summary(self, results: Dict[str, Any]) -> str:
        """Generate human-readable workflow summary."""
        
        summary_parts = []
        
        # Document Intelligence Summary
        doc_intel = results["components"].get("document_intelligence", {})
        if doc_intel.get("status") == "completed":
            summary_parts.append(
                f"📊 Document Intelligence: Analyzed {doc_intel['insights_generated']} documents, "
                f"found {doc_intel['high_risk_docs']} high-risk items "
                f"(avg confidence: {doc_intel['avg_confidence']:.2f})"
            )
        
        # Compliance Summary
        compliance = results["components"].get("compliance_monitoring", {})
        if compliance.get("status") == "completed":
            summary_parts.append(
                f"🛡️ Compliance: Found {compliance['violations_found']} violations, "
                f"{compliance['critical_violations']} critical issues requiring immediate attention"
            )
        
        # Executive Reporting Summary
        exec_report = results["components"].get("executive_reporting", {})
        if exec_report.get("status") == "completed":
            summary_parts.append(
                f" Executive Report: Generated from {exec_report['documents_analyzed']} documents "
                f"(API cost: ${exec_report['api_cost']:.4f})"
            )
        elif exec_report.get("status") == "skipped":
            summary_parts.append(" Executive Report: Skipped (weekly schedule)")
        
        return "\n".join(summary_parts)
    
    def schedule_workflow(self, schedule_type: str = "daily") -> str:
        """Generate cron schedule for workflow automation."""
        
        schedules = {
            "daily": "0 6 * * *",  # 6 AM daily
            "hourly": "0 * * * *",  # Every hour
            "weekly": "0 6 * * 1",  # 6 AM Monday
            "business_hours": "0 9-17 * * 1-5"  # 9 AM to 5 PM, weekdays
        }
        
        return schedules.get(schedule_type, schedules["daily"])

print(" Enterprise Workflow Orchestrator created")

In [None]:
# Run enterprise workflow
orchestrator = EnterpriseWorkflowOrchestrator(
    egnyte_domain=os.getenv("EGNYTE_DOMAIN"),
    egnyte_token=os.getenv("EGNYTE_USER_TOKEN"),
    openai_key=os.getenv("OPENAI_API_KEY")
)

# Execute daily workflow
workflow_results = orchestrator.run_daily_workflow()

print(" ENTERPRISE WORKFLOW RESULTS")
print("=" * 60)
print(f"Status: {workflow_results['status'].upper()}")
print(f"Duration: {workflow_results.get('duration_minutes', 0):.2f} minutes")
print(f"\n Workflow Summary:")
print(workflow_results.get('workflow_summary', 'No summary available'))

print(f"\n🔧 Component Status:")
for component, details in workflow_results['components'].items():
    status_emoji = "✅" if details['status'] == 'completed' else "⏭️" if details['status'] == 'skipped' else "❌"
    print(f"  {status_emoji} {component.replace('_', ' ').title()}: {details['status']}")

# Generate cron schedule
cron_schedule = orchestrator.schedule_workflow("daily")
print(f"\n Cron Schedule (daily): {cron_schedule}")
print("   Add this to your crontab for automated execution")

## Production Deployment Considerations

### Infrastructure Requirements:

1. **Compute Resources**: 4+ CPU cores, 8GB+ RAM for concurrent processing
2. **Storage**: Persistent storage for reports and logs
3. **Network**: Reliable internet for Egnyte and OpenAI API calls
4. **Security**: Secure credential management (AWS Secrets Manager, Azure Key Vault)

### Monitoring & Alerting:

1. **Health Checks**: Monitor API availability and response times
2. **Error Tracking**: Comprehensive error logging and alerting
3. **Cost Monitoring**: Track OpenAI API usage and costs
4. **Compliance Alerts**: Immediate notifications for critical violations

### Scalability:

1. **Horizontal Scaling**: Multiple worker instances for large document volumes
2. **Queue Management**: Redis/RabbitMQ for job queuing
3. **Caching**: Redis for API response caching
4. **Load Balancing**: Distribute requests across instances

### Security Best Practices:

1. **Credential Rotation**: Regular API key rotation
2. **Network Security**: VPC/firewall configuration
3. **Data Encryption**: Encrypt sensitive data at rest and in transit
4. **Audit Logging**: Comprehensive audit trail for compliance

## Next Steps

- **[Multi-Modal Analysis](04-multimodal-analysis.ipynb)**: Working with different document types
- **[Security & Compliance](05-security-compliance.ipynb)**: Advanced security patterns
- **[Performance Optimization](06-performance-optimization.ipynb)**: Scaling for enterprise workloads