Skip to content

jllarreag/hr-reader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HR Document Reader

AI-powered document processing system for HR documents using AWS Bedrock, Strands Agents, and serverless architecture.

🎯 Overview

This project provides an automated solution for extracting structured data from HR documents (CVs, payrolls, contracts, lab results) using:

  • AWS Bedrock with Nova Lite model for AI processing
  • Strands Agents SDK for agentic workflows
  • API Gateway for secure document uploads
  • App Runner for containerized application hosting
  • S3 + DynamoDB for storage
  • Langfuse for observability (optional)

πŸ—οΈ Architecture

Client β†’ API Gateway (API Key validation)
         ↓
    Lambda (generates presigned URL)
         ↓
    Client uploads to S3
         ↓
    S3 Event β†’ Lambda β†’ App Runner (Strands Agent)
         ↓
    Results saved to DynamoDB

πŸ“‹ Prerequisites

  • AWS Account with Bedrock access
  • AWS CLI configured
  • Node.js 18+ (for CDK)
  • Python 3.11+
  • Docker
  • jq (for client script)

πŸš€ Quick Start

1. Clone and Setup

git clone <repository-url>
cd hr-reader-simple
cp .env.example .env
# Edit .env with your AWS profile and Langfuse keys (optional)

2. Deploy Infrastructure

# Install dependencies
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

# Deploy with Langfuse (optional)
./scripts/deploy/deploy_with_langfuse.sh

# Or deploy without Langfuse
./scripts/deploy/deploy.sh

3. Create API Key

./scripts/service/manage_api_keys.sh create your-user-id
# Save the generated API key

4. Test the System

export API_URL="<your-api-gateway-url>"
export API_KEY="<your-api-key>"

# Test with lab results
./scripts/test/client_example.sh lab tests/sample_lab_result.txt

# Test with physical evaluation
./scripts/test/client_example.sh evaluacion_fisica tests/sample_evaluacion_fisica.txt

πŸ“ Project Structure

hr-reader-simple/
β”œβ”€β”€ app/                    # Application code
β”‚   β”œβ”€β”€ agent.py           # Strands agent configuration
β”‚   β”œβ”€β”€ api.py             # FastAPI endpoints
β”‚   β”œβ”€β”€ config.py          # Settings management
β”‚   β”œβ”€β”€ observability.py   # Langfuse integration
β”‚   └── tools.py           # Agent tools (S3, DynamoDB, Textract)
β”œβ”€β”€ cdk/                    # Infrastructure as Code
β”‚   β”œβ”€β”€ app.py             # CDK app entry point
β”‚   └── stacks/            # CloudFormation stacks
β”œβ”€β”€ lambda/                 # Lambda functions
β”‚   β”œβ”€β”€ presigned_url/     # Generate upload URLs
β”‚   β”œβ”€β”€ process_document/  # Trigger processing
β”‚   └── s3_notification_config/  # S3 event configuration
└── scripts/                # Utility scripts
    β”œβ”€β”€ deploy/            # Deployment scripts
    β”‚   β”œβ”€β”€ deploy.sh
    β”‚   └── deploy_with_langfuse.sh
    β”œβ”€β”€ service/           # Service management
    β”‚   β”œβ”€β”€ manage_api_keys.sh
    β”‚   β”œβ”€β”€ pause.sh
    β”‚   └── resume.sh
    └── test/              # Testing scripts
        β”œβ”€β”€ client_example.sh
        β”œβ”€β”€ client_test.sh
        └── demo_video.sh

πŸ”§ Configuration

Environment Variables (.env)

# AWS
AWS_REGION=us-east-1
AWS_PROFILE=your-profile

# Bedrock
MODEL_ID=us.amazon.nova-lite-v1:0

# Langfuse (Optional)
LANGFUSE_ENABLED=true
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com

Supported Document Types

  • lab - Laboratory results (blood tests, biochemistry, etc.)
    • Example: tests/sample_lab_result.txt
  • evaluacion_fisica - Physical evaluation (anthropometry, vital signs, body composition)
    • Example: tests/sample_evaluacion_fisica.txt
  • cv - Curriculum Vitae
  • nomina - Payroll documents
  • contrato - Employment contracts

πŸ“Š Observability

The system integrates with Langfuse for complete observability:

  • LLM call tracing with token usage and costs
  • Tool execution monitoring
  • Agent reasoning cycles
  • Performance metrics

Enable by setting LANGFUSE_ENABLED=true in .env and deploying with ./deploy_with_langfuse.sh.

πŸ’° Cost Management

Pause Service (saves compute costs)

./scripts/service/pause.sh

Resume Service

./scripts/service/resume.sh

Destroy All Resources

cd cdk
cdk destroy --all

πŸ” Security

  • API Gateway with API key authentication
  • Presigned URLs with 5-minute expiration
  • App Runner not publicly accessible
  • S3 bucket encryption enabled
  • DynamoDB encryption at rest

πŸ“ API Usage

1. Request Upload URL

curl -X POST $API_URL/upload \
  -H "x-api-key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "lab",
    "filename": "test.pdf",
    "content_type": "application/pdf"
  }'

2. Upload Document

curl -X PUT "<presigned_url>" \
  -H "Content-Type: application/pdf" \
  --data-binary "@document.pdf"

3. Check Results

# Get specific document results
curl $APPRUNNER_URL/results/<document_key>

# List all unconsulted results
curl $APPRUNNER_URL/results

πŸ› οΈ Development

Local Testing

# Run locally
source venv/bin/activate
uvicorn app.api:app --reload

Update Infrastructure

./scripts/deploy/deploy_with_langfuse.sh

πŸ“š Additional Documentation

🀝 Contributing

This project was created for a hackathon. Feel free to fork and adapt for your needs.

πŸ“„ License

MIT License - See LICENSE file for details

πŸ™ Acknowledgments

  • AWS for Bedrock and infrastructure services
  • Strands team for the Agents SDK
  • Langfuse for observability platform

About

health and personal record document reader. Return structured response basing on document type

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors