HR Document Reader

AI-powered document processing system for HR documents using AWS Bedrock, Strands Agents, and serverless architecture.

🎯 Overview

This project provides an automated solution for extracting structured data from HR documents (CVs, payrolls, contracts, lab results) using:

AWS Bedrock with Nova Lite model for AI processing
Strands Agents SDK for agentic workflows
API Gateway for secure document uploads
App Runner for containerized application hosting
S3 + DynamoDB for storage
Langfuse for observability (optional)

🏗️ Architecture

Client → API Gateway (API Key validation)
         ↓
    Lambda (generates presigned URL)
         ↓
    Client uploads to S3
         ↓
    S3 Event → Lambda → App Runner (Strands Agent)
         ↓
    Results saved to DynamoDB

📋 Prerequisites

AWS Account with Bedrock access
AWS CLI configured
Node.js 18+ (for CDK)
Python 3.11+
Docker
jq (for client script)

🚀 Quick Start

1. Clone and Setup

git clone <repository-url>
cd hr-reader-simple
cp .env.example .env
# Edit .env with your AWS profile and Langfuse keys (optional)

2. Deploy Infrastructure

# Install dependencies
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

# Deploy with Langfuse (optional)
./scripts/deploy/deploy_with_langfuse.sh

# Or deploy without Langfuse
./scripts/deploy/deploy.sh

3. Create API Key

./scripts/service/manage_api_keys.sh create your-user-id
# Save the generated API key

4. Test the System

export API_URL="<your-api-gateway-url>"
export API_KEY="<your-api-key>"

# Test with lab results
./scripts/test/client_example.sh lab tests/sample_lab_result.txt

# Test with physical evaluation
./scripts/test/client_example.sh evaluacion_fisica tests/sample_evaluacion_fisica.txt

📁 Project Structure

hr-reader-simple/
├── app/                    # Application code
│   ├── agent.py           # Strands agent configuration
│   ├── api.py             # FastAPI endpoints
│   ├── config.py          # Settings management
│   ├── observability.py   # Langfuse integration
│   └── tools.py           # Agent tools (S3, DynamoDB, Textract)
├── cdk/                    # Infrastructure as Code
│   ├── app.py             # CDK app entry point
│   └── stacks/            # CloudFormation stacks
├── lambda/                 # Lambda functions
│   ├── presigned_url/     # Generate upload URLs
│   ├── process_document/  # Trigger processing
│   └── s3_notification_config/  # S3 event configuration
└── scripts/                # Utility scripts
    ├── deploy/            # Deployment scripts
    │   ├── deploy.sh
    │   └── deploy_with_langfuse.sh
    ├── service/           # Service management
    │   ├── manage_api_keys.sh
    │   ├── pause.sh
    │   └── resume.sh
    └── test/              # Testing scripts
        ├── client_example.sh
        ├── client_test.sh
        └── demo_video.sh

🔧 Configuration

Environment Variables (.env)

# AWS
AWS_REGION=us-east-1
AWS_PROFILE=your-profile

# Bedrock
MODEL_ID=us.amazon.nova-lite-v1:0

# Langfuse (Optional)
LANGFUSE_ENABLED=true
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com

Supported Document Types

lab - Laboratory results (blood tests, biochemistry, etc.)
- Example: tests/sample_lab_result.txt
evaluacion_fisica - Physical evaluation (anthropometry, vital signs, body composition)
- Example: tests/sample_evaluacion_fisica.txt
cv - Curriculum Vitae
nomina - Payroll documents
contrato - Employment contracts

📊 Observability

The system integrates with Langfuse for complete observability:

LLM call tracing with token usage and costs
Tool execution monitoring
Agent reasoning cycles
Performance metrics

Enable by setting LANGFUSE_ENABLED=true in .env and deploying with ./deploy_with_langfuse.sh.

💰 Cost Management

Pause Service (saves compute costs)

./scripts/service/pause.sh

Resume Service

./scripts/service/resume.sh

Destroy All Resources

cd cdk
cdk destroy --all

🔐 Security

API Gateway with API key authentication
Presigned URLs with 5-minute expiration
App Runner not publicly accessible
S3 bucket encryption enabled
DynamoDB encryption at rest

📝 API Usage

1. Request Upload URL

curl -X POST $API_URL/upload \
  -H "x-api-key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "lab",
    "filename": "test.pdf",
    "content_type": "application/pdf"
  }'

2. Upload Document

curl -X PUT "<presigned_url>" \
  -H "Content-Type: application/pdf" \
  --data-binary "@document.pdf"

3. Check Results

# Get specific document results
curl $APPRUNNER_URL/results/<document_key>

# List all unconsulted results
curl $APPRUNNER_URL/results

🛠️ Development

Local Testing

# Run locally
source venv/bin/activate
uvicorn app.api:app --reload

Update Infrastructure

./scripts/deploy/deploy_with_langfuse.sh

📚 Additional Documentation

🤝 Contributing

This project was created for a hackathon. Feel free to fork and adapt for your needs.

📄 License

MIT License - See LICENSE file for details

🙏 Acknowledgments

AWS for Bedrock and infrastructure services
Strands team for the Agents SDK
Langfuse for observability platform

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
cdk		cdk
lambda		lambda
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
HACKATHON_SETUP.md		HACKATHON_SETUP.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

HR Document Reader

🎯 Overview

🏗️ Architecture

📋 Prerequisites

🚀 Quick Start

1. Clone and Setup

2. Deploy Infrastructure

3. Create API Key

4. Test the System

📁 Project Structure

🔧 Configuration

Environment Variables (.env)

Supported Document Types

📊 Observability

💰 Cost Management

Pause Service (saves compute costs)

Resume Service

Destroy All Resources

🔐 Security

📝 API Usage

1. Request Upload URL

2. Upload Document

3. Check Results

🛠️ Development

Local Testing

Update Infrastructure

📚 Additional Documentation

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages