Skip to content

erinjerri/TimeBite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TimeBite

Previously called CYRA (Creating Your Reality Agent), TimeBite is an Apple Vision Claw-style productivity agent with a measurable eval/benchmark loop for computer-use tasks.

TimeBite Logo

TL;DR

TimeBite captures user intent (voice/text), plans safe computer actions, runs guarded execution loops, and reports time-reclaimed outcomes through dashboard metrics and benchmark runs.

Product Goals

  • Build an iOS-first agentic productivity experience.
  • Enforce safe execution with policy checks and human approvals.
  • Benchmark reliability under perturbations (popups, layout shifts, out-of-stock states).
  • Quantify value through minutes_reclaimed and scenario success metrics.

AgentBeats Integration (From CYRA Repo)

Evaluation Status

  • CYRA is already registered as a Green Agent with a baseline Purple Agent.
  • Leaderboard/eval pipeline is configured in the CYRA repo.
  • TimeBite extends the same architecture for productivity and time-reclaimed benchmarking.

Running Green Agent Container

docker pull ghcr.io/erinjerri/cyra-green-agent:latest
docker run ghcr.io/erinjerri/cyra-green-agent:latest

After Running Agents (Recommended Flow)

  1. Validate run logs and output artifacts (success/failure + telemetry).
  2. Update scenario.toml participant IDs/env in the leaderboard repo.
  3. Trigger assessment workflow from a branch and review generated results.
  4. Merge results PR to publish leaderboard updates.
  5. Sync key metrics into TimeBite dashboard docs (minutes_reclaimed, success rate, unsafe action rate).

Reuse vs New Docker Image / Agent Registration

  • Reuse existing CYRA Green Agent if evaluator contract and benchmark scope are unchanged.
  • Publish a new image tag if logic changed (policy, scoring, tool behavior, schema), then update the registered agent image reference.
  • Register a new Green Agent only if you are creating a distinct benchmark identity (new domain/leaderboard), not for routine iteration.

Overall App Architecture

flowchart LR
    U["User (iOS)"] --> APP["TimeBite App"]
    APP --> API["API Gateway (/process, /runs, /metrics)"]
    API --> LPROC["Lambda: Process Orchestrator"]
    LPROC --> LSAFE["Lambda: Policy + Approval Guard"]
    LPROC --> LVIS["On-device CoreML Vision Context"]
    LPROC --> LACT["Lambda: Action Runtime + Retry"]
    LACT --> TELEM["Step Telemetry Stream"]
    TELEM --> LSTORE["Lambda Storage DB (S3 bucket layout + run index)"]
    LSTORE --> INS["Insights Aggregator"]
    INS --> DASH["Dashboard (Ring + Buckets + Matrix)"]
    LSTORE --> BENCH["Benchmark/Eval Harness"]
Loading

Information Architecture

flowchart TD
    A["User Intent (Voice/Text)"] --> B["Intent Normalizer"]
    B --> C["Task/Session Orchestrator"]
    C --> D["Policy + Guardrails"]
    D --> E["Agent Runtime (Plan -> Act -> Observe)"]
    E --> F["Computer Use Layer (Screenshot/CoreML Context/Action)"]
    E --> G["Approval Gate (cart/checkout/irreversible)"]
    F --> H["Telemetry Stream"]
    G --> H
    H --> I["Lambda Storage DB (Runs/Events/Insights)"]
    I --> J["Insights Engine"]
    J --> K["Dashboard (Ring, Buckets, Matrix)"]
    I --> L["Benchmark Harness"]
    L --> M["Reliability Report"]
Loading

System Design

flowchart LR
    subgraph Client["Client Layer"]
        C1["iOS App"]
        C2["Dashboard UI"]
    end

    subgraph API["Backend/API Layer"]
        A1["/process"]
        A2["/runs"]
        A3["/metrics"]
        A4["Policy Service"]
        A5["Approval Service"]
    end

    subgraph Runtime["Agent Runtime Layer"]
        R1["Planner"]
        R2["CoreML Vision Context"]
        R3["Action Executor"]
        R4["Retry + Timeout Controller"]
    end

    subgraph Data["Data Layer (Lambda Storage DB)"]
        D1["Task Store"]
        D2["Session/Run Store"]
        D3["Event Logs"]
        D4["Insight Aggregates"]
    end

    C1 --> A1
    C1 --> A2
    C2 --> A3
    A1 --> A4
    A1 --> R1
    R1 --> R2
    R2 --> R3
    R3 --> R4
    R4 --> A5
    R4 --> D3
    A2 --> D2
    A3 --> D4
    A1 --> D1
    D3 --> D4
Loading

Agent Loop (Execution + Safety)

sequenceDiagram
    participant U as User
    participant App as iOS App
    participant API as Process API
    participant Policy as Guardrails
    participant Agent as Runtime Loop
    participant Vision as CoreML Vision
    participant Exec as Action Executor
    participant Approve as Human Approval
    participant Log as Telemetry

    U->>App: Submit goal (voice/text)
    App->>API: POST /process
    API->>Policy: Validate allowlist, max-step, timeout
    Policy-->>API: Policy decision
    API->>Agent: Start run(session_id)
    loop Until done or timeout
        Agent->>Vision: Capture screenshot + context
        Vision-->>Agent: Parsed state
        Agent->>Exec: Next safe action
        Exec-->>Agent: Outcome
        Agent->>Log: step(action, latency, outcome)
        alt Irreversible action detected
            Agent->>Approve: Request confirmation
            Approve-->>Agent: Approve/Reject
        end
    end
    Agent-->>API: Final run result + summary
    API-->>App: Response + insights
Loading

Planned API Surface

Endpoint Method Purpose
/process POST Start/continue an agent run from structured intent.
/runs GET Fetch run history, statuses, and scenario results.
/metrics GET Return dashboard metrics (time reclaimed, reliability, safety).

Core Data Model (Planned)

Entity Key Fields
Task task_id, intent, constraints, priority, created_at
Session session_id, user_id, start_time, end_time, status
Run run_id, session_id, scenario, score, duration_ms, result
Insight insight_id, session_id, minutes_reclaimed, bucket, created_at
StepEvent run_id, step_index, action, latency_ms, outcome, safety_flag

Safety and Compliance Defaults

  • Allowlist-based action policy.
  • Max steps per run and strict wall-clock timeout.
  • Mandatory human confirmation for checkout/cart/final submit.
  • Benchmark-time perturbation tests before release freeze.

Metrics That Matter

  • minutes_reclaimed per run/day/week.
  • Success rate across scored benchmark sessions.
  • Unsafe action rate under perturbation.
  • Median latency per action and end-to-end run time.

Quick Start (Boilerplate)

# 1) Clone
git clone https://github.com/erinjerri/TimeBite.git
cd TimeBite

# 2) Create env file (example)
cp .env.example .env

# 3) Install deps
# npm install

# 4) Run app/api
# npm run dev

Note: this repository is currently documentation-first. Replace placeholder setup commands with your actual stack commands as implementation lands.

Backend API (Current Stub)

cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

Routes:

  • POST /process
  • GET /runs
  • GET /metrics
  • GET /health

Documentation

About

Previously called CYRA (Creating Your Reality Agent/Create Your Reality Agent), with Apple Vision Claw, a productivity AI app with eval/benchmark for computer use

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages