A minimum viable data agent. You ask a question in plain English about your spreadsheets; the agent writes a small query, runs it, and answers in plain English.
This is the companion code for the guide at ourcommunity.tech/build-your-own-data-agent.
Built by Our Community Tech — a nonprofit that helps other nonprofits with technology.
- A free Replit account (no local install needed)
- An OpenAI API key (free to create; pay-as-you-go after that — typically $2–$5/month at low volume)
- About 30 minutes
That's it. You do not need to know Python.
- Fork this template (click "Use template" on the Replit page).
- Add your API key as a secret.
In the left sidebar click Secrets, add a secret named
OPENAI_API_KEY, and paste your key. - Click the big green "Run" button. The sample data loads and you get an interactive prompt.
- Ask a question. Try one from the list below.
- Which donors gave less this year than last year, and by how much?
- Who lapsed this year (gave last year, nothing this year)?
- What was our average grant size by program?
- Which programs are over budget, and by how much?
- Which funders gave us more than one grant?
- What's our total grant revenue by status?
- Which cities do our top 10 lifetime donors live in?
-
Export your data to CSV. Each table should be one CSV file. Spreadsheets → File → Download → CSV works fine.
-
Drop your CSVs into the
/datafolder in this Replit. -
Open
agent.pyand edit theDATA_FILESdict near the top:DATA_FILES = { "donors": "data/my_donors.csv", "programs": "data/my_programs.csv", }
The keys are the names the agent will use internally (keep them short and clean —
donors,grants,budget). The values are the paths. -
Click Run.
Important: before you point this at your real data, read
SAFETY.md. There are kinds of data this tool is not
appropriate for.
| File | What it does |
|---|---|
agent.py |
The whole agent — about 250 lines, heavily commented. Start here. |
generate_sample_data.py |
Recreates the sample CSVs. You do not need to run this. |
data/donors.csv |
60 synthetic donors with giving history. |
data/grants.csv |
24 synthetic grants across programs and funders. |
data/program_budget.csv |
7 synthetic program budgets with YTD actuals. |
SAFETY.md |
What you should and should not put through this. Please read. |
requirements.txt |
Python packages (Replit installs these automatically). |
The agent loads your CSVs into pandas, sends the model a summary of your columns (not your actual data), asks it for a small pandas snippet that answers your question, runs that snippet in a narrow sandbox, and sends the result back for a plain-English write-up. Two model calls per question. At gpt-4o-mini prices, most questions cost a fraction of a cent.
- It is not real-time. It only knows what's in your CSVs at load time.
- It is not for huge data. This minimal version comfortably handles a few hundred thousand rows. Beyond that, move to a real BI tool.
- It is not a replacement for a data analyst. It helps when you don't have one.
- It will occasionally be wrong. Check the numbers on important questions.
Toggle
/codeat the prompt to see the query it ran.
Once you have this running on your own data, the obvious next steps are:
- Connect a donor database (Salesforce, Bloomerang, Little Green Light) so you do not need to export CSVs each month.
- Combine multiple sources (grants + accounting + program outcomes) in one agent session.
- Schedule it. Run a fixed set of questions every Monday morning and email the results to your team.
We are writing follow-up guides on each of those. If you want to be the first to see them, or if you need hands-on help, drop us a note at ourcommunity.tech.
License: MIT. Use it, fork it, remix it, ship it for your mission.