A command-line tool that processes raw event leads into scored, segmented, and email-ready deliverables. It accepts CSV and Excel exports from multiple sources (Google Forms, badge scanners, HubSpot card scans), deduplicates and normalizes the data, applies LLM-based scoring and segmentation, and produces three output files ready for sales follow-up.
Chinese version: README-ZH.md
Japanese version: README-JA.md
git clone https://github.com/saltism/event-lead-cli.git
cd event-lead-cli
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt# Put event CSV/XLSX files into data/
# Generate config (event or meetup)
python -m event_leads init-config --type event --name "Your Event Name" --date "2026-06-15" --location "Singapore"
# python -m event_leads init-config --type meetup --name "Your Meetup Name" --date "2026-06-20" --location "Tokyo"
# Set OpenAI key
export OPENAI_API_KEY='sk-...'
# Optional preflight check
./scripts/smoke-test.sh configs/your-event-name.yaml
# Run pipeline
./run_enrich.sh configs/your-event-name.yaml# Put card images into data/cards/
python -m event_leads cards-ocr-and-run configs/your-event-name.yaml --input-dir data/cards --output-csv data/business-card.csv# Output files are in configs/output/
# - *-leads.csv
# - *-report.md
# - *-email-drafts.md
# If interrupted
./run_enrich.sh configs/your-event-name.yaml --resumeEach run produces three files in configs/output/:
| File | Recipient | Description |
|---|---|---|
{prefix}-leads.csv |
Sales / BD | Cleaned and enriched lead list with _segment column; ready for HubSpot import |
{prefix}-report.md |
Sales / BD | Segment definitions, follow-up recommendations, compact score tables, and language auto-selected from lead data |
{prefix}-email-drafts.md |
Sales / BD | Follow-up email drafts per segment, in the detected language of each recipient (EN / JA / zh-TW) |
- Python 3.9 or later
- An OpenAI API key (the tool uses GPT-4o-mini by default)
- macOS or Linux
The tool uses GPT-4o-mini by default — chosen for its low cost per token, which matters when processing hundreds of leads across multiple LLM calls per run.
Using a different OpenAI-compatible API (e.g. Azure OpenAI, Groq, Together AI, or a local Ollama instance with an OpenAI-compatible endpoint) requires no code changes. Set two environment variables before running:
export OPENAI_API_KEY="your-key-for-that-provider"
export OPENAI_BASE_URL="https://your-provider-endpoint/v1"Using Anthropic or Gemini requires a small change in event_leads/enrich.py. Replace the two client functions near line 213:
# Before (OpenAI)
def _get_client():
return instructor.from_openai(OpenAI())
def _get_async_client():
return instructor.from_openai(AsyncOpenAI())
# After (Anthropic example)
from anthropic import Anthropic, AsyncAnthropic
def _get_client():
return instructor.from_anthropic(Anthropic())
def _get_async_client():
return instructor.from_anthropic(AsyncAnthropic())You will also need to install the provider's SDK (pip install anthropic) and update requirements.txt accordingly. The rest of the pipeline does not need to change — instructor normalizes the interface across providers.
The tool produces a HubSpot-ready CSV. Below is the recommended one-time setup and per-event import process.
- In HubSpot, go to Settings → Properties → Contact properties
- Create a new property:
- Label: Lead Segment
- Internal name:
lead_segment - Field type: Single-line text (or Dropdown if you want predefined values)
- Optionally create a second property
Lead Score(Number type) to import_score_overall.
- Open
{prefix}-leads.csvin Excel or Google Sheets - Rename the column
_segmenttoLead Segment(to match the HubSpot property label), and_score_overalltoLead Scoreif you created that property - In HubSpot, go to Contacts → Import → Import a file
- Select the CSV; map columns to properties during the import wizard
- After import, use Lists or Filters to group contacts by Lead Segment
- Open
{prefix}-email-drafts.md - Each section is labeled by segment and language
- In HubSpot, filter contacts by segment, open a contact record, and paste the corresponding draft into the email composer
- Personalize the
[First Name]and[Company]fields before sending
These are the field names the pipeline understands. Map your source column headers to them in the mapping: and survey_fields: sections.
| Field | Description |
|---|---|
name |
Full name |
email |
Email address — used as the primary deduplication key |
company_title |
Company and title combined in a single column |
company |
Company name when it is in its own column |
title |
Job title when it is in its own column |
phone |
Phone number |
interest_scenario |
Stated AI use case interest |
demo_interest |
Demo request signal |
project_timeline |
Estimated project or evaluation timeline |
additional_topics |
Free-text survey responses |
The LLM scores each lead on four dimensions (0–10). For every dimension, it also provides a one-sentence justification, so each score is transparent and auditable by sales.
The overall score is a weighted average. Weights must sum to 1.0. Descriptions and weights can be adjusted per event in the scoring: section of the config.
| Dimension | Default weight | What it measures | Example justification |
|---|---|---|---|
company_fit |
0.25 | Company size and industry match to target customer profile | "Public financial company; profile strongly matches the target ICP" |
seniority_match |
0.25 | Budget authority or procurement influence | "VP-level contact with clear purchasing influence" |
engagement_signal |
0.35 | On-site interest level and follow-up request | "Explicitly requested a demo with a near-term timeline" |
interest_alignment |
0.15 | Alignment between stated interests and core capabilities | "Needs align with workflow automation and knowledge base use cases" |
The report stays compact for large lead volumes. If sales needs detailed score justifications for a specific lead, check the corresponding _score_*_reason columns in {prefix}-leads.csv.
| Field | Description |
|---|---|
file |
Filename relative to data_dir |
type |
csv or xlsx |
encoding |
Character encoding; use utf-8-sig for Windows Excel exports, big5 for Traditional Chinese exports |
mapping |
Maps source column headers to standard field names |
survey_fields |
Survey-specific columns; listed separately for LLM context |
attendance_status |
attended or registered |
OPENAI_API_KEY not set
Set the key in your current session and resume:
export OPENAI_API_KEY="sk-..."
./run_enrich.sh configs/your-event-name.yaml --resumeUnicodeEncodeError: 'ascii' codec can't encode characters ...
Your OPENAI_API_KEY or OPENAI_BASE_URL likely contains non-ASCII characters (for example smart quotes, full-width symbols, or hidden spaces from copy/paste). Re-export them as plain ASCII:
unset OPENAI_API_KEY OPENAI_BASE_URL
export OPENAI_API_KEY='sk-...'
# Optional:
# export OPENAI_BASE_URL='https://your-provider-endpoint/v1'CONFIG path does not exist
Generate the config first, then run:
python -m event_leads init-config --type event --name "Your Event Name" --date "2026-06-15" --location "Your City"
./run_enrich.sh configs/your-event-name.yamlzsh: command not found: #
You likely pasted comment lines (starting with #) into the terminal. Run only executable commands.
Error: Got unexpected extra argument (...)
Use only one config path. Correct format:
./run_enrich.sh configs/your-event-name.yamlDo not append a second config argument.
All scores are 0 after --resume
This usually means a bad old checkpoint was reused after a failed API-key run. Delete stale checkpoints and run again:
rm -f configs/output/checkpoints/*.pkl
./run_enrich.sh configs/your-event-name.yamlSegments are not exactly A/B/C
This usually comes from old outputs generated by a previous version. Pull latest code, clear old report artifacts, and rerun:
git pull
rm -f configs/output/*-report.md configs/output/*-leads.csv
./run_enrich.sh configs/your-event-name.yamlFirst LLM call feels slow
The first request can be slower due to model warm-up or network latency. This does not indicate a segmentation bug. For long runs, use --resume to avoid recomputing finished stages.
Garbled text or encoding errors
Set encoding: utf-8-sig in the source config. For Traditional Chinese exports from Taiwan-based systems, try big5.
Incorrect email language
Language is inferred from the lead's name characters and email domain: Chinese characters → zh_tw, .jp domain or Japanese characters → ja, all others → en. To override, manually edit the _lang column in the output CSV.
Report language is not what you want
By default, report language is selected from the majority lead language (en, ja, zh_tw). To force one language, set:
output:
filename_prefix: "your-event-name"
report_language: "en" # en / ja / zh_tw / autoevent-lead-cli/
├── README.md
├── README-ZH.md
├── README-JA.md
├── run_enrich.sh ← entry point for each event run
├── requirements.txt
├── data/ ← source CSV/Excel files
├── configs/
│ ├── event-template.yaml ← trade show / conference template
│ ├── meetup-template.yaml ← meetup / community template
│ └── output/
│ ├── checkpoints/ ← intermediate state for --resume (safe to delete after a run)
│ ├── *-leads.csv
│ ├── *-report.md
│ └── *-email-drafts.md
└── event_leads/
├── __main__.py
├── pipeline.py
├── enrich.py
└── parsers.py