webmonitor is an AWS CDK application that ingests domain intelligence feeds, builds searchable datasets, reconciles results into DynamoDB, and sends alert emails for selected events.
The project is designed around scheduled Lambda workflows and organization-scoped data access.
- Downloads domain-monitor feed files (CSV + full ZIP) into S3.
- Converts daily CSV files into SQLite snapshots.
- Loads targeted domains from a source table and executes searches.
- Reconciles matches into dedicated DynamoDB tables (
insert/deletedelta model). - Sends SES email alerts on inserts for selected tables.
The CDK app defines these stacks:
-
WebmonitorStorage- Creates S3 bucket
temporarywebmonitor. - Enforces SSL, blocks public access, and applies 1-day lifecycle expiration.
- Adds org-scoped resource policies (via
/organization/id) forListBucket/GetObject.
- Creates S3 bucket
-
WebmonitorDownload- Creates
downloadLambda (Python 3.13, ARM64, 15 min timeout, 4 GiB ephemeral storage). - Stores API token in Secrets Manager secret
webmonitor. - Downloads domain-monitor list types and full ZIP to S3 bucket
temporarywebmonitor. - Runs daily at
01:00(cron inus-east-2).
- Creates
-
WebmonitorSqlite- Creates
list+makeLambdas. listfinds current-day CSV files and invokesmakeasynchronously.maketurns each CSV into a SQLite DB and uploads it back to S3.- Also copies
dns.sqlite3fromcaretakerstagedinto dated*-osint.sqlite3intemporarywebmonitor. - Runs daily at
01:15.
- Creates
-
WebmonitorZiplist- Creates
ziplistLambda. - Scans the dated full ZIP for item matches and reconciles into DynamoDB table
full. - If the requested dated object is missing, it attempts fallback to previous day by copying in S3.
- Creates
-
WebmonitorSearch- Creates
search+searchlistLambdas. searchlistreads terms from a source DynamoDB table (lunker) and tracks daily status instate.- Invokes
ziplistfor full ZIP search andsearchfor each SQLite file. searchqueries SQLite (domainsordnstable depending on dataset) and reconciles results into the target DynamoDB table derived from key name.- Runs daily at
11:15.
- Creates
-
WebmonitorDynamoDB- Creates DynamoDB tables:
dailyremove,dailyupdate,weeklyremove,weeklyupdatemonthlyremove,monthlyupdate,quarterlyremove,quarterlyupdatefull,malware,osint,state
- All tables use on-demand billing, TTL (
ttl), PITR enabled, streams enabled, and deletion protection. - Replicates each table to
us-east-1andus-west-2. - Creates
actionLambda and subscribes it to streams fromdailyremove,dailyupdate,malware, andosint. actionlooks up recipients from thelunkertable (GSI onpk+tk) and sends raw SES mail alerts.
- Creates DynamoDB tables:
-
WebmonitorGithub- Creates GitHub OIDC provider + IAM role for
repo:jblukach/webmonitor:*. - Grants permissions needed for CDK deployments and asset publishing.
- Creates GitHub OIDC provider + IAM role for
app.py # CDK app entrypoint
webmonitor/ # CDK stack definitions
download/download.py # Feed downloader Lambda
sqlite/list.py # SQLite orchestrator Lambda
sqlite/make.py # CSV -> SQLite builder Lambda
search/list.py # Search orchestrator Lambda
search/search.py # SQLite matcher + DynamoDB reconciler
ziplist/ziplist.py # ZIP matcher + DynamoDB reconciler
action/action.py # DynamoDB stream -> SES notifications
- Python 3.13 (or a compatible local version for CDK synth/deploy)
- AWS CLI configured for your target account
- AWS CDK v2
- Bootstrap completed for qualifier
lukachin target regions (us-east-1,us-east-2,us-west-2) - Existing S3 bucket for Lambda layer artifact:
packages-use2-lukach-iowithrequests.zip - Existing SSM parameters:
/account/lunker(account ID that owns thelunkertable)/organization/id(AWS Organization ID)
- Existing DynamoDB table
lunkerwith a GSI using:- hash key:
pk - range key:
tk
- hash key:
- Verified SES identity for
hello@lukach.ioand/orlukach.io
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtIf this is a new account/region bootstrap:
cdk bootstrap --qualifier lukach aws://<ACCOUNT_ID>/us-east-1
cdk bootstrap --qualifier lukach aws://<ACCOUNT_ID>/us-east-2
cdk bootstrap --qualifier lukach aws://<ACCOUNT_ID>/us-west-2Deploy everything:
cdk deploy --all --profile <aws-profile>Useful CDK commands:
cdk synth
cdk diff
cdk ls
cdk destroy --all --profile <aws-profile>Update the generated Secrets Manager secret webmonitor with your real domain-monitor API token:
{
"token": "<YOUR_TOKEN>"
}Without this value, the downloader cannot fetch upstream feed data.
Invoke downloader:
aws lambda invoke \
--function-name download \
--payload '{}' \
--cli-binary-format raw-in-base64-out \
/tmp/download.jsonRun search list in scheduled mode:
aws lambda invoke \
--function-name searchlist \
--payload '{}' \
--cli-binary-format raw-in-base64-out \
/tmp/searchlist.jsonRun search list for a single status/item:
aws lambda invoke \
--function-name searchlist \
--payload '{"Status":"example"}' \
--cli-binary-format raw-in-base64-out \
/tmp/searchlist-single.jsondownloadwrites dated files like:YYYY-MM-DD-dailyupdate.csvYYYY-MM-DD-full.zip
sqlite/listtriggerssqlite/maketo createYYYY-MM-DD-*.sqlite3files.search/listselects search terms and invokes:ziplistagainstYYYY-MM-DD-full.zip-> tablefullsearchagainst each SQLite DB -> table inferred from key suffix
- Reconciliation logic inserts new matches and deletes stale matches in DynamoDB.
- DynamoDB stream inserts on selected tables trigger
action, which sends SES alerts.
- Lambdas are configured for Python 3.13 and ARM64.
- Several IAM policies in this project currently use broad resource scopes (
*); tighten them if required by your security posture. - Storage is intentionally short-lived in
temporarywebmonitor(1-day expiration). - Reconciliation code includes previous-day fallback for missing dated S3 objects in
searchandziplistworkflows.
This repository is licensed under the terms in LICENSE.