GeoForge is a reproducible GeoIP database compiler that builds a local MaxMind-compatible MMDB from multiple independent geolocation sources.
The pipeline combines DB-IP Lite, MaxMind GeoLite2, IP2Location LITE, Sypex Geo, operator geofeeds, RIR delegated statistics, and GeoNames reference datasets into a normalized consensus-based geolocation layer.
Primary outputs:
release/geo.mmdbrelease/geo.csvrelease/geo-quality-report.txt
Free GeoIP datasets typically optimize for broad coverage, not cross-source validation.
GeoForge approaches geolocation as a consensus problem:
prefix seed
-> multi-source candidate collection
-> confidence scoring
-> normalization
-> conflict resolution
-> reproducible MMDB output
The builder merges independent signals, downranks inconsistent records, applies conservative normalization rules, and produces an auditable local database suitable for gateways, analytics, enrichment services, fraud systems, and infrastructure tooling.
The project is designed for:
- offline local lookups
- deterministic rebuilds
- source transparency
- quality regression tracking
- operational GeoIP enrichment
Source Databases
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
GeoLite2 IP2Location Sypex Geo
│ │ │
└──────────────┬───┴──────────────────┘
▼
Consensus Engine
scoring / merge / weighting
▼
GeoNames Enrichment
postal / city normalization
▼
Output Cleanup
timezone / precision / QA
▼
MMDB Export
cmd/
├── builder/ Main database compiler
└── qualitycheck/ Post-build validation
internal/
├── consensus/ Merge and scoring logic
├── geofeed/ RFC 8805 parser/index
├── geozip/ GeoNames enrichment
├── output/ Final normalization
├── refdata/ Country/currency metadata
├── rirstats/ RIR delegated statistics
└── strnorm/ String normalization
data/
release/
scripts/
Downloaded source datasets and generated outputs are intentionally gitignored.
| Source | Role |
|---|---|
| DB-IP Lite | Required prefix seed |
| MaxMind GeoLite2 City | Consensus baseline |
| IP2Location LITE DB5 | Independent geo signal |
| Sypex Geo City | CIS-focused enrichment |
| RFC 8805 geofeeds | Operator-published corrections |
| RIR delegated stats | Registry country attribution |
| GeoNames postal dump | Postal enrichment |
| GeoNames cities1000 | City geoname resolution |
GeoForge separates source collection from output generation, allowing partial builds when some datasets are unavailable.
Create a local environment file:
cp admin.env.example admin.envRun the pipeline:
./geo.shThe build workflow:
- Acquires a release lock
- Downloads updated datasets
- Detects source changes
- Runs Go tests
- Compiles builders
- Generates MMDB + CSV outputs
- Runs quality validation
Force rebuild:
FORCE_BUILD=1 ./geo.shDisable downloads:
AUTO_DOWNLOAD=0 ./geo.sh| File | Description |
|---|---|
release/geo.mmdb |
MaxMind-compatible GeoIP database |
release/geo.csv |
CSV audit/export copy |
release/geo-quality-report.txt |
Post-build quality analysis |
release/geo.previous.csv |
Previous snapshot for diffing |
Each MMDB entry contains top-level metadata plus a nested location object.
| Field | Description |
|---|---|
matched_prefix |
CIDR written into MMDB |
confidence |
Consensus confidence score |
source_updated_at |
UTC build timestamp |
country_metadata |
Country/currency/calling metadata |
location |
Final geolocation object |
| Field | Description |
|---|---|
continent_code |
Continent code |
country_code |
ISO country |
registry_country_code |
RIR-derived registry country |
subdivision_name |
Normalized admin region |
city_geoname_id |
GeoNames city identifier |
city_name |
Normalized city |
postal_code |
Consensus postal code |
latitude |
Rounded latitude |
longitude |
Rounded longitude |
time_zone |
Derived timezone |
accuracy_radius_km |
Conservative accuracy estimate |
{
"ip": "1.208.10.20",
"matched_prefix": "1.208.0.0/12",
"confidence": 85,
"source_updated_at": "2026-05-20T00:00:00Z",
"location": {
"country_code": "KR",
"country_name": "South Korea",
"city_name": "Seoul",
"postal_code": "04524",
"latitude": 37.56631,
"longitude": 126.9772,
"time_zone": "Asia/Seoul",
"accuracy_radius_km": 20
}
}GeoForge is designed to improve operational quality through source consensus rather than raw source replacement.
Expected improvements over single-source lite datasets:
- better country stability
- improved city consistency
- stronger CIS coverage
- cleaner normalization
- more conservative precision signaling
- reduced malformed text artifacts
The builder intentionally favors stable consensus over aggressive precision claims.
After each build, the pipeline runs a post-build validation stage and generates:
release/geo-quality-report.txt
Validation includes:
- coverage statistics
- confidence distribution
- added/removed prefixes
- country/city regressions
- mojibake detection
- MMDB smoke lookups
Enable strict mode:
QUALITY_STRICT=1 ./geo.shStrict mode fails the build on large regressions or suspicious output anomalies.
Final records are normalized immediately before export.
Normalization includes:
- coordinate rounding
- mojibake repair
- subdivision cleanup
- duplicate collapse
- timezone derivation
- conservative multilingual cleanup
Normalization intentionally avoids broad transliteration or aggressive geopolitical rewriting.
Allowlisted RFC 8805 feeds are configured in:
data/geofeeds/allowlist.tsv
Supported formats:
prefix,country,region,city
prefix,country,region,city,postal
Default IPv4 floor:
GEOFEED_MAX_IPV4_BITS=24
This prevents excessive host-level fragmentation from narrow geofeed entries.
The downloader uses content hashing and atomic replacement semantics.
Tracked state files:
data/download-state.tsv
data/download-changed.txt
If no source changed, the builder preserves the existing MMDB unless forced.
| Domain | Example |
|---|---|
| Fraud Detection | Geo consistency checks |
| SIEM Enrichment | Country/city attribution |
| Analytics | Geographic aggregation |
| Gateways | Local GeoIP lookups |
| Data Pipelines | IP enrichment |
| Infrastructure | Region-aware routing |
- City-level geolocation remains probabilistic
- Mobile and VPN accuracy may vary substantially
- Prefix coverage depends on DB-IP Lite seed availability
- Postal enrichment should be treated as opportunistic
- Geofeeds improve operator-owned allocations but are uneven globally
The repository is designed so code can be published independently from downloaded datasets.
Do not commit:
admin.env- downloaded provider databases
- generated release artifacts
- API credentials or license tokens
Review THIRD_PARTY_DATA.md before redistributing derived outputs.
Planned additions:
- ASN-aware geo heuristics
- confidence-weighted source tuning
- regional regression dashboards
- IPv6 quality scoring
- compressed bulk exports
- build reproducibility attestations
See LICENSE.
Additional redistribution guidance:
THIRD_PARTY_DATA.md
GeoForge aggregates third-party geolocation datasets into derived operational outputs. IP geolocation should be treated as probabilistic infrastructure metadata, not physical-user attribution.
