-
Notifications
You must be signed in to change notification settings - Fork 184
Log Parsing EN
- Initial scan: parse recent window after startup.
- Incremental scan: periodic scan by
system.taskInterval. - Backfill: fill older logs in background.
- IP geo backfill: resolve IP locations asynchronously.
- State file:
var/nginxpulse_data/nginx_scan_state.json - If current size < last size, the file is treated as rotated and re-parsed.
- Site ID is derived from
websites[].name. Renaming creates a new site.
-
system.parseBatchSizecontrols batch size (default 100). - Can be overridden by
LOG_PARSE_BATCH_SIZE.
Endpoint: GET /api/status
log_parsing_progresslog_parsing_estimated_remaining_secondsip_geo_progressip_geo_estimated_remaining_seconds
Poll this endpoint to update progress in UI.
- Parsing writes core fields first; IP geo is queued.
- IP geo is resolved in batches after parsing.
- For speed: increase
parseBatchSize, use faster disk, or split logs by day.
-
system.logRetentionDayscontrols cleanup. - Cleanup runs at 02:00 (system timezone).
WEBSITES is a JSON array, each item describes one site. logPath must be a container-accessible path.
Example:
environment:
WEBSITES: '[{"name":"Site 1","logPath":"/share/log/nginx/access-site1.log","domains":["www.kaisir.cn","kaisir.cn"]}, {"name":"Site 2","logPath":"/share/log/nginx/access-site2.log","domains":["home.kaisir.cn"]}]'
volumes:
- ./nginx_data/logs/site1/access.log:/share/log/nginx/access-site1.log:ro
- ./nginx_data/logs/site2/access.log:/share/log/nginx/access-site2.log:roIf you have many sites, consider mounting the entire log directory and specify exact files in WEBSITES:
environment:
WEBSITES: '[{"name":"Site 1","logPath":"/share/log/nginx/access-site1.log","domains":["www.kaisir.cn","kaisir.cn"]}, {"name":"Site 2","logPath":"/share/log/nginx/access-site2.log","domains":["home.kaisir.cn"]}]'
volumes:
- ./nginx_data/logs:/share/log/nginx/Tip: If logs are rotated daily, use
*to replace the date, e.g.{"logPath":"/share/log/nginx/site1.top-*.log"}.
.gz logs are supported. logPath can point to a single .gz file or a glob:
{"logPath": "/share/log/nginx/access-*.log.gz"}There is a gzip sample in var/log/gz-log-read-test/.
When logs are not convenient to mount locally, you can use sources instead of logPath. Once sources is set, logPath is ignored.
sources is a JSON array. Each item defines a log source. This design allows:
- Multiple sources per site (multiple machines/directories/buckets).
- Different parsing/auth/polling strategies per source.
- Easy extension for rotation/archival without changing old sources.
Common fields:
-
id: unique source ID (recommend globally unique). -
type:local/sftp/http/s3/agent. -
mode:-
poll: periodic pulling (default). -
stream: streaming input only (currently Push Agent only). -
hybrid: stream + polling fallback (only Push Agent streams; others still usepoll).
-
-
pollInterval: polling interval (e.g.5s). -
pattern: rotation glob (SFTP/Local/S3 use glob; HTTP uses index JSON). -
compression:auto/gz/none. -
parse: override parsing (see “Parsing Override”).
streammode is mainly for Push Agent; other sources still run aspoll.
Best when you can provide HTTP access to log files (internal network or with auth).
Method A: Expose files via Nginx/Apache (lock it down to avoid leakage)
location /logs/ {
alias /var/log/nginx/;
autoindex on;
# Add basic auth / IP allowlist
}Then configure sources:
{
"id": "http-main",
"type": "http",
"mode": "poll",
"url": "https://logs.example.com/logs/access.log",
"rangePolicy": "auto",
"pollInterval": "10s"
}rangePolicy:
-
auto: prefer Range; fallback to full download (skips already-read bytes). -
range: force Range; error if not supported. -
full: always download full file.
Method B: JSON index API
Good for rotated logs (daily/hourly) or .gz archives:
{
"index": {
"url": "https://logs.example.com/index.json",
"jsonMap": {
"items": "items",
"path": "path",
"size": "size",
"mtime": "mtime",
"etag": "etag",
"compressed": "compressed"
}
}
}Recommended index contract:
- Return a JSON with an array of log objects.
- Each item must include
path(a fetchable URL). - Provide
size/mtime/etagto detect changes and avoid duplicates. -
mtimesupports RFC3339 / RFC3339Nano /2006-01-02 15:04:05/ Unix seconds.
Example response:
{
"items": [
{
"path": "https://logs.example.com/access-2024-11-03.log.gz",
"size": 123456,
"mtime": "2024-11-03T13:00:00Z",
"etag": "abc123",
"compressed": true
},
{
"path": "https://logs.example.com/access.log",
"size": 98765,
"mtime": 1730638800,
"etag": "def456",
"compressed": false
}
]
}If your fields differ, map them in jsonMap:
{
"index": {
"url": "https://logs.example.com/index.json",
"jsonMap": {
"items": "data",
"path": "url",
"size": "length",
"mtime": "updated_at",
"etag": "hash",
"compressed": "gz"
}
}
}Notes:
-
pathmust be a directly accessible log URL. - For
.gz, provide stableetag/size/mtimeto avoid duplicate parsing. - If HTTP Range is not supported, use
autoorfull.
Ideal when SSH/SFTP access is available, no extra HTTP service needed.
{
"id": "sftp-main",
"type": "sftp",
"mode": "poll",
"host": "1.2.3.4",
"port": 22,
"user": "nginx",
"auth": { "keyFile": "/secrets/id_rsa" },
"path": "/var/log/nginx/access.log",
"pattern": "/var/log/nginx/access-*.log.gz",
"pollInterval": "5s"
}
authsupportskeyFileandpassword.
Best when logs are archived to OSS/S3 (Aliyun/Tencent/AWS compatible endpoints).
{
"id": "s3-main",
"type": "s3",
"mode": "poll",
"endpoint": "https://oss-cn-hangzhou.aliyuncs.com",
"bucket": "nginx-logs",
"prefix": "prod/access/",
"pollInterval": "30s"
}If formats differ across sources, override parsing per source:
{
"parse": {
"logType": "nginx",
"logRegex": "^(?P<ip>\\S+) - (?P<user>\\S+) \\[(?P<time>[^\\]]+)\\] \"(?P<request>[^\"]+)\" (?P<status>\\d+) (?P<bytes>\\d+) \"(?P<referer>[^\"]*)\" \"(?P<ua>[^\"]*)\"$",
"timeLayout": "02/Jan/2006:15:04:05 -0700"
}
}Designed for internal networks or edge nodes. Logs are pushed in real time.
You need to set up two machines:
- Start nginxpulse (ensure backend
:8089is reachable). - Recommend enabling access keys:
ACCESS_KEYS(orsystem.accessKeys). - Get
websiteID: callGET /api/websites. - If you need a custom format for the agent, add a
type=agentsource for parse override:
{
"name": "Main Site",
"sources": [
{
"id": "agent-main",
"type": "agent",
"parse": {
"logFormat": "$remote_addr - $remote_user [$time_local] \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\""
}
}
]
}- Prepare the agent (build or use prebuilt).
Build:
go build -o bin/nginxpulse-agent ./cmd/nginxpulse-agentPrebuilt binaries:
prebuilt/nginxpulse-agent-darwin-arm64prebuilt/nginxpulse-agent-linux-amd64
- Create agent config on the log server (fill in parsing server and
websiteID).- Fetch
websiteIDfrom the parsing server:curl http://<nginxpulse-server>:8089/api/websitesTheidfield is thewebsiteID.
- Fetch
{
"server": "http://<nginxpulse-server>:8089",
"accessKey": "your-key",
"websiteID": "abcd",
"sourceID": "agent-main",
"paths": ["/var/log/nginx/access.log"],
"pollInterval": "1s",
"batchSize": 200,
"flushInterval": "2s"
}- Run the agent:
./bin/nginxpulse-agent -config configs/nginxpulse_agent.jsonNotes:
- The log server must reach
http://<nginxpulse-server>:8089/api/ingest/logs. - To override parsing, set a
type=agentsource withid=sourceIDand fillparse. - The agent skips
.gzfiles; if a log file shrinks (rotation), it restarts from the beginning.
- If reparse happens on restart, make sure no stale process is running.
- Globs may match more files than expected.
- Gzip logs are parsed as full files based on metadata.
- Home
- 快速开始
- 部署方式
- SQLite -> PostgreSQL 迁移
- 配置说明
- 完整字段参考
- 日志来源配置
- 支持的日志格式
- Agent 采集
- 日志解析机制
- IP 归属地解析
- 数据库结构
- 常见问题
- Home (EN)
- Quick Start (EN)
- Deployment (EN)
- Migration (EN)
- Configuration (EN)
- Config Reference (EN)
- Log Sources (EN)
- Supported Log Formats (EN)
- Agent Collection (EN)
- Log Parsing (EN)
- IP Geo (EN)
- Database Schema (EN)
- FAQ (EN)