Releases: pganalyze/collector
Releases · pganalyze/collector
v0.49.2
- Bugfix: Ensure all relation information will be sent out even with a lock
- This fixes a bug where we were not sending out relation information of
relations encountered locks. Processing a snapshot missing such information
was failing
- This fixes a bug where we were not sending out relation information of
- Allow pg_stat_statements_reset() to fail with a soft error
- This was a hard error previously, which failed the snapshot and the snapshot
state did not get persisted, indirectly led to a memory leak
- This was a hard error previously, which failed the snapshot and the snapshot
- Add integrity checks before uploading snapshots
- Validate some structural assumptions that cannot be enforced by protobuf
before sending a snapshot
- Validate some structural assumptions that cannot be enforced by protobuf
- Bugfix: Increase timeout to prevent data loss when monitoring many servers
- This mitigates an issue introduced in 0.49.0
v0.49.1
- Relation queries: Correctly handle later queries encountering a lock
- This fixes edge cases where relation metadata (e.g. which indexes exist)
can appear and disappear from one snapshot to the next, due to locks
held for parts of the snapshot collection
- This fixes edge cases where relation metadata (e.g. which indexes exist)
- Relation statistics: Avoid bogus data due to diffs against locked objects
- This fixes a bug where table or index statistics can be skipped due to
locks held on the relation, and that causing a bad data point to be
collected on a subsequent snapshot, since the prior snapshot would be
missing an entry for that relation. Fixed by consistently skipping
statistics for that table/index in such situations.
- This fixes a bug where table or index statistics can be skipped due to
- Amazon RDS / Aurora: Support new long-lived CA authorities
- Introduces the new "rds-ca-global" option for db_sslrootcert, which is the
recommended configuration for RDS and Aurora going forward, which encompasses
both "rds-ca-2019-root" and all newer RDS CAs such as "rds-ca-rsa2048-g1". - For compatibility reasons we still support naming the "rds-ca-2019-root" CA
explicitly, but its now just an alias for the global set.
- Introduces the new "rds-ca-global" option for db_sslrootcert, which is the
- Citus: Add option to turn off collection of Citus schema statistics
- For certain Citus deployments, running the relation or index size functions
can fail or time out due to a very high number of distributed tables. - Adds the new option "disable_citus_schema_stats" / "DISABLE_CITUS_SCHEMA_STATS"
to turn off the collection of these statistics. When using this option its
recommended to instead monitor the workers directly for table and index sizes.
- For certain Citus deployments, running the relation or index size functions
- Add troubleshooting HINT when creating pg_stat_statements extension fails
- This commonly fails due to creating pg_stat_statements on the wrong database,
see https://pganalyze.com/docs/install/troubleshooting/pg_stat_statements
- This commonly fails due to creating pg_stat_statements on the wrong database,
v0.49.0
- Update pg_query_go to v4 / Postgres 15 parser
- Besides supporting newer syntax like the MERGE statement, this parser
update also drops support for "?" replacement characters found in
pg_stat_statements output before Postgres 10
- Besides supporting newer syntax like the MERGE statement, this parser
- Postgres 10 is now the minimum required version for running the collector
- We have dropped support for 9.6 and earlier due to the parser update,
and due to Postgres 9.6 now being End-of-Life (EOL) for over 1 year
- We have dropped support for 9.6 and earlier due to the parser update,
- Enforce maximum time for each snapshot collection using deadlines
- Sometimes individual database servers can take longer than the allocated
interval (e.g. 10 minutes for a full snapshot), which previously lead to
missing data for other servers monitored by the same collector process - The new deadline-based logic ensures that collector functions return with
a "context deadline exceeded" error when the allocated interval is exceed,
causing a clear error for that server, and allowing other servers to
continue reporting their data as planned - As a side effect of this change, Ctrl+C (SIGINT) now works to stop a
collector test right away, instead of waiting for the snapshot to complete
- Sometimes individual database servers can take longer than the allocated
- Log Insights
- Only consider first 1000 characters for log_line_prefix to speed up parsing
- Clearly report errors with closing/removing temporary files
- Improve --analyze-logfile mode for debugging log parsing
- Amazon RDS/Aurora: Improve handling of excessively large log file portions
- Azure DB for Postgres: Fix log line parsing for DETAIL lines
- Collect xmin horizon metrics
- Bugfixes
- Relation info: Correctly filter out foreign tables for constraints query
- Return zero as FullFrozenXID for replicas
- Update Go modules flagged by dependency scanners (issues are not actually applicable)
v0.48.0
- Update to Go 1.19
- Bugfix: Ensure relfrozenxid = 0 is tracked as full frozenxid = 0 (instead of
adding epoch prefix) - Amazon RDS and Amazon Aurora: Support IAM token authentication
- This adds a new configuration setting,
db_use_iam_auth
/DB_USE_IAM_AUTH
.
If enabled, the collector fetches a short-lived token for logging into the
database instance from the AWS API, instead of using a hardcoded password
in the collector configuration file - In order to use this setting, IAM authentication needs to be enabled on the
database instance / cluster, and the pganalyze IAM policy needs to be
extended to cover the "rds-db:connect" privilege for the pganalyze user:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAMDBAuth.IAMPolicy.html
- This adds a new configuration setting,
- Amazon RDS: Avoid DescribeDBInstances calls for log downloads for RDS instances
- This should reduce issues with rate limiting on this API call in some cases
- Amazon Aurora: Cache failures in DescribeDBClusters calls for up to 10 minutes
- This reduces repeated calls to the AWS API when the cluster identifier is
incorrect
- This reduces repeated calls to the AWS API when the cluster identifier is
- Log parsing: Add support for timezones specified by number, such as "-03"
v0.47.0
- Fix RDS log processing for large log file sections
- This fixes an issue with RDS log file chunks larger than 10MB that caused
the collector to calculate log text source offsets incorrectly and could
lead to mangled log text in the pganalyze UI and incorrect log filtering
- This fixes an issue with RDS log file chunks larger than 10MB that caused
- Warn if some log lines will be ignored
- Some verbose logging settings can lead to log lines being ignored by the
collector for performance reasons: warn about this on startup
- Some verbose logging settings can lead to log lines being ignored by the
- Improve Aiven Service ID and Project ID detection from hostname
- Fix error handling when fetching stats
- The missing checks could previously lead to incomplete snapshots, possibly
resulting in tables or indexes temporarily disappearing in pganalyze
- The missing checks could previously lead to incomplete snapshots, possibly
- Fix error handling regarding reading SSL-related config values on startup
- Ignore non-Postgres URIs in environment on Heroku (@Preovaleo)
- Send additional Postgres table stats
- Send relpages, reltuples, relallvisible
- Send additional Postgres transaction metadata
- server level (new stats): current TXID and next MXID
- database level: age of datfrozenxid and datminmxid, also xact_commit and
xact_rollback - table level: the age of relfrozenxid and relminmxid
- Send Citus distributed index sizes
- Add
always_collect_system_data
config option- Also configurable with the
PGA_ALWAYS_COLLECT_SYSTEM_DATA
environment
variable - This is useful for package-based setups which monitor the local server by a
non-local IP
- Also configurable with the
- Update pg_query_go version to 2.2.0
- Install script: Detect aarch64 for Ubuntu/Debian package install
v0.46.1
- Fix Postgres 15 compatibility due to version check bug
- This fixes an issue with Postgres 15 only that caused the collector to reject
the newer pg_stat_statements version (1.10) by accident
- This fixes an issue with Postgres 15 only that caused the collector to reject
- Add packages for Ubuntu 22.04, RHEL9-based distributions and Fedora 36
v0.46.0
- Relation stats: Skip statistics collection on child tables when parent is locked
- Add new wait events from Postgres 13 and 14
- Log streaming: Discard logs after consistent failures to upload
- Collect blocking PIDs for lock monitoring
- Collect blocking PIDs for the backends in waiting for locks state
- Disable this option by passing the "--no-postgres-locks" option to the collector binary
- Add "--benchmark" flag for running collector in benchmark mode (does not send data to pganalyze service)
v0.45.2
- Amazon RDS/Aurora
- Log download: Fix edge case that caused errors on hourly log boundaries
- Resolves errors like "Error reading 65817 bytes from tempfile: unexpected EOF"
- Collect tags assigned to instance as system metadata
- Log download: Fix edge case that caused errors on hourly log boundaries
- Docker: Allow setting CONFIG_CONTENTS to pass ini-style configuration
- This allows easier configuration of multiple servers to be monitored by
the same Docker container. Previously this required use of a volume
mount, which can be harder to make work successfully. - CONFIG_CONTENTS needs to match the regular configuration file format that
uses separate sections for each server. - This can be combined with environment-variable style configuration for
settings that apply to all servers (e.g. PGA_API_KEY) but all
server-specific configuration should only be passed in through the
CONFIG_CONTENTS variable.
- This allows easier configuration of multiple servers to be monitored by
v0.45.1
- Amazon Aurora and Amazon RDS
- Auto-detect Aurora writer instance, as well as reader on two-node clusters
- Previously it was required to specify the individual instance to support
log downloads and system metrics, but this now happens automatically - The cluster name is auto-detected from the hostname, but to override the
new "aws_db_cluster_id" and "aws_db_cluster_readonly" settings can be used - This requires giving the IAM policy for the collector the
"DescribeDBClusters" permission - In case more than one reader instance is used, each reader instance must
be specified individually instead of using the readonly cluster hostname
- Previously it was required to specify the individual instance to support
- Show RDS instance role hint when running collector test
- Ensure permission errors during log download are shown
- Auto-detect Aurora writer instance, as well as reader on two-node clusters
- Add "-q" / "--quiet" flag for hiding everything except errors in the logs
v0.45.0
- Log Insights: Filter out
log_statement=all
andlog_duration=on
log lines- This replaces the previous behaviour that prevented all log collection for
servers that had eitherlog_statement=all
orlog_duration=on
enabled. - With the new logic, we continue ignoring these high-frequency events
(which would cause downstream problems), but accept all other log events,
including threshold-based auto_explain events.
- This replaces the previous behaviour that prevented all log collection for
- Track extensions that are installed on each database
- This is helpful to ensure that the necessary schema definitions are
loaded by pganalyze, e.g. for use by the Index Advisor. - Ignore objects that are provided by extensions, as determined by pg_depend
(e.g. function definitions, etc)
- This is helpful to ensure that the necessary schema definitions are
- Add support for Google AlloyDB for PostgreSQL
- This adds new options to specify the AlloyDB cluster ID and instance ID
- Special cases the log parsing to support AlloyDB's
[filename:line]
prefix - Supports AlloyDB's modified autovacuum log output
- Add explicit support for Aiven Postgres databases
- Support was previously available via the self-managed instructions, but
this adds explicit support and improved setup instructions - Existing Aiven servers that were detected as self-managed will be
automatically updated to be recognized as Aiven servers
- Support was previously available via the self-managed instructions, but
- Self-managed servers
- Support disk statistics for software RAID devices
- These statistics are summarized across all component disk devices and
then tracked for the parent software RAID device as one. Note that this
is only done in case these statistics are not yet set (which is the case
for the typical Linux software RAID setup).
- These statistics are summarized across all component disk devices and
- Allow using
pg_read_file
to read log files (instead of log tail / syslog)- This relies on the built-in Postgres function
pg_read_file
to read log
files and return the log data over the Postgres connection. - This requires superuser (either directly or through a helper) and thus
does not work on managed database providers, with the exception of
Crunchy Bridge, for which this is already the mechanism to fetch logs. - Additionally, this carries higher overhead than directly tailing log
files, or using syslog, and thus should only be used when necessary. - Set
db_log_pg_read_file = 1
/LOG_PG_READ_FILE=1
to enable the logic
- This relies on the built-in Postgres function
- Support disk statistics for software RAID devices
- Crunchy Bridge
- Fix collection of system metrics
- Heroku Postgres
- Fix blank log line parsing
- Add
--test-section
parameter to set a specific config section to test - Fully qualify constraint definitions, to support non-standard schemas
- Add support for log_line_prefix
%m [%p] %q%u@%d
and%t [%p] %q%u@%d %h