Releases: pganalyze/collector
Releases · pganalyze/collector
v0.36.0
- Config parsing improvements:
- Fail fast when pganalyze section is missing in config file
- Ignore duplicates in db_name config setting
- Previously this could cause malformed snapshots that would be submitted
correctly but could not be processed
- Previously this could cause malformed snapshots that would be submitted
- Validate db_url parsing to avoid collector crash with invalid URLs
- Include pganalyze-collector-setup program (see 0.35 release notes) in supported packages
- Rename
<unidentified queryid>
query text placeholder to<query text unavailable>
- This makes it clearer what the underlying issue is
- Revert to using
<truncated query>
instead of<unparsable query>
in some situations- When a query is cut off due to pg_stat_activity limit being reached,
show<truncated query>
, to make it clear that increasing track_activity_query_size
would solve the issue
- When a query is cut off due to pg_stat_activity limit being reached,
- Ignore I/O stats for AWS Aurora utility statements
- AWS Aurora appears to report incorrect blk_read_time and blk_write_time values
for utility statements (i.e., non-SELECT/INSERT/UPDATE/DELETE); we zero these out for now
- AWS Aurora appears to report incorrect blk_read_time and blk_write_time values
- Fix log-based EXPLAIN bug where query samples could be dropped if EXPLAIN failed
- Add U140 log event (inconsistent range bounds)
- e.g.: ERROR: range lower bound must be less than or equal to range upper bound
- Fix issue where incomplete schema information in snapshots was not marked correctly
- This could lead to schema objects disappearing and being re-created
- Fix trailing newline handling for GCP and self-hosted log streams
- This could lead to queries being poorly formatted in the UI, or some queries
with single-line comments being ignored
- This could lead to queries being poorly formatted in the UI, or some queries
- Include additional collector configuration settings in snapshot metadata for diagnostics
- Ignore "insufficient privilege" queries w/o queryid
- Previously, these could all be aggregated together yielding misleading stats
v0.35.0
- Add new "pganalyze-collector-setup" program that streamlines collector installation
- This is initially targeted for self-managed servers to make it easier to set up
the collector and required configuration settings for a locally running Postgres
server - To start, this supports the following environments:
- Postgres 10 and newer, running on the same server as the collector
- Ubuntu 14.04 and newer
- Debian 10 and newer
- This is initially targeted for self-managed servers to make it easier to set up
- Collector test: Show server URLs to make it easier to access the servers in
pganalyze after the test - Collector test+reload: In case of errors, return exit code 1
- Ignore manual vacuums if the collector can't access pg_stat_progress_vacuum
- Don't run log test for Heroku, instead provide info message
- Also fixes "Unsupported log_line_prefix setting: ' sql_error_code = %e '"
error on Heroku Postgres
- Also fixes "Unsupported log_line_prefix setting: ' sql_error_code = %e '"
- Add pganalyze system user to adm group in Debian/Ubuntu packages
- This gives the collector permission to read Postgres log files in a default
install, simplifying Log Insights setup
- This gives the collector permission to read Postgres log files in a default
- Handle NULL parameters for query samples correctly
- Add a skip_if_replica / SKIP_IF_REPLICA option (#117)
- You can use this to configure the collector in a no-op mode on
replicas (we only query if the monitored database is a replica), and
automatically switch to active monitoring when the database is no
longer a replica.
- You can use this to configure the collector in a no-op mode on
- Stop building packages for CentOS 6 and Ubuntu 14.04 (Trusty)
- Both of these systems are now end of life, and the remaining survivor
of the CentOS 6 line (Amazon Linux 1) will be EOL on December 31st 2020.
- Both of these systems are now end of life, and the remaining survivor
v0.34.0
- Check and report problematic log collection settings
- Some Postgres settings almost always cause a drastic increase in log
volume for little actual benefit. They tend to cause operational problems
for the collector (due to the load of additional log parsing) and the
pganalyze service itself (or indeed, likely for any service that would
process collector snapshots), and do not add any meaningful insights.
Furthermore, we found that these settings are often turned on
accidentally. - To avoid these issues, add some client-side checks in the collector to
disable log processing if any of the problematic settings are on. - The settings in question are:
- log_min_duration_statement less than 10ms
- log_statement set to 'all'
- log_duration set to 'on'
- log_error_verbosity set to 'verbose'
- If any of these are set to these unsupported values, all log collection will be
disabled for that server. The settings are re-checked every full snapshot, and can be
explicitly re-checked with a collector reload.
- Some Postgres settings almost always cause a drastic increase in log
- Log Insights improvements
- Self-managed server: Process logs every 3 seconds, instead of on-demand
- Self-managed server: Improve handling of multi-line log events
- Google Cloud SQL: Always acknowledge Pub Sub messages, even if collector doesn't handle them
- Optimize stitching logic for reduced CPU consumption
- Explicitly close temporary files to avoid running out of file descriptors
- Multiple changes to improve debugging in support situations
- Report collector config in full snapshot
- This reports certain collector config settings (except for passwords/keys/credentials)
to the pganalyze servers to help with debugging.
- This reports certain collector config settings (except for passwords/keys/credentials)
- Print collector version at beginning of test for better support handling
- Print collection status and Postgres version before submitting snapshots
- Change panic stack trace logging from Verbose to Warning
- Report collector config in full snapshot
- Add support for running the collector on ARM systems
- Note that we don't provide packages yet, but with this the collector
can be built on ARM systems without any additional patches.
- Note that we don't provide packages yet, but with this the collector
- Introduce API system scope fallback
- This fallback is intended to allow changing the API scope, either based
on user configuration (e.g. moving the collector between different
cloud provider accounts), or because of changes in the collector identify
system logic. - The new "api_system_scope_fallback" / PGA_API_SYSTEM_SCOPE_FALLBACK config
variable is intended to be set to the old value of the scope. When the
pganalyze backend receives a snapshot with a fallback scope set, and there
is no server created with the regular scope, it will first search the
servers with the fallback scope. If found, that server's scope will be
updated to the (new) regular scope. If not found, a new server will be
created with the regular scope. The main goal of the fallback scope is to
avoid creating a duplicate server when changing the scope value
- This fallback is intended to allow changing the API scope, either based
- Use new fallback scope mechanism to change scope for RDS databases
- Previously we identified RDS databases by their ID and region only, but
the ID does not have to be unique within a region, it only has to be
unique within the same AWS account in that region. Thus, adjust the
scope to include both the region and AWS Account ID (if configured or
auto-detected), and use the fallback scope mechanism to migrate existing
servers.
- Previously we identified RDS databases by their ID and region only, but
- Add support for GKE workload identity Yash Bhutwala #91
- Add support for assuming AWS instance roles
- Set the role to be assumed using the new
aws_assume_role
/AWS_ASSUME_ROLE
configuration setting. This is useful when the collector runs in a different
AWS account than your database.
- Set the role to be assumed using the new
v0.33.1
- Ignore internal admin databases for GCP and Azure
- This avoids collecting data from these internal databases, which produces
unnecessary errors when using the all databases setting.
- This avoids collecting data from these internal databases, which produces
- Add log_line_prefix check to GCP self-test
- Schema stats handling: Avoid crash due to nil pointer dereference
- Add support for "%m [%p]: [%l-1] db=%d,user=%u " log_line_prefix
v0.33.0
- Add helper for log-based EXPLAIN access and use if available
- This lets us avoid granting the pganalyze user any access to the data to follow
the principle of least privilege - See https://github.com/pganalyze/collector#setting-up-log-explain-helper
- This lets us avoid granting the pganalyze user any access to the data to follow
- Avoid corrupted snapshots when OIDs get reused across databases
- This would have shown as data not being visible in pganalyze,
particularly for servers with many databases where tables were
dropped and recreated often
- This would have shown as data not being visible in pganalyze,
- Locked relations: Ignore table statistics, handle other exclusive locks
- Tables being rewritten would cause the relation statistics query to
fail due to statement timeout (caused by lock being held) - Non-relation locks held in AccessExclusiveLock mode would cause all
relation information to disappear, but only for everything thats not
the top-level relation information. This is due to the behaviour of
NOT IN when the list contains NULLs (never being true, even if an
item doesn't match the list). The top-level relation information
was using a LEFT JOIN that doesn't suffer from this problem. This likely
caused problems reported as missing index information, or indices
showing as being recently created even though they've exited for a
while.
- Tables being rewritten would cause the relation statistics query to
- Improvements to table partitioning reporting
- Enable additional settings to work correctly when used in Heroku/Docker
- DB_NAME
- DB_SSLROOTCERT_CONTENTS
- DB_SSLCERT_CONTENTS
- DB_SSLKEY_CONTENTS
v0.32.0
- Add
ignore_schema_regexp
/IGNORE_SCHEMA_REGEXP
configuration option- This is like ignore_table_pattern, but pushed down into the actual
stats-gathering queries to improve performance. This should work much
better on very large schemas - We use a regular expression instead of the current glob-like matching
since the former is natively supported in Postgres - We now warn on usage of the deprecated
ignore_table_pattern
field
- This is like ignore_table_pattern, but pushed down into the actual
- Add warning for too many tables being collected (and recommend
ignore_schema_regexp
) - Allow keeping of unparsable query texts by setting
filter_query_text: none
- By default we replace everything with
<unparsable query>
(renamed
from the previous<truncated query>
for clarity), to avoid leaking
sensitive data that may be contained in query texts that couldn't be
parsed and that Postgres itself doesn't mask correctly (e.g. utility
statements) - However in some situations it may be desirable to have the original
query texts instead, e.g. when the collector parser is outdated
(right now the parser is Postgres version 10, and some newer Postgres 12
query syntax fails to parse) - To support this use case, a new "filter_query_text" / FILTER_QUERY_TEXT
option is introduced which can be set to "none" to keep all query texts.
- By default we replace everything with
- EXPLAIN plans / Query samples: Support log line prefix without %d and %u
- Whilst not recommended, in some scenarios changing the log_line_prefix
is difficult, and we want to make it easy to get EXPLAIN data even in
those scenarios - In case the log_line_prefix is missing the database (%d) and/or the user
(%u), we simply use the user and database of the collector connection
- Whilst not recommended, in some scenarios changing the log_line_prefix
- Log EXPLAIN: Run on all monitored databases, not just the primary database
- Add support for stored procedures (new with Postgres 11)
- Handle Postgres error checks using Go 1.13 error helpers
- This is more correct going forward, and adds a required type check for
the error type, since the database methods can also return net.OpError - Fixes "panic: interface conversion: error is *net.OpError, not *pq.Error"
- This is more correct going forward, and adds a required type check for
- Collect information on table partitions
- Relation parents as well as partition boundary (if any)
- Partitionining strategy in use
- List of partitioning fields and/or expression
- Log Insights: Track TLS protocol version as a log line detail
- This allows verification of which TLS versions were used to connect to the
database over time
- This allows verification of which TLS versions were used to connect to the
- Log Insights: Track host as detail for connection received event
- This allows more detailed analysis of which IPs/hostnames have connected
to the database over time
- This allows more detailed analysis of which IPs/hostnames have connected
- Example collector config: Use collect all databases option in the example
- This improves the chance that this is set up correctly from the
beginning, without requiring a backwards incompatible change in the
collector
- This improves the chance that this is set up correctly from the
v0.31.0
- Add Log Insights support for Azure Database for PostgreSQL
- Log Insights: Avoid unnecessary "Timeout" error when there are other failures
- Log EXPLAIN: Don't run EXPLAIN logic when there are no query sample
- Improve non-fatal error messages to clarify the collector still works
- Log grant failure: Explain root cause better (plan doesn't support it / fair use limit reached)
v0.30.0
- Track local replication lag in bytes
- RDS: Handle end of log files correctly
- High-frequency query collection: Avoid race condition, run in parallel
- This also resolves a memory leak in the collector that was causing
increased memory usage over time for systems that have a lot of
pg_stat_statements query texts (causing the full snapshot to take
more than a minute, which triggered the race condition)
- This also resolves a memory leak in the collector that was causing
v0.29.0
- Package builds: Use Golang 1.14.3 patch release
- This fixes golang/go#37436 which was causing
"mlock of signal stack failed: 12" on Ubuntu systems
- This fixes golang/go#37436 which was causing
- Switch to simpler tail library to fix edge case bugs for self-managed systems
- The hpcloud library has been unmaintained for a while, and whilst
the new choice doesn't have much activity, in tests it has shown
to work better, as well as having significantly less lines of code - This also should make "--test" work reliably for self-managed systems
(before this returned "Timeout" most of the time)
- The hpcloud library has been unmaintained for a while, and whilst
- Index statistics: Don't run relation_size on exclusively locked indices
- Previously the collector was effectively hanging when it encountered an
index that has an ExclusiveLock held (e.g. due to a REINDEX)
- Previously the collector was effectively hanging when it encountered an
- Add another custom log line prefix: "%m %r %u %a [%c] [%p] "
- RDS fixes
- Fix handling of auto-detection of AWS regions outside of us-east-1
- Remember log marker from previous runs, to avoid duplicate log lines
- Add support for Postgres 13
- This adds support for running against Postgres 13, which otherwise breaks
due to backwards-incompatible changes in pg_stat_statements - Note that there are many other new statistics views and metrics that
will be added separately
- This adds support for running against Postgres 13, which otherwise breaks
v0.28.0
- Add "db_sslkey" and "db_sslcert" options to use SSL client certificates
- Add Ubuntu 20.04 packages
- Update to Go 1.14, latest libpq
- Ensure that we set system type correctly for Heroku full snapshots
- Detect cloud providers based on hostnames from DB_URL / db_url as well
- Previously this was only detected for the DB_HOST / db_host setting, and that is unnecessarily restrictive
- Note that this means your instance may show up under a new ID in pganalyze after upgrading to this version
- Log Explain
- Ignore pg_start_backup queries
- Support EXPLAIN for queries with parameters
- Log Insights improvements
- Experimental: Google Cloud SQL log download
- Remove unnecessary increment of log line byte end position
- Make stream-based log processing more robust
- Add direct "http_proxy" & similar collector settings for Proxy config
- This avoids problems in some environments where its not clear whether
the environment variables are set. The environment variables HTTP_PROXY,
http_proxy, HTTPS_PROXY, https_proxy, NO_PROXY and no_proxy continue to
function as expected.
- This avoids problems in some environments where its not clear whether
- Fix bug in handling of state mutex in activity snapshots
- This may have been the cause of "unlock of unlocked mutex" errors
when having multiple servers configured.
- This may have been the cause of "unlock of unlocked mutex" errors