Skip to content

Releases: pganalyze/collector

v0.36.0

22 Jan 06:22
7e92fbe
Compare
Choose a tag to compare
  • Config parsing improvements:
    • Fail fast when pganalyze section is missing in config file
    • Ignore duplicates in db_name config setting
      • Previously this could cause malformed snapshots that would be submitted
        correctly but could not be processed
    • Validate db_url parsing to avoid collector crash with invalid URLs
  • Include pganalyze-collector-setup program (see 0.35 release notes) in supported packages
  • Rename <unidentified queryid> query text placeholder to <query text unavailable>
    • This makes it clearer what the underlying issue is
  • Revert to using <truncated query> instead of <unparsable query> in some situations
    • When a query is cut off due to pg_stat_activity limit being reached,
      show <truncated query>, to make it clear that increasing track_activity_query_size
      would solve the issue
  • Ignore I/O stats for AWS Aurora utility statements
    • AWS Aurora appears to report incorrect blk_read_time and blk_write_time values
      for utility statements (i.e., non-SELECT/INSERT/UPDATE/DELETE); we zero these out for now
  • Fix log-based EXPLAIN bug where query samples could be dropped if EXPLAIN failed
  • Add U140 log event (inconsistent range bounds)
    • e.g.: ERROR: range lower bound must be less than or equal to range upper bound
  • Fix issue where incomplete schema information in snapshots was not marked correctly
    • This could lead to schema objects disappearing and being re-created
  • Fix trailing newline handling for GCP and self-hosted log streams
    • This could lead to queries being poorly formatted in the UI, or some queries
      with single-line comments being ignored
  • Include additional collector configuration settings in snapshot metadata for diagnostics
  • Ignore "insufficient privilege" queries w/o queryid
    • Previously, these could all be aggregated together yielding misleading stats

v0.35.0

06 Dec 04:28
Compare
Choose a tag to compare
  • Add new "pganalyze-collector-setup" program that streamlines collector installation
    • This is initially targeted for self-managed servers to make it easier to set up
      the collector and required configuration settings for a locally running Postgres
      server
    • To start, this supports the following environments:
      • Postgres 10 and newer, running on the same server as the collector
      • Ubuntu 14.04 and newer
      • Debian 10 and newer
  • Collector test: Show server URLs to make it easier to access the servers in
    pganalyze after the test
  • Collector test+reload: In case of errors, return exit code 1
  • Ignore manual vacuums if the collector can't access pg_stat_progress_vacuum
  • Don't run log test for Heroku, instead provide info message
    • Also fixes "Unsupported log_line_prefix setting: ' sql_error_code = %e '"
      error on Heroku Postgres
  • Add pganalyze system user to adm group in Debian/Ubuntu packages
    • This gives the collector permission to read Postgres log files in a default
      install, simplifying Log Insights setup
  • Handle NULL parameters for query samples correctly
  • Add a skip_if_replica / SKIP_IF_REPLICA option (#117)
    • You can use this to configure the collector in a no-op mode on
      replicas (we only query if the monitored database is a replica), and
      automatically switch to active monitoring when the database is no
      longer a replica.
  • Stop building packages for CentOS 6 and Ubuntu 14.04 (Trusty)
    • Both of these systems are now end of life, and the remaining survivor
      of the CentOS 6 line (Amazon Linux 1) will be EOL on December 31st 2020.

v0.34.0

08 Nov 04:51
Compare
Choose a tag to compare
  • Check and report problematic log collection settings
    • Some Postgres settings almost always cause a drastic increase in log
      volume for little actual benefit. They tend to cause operational problems
      for the collector (due to the load of additional log parsing) and the
      pganalyze service itself (or indeed, likely for any service that would
      process collector snapshots), and do not add any meaningful insights.
      Furthermore, we found that these settings are often turned on
      accidentally.
    • To avoid these issues, add some client-side checks in the collector to
      disable log processing if any of the problematic settings are on.
    • The settings in question are:
    • If any of these are set to these unsupported values, all log collection will be
      disabled for that server. The settings are re-checked every full snapshot, and can be
      explicitly re-checked with a collector reload.
  • Log Insights improvements
    • Self-managed server: Process logs every 3 seconds, instead of on-demand
    • Self-managed server: Improve handling of multi-line log events
    • Google Cloud SQL: Always acknowledge Pub Sub messages, even if collector doesn't handle them
    • Optimize stitching logic for reduced CPU consumption
    • Explicitly close temporary files to avoid running out of file descriptors
  • Multiple changes to improve debugging in support situations
    • Report collector config in full snapshot
      • This reports certain collector config settings (except for passwords/keys/credentials)
        to the pganalyze servers to help with debugging.
    • Print collector version at beginning of test for better support handling
    • Print collection status and Postgres version before submitting snapshots
    • Change panic stack trace logging from Verbose to Warning
  • Add support for running the collector on ARM systems
    • Note that we don't provide packages yet, but with this the collector
      can be built on ARM systems without any additional patches.
  • Introduce API system scope fallback
    • This fallback is intended to allow changing the API scope, either based
      on user configuration (e.g. moving the collector between different
      cloud provider accounts), or because of changes in the collector identify
      system logic.
    • The new "api_system_scope_fallback" / PGA_API_SYSTEM_SCOPE_FALLBACK config
      variable is intended to be set to the old value of the scope. When the
      pganalyze backend receives a snapshot with a fallback scope set, and there
      is no server created with the regular scope, it will first search the
      servers with the fallback scope. If found, that server's scope will be
      updated to the (new) regular scope. If not found, a new server will be
      created with the regular scope. The main goal of the fallback scope is to
      avoid creating a duplicate server when changing the scope value
  • Use new fallback scope mechanism to change scope for RDS databases
    • Previously we identified RDS databases by their ID and region only, but
      the ID does not have to be unique within a region, it only has to be
      unique within the same AWS account in that region. Thus, adjust the
      scope to include both the region and AWS Account ID (if configured or
      auto-detected), and use the fallback scope mechanism to migrate existing
      servers.
  • Add support for GKE workload identity Yash Bhutwala #91
  • Add support for assuming AWS instance roles
    • Set the role to be assumed using the new aws_assume_role / AWS_ASSUME_ROLE
      configuration setting. This is useful when the collector runs in a different
      AWS account than your database.

v0.33.1

11 Sep 15:50
Compare
Choose a tag to compare
  • Ignore internal admin databases for GCP and Azure
    • This avoids collecting data from these internal databases, which produces
      unnecessary errors when using the all databases setting.
  • Add log_line_prefix check to GCP self-test
  • Schema stats handling: Avoid crash due to nil pointer dereference
  • Add support for "%m [%p]: [%l-1] db=%d,user=%u " log_line_prefix

v0.33.0

04 Sep 01:07
Compare
Choose a tag to compare
  • Add helper for log-based EXPLAIN access and use if available
  • Avoid corrupted snapshots when OIDs get reused across databases
    • This would have shown as data not being visible in pganalyze,
      particularly for servers with many databases where tables were
      dropped and recreated often
  • Locked relations: Ignore table statistics, handle other exclusive locks
    • Tables being rewritten would cause the relation statistics query to
      fail due to statement timeout (caused by lock being held)
    • Non-relation locks held in AccessExclusiveLock mode would cause all
      relation information to disappear, but only for everything thats not
      the top-level relation information. This is due to the behaviour of
      NOT IN when the list contains NULLs (never being true, even if an
      item doesn't match the list). The top-level relation information
      was using a LEFT JOIN that doesn't suffer from this problem. This likely
      caused problems reported as missing index information, or indices
      showing as being recently created even though they've exited for a
      while.
  • Improvements to table partitioning reporting
  • Enable additional settings to work correctly when used in Heroku/Docker
    • DB_NAME
    • DB_SSLROOTCERT_CONTENTS
    • DB_SSLCERT_CONTENTS
    • DB_SSLKEY_CONTENTS

v0.32.0

17 Aug 03:21
Compare
Choose a tag to compare
  • Add ignore_schema_regexp / IGNORE_SCHEMA_REGEXP configuration option
    • This is like ignore_table_pattern, but pushed down into the actual
      stats-gathering queries to improve performance. This should work much
      better on very large schemas
    • We use a regular expression instead of the current glob-like matching
      since the former is natively supported in Postgres
    • We now warn on usage of the deprecated ignore_table_pattern field
  • Add warning for too many tables being collected (and recommend ignore_schema_regexp)
  • Allow keeping of unparsable query texts by setting filter_query_text: none
    • By default we replace everything with <unparsable query> (renamed
      from the previous <truncated query> for clarity), to avoid leaking
      sensitive data that may be contained in query texts that couldn't be
      parsed and that Postgres itself doesn't mask correctly (e.g. utility
      statements)
    • However in some situations it may be desirable to have the original
      query texts instead, e.g. when the collector parser is outdated
      (right now the parser is Postgres version 10, and some newer Postgres 12
      query syntax fails to parse)
    • To support this use case, a new "filter_query_text" / FILTER_QUERY_TEXT
      option is introduced which can be set to "none" to keep all query texts.
  • EXPLAIN plans / Query samples: Support log line prefix without %d and %u
    • Whilst not recommended, in some scenarios changing the log_line_prefix
      is difficult, and we want to make it easy to get EXPLAIN data even in
      those scenarios
    • In case the log_line_prefix is missing the database (%d) and/or the user
      (%u), we simply use the user and database of the collector connection
  • Log EXPLAIN: Run on all monitored databases, not just the primary database
  • Add support for stored procedures (new with Postgres 11)
  • Handle Postgres error checks using Go 1.13 error helpers
    • This is more correct going forward, and adds a required type check for
      the error type, since the database methods can also return net.OpError
    • Fixes "panic: interface conversion: error is *net.OpError, not *pq.Error"
  • Collect information on table partitions
    • Relation parents as well as partition boundary (if any)
    • Partitionining strategy in use
    • List of partitioning fields and/or expression
  • Log Insights: Track TLS protocol version as a log line detail
    • This allows verification of which TLS versions were used to connect to the
      database over time
  • Log Insights: Track host as detail for connection received event
    • This allows more detailed analysis of which IPs/hostnames have connected
      to the database over time
  • Example collector config: Use collect all databases option in the example
    • This improves the chance that this is set up correctly from the
      beginning, without requiring a backwards incompatible change in the
      collector

v0.31.0

24 Jun 04:46
Compare
Choose a tag to compare
  • Add Log Insights support for Azure Database for PostgreSQL
  • Log Insights: Avoid unnecessary "Timeout" error when there are other failures
  • Log EXPLAIN: Don't run EXPLAIN logic when there are no query sample
  • Improve non-fatal error messages to clarify the collector still works
  • Log grant failure: Explain root cause better (plan doesn't support it / fair use limit reached)

v0.30.0

12 Jun 15:57
Compare
Choose a tag to compare
  • Track local replication lag in bytes
  • RDS: Handle end of log files correctly
  • High-frequency query collection: Avoid race condition, run in parallel
    • This also resolves a memory leak in the collector that was causing
      increased memory usage over time for systems that have a lot of
      pg_stat_statements query texts (causing the full snapshot to take
      more than a minute, which triggered the race condition)

v0.29.0

02 Jun 09:32
Compare
Choose a tag to compare
  • Package builds: Use Golang 1.14.3 patch release
    • This fixes golang/go#37436 which was causing
      "mlock of signal stack failed: 12" on Ubuntu systems
  • Switch to simpler tail library to fix edge case bugs for self-managed systems
    • The hpcloud library has been unmaintained for a while, and whilst
      the new choice doesn't have much activity, in tests it has shown
      to work better, as well as having significantly less lines of code
    • This also should make "--test" work reliably for self-managed systems
      (before this returned "Timeout" most of the time)
  • Index statistics: Don't run relation_size on exclusively locked indices
    • Previously the collector was effectively hanging when it encountered an
      index that has an ExclusiveLock held (e.g. due to a REINDEX)
  • Add another custom log line prefix: "%m %r %u %a [%c] [%p] "
  • RDS fixes
    • Fix handling of auto-detection of AWS regions outside of us-east-1
    • Remember log marker from previous runs, to avoid duplicate log lines
  • Add support for Postgres 13
    • This adds support for running against Postgres 13, which otherwise breaks
      due to backwards-incompatible changes in pg_stat_statements
    • Note that there are many other new statistics views and metrics that
      will be added separately

v0.28.0

19 May 07:58
Compare
Choose a tag to compare
  • Add "db_sslkey" and "db_sslcert" options to use SSL client certificates
  • Add Ubuntu 20.04 packages
  • Update to Go 1.14, latest libpq
  • Ensure that we set system type correctly for Heroku full snapshots
  • Detect cloud providers based on hostnames from DB_URL / db_url as well
    • Previously this was only detected for the DB_HOST / db_host setting, and that is unnecessarily restrictive
    • Note that this means your instance may show up under a new ID in pganalyze after upgrading to this version
  • Log Explain
    • Ignore pg_start_backup queries
    • Support EXPLAIN for queries with parameters
  • Log Insights improvements
    • Experimental: Google Cloud SQL log download
    • Remove unnecessary increment of log line byte end position
    • Make stream-based log processing more robust
  • Add direct "http_proxy" & similar collector settings for Proxy config
    • This avoids problems in some environments where its not clear whether
      the environment variables are set. The environment variables HTTP_PROXY,
      http_proxy, HTTPS_PROXY, https_proxy, NO_PROXY and no_proxy continue to
      function as expected.
  • Fix bug in handling of state mutex in activity snapshots
    • This may have been the cause of "unlock of unlocked mutex" errors
      when having multiple servers configured.