diff --git a/operator.md b/operator.md new file mode 100644 index 0000000..2129fd3 --- /dev/null +++ b/operator.md @@ -0,0 +1,135 @@ +## Google Cloud SQL for PostgreSQL + +Google Cloud SQL for PostgreSQL is a fully-managed database service that makes it easy to set up, maintain, manage, and administer PostgreSQL relational databases on Google Cloud Platform. + +### Design Decisions + +This Terraform setup for managing Google Cloud SQL for PostgreSQL includes the following design decisions: + +- **Instance Configuration**: Custom instance tiers and disk types are supported. Disk resizing is disabled by default to give more control over cost management. +- **Database Configuration**: UTF-8 encoding and standard collation for PostgreSQL databases. +- **Security**: SSL requirements for IP configuration are turned off to simplify connectivity. However, using private IP addresses ensures security within the VPC. +- **Backup Management**: Automated backups are enabled with retention settings specified by the user. +- **Insights Configuration**: Query Insights enabled, with an increased query string length. +- **Maintenance Windows**: Set to Tuesdays at 2 AM by default. +- **Monitoring and Alerts**: Set up for critical metrics like CPU utilization, disk capacity, and memory usage with customizable thresholds. + +### Runbook + +#### Unable to Connect to PostgreSQL Database + +When you are having trouble connecting to your PostgreSQL database instance, it could be due to various reasons, such as network issues or insufficient permissions. + +1. Check the connectivity using Google Cloud SDK: + +```sh +gcloud sql instances describe [INSTANCE_NAME] --project [PROJECT_ID] +``` + +Check for `privateIpAddress` and ensure it matches the IP you are using to connect. + +2. Ping the instance to ensure it's reachable: + +```sh +ping [privateIpAddress] +``` + +You should see responses if the instance is reachable; otherwise, investigate network settings. + +3. Ensure the PostgreSQL service is running and accepting connections: + +```sh +gcloud sql instances list --filter="name:[INSTANCE_NAME]" --format="table[box](name, region, databaseVersion, state)" +``` + +The `state` should be `RUNNABLE`. + +#### Unable to Authenticate to the Database + +If you can’t authenticate to your PostgreSQL database, it could be due to incorrect credentials or revoked access. + +1. List the users for the database instance: + +```sh +gcloud sql users list --instance=[INSTANCE_NAME] --project=[PROJECT_ID] +``` + +Confirm that your user is listed. + +2. Reset the password for the user if necessary: + +```sh +gcloud sql users set-password [USERNAME] '%' --instance=[INSTANCE_NAME] --password=[NEW_PASSWORD] --project=[PROJECT_ID] +``` + +This command will reset the user's password. Ensure it matches the one you are using in your connection string. + +3. Test the connection with `psql`: + +```sh +psql "host=[privateIpAddress] port=5432 dbname=default user=[USERNAME] password=[NEW_PASSWORD]" +``` + +#### High CPU Utilization + +Monitoring shows high CPU utilization which could point to various performance issues. + +1. Check the running queries to identify resource-intensive operations: + +```sql +SELECT pid, usename, application_name, client_addr, backend_start, query_start, state_change, wait_event, state, backend_xid, query +FROM pg_stat_activity +WHERE state = 'active' +ORDER BY query_start; +``` + +Identify long-running or resource-intensive queries and optimize them. + +2. Examine system resource consumption: + +```sql +SELECT * FROM pg_stat_activity; +``` + +3. Create indexes to speed up frequently run queries. + +```sql +CREATE INDEX idx_name ON table_name(column_name); +``` + +#### Disk Space Issues + +Identify high disk usage and take action to clean up or resize space. + +1. Check the disk usage on your Cloud SQL instance: + +```sh +gcloud sql instances describe [INSTANCE_NAME] --project [PROJECT_ID] +``` + +Look for `diskSizeGb` and `dataDiskType`. + +2. Use PostgreSQL commands to find large tables and optimize them: + +```sql +SELECT table_schema || '.' || table_name AS relation, +pg_size_pretty(pg_relation_size(table_schema || '.' || table_name)) AS size +FROM information_schema.tables +ORDER BY pg_relation_size(table_schema || '.' || table_name) DESC +LIMIT 10; +``` + +Identify large tables and possibly perform `VACUUM`. + +```sql +VACUUM FULL [table_name]; +``` + +3. If necessary, resize the disk from Google Cloud Console or with the gcloud command: + +```sh +gcloud sql instances patch [INSTANCE_NAME] --storage-auto-increase +``` + +Enable this to ensure your instance can scale automatically and avoid downtime due to reaching storage limits. + diff --git a/operator.mdx b/operator.mdx deleted file mode 100644 index d899f04..0000000 --- a/operator.mdx +++ /dev/null @@ -1,74 +0,0 @@ -# Operator Guide for gcp-cloud-sql-postgres - -Google Cloud SQL for PostgreSQL enables you to run a relational database with the resiliency, security, scalability, and ease of use of a fully managed cloud service. You gain most of the functionality of a conventional PostgreSQL database management system, while Google Cloud takes care of daily maintenance and regular backups, allowing you to focus on your core competencies. - -## Use Cases - -### Highly Scalable Data Store -PostgreSQL is an open-source object-relational database-management system. With decades of development behind it and as one of the most popular solutions in its class, PostgreSQL excels as a backend database. It is versatile and adaptable. -### ACID Compliant -PostgreSQL supports the rigorous requirements of financial institutions and others needing exceptional reliability. It supports atomicity, consistency, isolation, and durability (ACID) and online transaction processing (OLTP). -### Spatial Data -PostgreSQL is highly extensible. PostGIS, an extension for a geographical information system (GIS), provides hundreds of functions to process geometric data. In the decades since its initial release, PostGIS has become one of the de facto standards in the open-source GIS world. -### JSON Support -PostgreSQL also supports JSON data in either json or jsonb format. For querying JSON data, PostgreSQL supports the jsonpath data type. - -## Configuration Presets - -### Development -The development preset does not have deletion protection, allowing you to deprovision the database as needed. Since this preset is not intended for production, it will keep only one backup, and it comes with only a 20 GB disk and a shared-core CPU. This preset is suitable as a low-cost test and development instance only. -### Staging -The staging preset provides high availability and the same backup and deletion protection as the production preset, with seven backups kept at all times. However, staging has only 1 core and a smaller 200 GB disk. -### Production -The production preset comes with ten cores and a 1 TB disk. High availability, backups, and deletion protection (with seven backups) are enabled by default. For Postgres, we use the latest managed version of the 14.x series. This preset has sufficient resources to support a full production environment. - -## Design - -A typical relational database has a primary instance for writes and a read replica for offline workloads, which protects the performance of the primary database. The read replica typically acts as a failover in case the primary goes offline. In Google Cloud SQL for Postgres, we offer the same functionality through a slightly different design. - -### High Availability -Google supports high availability by provisioning a standby instance in a different zone than your primary instance. While you cannot access the standby under normal conditions, Google will automatically direct traffic to it if the primary instance fails. This approach doubles the service cost in exchange for full service-level agreements (SLAs). -### Read Replicas -We will support read replicas in a future release of this bundle. - -## Best Practices -The bundle includes a number of best practices without needing any additional work on your part: - -### Security -The database is not provisioned with a public IP. -### High availability -This bundle supports regional high availability, which is enabled by default. In the case of a zonal outage, failover is performed automatically for you. We do not currently support failbacks (that is, moving the primary instance back to the original zone after recovery). -### Backup and recovery -Backups are created to recover the database in the event of disaster recovery, and cannot be disabled. -### Replication -Replica configuration will be supported in a future release. Google recommends that you keep fewer than ten replicas for your primary instance. - -## Security -### Auto-generated password -Upon database creation, we generate a random sixteen-character password. -### Private deployment -This database will be available only in the gcp-global-network to which it is connected. -### Data encrypted in transit -By default, all data in transit will be encrypted with Secure Sockets Layer and Transport Layer Security (SSL/TLS). -### Data encrypted at rest -By default, all customer content in Google Cloud is encrypted at rest using AES256. - -## Auditing -This bundle allows you to configure the transaction log policy. The minimum allowed days are 1, and the maximum is 7. Transaction logs allow you to perform the following: -* Individual transaction recovery -* Recovery of all incomplete transactions when SQL Server is started -* Rolling a restored database, file, filegroup, or page forward to the point of failure -* Supporting transactional replication -* Supporting high availability and disaster-recovery solutions (for example, database mirroring, log shipping, etc.) - -## Observability -This bundle comes with three preconfigured alarms: - -* CPU: when the CPU usage exceeds 60% -* Disk: when the database storage-disk usage exceeds 60% -* Memory: when the database is using more than 60% of its allocated memory - -## Trade-Offs -* This bundle does not support a public IP. -* The bundle supports only SSL/TLS. It does not support unencrypted traffic. -* As noted above, this bundle supports regional high availability but not failbacks. In case of a failover, the primary instance will not move back to the original zone after the original primary recovers.