Skip to content

Commit

Permalink
Document Security Infrastructure (#2396)
Browse files Browse the repository at this point in the history
  • Loading branch information
ElizabethStirling committed Jan 26, 2021
1 parent ee81d6e commit 1f6ad0b
Show file tree
Hide file tree
Showing 5 changed files with 514 additions and 21 deletions.
20 changes: 2 additions & 18 deletions handbook/engineering/deployments/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,25 +35,9 @@ Just run the appropriate `gcloud container clusters get-credentials` command lis

## Scaling Kubernetes clusters

To scale the number of nodes in a cluster run the following command:
Cluster scale should be managed via terraform. Please reference `google_container_node_pool.primary_containerd_nodes.node_count` [this line](https://github.com/sourcegraph/infrastructure/blob/main/cloud/main.tf) in cloud's terraform configuration to see where the number of nodes is configured for the cluster, and `gke_num_nodes` [in the tfvars file](https://github.com/sourcegraph/infrastructure/blob/main/cloud/terraform.tfvars) to see the current number of nodes. For more details, see the [terraform provider documentation](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_node_pool#node_count).

```bash
gcloud container clusters resize $CLUSTER_NAME --zone $ZONE --num-nodes $NUM_NODES
```

For example, you may have a cluster not being actively used but want to preserve it for later use. You can scale the cluster to zero by running:

```bash
gcloud container cluster resize dev-cluster --zone us-central0-f --num-nodes 0
```

When the cluster is ready for use again, simply run the same command with the number of nodes required:

```bash
gcloud container cluster resize dev-cluster --zone us-central0-f --num-nodes 3
```

For more informatino see the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/resizing-a-cluster).
Any changes to the cluster scale made via kubectl will eventually be overwritten by the values set in terraform.

## Kubernetes backups

Expand Down
7 changes: 4 additions & 3 deletions handbook/engineering/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,11 @@ Sourcegraph Cloud projects.

#### Sourcegraph Security

Sourcegraph Security projects.
[Sourcegraph Security projects](./security/infrastructure/index.md#projects).

- **sourcegraph-security-logging**: Infrastructure required for centralized security logging.
- **sourcegraph-security-vault**: Contains HashiCorp Vault for secret management.
- **[sourcegraph-security-logging](./security/infrastructure/index.md#logging)**: Infrastructure required for centralized security logging.
- **[sourcegraph-security-logging-stage](./security/infrastructure/index.md#logging-stage)**: Staging environment for logging infrastructure.
- **[sourcegraph-security-vault](./security/infrastructure/index.md#vault)**: Contains HashiCorp Vault for secret management.

#### Other Projects

Expand Down
1 change: 1 addition & 0 deletions handbook/engineering/security/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ See [security goals and priorities](goals.md)
- https://github.com/sourcegraph/security-issues
- Increase our security posture by running traditional security tools such as vulnerability scanners, SAST, and DAST tools.
- https://github.com/sourcegraph/sourcegraph/security/code-scanning
- [Infrastructure information](./infrastructure/index.md)
- Create a culture of security at Sourcegraph that empowers all of our engineers to write secure code.

## How we work
Expand Down
135 changes: 135 additions & 0 deletions handbook/engineering/security/infrastructure/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Security infrastructure

We maintain multiple flavors of infrastructure with various degrees of management.



## GCP infrastructure basics

GCP infrastructure is configured via [terraform](https://www.terraform.io/) in the [infrastructure repository](https://github.com/sourcegraph/infrastructure/). All configuration for security projects should be stored in the [security subdirectory](https://github.com/sourcegraph/infrastructure/tree/main/security). Please adhere to this [terraform style guide](../../languages/terraform.md) when working here.

For instructions on how to deploy this infrastructure, see [GCP Deployment Playbooks](./playbooks.md#gcp-deployment-playbooks).



#### Logging

To deploy logging infrastructure, see the [Deploying logging infrastructure](./playbooks.md#deploying-logging-infrastructure) playbook.

Logging configuration exists in many different places at present, which makes it complex.

* `pubsub.tf` in the [cloud](https://github.com/sourcegraph/infrastructure/blob/main/cloud/pubsub.tf) and [dogfood](https://github.com/sourcegraph/infrastructure/blob/main/dogfood/pubsub.tf) directories of the infrastrucutre repository.

* This should remain mostly static, but the filter may change as filtering rules are refined, and additional logging sinks may be added for the [staging environment](#logging-stage).

* Note that not all cloud pubsub configuration belongs to security.
* This creates a logging sink for cloud and dogfood which sends logs via pub/sub to the security project.

* `gke-logging.tf` in the [cloud](https://github.com/sourcegraph/infrastructure/blob/main/cloud/pubsub.tf) and [dogfood](https://github.com/sourcegraph/infrastructure/blob/main/dogfood/pubsub.tf) directories of the infrastrucutre repository.

* This should remain static.
* This deploys the gke-logging module to the k8s cluster.

* The `gke-logging` module in the [modules folder](https://github.com/sourcegraph/infrastructure/tree/main/modules/gke-logging).

* This should remain static.
* This module pushes GKE node audit logs to stackdriver.
* [Reference configuration](https://github.com/GoogleCloudPlatform/k8s-node-tools/blob/master/os-audit/cos-auditd-logging.yaml)

* The [logging folder](https://github.com/sourcegraph/infrastructure/tree/main/security/logging) in the security project.

* This contains the GCP configuration for the logging projects owned by security.

* The [helm directory](https://github.com/sourcegraph/infrastructure/tree/main/security/logging/helm/) in the security project.

* This contains the configuration for all helm deployments for security.
* Currently, this is only pubsubbeats.

* Elastic cloud's [production logging deployment](#elastic-logging)

* Elastic manages our logs, as well as our retention policy on our log data.
* Expected to be re-configured whenever new sources of logs are added, as well as monitored to ensure it doesn't run out of disk space.

#### Logging stage

To implement in [#17281](https://github.com/sourcegraph/sourcegraph/issues/17281).

Will likely be similar to the above [logging infrastructure](#logging).



## GKE deployment basics

This section explains how to use various tools used to generate and deploy kubernetes configuration to GKE.

### Helmfile

Instead of using helm, we use [helmfile](https://github.com/roboll/helmfile). The reasoning for doing this is that helmfile allows basic script execution as part of the templating process, which is used to decrypt the secrets used for pubsubbeats. Additionally, it supports conditional configuration based on the deployment environment, which makes it harder to accidentally desynchronize the staging and production configurations.

### Kubectl

Kubectl is used to interact with kubernetes clusters. For basic information see the existing [kubectl tips and tricks document](../../deployments/kubernetes.md). See the linked documents for examples of how kubectl is used to [configure](./playbooks.md#gke-deployment-playbooks) or [debug](./playbooks.md#debugging-logging).

## Projects

These are security's current GCP projects, and what they do.

For instructions on how to deploy these projects, see [GKE Deployment Playbooks](./playbooks.md#gke-deployment-playbooks).



### sourcegraph-security-logging

Currently ingests all stackdriver logs from the projects `sourcegraph-dev`(cloud) and `sourcegraph-dogfood`(dogfood). Will later ingest logs from other sources using additional deployments within the cluster.



### sourcegraph-security-logging-stage

To implement in [#17281](https://github.com/sourcegraph/sourcegraph/issues/17281).

This is a testbed to allow us to test changes to logs without risking production logs. This pushes logs to the [stage logging environment](#elastic-logging-stage), so that they don't pollute production logs in [elastic](#elastic-cloud).



### sourcegraph-security-vault

Currently unused. Will eventually contain a [HashiCorp Vault](https://www.vaultproject.io/) instance for secret management. This may change depending on the state of [Managed Vault](https://www.hashicorp.com/cloud-platform). We may transition to using a [managed vault service](#hashicorp-vault).



### sourcegraph-vault-stage

Unmaintained and [to be deleted](https://github.com/sourcegraph/sourcegraph/issues/17046) - purely used as a testbed for vault. Do not add production secrets to this instance.



## Managed services

### Elastic cloud

We currently use elastic cloud to store centralized security logs. This allows us to avoid the overhead of managing it ourselves, while getting something that's reasonably performant and stable.

Elastic cloud web portal is [here](https://cloud.elastic.co/home). Credentials are stored in 1Password.

#### Elastic logging

Currently contains all stackdriver logs from the GCP projects `sourcegraph-dogfood` and `sourcegraph-dev`. Note that stackdriver also contains OS audit logs from GKE nodes on the primary GKE clusters for those projects. This is due to the afforementioned `gke-logging` module being deployed in them as part of our [logging infrastructure](#logging).



Note that the pubsubbeat index lifecycle policy is set to a maximum index size is 50GB, and rollover is enabled.

Note that the index refresh interval is 30 seconds.



#### Elastic logging stage

To implement in [#17281](https://github.com/sourcegraph/sourcegraph/issues/17281).



### HashiCorp Vault

This section is a placeholder, since we may or may not use this service.

0 comments on commit 1f6ad0b

Please sign in to comment.