A Terraform module which deploys the Snowplow Stream Collector on EC2. If you want to use a custom AMI for this deployment you will need to ensure it is based on top of Amazon Linux 2.
This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.
If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id
variable to include a valid email address which we can reach you at.
To disable telemetry simply set variable telemetry_enabled = false
.
For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry
A Collector requires two output Kinesis Streams and a Load Balancer which is deployed upstream. The Load Balancer ensures we can easily configure TLS termination later in the setup and provides a simple mechanism for setting up DNS (over single EC2 instances with EIP's).
module "raw_stream" {
source = "snowplow-devops/kinesis-stream/aws"
version = "0.2.0"
name = "raw-stream"
}
module "bad_1_stream" {
source = "snowplow-devops/kinesis-stream/aws"
version = "0.2.0"
name = "bad-1-stream"
}
module "collector_lb" {
source = "snowplow-devops/alb/aws"
version = "0.2.0"
name = "collector-lb"
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
health_check_path = "/health"
}
module "collector_kinesis" {
source = "snowplow-devops/collector-kinesis-ec2/aws"
accept_limited_use_license = true
name = "collector-server"
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
collector_lb_sg_id = module.collector_lb.sg_id
collector_lb_tg_id = module.collector_lb.tg_id
ingress_port = module.collector_lb.tg_egress_port
good_stream_name = module.raw_stream.name
bad_stream_name = module.bad_1_stream.name
ssh_key_name = "your-key-name"
ssh_ip_allowlist = ["0.0.0.0/0"]
}
Name | Version |
---|---|
terraform | >= 1.0.0 |
aws | >= 3.72.0 |
Name | Version |
---|---|
aws | >= 3.72.0 |
Name | Source | Version |
---|---|---|
instance_type_metrics | snowplow-devops/ec2-instance-type-metrics/aws | 0.1.2 |
service | snowplow-devops/service-ec2/aws | 0.2.1 |
telemetry | snowplow-devops/telemetry/snowplow | 0.5.0 |
Name | Type |
---|---|
aws_cloudwatch_log_group.log_group | resource |
aws_iam_instance_profile.instance_profile | resource |
aws_iam_policy.iam_policy | resource |
aws_iam_role.iam_role | resource |
aws_iam_role_policy_attachment.policy_attachment | resource |
aws_security_group.sg | resource |
aws_security_group_rule.egress_tcp_443 | resource |
aws_security_group_rule.egress_tcp_80 | resource |
aws_security_group_rule.egress_udp_123 | resource |
aws_security_group_rule.ingress_tcp_22 | resource |
aws_security_group_rule.ingress_tcp_webserver | resource |
aws_security_group_rule.lb_egress_tcp_webserver | resource |
aws_caller_identity.current | data source |
aws_region.current | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
bad_stream_name | The name of the bad kinesis/sqs stream that the collector will insert data into | string |
n/a | yes |
collector_lb_sg_id | The ID of the load-balancer security group that sits upstream of the webserver | string |
n/a | yes |
collector_lb_tg_id | The ID of the load-balancer target group to direct traffic from the load-balancer to the webserver | string |
n/a | yes |
good_stream_name | The name of the good kinesis/sqs stream that the collector will insert data into | string |
n/a | yes |
ingress_port | The port that the collector will be bound to and expose over HTTP | number |
n/a | yes |
name | A name which will be pre-pended to the resources created | string |
n/a | yes |
ssh_key_name | The name of the preexisting SSH key-pair to attach to all EC2 nodes deployed | string |
n/a | yes |
subnet_ids | The list of at least two subnets in different availability zones to deploy the collector across | list(string) |
n/a | yes |
vpc_id | The VPC to deploy the collector within | string |
n/a | yes |
accept_limited_use_license | Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) | bool |
false |
no |
amazon_linux_2_ami_id | The AMI ID to use which must be based of of Amazon Linux 2; by default the latest community version is used | string |
"" |
no |
app_version | App version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. | string |
"3.0.1" |
no |
associate_public_ip_address | Whether to assign a public ip address to this instance | bool |
true |
no |
bad_sqs_buffer_name | The name of the bad sqs queue to use as an overflow buffer for kinesis | string |
"" |
no |
byte_limit | The amount of bytes to buffer events before pushing them downstream | number |
1000000 |
no |
cloudwatch_logs_enabled | Whether application logs should be reported to CloudWatch | bool |
true |
no |
cloudwatch_logs_retention_days | The length of time in days to retain logs for | number |
7 |
no |
config_override_b64 | App config uploaded as a base64 encoded blob. This variable facilitates dev flow, if config is incorrect this can break the deployment. | string |
"" |
no |
cookie_domain | Optional first party cookie domain for the collector to set cookies on (e.g. acme.com) | string |
"" |
no |
cookie_enabled | Whether server side cookies are enabled or not | bool |
true |
no |
custom_paths | Optional custom paths that the collector will respond to, typical paths to override are '/com.snowplowanalytics.snowplow/tp2', '/com.snowplowanalytics.iglu/v1' and '/r/tp2'. e.g. { "/custom/path/" : "/com.snowplowanalytics.snowplow/tp2"} | map(string) |
{} |
no |
enable_auto_scaling | Whether to enable auto-scaling policies for the service | bool |
true |
no |
enable_sqs_buffer | Whether to enable the optional sqs overflow buffer for kinesis (note: only works when 'sink_type' is 'kinesis') | bool |
false |
no |
good_sqs_buffer_name | The name of the good sqs queue to use as an overflow buffer for kinesis | string |
"" |
no |
iam_permissions_boundary | The permissions boundary ARN to set on IAM roles created | string |
"" |
no |
instance_type | The instance type to use | string |
"t3a.micro" |
no |
java_opts | Custom JAVA Options | string |
"-Dcom.amazonaws.sdk.disableCbor -XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" |
no |
max_size | The maximum number of servers in this server-group | number |
2 |
no |
min_size | The minimum number of servers in this server-group | number |
1 |
no |
private_ecr_registry | The URL of an ECR registry that the sub-account has access to (e.g. '000000000000.dkr.ecr.cn-north-1.amazonaws.com.cn/') | string |
"" |
no |
record_limit | The number of events to buffer before pushing them downstream | number |
500 |
no |
scale_down_cooldown_sec | Time (in seconds) until another scale-down action can occur | number |
600 |
no |
scale_down_cpu_threshold_percentage | The average CPU percentage that we must be below to scale-down | number |
20 |
no |
scale_down_eval_minutes | The number of consecutive minutes that we must be below the threshold to scale-down | number |
60 |
no |
scale_up_cooldown_sec | Time (in seconds) until another scale-up action can occur | number |
180 |
no |
scale_up_cpu_threshold_percentage | The average CPU percentage that must be exceeded to scale-up | number |
60 |
no |
scale_up_eval_minutes | The number of consecutive minutes that the threshold must be breached to scale-up | number |
5 |
no |
sink_type | The stream technology to push messages into (either 'kinesis' or 'sqs') | string |
"kinesis" |
no |
ssh_ip_allowlist | The list of CIDR ranges to allow SSH traffic from | list(any) |
[ |
no |
tags | The tags to append to this resource | map(string) |
{} |
no |
telemetry_enabled | Whether or not to send telemetry information back to Snowplow Analytics Ltd | bool |
true |
no |
time_limit_ms | The amount of time to buffer events before pushing them downstream | number |
500 |
no |
user_provided_id | An optional unique identifier to identify the telemetry events emitted by this stack | string |
"" |
no |
Name | Description |
---|---|
asg_id | ID of the ASG |
asg_name | Name of the ASG |
sg_id | ID of the security group attached to the Collector Server node |
Copyright 2021-present Snowplow Analytics Ltd.
Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)