Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version and TF module updates #13

Closed
wants to merge 4 commits into from
Closed

Conversation

git-ival
Copy link
Collaborator

@git-ival git-ival commented Sep 13, 2023

The changes can be summarized as follows:

  • ignore tfvars in VCS
  • add input vars for ami and instance_types
  • add options to upstream and downstream cluster objects
    • refactor of downstream_clusters local var
    • added optional datastore field
  • add more specific version constraints to tf providers
  • default to rancher-eng aws profile for aws tf provider auth

Copy link
Owner

@moio moio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @git-ival, thanks for this contribution!

Sorry for bringing in quite a lot of questions to your first PR 😇

Don't be scared, I like most of what I am seeing here 😉

terraform/main/aws/inputs.tf Outdated Show resolved Hide resolved
terraform/main/aws/main.tf Outdated Show resolved Hide resolved
variable "downstream_config" {
type = object({
name = optional(string)
server_count = number
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This opens an interesting design question: if we make those parameters variable, then a commit id will not suffice to describe the whole test environment - one will need the id plus all specified variables.

My original intent was to go all-in with code hardcoding parameters, so that the commit id is all one needs to know to repeat the test. Of course there are bits that can't be hardcoded - namely credentials and other security material - but those would be the only exceptions.

That intent comes from the past experience that those parameters then have to be tracked, and frequently end up in another Git repo that contains workflow definitions specifying environment variables for the tests. So one needs two Git repos, and their commit ids, before knowing the full picture.

What are your reasons to keep those variables variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I hadn't realized that was the method being used for keeping track of the test environments specification. I like the flexibility that input variables can offer, but I understand the desire for being able to quickly and easily reproduce an env.

I see a potential usecase for corral and it's "corrals" (corral-packages). Or at the very least, the concept of a sort of manifest for an env that could be updated per desired config (see an example).

Alternatively we could track the actual testing, observations/results and config (input vars, etc.) via GH issues if that suffices and is desirable.

Otherwise, we can revert this change and leave things hardcoded.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two aspects: first, I understand the need to override any piece of configuration as a "temporary" measure during development. At the moment that is possible via tfvars but not all useful aspects in main.tfs are actually variables - I have no problem in accepting patches to expose any useful aspect as variables with default values, so that they can be overridden when needed. In future, I can imagine this project running in CI always on defaults, and engineers being able to override any aspects in their setups during development. I am also open to consider alternative overriding mechanisms to make DX better.

Second: in a future where this project runs in CI, and where there is more than one set of "configuration parameters" (eg. the testsuite running on both AWS and Azure at any given time), where should such parameters be stored? Choosing where they are stored is important because we will need to long term record that information along with test results, in order to make them repeatable/verifiable. The obvious option in my mind is some git repo, as we will need tracking/PRs/blame/etc.

I see two options:

  1. any such configuration parameter lives in the testsuite repo (this) alongside code. Implication: "environment files"/"corral files"/"tfvar files" are stored in a directory here, and a commit id to this repo contains everything needed to reproduce (modulo security-sensitive parameters such as AWS credentials). Other implication: a new test environment implies creating a new branch and changing parameters there. Further development will go on in master, and if new features (eg. new tests) are implemented, a rebase of test branches is needed to "sign off" that they now run on the new code - along with any related parameter adjustment
  2. there is a separate repo with "environment files"/"corral files"/"tfvar files". Thus a build is defined by a combination of one commit ID from this repo and one commit ID with the "parameters repo". More pragmatically the parameters repo could contain multiple environments, so a build will be defined by two commit IDs and one "identifier" of some sort (eg. directory name or manifest path). Advantage: no need to branch off from master in this repo to create a new environment. Disadvantage: two repos to keep in sync.

In my experience I have seen projects using 2., and the bookkeeping kind of bored me. In this project I tried to go with 1., and found out rebasing is also meh, but I still like it a little bit better. Still I don't want to take a decision based on my feelings/tastes alone.

What is your experience and what do you think could serve us best?

Copy link
Collaborator Author

@git-ival git-ival Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the problem of where to store test env configs has been on my mind for some time, even outside of this repo's concerns. I think like you said, it would likley be easiest to store in a VCS/repo in order to more easily track the point-in-time configs, etc.

An alternative to the more "brute force" git branching/commit ID method, corral is actually capable of creating and storing packages in an OCI registry. See here for an example of some packages created via corral and pushed to the github packages for the repo. Corral has a command to easily publish a given package to a given registry via corral package publish </path/to/local/corral/package><OCI registry path:tag>. I think this could help solve the config storage issue by publishing packages to an OCI registry (tagged with the commit id and/or datetime stamp), this would also make things very easily reproducible for anyone who is less dev-oriented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, I think leaving the downstream cluster config as an input variable is still feasible since I have mantained the defaults for each field and I will make the num_downstreams default to 5 to stay in-line with the terraform config that was in-place before these changes.

Copy link
Owner

@moio moio Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, I think leaving the downstream cluster config as an input variable is still feasible since I have mantained the defaults for each field and I will make the num_downstreams default to 5 to stay in-line with the terraform config that was in-place before these changes.

Yes, that makes total sense regardless of the rest of this discussion. Thanks.

Back to your point:

corral is actually capable of creating and storing packages in an OCI registry

That is an interesting feature, and it helps with an important problem (ease of reproducibility), but unless I am missing something I see that as separate design problem to solve.

Specifically, I do not see corral helping answering the question "where are test env vars tracked". Let's assume we develop a benchmark in order to, say, stress test the Steve API. That will be parametric on the Rancher version, the number of downstream clusters, the distro used underneath Rancher, the number of nodes and their sizing, the number of concurrent users hammering the API, and probably many more parameters.

IIUC corral will not help in storing those parameters, nor giving a certain "set of environment variables values" an ID, nor having a way to reference that ID from test results (eg. stored Prometheus metrics). Am I missing anything?

If not, where do you envision to store and how do you envision to reference those per-test sets of environment variable values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a bit easier to discuss via Teams or Slack but:

Specifically, I do not see corral helping answering the question "where are test env vars tracked".

To this point, I would imagine that this is where the corral package manifest comes in. Say we converted an existing module to be more "dynamic" (various input vars, maybe even some logic switches/optional resource creation if desired/needed), we could create different corral package templates that utilize this module and pass different input vars to modify exactly what sort of infra is spun up. Thus rather than relying on various git branches we could have a single git branch with multiple corral package templates that have the input var values baked into the manifest. All in the same repository.

In this way we could create various templates with specific input vars baked into the manifest (much the same way as you are doing via hardcoding inputs in git branches), and we could publish them to an OCI registry (like Github Packages) and use the git commit ID to tie them to the point-in-time versions of the modules etc.

terraform/modules/aws_loadbalancer/inputs.tf Outdated Show resolved Hide resolved
terraform/modules/aws_loadbalancer/main.tf Show resolved Hide resolved
Copy link
Owner

@moio moio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some questions.

One more general note: this PR and its commits have now lost the original intent, and contain fixes and improvements that go in many different directions.

In future, please consider separate PRs (or at least separate commits) to make granular merging easier. Thanks.

@@ -137,7 +138,7 @@ resource "aws_route_table_association" "secondary_private" {
}

resource "aws_vpc_dhcp_options" "dhcp_options" {
domain_name = var.region == "us-east-1" ? "ec2.internal" : "${var.region}.compute.internal"
domain_name = contains(["us-east-1", "us-west-1"], var.region) ? "ec2.internal" : "${var.region}.compute.internal"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my understanding that the original is the default behavior:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-naming.html#instance-naming-rbn

    Format for an instance in us-east-1: ec2-instance-id.ec2.internal
    Example: i-0123456789abcdef.ec2.internal
    Format for an instance in any other AWS Region: ec2-instance-id.region.compute.internal
    Example: i-0123456789abcdef.us-west-2.compute.internal

What is the reason why you set .ec2.internal for us-west-1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you're correc, although I remember facing some issues when trying to use this module with us-west-1 without things change. I will revert this for now

tags = {
Project = var.project_name
Name = "${var.project_name}-nat-eip"
}
depends_on = [aws_internet_gateway.main]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to add this dependency? Did you encounter errors without it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see the documentation for aws_eip

The relevant note:
image

@@ -0,0 +1,21 @@
variable "subdomain" {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to add a load balancer to the Terraform configuration, rather than having that managed by Kubernetes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason other than it is what I'm accustomed to. I've personally never used k8s in order to provision and manage cloud-provider loadbalancers 🤷‍♂️. This module can be removed if its not particularly useful

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem with this is that it is not really realistic in terms of how actual prod environments look like, and it's significantly more complex than just exposing a node's IP. I proposed the latter approach in #17. Please remove this part

@moio
Copy link
Owner

moio commented May 3, 2024

Work to continue as part of project dartboard, I am closing this PR for the time being

@moio moio closed this May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants