GH-524 staging environment #532

alabdao · 2023-07-20T21:26:25Z

Split out docker setup for potentially used on other nodes.

Putting custom vars into extra-vars file to be referenced on cli
while executing script.

Fixes #524

Split out docker setup for potentially used on other nodes. Putting custom vars into extra-vars file to be referenced on cli while executing script.

vercel · 2023-07-20T21:26:28Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 21, 2023 2:36pm

hevans66 · 2023-07-20T22:54:13Z

infrastructure/terraform/staging/lb_public.tf

+
+  # slow_start = 60
+
+  # TODO: need to figure out healthcheck for IPFS


Yeah, this is a rabbit hole I started down at some point.

hevans66

This is looking great its certainly a more complete set up than we had before.

Coupe of notes:

This does not set up a receptor instance for staging. I think that's ok for now, but I think as soon as the receptor starts doing fancier things (like actually rejecting jobs based on criteria) we will want to add a receptor for testing purposes.
I'm a little worried about the requester instances having an auto scaling group. Mostly because I don't really know what it means for two bacalhau requester nodes (that are not peered together)
to share the same set of compute nodes. What would happen if one requester node accepts a job and hands it off to a compute node, then the cli later requests the job status and that request goes to a different requester node? The requester node is not doing any computation so I don't anticipate ever really needing more than one. Unless the idea is to never n > 1 for this asg.
This still won't automatically run the ansible provision scripts when an instance launches right? For now were still running ansible-playbook from command line?

hevans66 · 2023-07-20T23:04:08Z

infrastructure/ansible/provision_compute_only.yaml

@@ -1,109 +1,67 @@
 - name: Provision Bacalhau Compute Instance
  remote_user: ubuntu
-  hosts: tag_Type_compute_only:&tag_Env_prod


Are you limiting to specific Env's when running ansible-playbook? Or is there some magic I am missing.

yeah using --limit tag_Env_staging while executing ansible-playbook command.

alabdao · 2023-07-21T14:19:37Z

This is looking great its certainly a more complete set up than we had before.

Coupe of notes:

* This does not set up a receptor instance for staging. I think that's ok for now, but I think as soon as the receptor starts doing fancier things (like actually rejecting jobs based on criteria) we will want to add a receptor for testing purposes.

Yes recepter will definitely come later.

* I'm a little worried about the requester instances having an auto scaling group. Mostly because I don't really know what it means for two bacalhau requester nodes (that are not peered together)
  to share the same set of compute nodes. What would happen if one requester node accepts a job and hands it off to a compute node, then the cli later requests the job status and that request goes to a different requester node? The requester node is not doing any computation so I don't anticipate ever really needing more than one. Unless the idea is to never n > 1 for this asg.

Need to ASG is to have HA capability in case of node becoming unhealthy. Work needs to be done to make Bacalhau and IPFS state to be preservable probably EFS would do the job.
Needed LB target to terminate public traffic. Step towards having nodes NOT having public IP addresses being accessed via bastian/VPN/EC2 Instance Connect Endpoint.
Easier bootstrapping mecnism.
At some point probably have multiple requesters with Stickness enabled so client hits the same node.

* This still won't automatically run the ansible provision scripts when an instance launches right? For now were still running ansible-playbook from command line?

Yes correct. This is step towards that. Having compute nodes being completely dynamic with headless setup.

Against incorrect environment

thetechnocrat-dev

Looks good to me, but I'll defer to @hevans66 on the final approval. I do think the large upcoming strategic decision that is coming up is, how we organize the infrastructure code for the private versus public clusters.

thetechnocrat-dev · 2023-07-21T16:06:03Z

infrastructure/ansible/provision_requester.yaml

@@ -4,6 +4,12 @@
  vars:
    ipfs_path: /opt/local/ipfs
  tasks:
+    # Must provide limit flag to ensure running against current environment
+    - fail:


Cool, didn't know about this trick

hevans66 and others added 4 commits July 18, 2023 15:22

fix flags for requester/compute split

aebc818

Merge branch 'main' into as/staging-environment

c7a9e71

Merge branch 'main' into as/staging-environment

6224e9b

GH-524 Terraform and Ansible for staging environment

caa4e63

Split out docker setup for potentially used on other nodes. Putting custom vars into extra-vars file to be referenced on cli while executing script.

alabdao temporarily deployed to ci July 20, 2023 21:26 — with GitHub Actions Inactive

vercel bot deployed to Preview July 20, 2023 21:26 View deployment

alabdao requested a review from hevans66 July 20, 2023 21:27

hevans66 reviewed Jul 20, 2023

View reviewed changes

hevans66 approved these changes Jul 20, 2023

View reviewed changes

force to providing limit, protecting against accidental runs

54134ed

Against incorrect environment

alabdao temporarily deployed to ci July 21, 2023 14:34 — with GitHub Actions Inactive

trim whitespace

f250968

alabdao temporarily deployed to ci July 21, 2023 14:35 — with GitHub Actions Inactive

vercel bot deployed to Preview July 21, 2023 14:36 View deployment

alabdao merged commit 9538de2 into main Jul 21, 2023
3 checks passed

alabdao deleted the ops/524-staging-environment branch July 21, 2023 16:04

thetechnocrat-dev reviewed Jul 21, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-524 staging environment #532

GH-524 staging environment #532

alabdao commented Jul 20, 2023 •

edited

vercel bot commented Jul 20, 2023 •

edited

hevans66 Jul 20, 2023

hevans66 left a comment

hevans66 Jul 20, 2023

alabdao Jul 21, 2023

alabdao commented Jul 21, 2023

thetechnocrat-dev left a comment

thetechnocrat-dev Jul 21, 2023


		# slow_start = 60

		# TODO: need to figure out healthcheck for IPFS

GH-524 staging environment #532

GH-524 staging environment #532

Conversation

alabdao commented Jul 20, 2023 • edited

vercel bot commented Jul 20, 2023 • edited

hevans66 Jul 20, 2023

Choose a reason for hiding this comment

hevans66 left a comment

Choose a reason for hiding this comment

hevans66 Jul 20, 2023

Choose a reason for hiding this comment

alabdao Jul 21, 2023

Choose a reason for hiding this comment

alabdao commented Jul 21, 2023

thetechnocrat-dev left a comment

Choose a reason for hiding this comment

thetechnocrat-dev Jul 21, 2023

Choose a reason for hiding this comment

alabdao commented Jul 20, 2023 •

edited

vercel bot commented Jul 20, 2023 •

edited