Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster/hardware sizing #6

Closed
rbo opened this issue Jun 2, 2021 · 8 comments
Closed

Cluster/hardware sizing #6

rbo opened this issue Jun 2, 2021 · 8 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@rbo
Copy link
Member

rbo commented Jun 2, 2021

We have to clarify the cluster/hardware sizing.

Initially, for development & testing, I started with:

Amount Type CPU RAM Disks
3 SB48 Intel Core i7-7700 64GB 2x SSD SATA 256 GB
1 SB49 Intel Core i7-6700 64GB 2x SSD SATA 256 GB
@tumido
Copy link
Member

tumido commented Jun 2, 2021

How many clusters we plan on deploying in Hetzner? Can we get more than one?

Also, I think Operate First would appreciate clusters dedicated to:

  • storage (minimal HW requirements as per OCS docs here with raw capacity in tens of TiBs)
  • computing/GPUs focused (at least similar to MOC Zero cluster - 9 nodes, in total: 260CPU cores, 400 GiB RAM, no storage needed + GPUs)

/cc @durandom @goern

@rbo
Copy link
Member Author

rbo commented Jun 2, 2021

How many clusters we plan on deploying in Hetzner? Can we get more than one?

Also, I think Operate First would appreciate clusters dedicated to:

  • storage (minimal HW requirements as per OCS docs here with raw capacity in tens of TiBs)

Maybe we can try to use a compact deployment.
Don't want to waste too many resources on the control plane.

We can choose one of the AX* Servers. Minimum AX51-NVME or higher.

  • computing/GPUs focused (at least similar to MOC Zero cluster - 9 nodes, in total: 260CPU cores, 400 GiB RAM, no storage needed + GPUs)

Sadly GPU is not available at Hetzner.

Controle Plane I recommend AX41-NVME or AX51-NVME
Infra Nodes: depend on what we want to deploy: OpenShift Logging?
Compute: AX101 or AX161

Maybe it makes sense to set up only one cluster to save resources ~ budget.

Try to get some cheap servers from the Serverbörse

/cc @durandom @goern

@durandom
Copy link
Member

durandom commented Jun 2, 2021

Let's think about our user base for this deployment.
The audience would be solution architects and others, get a namespace and you can do whatever you want.
A little bit more relaxed than the zero cluster.
And we should make sure we're running always at 80% capacity, since we pay for it. Therefore I'd start with the smallest supported setup and then grow by demand.

@goern
Copy link
Member

goern commented Jun 4, 2021

BTW it look like we need some GPU for https://github.com/AICoE/Varangian
If no GPU is agailable at hetzner, @durandom do we get them from AWS?

@rbo
Copy link
Member Author

rbo commented Jun 22, 2021

Because of #8, we decided to start with:

Role Amount Type CPU RAM Disks
Master 3 SB48 Intel Core i7-7700 64GB 2x SSD SATA 256 GB
Worker 3 SB94 Intel Xeon E5-1650V3 256GB 2x SSD SATA 480 GB

One of the workers will be used as a bootstrap during installation and added to the cluster after installation.

This was referenced Jun 23, 2021
@sesheta
Copy link
Member

sesheta commented Oct 15, 2021

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@sesheta sesheta added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2021
@sesheta
Copy link
Member

sesheta commented Nov 14, 2021

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@sesheta sesheta added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 14, 2021
@rbo
Copy link
Member Author

rbo commented Nov 15, 2021

initial sizing is done. We can close the issue.

@rbo rbo closed this as completed Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants