Skip to content

Commit

Permalink
update lustre configuration part in doc (#1284)
Browse files Browse the repository at this point in the history
  • Loading branch information
YinYangOfDao committed Aug 6, 2020
1 parent a774395 commit 57b7bc6
Show file tree
Hide file tree
Showing 2 changed files with 81 additions and 9 deletions.
74 changes: 68 additions & 6 deletions docs/deployment/Azure/cloud_init_configure.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,18 @@ azure_cluster:
- logging
number_of_instance: 1
- number_of_instance: 1
name: lustclean-lustre-mdt
- number_of_instance: 1 # MGS, also count as 1 of the MDSs
name: checkrename-lustre-mds
vm_size : Standard_B2s
vm_image: OpenLogic:CentOS-CI:7-CI:7.7.20190920
role:
- lustre
- mdt
- mds
- mgs
storage_quota:
user_soft: 3G
user_hard: 4G
user_grace_period: 4d
managed_disks:
- sku: Premium_LRS
is_os: True
Expand All @@ -73,7 +78,6 @@ azure_cluster:
fileshares:
- server_path: /lustrefs
client_mount_root: /mntdlws/lustre
# if exist, dsts join(linkroot, vc, leaf)
client_link_root: /dlwslustre
client_links:
- src: jobfiles
Expand All @@ -91,14 +95,66 @@ azure_cluster:
role:
- lustre
- oss
mds_name: checkrename-lustre-mds
managed_disks:
- sku: Premium_LRS
is_os: True
size_gb: 64
disk_num: 1
- sku: Premium_LRS
size_gb: 128
disk_num: 3
disk_num: 2
- number_of_instance: 1 # MDT2
name: checkrename-lustre-mds2
vm_size : Standard_B2s
vm_image: OpenLogic:CentOS-CI:7-CI:7.7.20190920
role:
- lustre
- mds
storage_quota:
user_soft: 2G
user_hard: 3G
user_grace_period: 1w
managed_disks:
- sku: Premium_LRS
is_os: True
size_gb: 64
disk_num: 1
- sku: Premium_LRS
size_gb: 64
disk_num: 1
fileshares:
- server_path: /lustrefs
client_mount_root: /mntdlws/lustre2
client_link_root: /dlwslustre2
client_links:
- src: jobfiles
dst: jobfiles
- src: storage
dst: storage
- src: work
dst: work
data_disk_mnt_path: /lustre
private_ip: 192.168.249.2
- number_of_instance: 2
vm_size : Standard_B2s
vm_image: OpenLogic:CentOS-CI:7-CI:7.7.20190920
role:
- lustre
- oss
mds_name: checkrename-lustre-mds2
managed_disks:
- sku: Premium_LRS
is_os: True
size_gb: 64
disk_num: 1
- sku: Premium_LRS
size_gb: 128
disk_num: 2
dedicated_vcs:
- multimedia
- number_of_instance: 1
name: lustclean-nfs-storage
Expand Down Expand Up @@ -347,4 +403,10 @@ link `/mntdlws/nfs/storage` to `/dlwsdata/storage`
NFS service might fail. We use the soft-link trick because it guarantees that when NFS service fails, operations would also fail, and we could know. Before we fix it, attempted operations would fail, but no vital damage would be caused.
* `nfs_client_CIDR`: specifies a list of IP ranges that can access NFS servers. Private IPs are allowed.

Currently, if Lustre support is desired, the only supported lustre server vm image is `OpenLogic:CentOS-CI:7-CI:7.7.20190920`, and the only supported client image is `Canonical:UbuntuServer:18.04-LTS:18.04.201912180`
Currently, if Lustre support is desired, the only supported lustre server vm image is `OpenLogic:CentOS-CI:7-CI:7.7.20190920`, and the only supported client image is `Canonical:UbuntuServer:18.04-LTS:18.04.201912180`

# Lustre Storage
Above example has 4 entries about Lustre: 1 MGS, 1 MDS and 2 OSS.
`fileshares` are configured similarly as NFS.
`storage_quota` is configured to setup storage quota, which is a global user storage quota.
`dedicated_vcs` under `managed_disks` is configured to group OSTs together to secure "throughput quota" for certain group, in our case, VCs.
16 changes: 13 additions & 3 deletions docs/deployment/Azure/cloud_init_readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,23 +98,33 @@ You may want to save the previous config files in advance.

After reconfiguration, you may use below commands to finish the new deployment of several nodes to the existing cluster:

if you are adding NFS node, need to run these lines in advance:
NFS/Lustre etc doesn't have common cloudinit script, because private IP, fileshares configuration etc. varies from instance to instance. So if you are adding NFS/Lustre node. you need to run these lines in advance:

```
./cloud_init_deploy.py render
./cloud_init_deploy.py pack
./cloud_init_deploy.py docker push cloudinit
```

workers should share the same cloudinit script, rendering is required when the shared cloudinit file is not on the devbox:
then run below lines. (start from here if you are adding workers only)
```
./cloud_init_deploy.py render
```

After rendering steps, add machines and update.
```
./cloud_init_aztools.py -v addmachines
./cloud_init_aztools.py listcluster
./cloud_init_aztools.py interconnect
```

Sometimes you might also want to add a new NFS node, which currently has not been automated. Any change to infra node would be considered a cluster change, as for now, we redeploy the whole cluster instead of adding a infra node. Contact us for more details.
If NFS/Lustre are added, it's also necessary to update `/opt/autoshare/mounting.yaml` on infra node correspondingly.

Any change to infra node would be considered a cluster change, as for now, we redeploy the whole cluster instead of adding a infra node. Contact us for more details.

## deleting machines
simply delete them from k8s and corresponding entries in status.yaml.

## dynamically scaling up/down # or workers
specify "dynamic_worker_num" in config.yaml,
Expand Down Expand Up @@ -148,7 +158,7 @@ Notice that here the script path cannot be src/ClusterBootstrap since it contain
./ctl.py [-s] [-v] [-r <role1> [-r <role2>] ...] [-r <nodename1> [-r <nodename2>]] copy2 <source path> <destination path>
```

## start/stop service
## start/stop/restart service
If you need to update service config of an already deployed node, edit status.yaml, not config.yaml
```
./ctl.py svc stop <service1, service2, ...> (e.g., ./ctl.py svc stop monitor)
Expand Down

0 comments on commit 57b7bc6

Please sign in to comment.