Switch to Fluent bit from Fluentd#943
Conversation
Pull Request Test Coverage Report for Build 2731
💛 - Coveralls |
|
What you meant by "nfs stores by container id"? Right now the pod name and container id within pod is the same, I was thinking about using main as container name, but we can stick to this if this may break something. |
|
Hi @xudifsd Currently the job log on NFS is grouped by container_id (https://github.com/microsoft/DLWorkspace/blob/dltsdev/src/ClusterManager/joblog_manager.py#L59-L70) while the GetJobLog API group job logs by pod_name (https://github.com/microsoft/DLWorkspace/blob/dltsdev/src/utils/JobRestAPIUtils.py#L649-L659) Are we currently or in the future plan to put 2 or more containers in a pod? if no, I will only keep logs grouped by only pod_name, which is good for job scheduling. |
|
I think we'd better unified to use only pod name. Seems we will not change this one pod one container policy for jobs, but I've heard that AML will put a sidecar inside job pod to facilitate log and metric collection. So we'd better prepare for this. |
|
|
Workaround an issue in hostNetwork deployment: perviously we are able to deploy fluentd to hostNetwork but it does not work fine when switching to fluent-bit. Fluentd requests k8s apiserver through IP in environment variables which fluent bit requests k8s apiserver through internal domain ( After configure fluent bit to request k8s apiserver through IP, ( Since k8s CA have only these domain names for SNI ( Unfortunately Tracking in fluent/fluent-bit#1615 (comment) |
cb33fe5 to
dbbdb4c
Compare
|
Why we want fluent bit use host network? For efficiency? |
Yes |
Should we bring docker.container_id back? No@Anbang-Hu @xudifsd currently the restfulapi is retrieving logs by pod_name and nfs stores logs by container_id. In which case are they different?