Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syslog changes Multi NPU platforms #4738

Merged
merged 8 commits into from
Jun 30, 2020

Conversation

arlakshm
Copy link
Contributor

@arlakshm arlakshm commented Jun 9, 2020

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com

- Why I did it
Add changes for syslog support for containers running in namespaces on multi NPU platforms.
On Multi NPU platforms

  • Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
  • On multi NPU platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
  • The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip

No change done for single ASIC platforms

- How I did it

  • Add new j2 file for the generating the rsyslog.conf inside the containers
  • Generate the rsyslog.conf and copy to the containers in the preStartAction function.
  • Update the rsyslog-config.sh script to update the rsyslog.conf on the host to listen to docker0 ip address in multi NPU platforms.

- How to verify it
Verify the syslogs are generated properly
Sample logs:

admin@sonic:~$ sudo grep -i orchagent /var/log/syslog.1
Jun  9 16:18:21.902661 sonic INFO swss4[612] 2020-06-09 16:18:21,902 INFO spawned: 'orchagent' with pid 42#015
Jun  9 16:18:22.142771 sonic INFO swss0[612] 2020-06-09 16:18:22,142 INFO spawned: 'orchagent' with pid 42#015
Jun  9 16:18:22.200722 sonic INFO swss1[612] 2020-06-09 16:18:22,200 INFO spawned: 'orchagent' with pid 36#015
Jun  9 16:18:22.234121 sonic INFO swss5[612] 2020-06-09 16:18:22,233 INFO spawned: 'orchagent' with pid 38#015
Jun  9 16:18:22.612859 sonic INFO swss3[612] 2020-06-09 16:18:22,611 INFO spawned: 'orchagent' with pid 36#015
Jun  9 16:18:22.802017 sonic INFO swss2[612] 2020-06-09 16:18:22,801 INFO spawned: 'orchagent' with pid 37#015
Jun  9 16:18:22.915873 sonic INFO swss4[612] 2020-06-09 16:18:22,911 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)#015
Jun  9 16:18:23.146156 sonic INFO swss0[612] 2020-06-09 16:18:23,145 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)#015
Jun  9 16:18:23.211117 sonic INFO swss1[612] 2020-06-09 16:18:23,207 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)#015
Jun  9 16:18:23.240845 sonic INFO swss5[612] 2020-06-09 16:18:23,236 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)#015
Jun  9 16:18:23.615436 sonic INFO swss3[612] 2020-06-09 16:18:23,615 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)#015
Jun  9 16:18:23.808437 sonic INFO swss2[612] 2020-06-09 16:18:23,806 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)#015

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@arlakshm arlakshm marked this pull request as ready for review June 9, 2020 16:30
@jleveque jleveque requested a review from qiluo-msft June 9, 2020 16:45
@@ -238,7 +242,9 @@ start() {
redis_dir=`echo $redis_dir_list | cut -d " " -f $id`
REDIS_MNT=" -v $redis_dir:$redis_dir:rw "
fi

# Containers in the docker network will use the syslog-driver as log-driver. This is mainly applicable for Multi NPU system
LOG_OPTS="--log-driver=syslog --log-opt syslog-address=udp://127.0.0.1:514 --log-opt syslog-format=rfc5424 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a check to make sure docker_image_run_opt does not have--log-driver=json-file ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuvarnaMeenakshi syslogs will not work if the log-driver is json on containers running the docker bridge network, so I think we can ignore --log-driver=json-file in this case.

Copy link
Collaborator

@qiluo-msft qiluo-msft Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you giver proof or reference that "syslogs will not work if the log-driver is json on containers running the docker bridge network"? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs

docker logs for swss running in namespace asic0.

admin@sonic:~$ docker logs swss0 | grep orchagent
2020-06-10 00:45:01,434 INFO spawned: 'orchagent' with pid 36
2020-06-10 00:45:02,439 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

same logs not available in syslog file

admin@sonic:~$ sudo zgrep -i orchagent /var/log/syslog.*
admin@sonic:~$

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Syslogs will not work if the log-driver is json and this is not related to docker bridge network and applicable to previous single ASIC use case.

Do you want to fix both single ASIC and multiple ASIC? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt change the Single ASIC I didnt want to break any backward compatibility for some applications,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you need to check '--log-driver=json-file' in docker_image_run_opt because it will work if user specify it explicitly #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@SuvarnaMeenakshi SuvarnaMeenakshi self-requested a review June 10, 2020 16:55
@arlakshm arlakshm marked this pull request as draft June 10, 2020 17:11
Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comment

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@arlakshm arlakshm marked this pull request as ready for review June 12, 2020 07:08
intf="lo"
fi

udp_server_ip=$(ip -o -4 addr list $intf | awk '{print $4}' | cut -d/ -f1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ip -o -4 addr list $intf [](start = 16, length = 24)

There may be more than one lo interfaces. How do you handle? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the latest commit

{%- else %}
# Containers in the docker network will use the syslog-driver as log-driver. This is mainly applicable for Multi NPU system
# In this case the syslog service on the host is listening on the docker0 address
syslog_address=$(ip -o -4 addr list docker0 | awk '{print $4}' | cut -d/ -f1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
syslog_address=$(ip -o -4 addr list docker0 | awk '{print $4}' | cut -d/ -f1)
syslog_address=$(docker network inspect bridge --format='{{(index .IPAM.Config 0).Gateway}}')
``` #Closed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This IP address is actually available in build time. Check files/docker/docker.service.conf


In reply to: 439565102 [](ancestors = 439565102)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has been removed

TARGET_IP=$(ip -o -4 addr list docker0 | awk '{print $4}' | cut -d/ -f1)
CONTAINER_NAME="{{docker_container_name}}$DEV"
TMP_FILE="/tmp/rsyslog.$CONTAINER_NAME.conf"
sonic-cfggen -t /usr/share/sonic/templates/rsyslog-container.conf.j2 -a "{\"target_ip\": \"$TARGET_IP\", \"container_name\": \"$CONTAINER_NAME\" }" > $TMP_FILE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 21, length = 1)

One extra blank #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in the latest commit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove the extra blank in

sonic-cfggen  -t

In reply to: 439579965 [](ancestors = 439579965)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit

TARGET_IP=$(ip -o -4 addr list docker0 | awk '{print $4}' | cut -d/ -f1)
CONTAINER_NAME="{{docker_container_name}}$DEV"
TMP_FILE="/tmp/rsyslog.$CONTAINER_NAME.conf"
sonic-cfggen -t /usr/share/sonic/templates/rsyslog-container.conf.j2 -a "{\"target_ip\": \"$TARGET_IP\", \"container_name\": \"$CONTAINER_NAME\" }" > $TMP_FILE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container_name [](start = 116, length = 14)

This field make original code useless.

ARG docker_container_name

Could you remove original related code in Dockerfile(s)? #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I will do it in a separate PR. Filed this issue to track this.

# running on the namespace to reach the rsyslog service running on the host
# Also update the container name
if [[ ($NUM_ASIC -gt 1) ]]; then
TARGET_IP=$(ip -o -4 addr list docker0 | awk '{print $4}' | cut -d/ -f1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TARGET_IP=$(ip -o -4 addr list docker0 | awk '{print $4}' | cut -d/ -f1) [](start = 8, length = 72)

This IP address is actually available in build time. Check files/docker/docker.service.conf #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't want to hard code the ip address here. So using this method to get the docker0 ip address

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean docker0 IP address is actually available in build time. Check files/docker/docker.service.conf. You are ok to use ip command, but build time constant may be easier.


In reply to: 439581391 [](ancestors = 439581391)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Your solution is future proof.


In reply to: 440542831 [](ancestors = 440542831,439581391)

if [[ ($NUM_ASIC -gt 1) ]]; then
TARGET_IP=$(ip -o -4 addr list docker0 | awk '{print $4}' | cut -d/ -f1)
CONTAINER_NAME="{{docker_container_name}}$DEV"
TMP_FILE="/tmp/rsyslog.$CONTAINER_NAME.conf"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TMP_FILE="/tmp/rsyslog.$CONTAINER_NAME.conf" [](start = 8, length = 44)

Could you use Dockerfile ENV and docker create --env to pass the container_name. Then no need to create tmpfile. #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tmp file is created after rendering this template /usr/share/sonic/templates/rsyslog-container.conf.j2, which has updated syslog.conf file. I don't think the tmp file can be avoided even if we pass the ENV to the container.

In Multi-NPU platforms it is possible that multi containers can be started at the same time, to avoid container using wrong syslog.conf files I am using the container name in the tmp filename

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create two ENV:

  1. RSYSLOG_SERVER_IP
  2. CONTAINER_NAME

Pass the ENV values by command docker create ... --env RSYSLOG_SERVER_IP=AA.BB.CC.DD ...

The benefit:

  1. not tmp file management
  2. works with multiple containers from single image

In reply to: 439583041 [](ancestors = 439583041)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft created a issue to track 4778 to improvements suggested in this comment.

@@ -228,6 +244,10 @@ start() {
done
fi
{%- endif %}
# single NPU systems and container running on the host network continue to use the json-file as log-driver
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single NPU systems and container running on the host network continue to use the json-file as log-driver [](start = 8, length = 106)

The different behavior of single NPU vs multiple NPU is confusing and complex deign.

What about moving all use cases to log-driver, or all use cases to json-file as before? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use json-file for all cases

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@@ -228,6 +245,7 @@ start() {
done
fi
{%- endif %}

else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change this line #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

@@ -238,7 +256,6 @@ start() {
redis_dir=`echo $redis_dir_list | cut -d " " -f $id`
REDIS_MNT=" -v $redis_dir:$redis_dir:rw "
fi

{%- if docker_container_name == "database" %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change this line #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
# running on the namespace to reach the rsyslog service running on the host
# Also update the container name
if [[ ($NUM_ASIC -gt 1) ]]; then
TARGET_IP=$(ip -o -4 addr list docker0 | awk '{print $4}' | cut -d/ -f1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TARGET_IP=$(ip -o -4 addr list docker0 | awk '{print $4}' | cut -d/ -f1)
TARGET_IP=$(docker network inspect bridge --format='{{(index .IPAM.Config 0).Gateway}}')
``` #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit

qiluo-msft
qiluo-msft previously approved these changes Jun 18, 2020
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
@arlakshm
Copy link
Contributor Author

@qiluo-msft , @SuvarnaMeenakshi , @lguohan
Please help approve if there are no more comments

@arlakshm arlakshm merged commit ef994a1 into sonic-net:master Jun 30, 2020
@arlakshm arlakshm deleted the multi_npu_syslog_changes branch June 30, 2020 13:29
abdosi pushed a commit that referenced this pull request Jul 5, 2020
Add changes for syslog support for containers running in namespaces on multi ASIC platforms.
On Multi ASIC platforms

Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
qiluo-msft pushed a commit to qiluo-msft/sonic-buildimage that referenced this pull request Jul 12, 2020
Add changes for syslog support for containers running in namespaces on multi ASIC platforms.
On Multi ASIC platforms

Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
qiluo-msft pushed a commit to qiluo-msft/sonic-buildimage that referenced this pull request Jul 12, 2020
Add changes for syslog support for containers running in namespaces on multi ASIC platforms.
On Multi ASIC platforms

Rsyslog service is only running on the host. There is no rsyslog service running in each namespace.
On multi ASIC platforms the rsyslog service on the host will be listening on the docker0 ip address instead of loopback address.
The rsyslog.conf on the containers is modified to have omfwd target ip to be docker0 ipaddress instead of loopback ip

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants