Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added techsupport data export manager and core file manager #468

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Added techsupport data export manager and core file manager #468

wants to merge 1 commit into from

Conversation

Kalimuthu-Velappan
Copy link
Contributor

@Kalimuthu-Velappan Kalimuthu-Velappan commented Sep 16, 2019

The techsupport data management functionality is divided into two main components.

  1. Core-dump manger.
  2. Tech-support data export manager.

1. Core-dump manager:

Added a new SONiC service named 'coredumpctl.service' to manage the core dump as follows: -

1. Support per-process core file rotation 
2. Archiving the corefile to optimize the disk space
3. Strip Core files sensitive information

2. Tech-support data export manager

Added a new SONIC service named 'export.service' to collect and export the tech-support data to a remote server for better offline debugging. The tech-support data is captured and exported under the following conditions:-

1. Collect tech-support data when a new core dump is discovered, and export it to a remote server.
2. Periodically collect and export tech-support data

Code PRs

sonic-net/sonic-buildimage#3447
sonic-net/sonic-utilities#643


a. Support per-process core file rotation and archiving to optimize disk space

b. Strip Core files sensitive information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which module does this? Debian service or ...?
Can you please share some details on, how it works?
Is there a way to enable/disable it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New service named 'coredumpctl.service' is added to monitor the per process core file rotation and archive managment. Through systemctl command, this can be enabled/disabled.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this service support user to enable core on few processes instead all at once ? Basically can we collect core per process?


2. Add a new SONIC service to collect and export the tech-support data as follows:-

a. Collect tech-support data when a new core dump is discovered, and export it to a remote server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is the tech support data collected?
Can we register a app/plugin to customize the data collected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The techsupport data size is arount 4MB. The 'generate_dump' script can be used for extending the custom data

### Config commands

>1. Config command to enable/disable the coredump generation of processes.
>2. Config command to store the details of exporting tech-support data to an external server which includes remote server name, path, transfer protocol type and the user credentials.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we register an external script to be invoked for uploading ?

This way we can extend the ability to store anywhere. For example, azure-cloud-storage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No external script. It uses the standard protocol scp/sftp to update the techsupport data to remote server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But I think, accepting a script (with full path) as part of remote server configuration (that takes server address, user, pass,...) will make this transparently extendable. You already have the FW, so adding this can be simple, I believe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note:
You don't imply "coredump generation of processes", but imply "coredump upload", correct.


>1. Config command to enable/disable the coredump generation of processes.
>2. Config command to store the details of exporting tech-support data to an external server which includes remote server name, path, transfer protocol type and the user credentials.
>2. Config command to enable/disable the tech-support export
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify:
Does disable implies "uploading core file only w/o accompanying tech support data" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disable implies whole techsupport data export and not the core file generation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help clarify:
Both core file & tech-support data goes to the same server. Correct?
Does the core file gets packaged inside tech-support data ?

### Config commands

>1. Config command to enable/disable the coredump generation of processes.
>2. Config command to store the details of exporting tech-support data to an external server which includes remote server name, path, transfer protocol type and the user credentials.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a process cores a lot, the core-rotation would help keep the local count of core files to set value, say N.
To clarify:
What about uploading, will it upload every core file, implying that the remote server could end up getting way more than N, if that process keeps crashing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uploads the current snapshot of techsupport data


# Tech-support export service

The tech-support data is a vital information for debugging of a system and is captured by collecting the device configuration, system information, log files and core files. The export service captures the tech-support data and export it to a remote server for better offline debugging. The tech-support data is captured and exported under the following conditions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of systemd-coredump service?
Will we be able to provide custom script to generate tech-support data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Export service is separate service. The standard "generate_dump" script is used for generating the techsupport data

![Tech Support export Service](images/corefilemgr.png)


The export service is configured to monitors the coredump path for any new core file creation. Upon detection of a new core file, it triggers the tech-support data collection and export it to a remote server. In addition, export service can be configured to capture and upload the tech-support data periodically.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the export service is part of systemd-coredump service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is a separate service.

![Tech Support export Service](images/corefilemgr.png)


The export service is configured to monitors the coredump path for any new core file creation. Upon detection of a new core file, it triggers the tech-support data collection and export it to a remote server. In addition, export service can be configured to capture and upload the tech-support data periodically.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add to the tech support a text file with the backtrace of the crash ?
This is a quick way to see the backtrace, instead of having to open the full core and look for the correct binary version symbols
It can be generated by "gdb thread apply all bt full" on the core, to a file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It can be added.


The [systemd-coredump](https://www.freedesktop.org/software/systemd/man/systemd-coredump.html) is a native systemd tool that is available in Debian o/s version 9 (stretch) and above. This tool provides an array of features to manage application core files. When it is installed, as part of base configuration it provides following functionality:

1. Configures kernel to dump a core when application performs unexpected exit. The process ID, UID, GID, signal received, time of termination, command name of the terminated process are collected. The core dump generate may be affected by ulimit settings. Care should be taken that ulimit settings do not conflict with systemd-coredump configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to add few more data to record maintained per core, like SONiC Version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be added



The export service is configured to monitors the coredump path for any new core file creation. Upon detection of a new core file, it triggers the tech-support data collection and export it to a remote server. In addition, export service can be configured to capture and upload the tech-support data periodically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the cost of this feature, with respect to 'size of image' ?
Will there be a possibility to build a image w/o this feature ?
More for the sake, if this feature would be expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Around 4MB. By default this service is disabled.


### Config DB Schema

In order to export the tech support data, remote server details have to be configured on the device. Through CLI interface, external storage server can be configured which includes server IP, path and access information like user credentials and transport protocol. This information is stored as part of config DB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought that the tech-support-data accompanies the core file. If so, this remote server config is for both core & tech-support-data? Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remote server config is for techsupport data which includes core file as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants