Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make embedded diagnostic more human readable #886

Merged
merged 10 commits into from
Jul 28, 2023

Conversation

valegagge
Copy link
Member

@valegagge valegagge commented Jun 6, 2023

In this PR, I created the new diagnostic formatter and a parser for each diagnostic message class.

The goal of this work is to start to improve the robot's embedded diagnostic making it more human-readable.

Overview of the code

I developed a formatter that is called on the reception of a diagnostic message.
Depending on the diagnostic class (Motioncontrol, Hardaware,Configuartion, etc ) the formatter calls the propper parser.

Class Diagram
Screenshot from 2023-06-06 14-38-48
The DefaultParser prints the diagnostic information as now.

Sequence Diagram: yarprobotinterface receives an error belonging to the motion-control class
Screenshot from 2023-06-06 14-39-44

NOTE
If an unknown error is received or the parser doesn't know how to parse the message the default parser is invoked: it prints the massage like it now.

How does this PR affect the developer?

Change param16 and param64

If you (as a developer) want to change the contents of param16 and param64 of the error e belonging to the error-class c, you need to update also the parsing of the message in the c class parser.

Add a new error e belonging to the already existing class c

First of all, you need to update the LUT in the icub-firmware-shared as usual.
If you want to add a human-readable parser, you should add the parser case in the c class parser, otherwise, the default parser will be used when yarprobotinterface receives this new error e.

Files

The new files are:

  • diagnosticInfoFormatter.h/cpp: contains the formatter definition/implementation and some auxiliary class
  • diagnosticInfoParser.cpp: contains the definition of all new parsers
  • diagnsoticInfo.h/cpp: contains the definition/implementation of the interface class EbeddedInfo
  • diagnosticInfoFormatter_hid.h: all auxiliary classes are here defined.

Test performed

Currently, this code has been tested on the small joint setup and on iCubGenova11.

cc @MSECode

@valegagge
Copy link
Member Author

This PR relies on the robotology/icub-firmware-shared#84 PR.

@pattacini
Copy link
Member

This PR relies on the robotology/icub-firmware-shared#84 PR.

Therefore, we ought to keep it in draft 😃

@pattacini pattacini marked this pull request as draft June 6, 2023 13:00
@valegagge
Copy link
Member Author

As @marcoaccame suggested, I updated the note explaining the needed step in case a developer wants to add a new message or change an existing one.

Copy link
Contributor

@marcoaccame marcoaccame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is quite big, it touches a lot of diagnostics messages (216) and it proposes a new way to process them. So, I have preferred to analyze things first: at first a recap of how it is working now, then what is the proposed change and finally the review.

All this is useful for me as well, so that I can recall what has been done all over the years.

In robotology

In here, a recap of the operations done at reception of each diagnostic message.

  • Every 1 ms the RX thread in icub-head processes all the UDP frames coming from a board. Inside each frame there can be some ROPs with a diagnostic message w/ a given {code, par16, par64}.

  • The code identifies one of the 216 errors, which belong to 9 categories, the par16 and par64 contain parameters specific to a particular error. So, for example if code refers to eoerror_value_SYS_ctrloop_execoverflowRX from category eoerror_category_System which is emitted when duration of the control loop RX phase exceeds the maximum time, par16 and par64 contain the packed information of the latest 5 durations of the all the phases of the control loop.

  • For each diagnostic ROP the RX thread executes the callback s_eoprot_print_mninfo_status() which takes care of producing a meaningful string for the received {code, par16, par64}.

  • For the majority of messages we just pick up a constant string from a LUT inside icub-firmware-shared , append the stringised par16 and par64 and print. This can be done in efficient way but w/ poor readability.

  • Over the time we produced a formatted string only for some messages which we felt more meaningful. They are the following.

    • eoerror_value_CFG_candiscovery_started, _ok, _detectedboard, _boardsmissing, _boardinvalid. They are processed inside function s_process_category_Config() which prints the result of a CAN discovery done by the ETH board.
    • eoerror_value_SYS_canservices_canprint. The CAN PRINT messages contains a string split inside a burst of CAN frames and a eoerror_value_SYS_canservices_canprint diagnostic message contains just one CAN frame. These messages are managed inside EthResource::CANPrintHandler() which collects several values of par64 (containing the actual CAN messages of the CAN PRINT) and when the burst is finished it produces the string. It is thecan_string_eth classe which does the job. It must be persistent and have a memory for each CAN board below each ETH board. So it was placed inside EthResource in form of one instance per CAN board below the managed ETH board.

In this PR

This Pr adds more readability to all the other messages in the following way.

  • The entry point is still s_eoprot_print_mninfo_status().

  • The management of messages eoerror_value_SYS_canservices_canprint does not change and stays inside EthResource.

  • All the other messages are managed by the new function feat_manage_diagnostic() .

  • The eoerror_value_CFG_candiscovery_started etc. are now processed inside feat_manage_diagnostic() with the same code retrieved from s_process_category_Config() and placed inside a new class.

  • The function feat_manage_diagnostic() uses some new classes which produce the string for each message. The new classes are all created in runtime starting from function and then destroyed or are automatic variables constructed and destructed. They are the ones described in here.

    • Diagnostic::EmbeddedInfo: just contains the strings to be printed for the message,

    • Diagnostic::LowLevel::InfoFormatter: is the wrapper class to the various parsers and fills information inside EmbeddedInfo.

    • In its inside, InfoFormatter creates and destroys one class derived from DefaultParser which does the actual job. The specific class depends on the eOerror_category_t of the error. They are 8 + the default parser: ConfigParser , etc.. The class for eoerror_category_Debug is missing but for that it is used the DefaultParser.

The review

There are two points: one related to the behavior of the code, the second related to the SW architecture.

The behavior

There are some small changes related to the management of some messages that should correct most things. See the table.

error value note
eoerror_value_SYS_canservices_canprint The relevant code inside SysParser::parseInfo() is never executed, so it is better to remove it and add a comment where the decoding happens.
eoerror_value_CFG_candiscovery_started, _ok, _detectedboard, _boardsmissing, _boardinvalid The code inside function s_process_category_Config() is not needed anymore, it is better to clean it up.
eoerror_value_SYS_ctrloop_execoverflowRX The par64 is parsed incorrectly. It is not separated in the four values and it prints just one big number.
eoerror_value_SYS_ctrloop_execoverflowDO same as before
eoerror_value_SYS_ctrloop_execoverflowTX same as before
eoerror_value_SYS_canservices_parsingfailure It does not contain the CAN payload contained in par64
eoerror_value_SYS_canservices_monitor_lostcontact, eoerror_value_SYS_canservices_monitor_stillnocontact OK thanks. I have seen that @MSECode has added them. Just pls specify that the time is in ms.
eoerror_value_SYS_proxy_forward_callback_fails and others inside the specialize parser Some of them have a non zero par16 and par64 which are not printed. So it can be useful for them to print the values just in case. Having said that, messages like that are probably unlikely to be seen.
Those inside eoerror_category_Debug Just a memo: the specialized class is not implemented, so for them it is used the DefaultParser.

The architecture

The new classes used by feat_manage_diagnostic() such as Diagnostic::LowLevel::InfoFormatter etc are created and destroyed each time, so potentially many times in a ms. Typically we don't create and destroy in runtime, and possibly at that high frequency, to avoid un-necessary processing in a critical point.

The classes could be instantiated in advance and stay alive for all the duration of yarprobotinterface. In this way also we could also manage diagnostics messages that must be decoded in bursts such as the eoerror_value_SYS_canservices_canprint or future ones. If we keep a memoryless approach we could not for instance move what inside EthResource to process the CAN prints.

It is just a suggestion of what I would have done. About the decision I prefer to keep it under discussion.

eOerror_category_t category = eoerror_code2category(m_infobasic->properties.code);
switch(category)
{
case eoerror_category_Config: parser_ptr = std::make_unique<ConfigParser>(dnginfo, entityNameProvider); break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside switch(category) the pointer parser_ptr uses a class that is created and then destroyed at each call of this function getDiagnosticInfo() which is called at the reception of every diagnostics message received by yarprobotinterface, so potentially many times every ms.

That could be heavy.

To avoid the continuous creation and destruction of the classes, an instance of ConfigParser, MotionControlParser etc could be created at startup and stay alive for the whole life of yarprobotinterface. The function getDiagnosticInfo() would just assign to parser_ptr the pointer to the relevant class.

Copy link
Member

@pattacini pattacini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, update icub-firmware-shared tag and check.

@valegagge
Copy link
Member Author

Currently, @MSECode and I are doing some fixes related to the TODO left in the code and the changes required in the PR of fw shared PR: robotology/icub-firmware-shared#84

Copy link
Member

@pattacini pattacini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me 👍🏻

Don't know what is the status of the PR wrt @marcoaccame's comments/requests.

Any updates @valegagge @MSECode?

@valegagge
Copy link
Member Author

valegagge commented Jul 27, 2023

Today @MSECode and I tested this PR and its related PR on icub-fw-shared robotology/icub-firmware-shared#84

During the test we check some errors like fw version wrong, disconnected a mais, etc

All work fine. So we can merge.

We increase of the required fw version to 1.35.1

cc @marcoaccame

@pattacini
Copy link
Member

Great!

Awaiting robotology/icub-firmware-shared#84 to be merged before turning this PR into ready for review to enable the CI.

@marcoaccame
Copy link
Contributor

marcoaccame commented Jul 28, 2023

Great!

Awaiting robotology/icub-firmware-shared#84 to be merged before turning this PR into ready for review to enable the CI.

icub-firmware-shared is merged now! yes, we can wait for the CI and merge.

@pattacini can you pls change the status of the PR into ready for review?

@pattacini pattacini marked this pull request as ready for review July 28, 2023 07:41
@pattacini
Copy link
Member

@pattacini can you pls change the status of the PR into ready for review?

Done 👍🏻

@pattacini pattacini merged commit 17decbc into robotology:devel Jul 28, 2023
4 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants