Skip to content

Latest commit

 

History

History
202 lines (170 loc) · 12.1 KB

README.md

File metadata and controls

202 lines (170 loc) · 12.1 KB

RTI LogParser Tutorial

Quick-start guide for RTI Log Parser. Learn how to master RTI® Connext® DDS logs in a few minutes.

Set Up

  1. Install Python.

  2. Download the GitHub repository with the program and tutorial files.

    1. Get the zip file from here.
    2. Or git clone the repository by running: git clone https://github.com/rticommunity/rticonnextdds-logparser
  3. Optionally, create a standalone bundle for the program by running the script ./create_redist.sh on Unix.

To run the program, use: python rtilogparser.py or just ./rtilogparser.

Part 1 - Understanding logs

The first part of this tutorial covers how to understand logs with RTI Log Parser, using the log file logs/log1.txt. This log file was generated by enabling the highest verbosity level in the asynchronous example of the RTI Community Forum. If you can't run Log Parser, you can get a preview of the output in the file logs/parsed1.md.

  1. Open the log file logs/log1.txt with a text editor. As you can see, it contains about 1,000 lines of information that is not easy to understand. Let's take as an example one of the latest lines:

    COMMENDSrWriterService_agentFunction: writer oid 0x80000003 sends sn (0000000000,00000050)
    

    From this log we know that a DataWriter for a non-keyed topic has sent the sample with sequence number 50.

  2. Let's try to get the same information with Log Parser. Run the following command: python rtilogparser.py -i tutorial/logs/log1.txt. Now it's easier to understand the behavior of the application:

    |         |                | [App] Writing async, count 48
    |         |   W-K_800000   | Scheduled DATA [49]
    |         |                | [App] Writing async, count 49
    |         |   W-K_800000   | Scheduled DATA [50]
    |   <---  |   W-K_800000   | Sent DATA [41]
    |   <---  |   W-K_800000   | Sent DATA [42]
    |   <---  |   W-K_800000   | Sent piggyback HB to acknowledge samples in [39, 42]
    |   <---  |   W-K_800000   | Sent DATA [43]
    

    The above lines show how an asynchronous DataWriter works: when calling write(), the sample is scheduled. Later, the flow-controller sends it. The samples are also acknowledged by the remote DataReader (the samples between 39 and 42).

  3. We can find out the delay between calling write() and sending a sample in an asynchronous write by enabling the timestamps. For instance: python rtilogparser.py -i tutorial/logs/log1.txt --show-timestamp

    2016-05-11T16:25:25.018000 |          |      | W-K_800000 | Scheduled DATA [38]
    ...
    2016-05-11T16:25:25.398995 |          | <--- | W-K_800000 | Sent DATA [38]
    ...
    2016-05-11T16:25:25.400995 | H3.A2.P1 | ---> | W-K_800000 | Received ACKNACK [2] from reader R-K_800000 for 38 +0
    

    Sample 38 was in the flow-controller queue for 381 ms before being sent to all the matching DataReaders. DataReader 800000 acknowledged the sample after 2 ms.

  4. We can get information about the entities in our applications, but also about the remote entities discovered. This information is at the beginning of the parsed file:

    |           |  | Created participant, domain: 206 index: 0
    |           |  | Created publisher
    |           |  | Created topic, name: 'Example async', type: 'async'
    |           |  | Created writer for topic 'Example async'
    ...
    | H3.A2     |  | Discovered new participant (H3.A2.P1)
    | H3.A2.P1  |  | Discovered new reader R-K_800000
    
  5. For simplicity, we assign an incremental number for the hosts, applications, and participants. H3.A2.P1 means first participant discovered, in second application discovered, from third host machine discovered. At the end is a summary of the entities with their IP addresses:

    ### Assigned names:
    * Host H3: 10.70.2.213
        * App H3.A1: 06076
            * Participant H3.A1.P1: 1
        * App H3.A2: 01264
            * Participant H3.A2.P1: 1
    * Host H1: 239.255.0.1
    * Host H2: 127.0.0.1
    Number of hosts: 3
    Number of apps:  2
    Number of participants: 2
    

    Network statistics:

    ### Bandwidth statistics:
    * Address: SHMEM
        * receive: 18.61 MB (3.72 MB/s)
    
    ### Packet statistics:
    * GUID: H3.A1
        * receive: 14 packets
            * INFO_TS: 7 (50.0%)
            * DATA: 7 (50.0%)
    * GUID: W-K_800000
        * send: 23 packets
            * DATA: 20 (87.0%)
            * PIGGYBACK HB: 3 (13.0%)
    

    Threads information:

    ### Threads Information:
    * Number of threads: 5
    * Receive threads
        * Number of threads: 2
        * Thread name: rR0120617bc01
            * ID: -1
            * Priority: 2
            * Stack size: 11
    
  6. The argument --show-ip always shows the IP and port address instead of the alias H3.A2.P1. The argument --show-lines shows a new column with the equivalent log line in the file and the current output line.

  7. You can get extra information about the discovery phase by adding the argument -v. For instance:

     H3.A2   |      |                | Assert participant (H3.A2.P1 PARTICIPANT)
    H3.A2.P1 | ---> | SED_SUB_READER | Received HB [1] from writer SED_SUB_WRITER for samples in [1, 1]
    H3.A2.P1 | <--- | SED_SUB_READER | Sent NACK [2] to writer SED_SUB_WRITER for 1 count 1
    H3.A2.P1 | ---> | SED_PUB_READER | Received HB [0] from writer SED_PUB_WRITER for samples in [1, 0]
    H3.A2.P1 | ---> | SED_SUB_WRITER | Received ACKNACK [0] from reader SED_SUB_READER for 0 +0
             | <--- | SED_SUB_WRITER | Sent piggyback HB [2] from synchronous reparation to acknowledge samples in [1, 0]
    H3.A2.P1 | ---> | SED_PUB_WRITER | Received ACKNACK [0] from reader SED_PUB_READER for 0 +0
             | <--- | SED_PUB_WRITER | Sent piggyback HB [2] from synchronous reparation to acknowledge samples in [1, 1]
    H3.A2.P1 | ---> | SED_SUB_READER | Received DATA(r) [1] from writer SED_SUB_WRITER (reliable)
    H3.A2.P1 |      |                | Discovered new reader R-K_800000
    H3.A2.P1 |      |                | Assert entity R-K_800000
    
  8. More verbose output with transport related information is shown with the -vv flag. For instance:

    SHMEM:(206.1 UsUn) | <--- | | Sent data
    SHMEM              | ---> | | Received 64 bytes
    SHMEM              | ---> | | Received 64 bytes
    H3.A2              | ---> | | Received INFO_DST packet
    H3.A2              | ---> | | Received ACK packet
    
  9. If your log contains sensitive information like topic names or IP address, you can obfuscate them with the --obfuscate flag. It will use a random seed in every run. To force a seed and have consistent results, use the the --salt argument. For instance: python rtilogparser.py -i tutorial/logs/log1.txt --obfuscate --salt 1234

    | | | Created topic, name: 'd3f54b1ef13dea39c58ecaba38aed042', type: 'd5dcd65115ef85d94a2f0e407b5601c7'
    | | | Created writer for topic 'd3f54b1ef13dea39c58ecaba38aed042'
    ...
    ### Assigned names:
    * Host H1: bad79e871897ebf
    * Host H3: d8bbfc4ecabc704
        * App H3.A2: b858b
    
  10. Log Parser is able to get log input from the standard input, so you can use pipes. For instance, consider that the Routing Service configuration increased the log verbosity level; you can do: rtiroutingservice -cfgFile myconfig.xml -cfgName myconfig | python rtilogparser.py.

  11. Finally, you can redirect the output with the --output argument. The output is in Markdown format and can be renderized with editors like Atom.

Take into account that this log is from two applications communicating through Shared Memory. Out-of-the-box it's not possible to use Wireshark to get all this information

Part 2 - Validating an application: warnings and errors

The second part of this tutorial covers how to leverage Log Parser to validate the usage of the middleware. Log Parser is able to detect common errors and mistakes in the usage of the API and network issues. This tutorial uses the log file logs/log2.txt, which contains log entries from several applications. The output from Log Parser is logs/parsed2.md.

  1. Run Log Parser as follows: python rtilogparser.py -i tutorial/logs/log2.txt --colors --show-timestamp. Log Parser will be able to detect the following warning and errors:

    ## Warnings:
     0. 1x [LP-21] Decreased message_size_max for UDPv4 from 65530 to 65507
     1. 1x [LP-20] The OS limits the receive socket buffer size from 1048576 to 425984 bytes
     2. 1x System clock went backward by 1:00:02.182831.
     3. 1x System clock went forward by 0:59:58.269157.
     4. 1x Missing sample from H1.A1.P1.W-K_800000 to R-K_800000
     5. 1x [LP-11] DataReader exceeded resource limits
     6. 1x [LP-10] DataWriter exceeded resource limits
     7. 1x 2 discovery samples lost for publication SED_PUB_READER (2 in total)
    
    ## Errors:
     0. 1x Durability QoS for local reader (TransientLocal) is incompatible with remote writer (Volatile)
     1. 1x [LP-8] Cannot serialize sample
     2. 1x [LP-13] Write maximum blocking time expired
     3. 1x [LP-5] Try to get key from unkeyed type.
     4. 1x [LP-14] Cannot write because DataWriter has been deleted
     5. 1x [LP-3] Cannot write unregistered instance.
    

    Some of these problems are:

    1. The QoS was configured to use a UDPv4 message size higher than the maximum size allowed by the protocol.
    2. The QoS was configured to use a higher socket buffer size than the hard limit in the OS.
    3. The system clock changed at run-time by one hour.
    4. There are some sample losses in the network.
    5. DataReader and DataWriter reached the resource limits.
    6. Incompatible entities were found.
    7. There was a problem serializing samples.
    8. The program tried to retrieve a key from a non-keyed sample.
    9. write() was called on a deleted DataWriter.
  2. Most of the errors have an ID code associated with them, such as LP-21. This means that there is more information about this specific message like why it happened? and how to fix it? in the file MessageCodes.md at the top level of this repository. In the case of LP-21, the detailed information is:

    The transport X reduced the value for the property message_size_max from Y to Z. The reason is that the property is greater than the maximum payload possible for the transport. For instance, consider the UDPv4 protocol, the maximum payload is 65535 - 8 (UDP header) - 20 (min IP header) = 65507. The middleware gets the protocols overheads from the property protocol_overhead_max.

Part 3 - Reviewing RTI Connext DDS Micro logs

Connext DDS Micro can be configured to print any Info, Warning, or Error message. These messages, although useful, can be tricky to interpret since the information is given by numbers that need to be searched in the Connext DDS Micro User's Manual. (You can read how to interpret Micro logs at this link.) Conveniently, Log Parser directly shows this information from the documentation. For instance, given the following log:

[943921909.645099999]ERROR: ModuleID=7 Errcode=200 X=1 E=0 T=1
dds_c/DomainFactory.c:163/DDS_DomainParticipantFactory_get_instance: kind=19

RTI LogParser will show:

Error: [DDSC_LOG_OBJECT_INITIALIZE_EC] Out of resources to initialize object of the specified kind.

which is the same information from the Connext DDS Micro User's Manual (Module 7, searching for error code (200).

Support for trace logs is on the road-map (see issue [#41] (#41) for more information).