Performance #10

johanfforsberg · 2015-10-28T23:44:08Z

While investigating the possibility of using cpppo in "production" to read thousands of tags as quickly as possible, I've noticed that the performance is limited by CPU usage and not network or PLC. Some testing with PyPy showed a significant increase in throughput (similar to the library we're using now, which is written in C) but still apparently limited by the CPU.

What are your thoughts about performance? Have you considered options like Cython for optimizing "bottlenecks"?

pjkundert · 2015-10-29T22:44:03Z

Yes, I've noticed that there is a significant performance constraints in the processing of the "client" side of the EtherNet/IP CIP requests. I haven't looked into it in too much detail. I've noticed that PyPy seems to help, but it really shouldn't be that slow. I'll take a look, too, and see what I can find out.

Let's take a look at the mix of requests you're trying to parse, and get this code tightened up for you; you should be able to use this efficiently in production. I'm on my way back from Munich over the next 36 hours, so I may not be immediately responsive...

johanfforsberg · 2015-10-30T11:18:28Z

Sounds great! I won't have access to a PLC until monday anyway so It'll have to wait if you want full info. But I've essentially been using the "getattr.py" script in server/enip to do the testing, and the tags were a mixed bunch of single (non-array) types.

johanfforsberg · 2015-11-04T15:50:23Z

OK I finally have some time to sit down with a PLC. It's a CompactLogix.

I'm running this command (these are all boolean tags):

python -m cpppo.server.enip.thruput -d 4 -m 420 -r 1000 -a w-kitslab-compactlogix-0 B_ProgDisable_C B_DigitalAlarmTag_C B_AutoChangeAlarmValue_C B_ProgAck_AD_C B_ProgAckAll_HB B_ProgEnable_HB B_Reset_C FB_ALMA01_AA.HHProgAck FB_ALMA01_AA.HHOperAck FB_ALMA01_AA.HProgAck FB_ALMA01_AA.HOperAck FB_ALMA01_AA.LProgAck FB_ALMA01_AA.LOperAck FB_ALMA01_AA.LLProgAck

... and the output, minus the very long slab of individual tag values, is:

14000 operations using 876 requests in   32.48s at pipeline depth  4; 431.0 TPS

I came up with the numbers for depth and multiple by experimentation; larger numbers either gave errors or did not increase performance noticably. I am wondering a bit about the -m number; normally we're able to use a request size of almost 500 bytes, but I don't know if this corresponds exactly to that number.

Running the same command with pypy (only increasing the -r to 10000 to account for warmup time) gives a much better result:

140000 operations using 8751 requests in  106.44s at pipeline depth  4; 1315.2 TPS

However in both tests the CPU is pegged at 100% suggesting that the bottleneck is not in the network or the PLC. A similar test using a library written in C (https://github.com/EPICSTools/ether_ip.git) gives performance roughly at the pypy level, but causing no measurable CPU load.

Tell me if you need more details about any of this, or any other interesting tests to perform.

datasim · 2015-11-04T16:01:05Z

Interesting. Well, the pypy test tells us that we are probably able to
achieve the bandwidth and/or PLC capacity limits of performance, but at
~100 CPU usage.

I've been working on a branch 'feature-performance' in the cpppo Git repo;
give that a try. It (so far) only gives me a ~5 to 10 percent
improvement. I'm still working on this; I can't put my finger on exactly
why parsing responses is still so expensive, but I am making progress.

-pjk

On Wed, Nov 4, 2015 at 8:50 AM, Johan Forsberg notifications@github.com
wrote:

OK I finally have some time to sit down with a PLC. It's a CompactLogix.

I'm running this command (these are all boolean tags):

python -m cpppo.server.enip.thruput -d 4 -m 420 -r 1000 -a w-kitslab-compactlogix-0 B_ProgDisable_C B_DigitalAlarmTag_C B_AutoChangeAlarmValue_C B_ProgAck_AD_C B_ProgAckAll_HB B_ProgEnable_HB B_Reset_C FB_ALMA01_AA.HHProgAck FB_ALMA01_AA.HHOperAck FB_ALMA01_AA.HProgAck FB_ALMA01_AA.HOperAck FB_ALMA01_AA.LProgAck FB_ALMA01_AA.LOperAck FB_ALMA01_AA.LLProgAck

... and the output, minus the very long slab of individual tag values, is:

14000 operations using 876 requests in 32.48s at pipeline depth 4; 431.0 TPS

I came up with the numbers for depth and multiple by experimentation;
larger numbers either gave errors or did not increase performance
noticably. I am wondering a bit about the -m number; normally we're able to
use a request size of almost 500 bytes, but I don't know if this
corresponds exactly to that number.

Running the same command with pypy (only increasing the -r to 10000 to
account for warmup time) gives a much better result:

140000 operations using 8751 requests in 106.44s at pipeline depth 4; 1315.2 TPS

However in both tests the CPU is pegged at 100% suggesting that the
bottleneck is not in the network or the PLC. A similar test using a library
written in C (https://github.com/EPICSTools/ether_ip.git) gives
performance roughly at the pypy level, but causing no measurable CPU load.

Tell me if you need more details about any of this, or any other
interesting tests to perform.

—
Reply to this email directly or view it on GitHub
#10 (comment).

pjkundert · 2016-11-30T17:40:12Z

I can get up to 300 TPS using CPython2/3, up to 700 TPS using pypy now, on my i7 Mac. Still work to do, but performance is probably no longer at the top of the priority list...

# This is the 1st commit message: Initial foray in to support for generic CIP Service Code requests # This is the commit message #2: No requirement for existence of .multiple segment in failed responses # This is the commit message #3: Correct handling of service_code operations in client connector I/O # This is the commit message #4: HART Requests almost working o Cannot derive HART from Logix; service codes overlap # This is the commit message #5: Initial working HART I/O card request # This is the commit message #6: Support intermixed Tags and already parsed operation in parse_operations # This is the commit message #7: Test and decode the Read primary variable response, however: o Still broken; the CIP Encapsulation path is still suppsed to be to the Connection Manager @0x06/1! The 0x52 Route Path is Port 1, Address 2, and the message path should be to @0x035D/8. # This is the commit message #8: Success. Still needs cleanup # This is the commit message #9: Further attempts to refine HART pass-thru. o HART I/O card is not responding as defined in documentation # This is the commit message #10: Cleanups for python3, source analysis, unit tests # This is the commit message #11: Attempt to parse Read Dynamic Variables reply; 3 unrecognzied bytes? # This is the commit message #12: Update to attempt to parse real HART I/O card response o Minimal Read Dynamic Variables status response? Not successful o Implement minimal simulated pass-thru Init/Query, HART commands 1,2,3 o Minor changes to client.py Send RR Data, to have timeout and ticks compatible with RSLogix; no difference

pjkundert closed this as completed Nov 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance #10

Performance #10

johanfforsberg commented Oct 28, 2015

pjkundert commented Oct 29, 2015

johanfforsberg commented Oct 30, 2015

johanfforsberg commented Nov 4, 2015

datasim commented Nov 4, 2015

pjkundert commented Nov 30, 2016

Performance #10

Performance #10

Comments

johanfforsberg commented Oct 28, 2015

pjkundert commented Oct 29, 2015

johanfforsberg commented Oct 30, 2015

johanfforsberg commented Nov 4, 2015

datasim commented Nov 4, 2015

pjkundert commented Nov 30, 2016