Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance #10

Closed
johanfforsberg opened this issue Oct 28, 2015 · 5 comments
Closed

Performance #10

johanfforsberg opened this issue Oct 28, 2015 · 5 comments

Comments

@johanfforsberg
Copy link

While investigating the possibility of using cpppo in "production" to read thousands of tags as quickly as possible, I've noticed that the performance is limited by CPU usage and not network or PLC. Some testing with PyPy showed a significant increase in throughput (similar to the library we're using now, which is written in C) but still apparently limited by the CPU.

What are your thoughts about performance? Have you considered options like Cython for optimizing "bottlenecks"?

@pjkundert
Copy link
Owner

Yes, I've noticed that there is a significant performance constraints in the processing of the "client" side of the EtherNet/IP CIP requests. I haven't looked into it in too much detail. I've noticed that PyPy seems to help, but it really shouldn't be that slow. I'll take a look, too, and see what I can find out.

Let's take a look at the mix of requests you're trying to parse, and get this code tightened up for you; you should be able to use this efficiently in production. I'm on my way back from Munich over the next 36 hours, so I may not be immediately responsive...

@johanfforsberg
Copy link
Author

Sounds great! I won't have access to a PLC until monday anyway so It'll have to wait if you want full info. But I've essentially been using the "getattr.py" script in server/enip to do the testing, and the tags were a mixed bunch of single (non-array) types.

@johanfforsberg
Copy link
Author

OK I finally have some time to sit down with a PLC. It's a CompactLogix.

I'm running this command (these are all boolean tags):

python -m cpppo.server.enip.thruput -d 4 -m 420 -r 1000 -a w-kitslab-compactlogix-0 B_ProgDisable_C B_DigitalAlarmTag_C B_AutoChangeAlarmValue_C B_ProgAck_AD_C B_ProgAckAll_HB B_ProgEnable_HB B_Reset_C FB_ALMA01_AA.HHProgAck FB_ALMA01_AA.HHOperAck FB_ALMA01_AA.HProgAck FB_ALMA01_AA.HOperAck FB_ALMA01_AA.LProgAck FB_ALMA01_AA.LOperAck FB_ALMA01_AA.LLProgAck

... and the output, minus the very long slab of individual tag values, is:

14000 operations using 876 requests in   32.48s at pipeline depth  4; 431.0 TPS

I came up with the numbers for depth and multiple by experimentation; larger numbers either gave errors or did not increase performance noticably. I am wondering a bit about the -m number; normally we're able to use a request size of almost 500 bytes, but I don't know if this corresponds exactly to that number.

Running the same command with pypy (only increasing the -r to 10000 to account for warmup time) gives a much better result:

140000 operations using 8751 requests in  106.44s at pipeline depth  4; 1315.2 TPS

However in both tests the CPU is pegged at 100% suggesting that the bottleneck is not in the network or the PLC. A similar test using a library written in C (https://github.com/EPICSTools/ether_ip.git) gives performance roughly at the pypy level, but causing no measurable CPU load.

Tell me if you need more details about any of this, or any other interesting tests to perform.

@datasim
Copy link

datasim commented Nov 4, 2015

Interesting. Well, the pypy test tells us that we are probably able to
achieve the bandwidth and/or PLC capacity limits of performance, but at
~100 CPU usage.

I've been working on a branch 'feature-performance' in the cpppo Git repo;
give that a try. It (so far) only gives me a ~5 to 10 percent
improvement. I'm still working on this; I can't put my finger on exactly
why parsing responses is still so expensive, but I am making progress.

-pjk

On Wed, Nov 4, 2015 at 8:50 AM, Johan Forsberg notifications@github.com
wrote:

OK I finally have some time to sit down with a PLC. It's a CompactLogix.

I'm running this command (these are all boolean tags):

python -m cpppo.server.enip.thruput -d 4 -m 420 -r 1000 -a w-kitslab-compactlogix-0 B_ProgDisable_C B_DigitalAlarmTag_C B_AutoChangeAlarmValue_C B_ProgAck_AD_C B_ProgAckAll_HB B_ProgEnable_HB B_Reset_C FB_ALMA01_AA.HHProgAck FB_ALMA01_AA.HHOperAck FB_ALMA01_AA.HProgAck FB_ALMA01_AA.HOperAck FB_ALMA01_AA.LProgAck FB_ALMA01_AA.LOperAck FB_ALMA01_AA.LLProgAck

... and the output, minus the very long slab of individual tag values, is:

14000 operations using 876 requests in 32.48s at pipeline depth 4; 431.0 TPS

I came up with the numbers for depth and multiple by experimentation;
larger numbers either gave errors or did not increase performance
noticably. I am wondering a bit about the -m number; normally we're able to
use a request size of almost 500 bytes, but I don't know if this
corresponds exactly to that number.

Running the same command with pypy (only increasing the -r to 10000 to
account for warmup time) gives a much better result:

140000 operations using 8751 requests in 106.44s at pipeline depth 4; 1315.2 TPS

However in both tests the CPU is pegged at 100% suggesting that the
bottleneck is not in the network or the PLC. A similar test using a library
written in C (https://github.com/EPICSTools/ether_ip.git) gives
performance roughly at the pypy level, but causing no measurable CPU load.

Tell me if you need more details about any of this, or any other
interesting tests to perform.


Reply to this email directly or view it on GitHub
#10 (comment).

@pjkundert
Copy link
Owner

I can get up to 300 TPS using CPython2/3, up to 700 TPS using pypy now, on my i7 Mac. Still work to do, but performance is probably no longer at the top of the priority list...

pjkundert added a commit that referenced this issue Jan 30, 2018
# This is the 1st commit message:

Initial foray in to support for generic CIP Service Code requests

# This is the commit message #2:

No requirement for existence of .multiple segment in failed responses

# This is the commit message #3:

Correct handling of service_code operations in client connector I/O

# This is the commit message #4:

HART Requests almost working
o Cannot derive HART from Logix; service codes overlap

# This is the commit message #5:

Initial working HART I/O card request

# This is the commit message #6:

Support intermixed Tags and already parsed operation in parse_operations

# This is the commit message #7:

Test and decode the Read primary variable response, however:
o Still broken; the CIP Encapsulation path is still suppsed to be to the
  Connection Manager @0x06/1!  The 0x52 Route Path is Port 1, Address 2,
  and the message path should be to @0x035D/8.

# This is the commit message #8:

Success.  Still needs cleanup

# This is the commit message #9:

Further attempts to refine HART pass-thru.
o HART I/O card is not responding as defined in documentation

# This is the commit message #10:

Cleanups for python3, source analysis, unit tests

# This is the commit message #11:

Attempt to parse Read Dynamic Variables reply; 3 unrecognzied bytes?

# This is the commit message #12:

Update to attempt to parse real HART I/O card response
o Minimal Read Dynamic Variables status response?  Not successful
o Implement minimal simulated pass-thru Init/Query, HART commands 1,2,3
o Minor changes to client.py Send RR Data, to have timeout and ticks
  compatible with RSLogix; no difference
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants