Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading to PLC breaks tags #64

Closed
xinthose opened this issue Mar 15, 2018 · 27 comments
Closed

Downloading to PLC breaks tags #64

xinthose opened this issue Mar 15, 2018 · 27 comments
Labels
bug

Comments

@xinthose
Copy link
Contributor

@xinthose xinthose commented Mar 15, 2018

Hello Kyle. I have noticed with Micrologix PLC's that sometimes when you download to the PLC or go live with it, it breaks all of the library's tag connections. I cannot auto-recover from this state (close all tags and re-create) until the program is restarted. How can I resolve this issue without restarting the code?

When I try to do a read on a tag (i.e. protocol=ab_eip&gateway=10.0.0.3&path=1&cpu=mlgx&elem_size=4&elem_count=1&name=myFloatArray) I get this error: PLCTAG_ERR_REMOTE_ERR after a new program has been downloaded to it.

Thank you.

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Mar 16, 2018

I suspect that the PLC is essentially rebooting when you change mode. That will cause it to lose all connections.

If your tags are handled by different threads, you need to make sure that they all close at the same time. If even one tag is still open, it will keep the TCP socket open to the PLC. I had plenty of problems with this until I came up with the scheme below.

What I do is have a shared flag for reconnect. Each thread looks at that. If one thread gets an error indicating that the connection to the PLC is down or bad, it sets the flag. The way I do this is a bit of a trick. I make the flag a timeout time. As long as the time is in the past, each thread keeps going. When the time is in the future, the thread immediately disconnects (plc_tag_destroy) until the time is in the past and the it reconnects.

This ensures that all tags have released the TCP connection (the EIP session). That closes it. The next time that a tag tries to connect, the TCP connection is set up again.

I intentionally made reconnect part of the application logic, not the library. I think, in retrospect, that may have been a mistake.

If that does not fix the problem, please add the debug level to the connection string and send me the output. Maybe there is something else going on.

It is possible that the PLC is simply dropping out completely and the OS on the system running the library does not realize that the TCP socket is dead.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Mar 16, 2018

OK. Thank you sir. I do destroy all tags before recreating them. Also, in my scenario, I have two C++ programs (one dedicated for SCADA web page) connecting to the same PLC from the same Ubuntu PC. Is this not recommended, because one program could have tags open while the other has them all closed, trying to recover from this PLC reboot error?

I can verify that the cpu mlgx and path=1 works with an older MicroLogix 1100 SLC.

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Mar 16, 2018

Can you verify that all tags are closed before the first one reopens? The library tries very hard, perhaps too hard, to limit the number of TCP connections used since some of these PLCs only have resources for 32 (or sometimes fewer!) TCP connections.

If that is verified, can you save a debug trace? It sounds like something is still hanging around when it should not be. If there is something odd happening with the socket on the PC side, it is possible that something is hanging long enough for the first reconnect to find the old connection and reuse it. There is not that much code there, but it is possible that there is a bug of some sort.

Two programs should be fine (as long as they are not doing some savage and unnatural shared memory thing). They will end up with different TCP connections. You should be able to verify that via lsof or other tools on Linux. On Windows, I am not sure what you would check. Look for connections to port 44818 to the PLC IP address.

How long do you wait before reconnecting?

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Mar 17, 2018

See issue #65.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Mar 17, 2018

I can verify that I close all tags before the first one opens. I can recover from unplugging the ethernet cable from the PLC, just not a download. I will try to get a debug trace. The current application I'm talking about is in the field in production.

The reason I have not raised this issue in the past is because usually a PLC in production never has its program changed.

I do not share memory between my programs (I never knew that was possible, lol).

I wait 10 seconds between tries: close all tags, create a test tag, test read, test write, close test tag, reinitialize my normal tags.

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Mar 17, 2018

What you describe with the tags should do it. It sounds like there is something else going on. Is this something you can simulate in the lab? I do not have a MicroLogix, so I cannot simulate this. I do have a PLC/5. However that has the Ethernet in a separate module, so restarting the CPU part may not do anything to the connection.

Let me know if you can get a debug trace. Add "&debug=4" to the tag string.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Mar 17, 2018

OK. Will do. Thank you sir.

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Mar 18, 2018

What OS is the library running on?

I am trying to figure out how this could happen. I have done tests with pulling Ethernet cords and turning off PLCs in midstream. Nothing seems to replicate this problem so I must be doing something different than what is happening to your systems. I am using Linux (Ubuntu 17.10) against a ControlLogix and a PLC/5.

The problem must be somehow in the library or in the interface between the library and the OS. I see no other way that bouncing the program would help. The remote error is unfortunately somewhat useless. I need to decode more of the errors returned to see what is happening. There are a lot of places that will return that generic error.

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Mar 18, 2018

Do you first start getting timeouts or does it go straight to the remote error after the program update?

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Mar 23, 2018

I have been asking around to get a MicroLogix. The closest I am coming is a SLC 500. That is probably not going to be good enough as the Ethernet is separate on that. But if I can get it, I will try to recreate the problem.

@kyle-github kyle-github added the bug label Mar 23, 2018
@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Mar 23, 2018

I am going to call this a bug until proven otherwise.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Mar 31, 2018

Hello. I have not been able to re-simulate the error at our office. The OS used is Ubuntu 16.04 LTS. I think it goes straight to the remote error after the PLC download. I think it is related to multiple programs connected to the same PLC. I saw the error with a MicroLogix 1100 but have seen it on other Allen Bradley PLC's before.

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Mar 31, 2018

Hmm, could it be due to some problem with resources? Perhaps the MicroLogix is running out of TCP connections?

I found this discussion about MicroLogix limits:

http://www.plctalk.net/qanda/showthread.php?t=109623

This seems to indicate that there are only 16 connections possible on a MicroLogix 1400!

How many programs are connecting to the same PLC? Could this be hitting the limit?

I will see if I can find a cheap MicroLogix on EBay or somewhere.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Apr 6, 2018

I have four programs connecting to the PLC (grand total of 24 tags). Everything works fine during normal operation. If we were selling a MicroLogix, I would give you a good price (we sometimes do on eBay).

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Jun 2, 2018

I was trying to recreate this again. It is definitely a bug.

I have a theory that this has to do with exactly how the PLC is rebooting and that process interacting with the PC OS's TCP stack.

The theory is this:

The PLC bounces when you program it and essentially erases all knowledge of the TCP connections that are currently open. However, it does not do this cleanly. Instead of sending TCP shutdown commands to the PC, it just silently goes away.

The PC OS TCP stack does not know that the PLC has had a sudden attack of amnesia. So, it continues to try to send packets.

Your code/the library code finally times out and you close all tags.

During the tag close, it tries to close the CIP connection first. I think I have this a more or less blocking. So, the thread that handles IO is actually blocked. This means that the underlying structures for the TCP session to the PLC (the library level) are still there when you start creating tags again.

The session data gets picked up by the new tags even though it is bad and then nothing works. Lather, rinse, repeat.

That's my theory.

This gives me a possible way to try to recreate this. It will take a bit of coding. If this is indeed the problem, it is going to take a bit of thought on how to work around it. Race conditions are really easy to create in that code.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Jun 2, 2018

OK. That sounds right. Thank you. How is version 2 going?

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Dec 26, 2018

Version 2.0 is out! Is there any way you can test this?

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Dec 26, 2018

Sure. I will try at my earliest convenience.

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Jan 14, 2019

I have been trying to fix a bug where if the remote system closes the TCP connection the library will throw a SIGPIPE. See issue #86. Could this be related. I have not had any luck getting this to repeat on demand.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Jan 14, 2019

That sounds good. Hopefully I can test 2.0 this week.

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Jan 21, 2019

There is a new fix in version 2.0.9 for a problem with remote TCP connection closure causing a crash bug (use-after-free). Please pick that one up, or a later version.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Jan 22, 2019

OK. Will do. Thank you.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Jan 24, 2019

I still get the fault (PLCTAG_ERR_REMOTE_ERR) the instant I download to the PLC (MicroLogix 1400 Series A) over Ethernet. Library version 2.0.10. Using RSLogix 500 Pro.

@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Jan 24, 2019

I would expect that you would still get an error. The remote connection is unexpectedly closing (as far as the library is concerned). However, it should not crash and you should (crosses fingers) be able to simply retry the read/write on the tag without closing and reopening it.

When an unexpected error arrives, the library will try to cleanly close down the connection (best effort, takes about 200ms), then waits about 5 seconds and tries to reopen the connection. If that succeeds, then any subsequent requests will go through.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Jan 24, 2019

Oh yah. It recovers fine. Good work! I will test it more in our shop.

@xinthose xinthose closed this Jan 24, 2019
@kyle-github

This comment has been minimized.

Copy link
Owner

@kyle-github kyle-github commented Jan 24, 2019

The reconnect strategy is basically the simplest thing that could work. It could be a lot smarter. I should retry almost immediately and then slowly use exponential back off to get to longer and longer retry intervals. That seems like it is feasible.

@xinthose

This comment has been minimized.

Copy link
Contributor Author

@xinthose xinthose commented Jan 24, 2019

I still close all tags and re-open them after any PLC tag error, but it does recover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.