Improve the CAS error message generated when an IOC gets port-scanned #128

jesusvasquez333 · 2021-03-09T19:30:34Z

I'm copying here the issue published in the codeathon ideas list: improve-the-cas-error-message-generated-when-an-ioc-gets-port-scanned

Read this tech-talk thread.

The code that needs changing is in camessage.c::camessage(). In the case described in tech-talk above, checking the condition msg.m_cmmd < NELEMENTS(tcpJumpTable) would have indicated that the TCP socket was not opened by a real CA client. There may be better ways to distinguish the first message received over TCP from a pre-3.12.0-beta1 client, but the IOC should attempt to detect non-CA clients and report them as such.

jesusvasquez333 · 2021-03-09T22:00:33Z

Taking a look the original post, the error message was:

CAS: request from 146.137.70.29:47932http://146.137.70.29:47932/ => CAS: Client version too old
CAS: Request from 146.137.70.29:47932http://146.137.70.29:47932/ => cmmd=18245 cid=0x54502f31 type=12064 count=18516 postsize=21536
CAS: Request from 146.137.70.29:47932http://146.137.70.29:47932/ => available=0x2e310d0a N=0 paddr=0000000000000000

Printed by this function. Putting back together the standard message header, defined here, the header content was (in hex):

0x474554202F20485454502f312e310d0a

which converted to ASCII is:

GET / HTTP/1.1

So, in that case the error was produce by an HTTP GET command, and not by a port scanner.

So, the issue is now easy to reproduce. Just try to open http connection pointing to an EPICS server TCP port.
I did it for example using links (curl also works) from host 172.16.125.3, pointing it to an IOC running on host 172.16.125.2, and using port 36533:

$ links -source http://172.16.125.2:36533

0����GET / HTTP/1.1
CAS: Client version 0 too old

This generates almost an identical error message in the EPICS IOC:

epics> CAS: request from 172.16.125.3:43680 => CAS: Client version too old
CAS: Request from 172.16.125.3:43680 => cmmd=18245 cid=0x54502f31 type=12064 count=18516 postsize=21536
CAS: Request from 172.16.125.3:43680 =>   available=0x2e310d0a 	N=0 paddr=(nil)

mdavidsaver · 2021-03-09T22:13:23Z

So, in that case the error was produce by an HTTP GET command, and not by a port scanner.

FYI. security scanners like to look for common services on uncommon ports. So I suspect this was a scanner looking for unauthorized HTTP servers.

mdavidsaver · 2021-03-09T22:22:11Z

Also, an example of my PVXS library logging a short PVA message:

2021-03-09T14:15:31.665466604 DEBUG pvxs.udp.io UDP 0x7f4cc0001710 event 2
0000 : CA02C000 00000027 DC2B10D3 08ADA8C0
0010 : A6BC2225 00000001 00000000 00000000
0020 : 0000FFFF 00000000 13D30374 6370FF

Feel free to borrow (or not) from xerrlogHexPrintf() https://github.com/mdavidsaver/pvxs/blob/32debd3e40921dc85eb14689b599cffac1e994d1/src/log.cpp#L250-L283 I don't think I'm doing anything which requires C++ in this function. Note that callers to this function limit the size being printed since I don't want to print a large array update as hex.

jesusvasquez333 · 2021-03-09T22:35:51Z

FYI. security scanners like to look for common services on uncommon ports. So I suspect this was a scanner looking for unauthorized HTTP servers.

That's a good point. That would make more sense that a real HTTP connection.

marciodo · 2022-05-10T23:09:57Z

I've read through this issue, PR 141, and the EPICS tech-talk post. I've built the latest 7.0 and tested it with the links suggestion from Jesus' previous work, to confirm that the Client version too old message is still there. I've built Jesus' PR and tested it with links. A message shows up only if CASDEBUG=1: CAS: Invalid command rejected.

It is not clear to me what is missing for having this PR merged.

marciodo · 2022-05-11T18:17:00Z

After talking with @mdavidsaver, these were the points brought up:

Although the comments in the code explains the payload size tested against, we could have it explicit as a subtraction in code to increase understanding (lines 2483 and 2501 of camessage.c).
Line 2470: there's a concern regarding msg.m_count==0 as a test. Michael Davidsaver thinks that it may be ok, but we could test this case.
Test the pull request against an old version of caget to a big waveform record to see if nothing was broken.

marciodo · 2022-05-13T03:55:34Z

I think that I've found a problem. I'm testing an IOC with the code in PR 141, containing the following record:

record (waveform, "waves")
{
        field(NELM, "17000")
        field(FTVL, "SHORT")
}

I've caput to it with 1000 values, using this bash script:

elements=""

for i in {1..1000}
do
elements+="${i} "
done
caput -a waves $i $elements

The IOC complained and answered back to caput with: CAS: Invalid payload size rejected

caput requested a write command using the standard protocol, but if you see the contents of the header:

(gdb) p msg
$2 = {m_postsize = 40000, m_count = 1000, m_cid = 1, m_available = 1, m_dataType = 0, m_cmmd = 4}

The payload size is 40000, but caput used the standard protocol, not the extended one. From this document - https://epics.anl.gov/base/R3-16/0-docs/CAproto/index.html - I see

Total size of an individual message is limited. With CA versions older than CA_V49, the maximum message size is limited to 16384 (0x4000) bytes. Out of these, header has a fixed size of 16 (0x10) bytes, with the payload having a maximum size of 16368 (0x3ff0) bytes.

For compatibility, extended message form should only be used if payload size exceeds the pre-CA_V49 message size limit of 16368 bytes.

I've tested other values for caput and it only switches from standard to extended protocol when the payload has more than 65536 bytes.

Should we change the description in the protocol specification to officialy allow standard protocol with more than 16368 bytes?

Should we, instead, fix EPICS base to match the protocol description and increase the minor version to have an additional condition in a way of maintaining compatibility with other EPICS versions?

mdavidsaver · 2022-05-13T20:01:11Z

cf.

epics-base/modules/ca/src/client/comQueSend.cpp

Line 285 in 9ebd841

if ( payloadSize < 0xffff && nElem < 0xffff ) {

mdavidsaver · 2022-05-13T20:03:55Z

Good catch!

marciodo · 2022-05-13T23:12:49Z

As we’ve identified that the payload size is allowed to have any number, it can’t be used alone to identify a valid CA message. Going back to the original suggestion: check the command number to identify a valid message. @mdavidsaver suggested using CA_PROTO_LAST_CMMD + 16 to make room for newer clients with more commands to have their message identified as valid. This would currently prevent 1-(28+16)/65536 = 99.93% of non-CA clients messages that send something random in the command field.

For the remaining 0.07% of the cases, we can create a combination between the command field with another field to limit the chances of a non-CA client having a message recognized as CA valid.

In pseudo-code, this is what I’m planning:

if msg.command > CA_PROTO_LAST_CMMD + 16 then
reject and close connection

case msg.command
CA_PROTO_VERSION: if msg.payloadsize != 0 then reject and close connection
CA_PROTO_EVENT_ADD: if msg.payloadsize != 16 then reject and close connection
CA_PROTO_EVENT_CANCEL: if msg.payloadsize != 0 then reject and close connection
etc.

Please, see if you see any catches with this idea and if it is ok to implement it.

anjohnson · 2022-05-14T00:53:52Z

I don't remember what the state is that you're detecting, so don't assume that what I suggest below is sensible/possible.

Don't we have a good idea what message(s) all our past clients send at the start of a new TCP connection? I would have thought we could reject anything that doesn't start with one of those first messages, rather than potentially allowing through any message. For instance I don't see how a real client could ever send a CA_PROTO_EVENT_CANCEL early in its life; it would have to have sent other messages that created an event before it could cancel one.

There might be several possible such messages, but I would have thought this could pare down the allowed list. I'm sure if I'm wrong Michael will be able to explain how/why though.

Thanks, these messages do tend to scare our users a little so I'd love to get rid of them.

marciodo · 2022-05-14T18:55:05Z

Andrew, the original suggestion and the subsequent solution implementation was focusing on every message arriving at the IOC, no matter if it was from a client starting the communication or continuing it.

Your suggestion would imply that, if the message is not the starting point of a communication, we would just need to check if the client ID is already registered, right? Then, the only test for validity of the header format would be applied for the initial command.

I had already finished a solution in PR 141 when your message came, and I ended up pushing it. I'm happy to change everything to implement another idea if you want. This is being a nice learning experience to me.

anjohnson · 2022-05-15T02:31:41Z

Hi Márcio, no don't change what you've already completed. I don't know the code as well as Michael does so my suggestion might not even be possible, although if it is we might want consider doing it as well. I will defer any decision on the idea to him.

mdavidsaver · 2022-05-15T18:03:02Z

For the remaining 0.07% of the cases ...

I'm not sure if trying to do so much validation early in camessage() is the right approach. The state needed to fully validate becomes available as a message is processed. For example, after channel ID lookup, it is straightforward to catch an inconsistency between DBR + count and message size.

I worry we have unconsciously allowed the scope of this PR to creep. Remember that the goal is to reduce log noise. Simply omitting errlogPrintf() in some cases would be sufficient. Any added message validation is a bonus, which must be balanced against the risk of regression. (As @marciodo has thankfully caught this time, but which we didn't previously.)

marciodo · 2022-05-18T21:00:37Z

Ok, @mdavidsaver. I agree with you. I'll go back to origins and flag the CA message as invalid only if command > CA_PROTO_LAST_CMMD + 16.

Please, just confirm to me if the message should be printed on the IOC shell only if CASDEBUG != 0.

anjohnson assigned jesusvasquez333 Mar 9, 2021

anjohnson added the Codeathon label Mar 9, 2021

jesusvasquez333 linked a pull request Mar 11, 2021 that will close this issue

Improve message sanity checks in camessage.c #141

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the CAS error message generated when an IOC gets port-scanned #128

Improve the CAS error message generated when an IOC gets port-scanned #128

jesusvasquez333 commented Mar 9, 2021 •

edited

jesusvasquez333 commented Mar 9, 2021

mdavidsaver commented Mar 9, 2021

mdavidsaver commented Mar 9, 2021

jesusvasquez333 commented Mar 9, 2021

marciodo commented May 10, 2022

marciodo commented May 11, 2022

marciodo commented May 13, 2022 •

edited

mdavidsaver commented May 13, 2022

mdavidsaver commented May 13, 2022

marciodo commented May 13, 2022 •

edited

anjohnson commented May 14, 2022

marciodo commented May 14, 2022

anjohnson commented May 15, 2022

mdavidsaver commented May 15, 2022

marciodo commented May 18, 2022

Improve the CAS error message generated when an IOC gets port-scanned #128

Improve the CAS error message generated when an IOC gets port-scanned #128

Comments

jesusvasquez333 commented Mar 9, 2021 • edited

jesusvasquez333 commented Mar 9, 2021

mdavidsaver commented Mar 9, 2021

mdavidsaver commented Mar 9, 2021

jesusvasquez333 commented Mar 9, 2021

marciodo commented May 10, 2022

marciodo commented May 11, 2022

marciodo commented May 13, 2022 • edited

mdavidsaver commented May 13, 2022

mdavidsaver commented May 13, 2022

marciodo commented May 13, 2022 • edited

anjohnson commented May 14, 2022

marciodo commented May 14, 2022

anjohnson commented May 15, 2022

mdavidsaver commented May 15, 2022

marciodo commented May 18, 2022

jesusvasquez333 commented Mar 9, 2021 •

edited

marciodo commented May 13, 2022 •

edited

marciodo commented May 13, 2022 •

edited