Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor protocol version parsing bug #749

Closed
jimbobmcgee opened this issue Dec 9, 2020 · 11 comments
Closed

Minor protocol version parsing bug #749

jimbobmcgee opened this issue Dec 9, 2020 · 11 comments
Assignees
Milestone

Comments

@jimbobmcgee
Copy link

jimbobmcgee commented Dec 9, 2020

I was looking through the code around an error message I was receiving ("Server response does not contian SSH protocol identification"), and noticed that you parse the protocol identification out of the server's initial banner, using a Regex which looks to be based on section 4.2 of RFC4253. However, I think that section has been slightly misinterpreted.

Specifically, the RFC says:

4.2. Protocol Version Exchange

When the connection has been established, both sides MUST send an
identification string. This identification string MUST be

 SSH-protoversion-softwareversion SP comments CR LF

Since the protocol being defined in this set of documents is version
2.0, the 'protoversion' MUST be "2.0". The 'comments' string is
OPTIONAL. If the 'comments' string is included, a 'space' character
(denoted above as SP, ASCII 32) MUST separate the 'softwareversion'
and 'comments' strings. The identification MUST be terminated by a
single Carriage Return (CR) and a single Line Feed (LF) character
(ASCII 13 and 10, respectively).

...and the Regex at

private static readonly Regex ServerVersionRe = new Regex("^SSH-(?<protoversion>[^-]+)-(?<softwareversion>.+)( SP.+)?$", RegexOptions.Compiled);
says:

^SSH-(?<protoversion>[^-]+)-(?<softwareversion>.+)( SP.+)?$

The Regex expression looks for the literal characters SP (i.e. spaceSP) to separate the software version from the subsequent comments, while the RFC says that the SP in its syntax is actually just the literal space character (in the same way that CR LF is the literal carriage-return and newline, so the expression should probably read:

^SSH-(?<protoversion>[^-]+)-(?<softwareversion>.+)( .+)?$

I don't think it technically affects anyone, since I don't think you do anything with the third group of that Regex, anyway, but on the off-chance that someone does eventually intend to parse comments out of the banner info, you may want to clean it up (and might want to review if you are interpreting that SP syntax literally, anywhere else in your codebase).

In fact, given you don't actually use the third capture group in that Regex (and just need it to demarcate the end of the software version), you should be able to forego capturing it entirely (i.e. using (?:...) instead of (...)).

You probably do want to extract software version non-greedily, though (.+? instead of .+). As it stands right now, a banner returning SSH-2.0-alpha some string info here would parse with software version alpha some string info, with a comment of here, where it probably should parse as software version alpha and comment of some string info here.

As such, I might be inclined to go with:

^SSH-(?<protoversion>[^-]+)-(?<softwareversion>.+?)(?: .+)?$

As I say, I don't think it is the cause of my "Server response does not contian SSH protocol identification" issue -- that's more likely to be a disconnection from the server -- but I thought I'd flag it because I was in the neighbourhood and, you know, duty!

@drieseng
Copy link
Member

Thanks for reporting this. It indeed does not affect anyone, but I prefer to fix it anyway.
I'll refactor the protocol version exchange in a separate class to allow for easier unit testing.
I'll complete our implementation to also capture the 'comments' string:

^SSH-(?<protoversion>[^-]+)-(?<softwareversion>.+?)([ ](?<comment>.+))?$

Seems right?

@jimbobmcgee
Copy link
Author

@drieseng: If you are planning on capturing the comment and making a solid class, then, after further reading, I think I am settling on either of the following, depending on how lenient you want to be...

  1. Version 1: (?=(?<nul>[\x00])|\ASSH-(?<protoversion>[\x21-\x2c\x2e-\x7e]+)-(?<softwareversion>[\x21-\x2c\x2e-\x7e]+)(?:[\x20](?<comment>[^\x00\x0a\x0d]*?))?\z)
  2. Version 2: (?=(?<nul>[\x00])|\ASSH-(?<protoversion>[\x21-\x2c\x2e-\x7e]+)(?:-(?<softwareversion>[\x21-\x2c\x2e-\x7e]*?))?(?:[\x20](?<comment>[^\x00\x0a\x0d]*?))?\z)

Appreciably less-readable, but does provide the following:

  1. Support only for printable US ASCII (except whitespace and minus) in protoversion and softwareversion, as per the RFC

    Both the 'protoversion' and 'softwareversion' strings MUST consist of printable US-ASCII characters, with the exception of whitespace characters and the minus sign (-).

  2. Expressly does not allow for arbitrary characters (incl. newlines) to appear in protoversion and softwareversion, where both [^-]+ and .+/.+? would.

  3. I am not sure how forgiving you want to be if there is no softwareversion -- I am guessing it will depend on whether you've seen any in the wild:

    1. Version 1 requires both protoversion and softwareversion, with comment optional (strict)
    2. Version 2 requires only protoversion, with both softwareversion and comment optional (lenient)

    RFC says MUST, so Version 1 is the likely choice, but you might want to offer Version 2 for compatibility (maybe a Strict/Relaxed property?)

    When the connection has been established, both sides MUST send an identification string. This identification string MUST be

    SSH-protoversion-softwareversion SP comments CR LF
    
  4. Supports any character in the comment string, except the NUL character, or CR / LF.

    The null character MUST NOT be sent.

  5. Allows determining whether NUL character has been sent, to satisfy MUST NOT clause above. Validity test is therefore:

    var match = regex.Match(theInputString);
    var isValid = match.Success && !match.Groups["nul"].Success;
    
  6. Does not capture the trailing space within softwareversion when comment is empty (as opposed to .+? which technically does.

  7. Parses the start and end of the string, not a "line" within the string (see Regarding lines, below)

The tests I have looked at are (any wrapping in the table should be ignored)...

Version String v1 Match v1 protoversion v1 softwareversion v1 comment v2 Match v2 protoversion v2 softwareversion v2 comment
"SSH-2.0" No "" "" "" Yes "2.0" "" ""
"SSH-2.0-" No "" "" "" Yes "2.0" "" ""
"SSH-2.0 this is a comment" No "" "" "" Yes "2.0" "" "this is a comment"
"SSH-2.0- this is a comment" No "" "" "" Yes "2.0" "" "this is a comment"
"SSH-2.0-alpha" Yes "2.0" "alpha" "" Yes "2.0" "alpha" ""
"SSH-2.0-beta " Yes "2.0" "beta" "" Yes "2.0" "beta" ""
"SSH-2.0-gamma this is a comment" Yes "2.0" "gamma" "this is a comment" Yes "2.0" "gamma" "this is a comment"
"SSH-2.0-billsSSH_3.6.3q3" Yes "2.0" "billsSSH_3.6.3q3" "" Yes "2.0" "billsSSH_3.6.3q3" ""

I expect further testing around various strings you may have seen in the wild will be necessary. https://regex101.com/r/XKbjxI/1/ might provide a reasonable playground (although note that it is PCRE, so does not support any .NET-specific additions to expression syntax):

Regarding lines
Note that I am also assuming that the concept of a line, terminated by CRLF is handled higher in the code, so have not attempted to handle CR and/or LF in the expression. I don't know how strongly you are taking MUST, may and SHOULD in the RFC, but the spec appears contradictory when dealing with backward-compatibility:

4.2. Protocol Version Exchange
...
The identification MUST be terminated by a single Carriage Return (CR) and a single Line Feed (LF) character
(ASCII 13 and 10, respectively). Implementers who wish to maintain compatibility with older, undocumented versions of this protocol may want to process the identification string without expecting the presence of the carriage return character for reasons described in Section 5 of this document.
...
As such, an example of a valid identification string is

SSH-2.0-billsSSH_3.6.3q3<CR><LF>

This identification string does not contain the optional 'comments' string and is thus terminated by a CR and LF immediately after the 'softwareversion' string.
...

5.1. Old Client, New Server
...
In this mode, the server SHOULD NOT send the Carriage Return character (ASCII 13) after the identification string.

As such, if you are not already doing so, you may want to pre-process with .TrimEnd('\r', '\n') before handing off to the regex.

Regarding readability
You may also prefer to use pre-processing to check for NUL characters, rather than relying on the positive-lookahead + named-capture above. If so, something like:

const string vchars= @"[\x21-\x2c\x2e-\x7e]";
const string cchars = @"[^\x00\x0a\x0d]";
var strictPattern = $"\ASSH-(?<protoversion>{vchars}+)-(?<softwareversion>{vchars}+)(?:[\x20](?<comment>{cchars}*?))?\z";
var relaxedPattern = $"\ASSH-(?<protoversion>{vchars}+)(?:-(?<softwareversion>{vchars}*?))?(?:[\x20](?<comment>{cchars}*?))?\z";

if (inputString.Contains('\0')) {
    throw new SshConnectionException("Version information contains NUL characters")
}

inputString = inputString.TrimEnd('\r', '\n');

var match = isRelaxed 
          ? Regex.Match(inputString, relaxedPattern)
          : Regex.Match(inputString, strictPattern);

if (!match.Success) {
    throw new SshConnectionException("Could not parse Version information")
}

this.protocolVersion = match.Groups["protoversion"].Value;
this.softwareVersion = match.Groups["softwareversion"].Value;
this.comment = match.Groups["comment"].Value;

...probably self-documents intent better than the NUL-capturing expressions above.

@drieseng drieseng added this to the 2020.0.0-beta2 milestone Dec 31, 2020
@drieseng
Copy link
Member

@jimbobmcgee I've refactored and updated our implementation based on your feedback. However, I decided not to make our "validation" more strict with regards to the character encoding of identification string.

If you feel strongly about this, we can introduce it in the next major release and document it as a (possible) breaking change.
Don't hesitate to open a new issue for this.

@drieseng drieseng self-assigned this Dec 31, 2020
@jimbobmcgee
Copy link
Author

If you feel strongly about this...

@drieseng : Not at all, I was just exercising my RFC-reading and Regex-writing muscles, as they are both weak from underuse. The removal of the expected literal SP from the pattern is the minimum required here and, as you say, even that was not affecting anyone unexpectedly.

@drieseng
Copy link
Member

@jimbobmcgee Let me know if you come across other issues. Thx!

@mobstation
Copy link

mobstation commented Jan 11, 2021

Hello, after upgrading to 2020.0.0 we got the error message

Renci.SshNet.Common.SshConnectionException : The server response contains a null character at position 0x00000028:

  00000000  53 53 48 2D 32 2E 30 2D 4F 70 65 6E 53 53 48 5F  SSH-2.0-OpenSSH_
  00000010  37 2E 34 70 31 20 44 65 62 69 61 6E 2D 31 30 2B  7.4p1 Debian-10+
  00000020  64 65 62 39 75 34 0A 00                          deb9u4..

A server must not send a null character before the Protocol Version Exchange is complete.

We are using the atmoz/sftp docker container for starting our local ftp server

docker run -p 22:22 --name my-ftp-test --restart unless-stopped -d atmoz/sftp foo:pass:1001::upload.

When using atmos/sftp:alpine, this isn't a problem.

Not sure if this is related to this issue or not but error message wise it seems related at least.

@jimbobmcgee
Copy link
Author

Pinging @drieseng, in case you are no longer subscribed...

@Kim-SSi
Copy link

Kim-SSi commented Jan 11, 2021

I ran into this also on our unit testing. It was due to an OpenSSH server bug.
How this should be handled is another issue. Maybe an option allowing the old code (single char) for identify?

From the OpenSSH release notes:

OpenSSH 7.5/7.5p1 (2017-03-20)

  • Fix various fallout and sharp edges caused by removing SSH protocol
    1 support from the server, including the server banner string being
    incorrectly terminated with only \n (instead of \r\n), confusing
    error messages from ssh-keyscan bz#2583 and a segfault in sshd
    if protocol v.1 was enabled for the client and sshd_config
    contained references to legacy keys bz#2686.

OpenSSH 5.1/5.1p1 (2008-07-22)

  • ssh(1) and sshd(8) now send terminate protocol banners with CR+LF for
    protocol 2 to comply with RFC 4253. Previously they were terminated
    with CR alone. Protocol 1 banners remain CR terminated. (bz#1443)

@DkGr
Copy link

DkGr commented Jan 12, 2021

Hello, after upgrading to 2020.0.0 we got the error message

Renci.SshNet.Common.SshConnectionException : The server response contains a null character at position 0x00000028:

  00000000  53 53 48 2D 32 2E 30 2D 4F 70 65 6E 53 53 48 5F  SSH-2.0-OpenSSH_
  00000010  37 2E 34 70 31 20 44 65 62 69 61 6E 2D 31 30 2B  7.4p1 Debian-10+
  00000020  64 65 62 39 75 34 0A 00                          deb9u4..

A server must not send a null character before the Protocol Version Exchange is complete.

We are using the atmoz/sftp docker container for starting our local ftp server

docker run -p 22:22 --name my-ftp-test --restart unless-stopped -d atmoz/sftp foo:pass:1001::upload.

When using atmos/sftp:alpine, this isn't a problem.

Not sure if this is related to this issue or not but error message wise it seems related at least.

I also have exactly this error with the 2020.0.0 package. I'm trying to connect to a raspberry under raspbian.

@drieseng
Copy link
Member

@DkGr @Kim-SSi @jimbobmcgee Can one of your create a new issue for this ("Allow SSH identification string to be terminated by LF + null character") ? Question is whether the null character is part of the identification string or not.

Can one of your validate the proposed fix (by building from source) once available?

@DkGr
Copy link

DkGr commented Jan 12, 2021

I can build the source and validate the fix. I created issue #761

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants