Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client certs removed from connection when setInsecure called, causing SSL connect failure. #7455

Closed
5 tasks done
Gor-Ren opened this issue Jul 14, 2020 · 10 comments · Fixed by #7464
Closed
5 tasks done
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.

Comments

@Gor-Ren
Copy link

Gor-Ren commented Jul 14, 2020

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • [NA] If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: ESP8266 device
  • Core Version: 2.7.2, 39c79d9
  • Development Env: Arduino IDE via VSCode + Visual Studio Code extension for Arduino
  • Operating System: Ubuntu 18.04

Settings in IDE

  • Module: NodeMCU v1.0 (ESP-12E Module)
  • Flash Mode: unknown
  • Flash Size: 4MB
  • lwip Variant: v2 Higher Bandwidth, also tried v1.4 Higher Bandwidth
  • Reset Method: unknown
  • Flash Frequency: unknown
  • CPU Frequency: tried both 80Mhz and 160MHz
  • Upload Using: SERIAL
  • Upload Speed: 115200

Problem Description

I am attempting to publish a "hello world" MQTT message over WiFi to an Amazon Web Services (AWS) IoT endpoint, which requires SSL encryption. I have been issued a CA cert, device cert and device public & private keys, plus my account-specific AWS endpoint to publish against.

I have configured my sketch to use these credentials (but not currently using PROGMEM to store them), and I populate the WifiClientSecure with my device certificate and private key. For now I am ignoring server certificate verification using WifiClientSecure::setInsecure.

my troubleshooting has included:

  • verified I am able to use a non-SSL WifiClient and successfully publish "hello worlds" to an unencrypted public MQTT broker.
  • reviewed the SSL client docs and example sketches
  • based on googling the error, I've variously tried the CPU at 80 Mhz and 160 MHz, SSL support at "All SSL ciphers" and "Basic SSL ciphers", lwIP variant v2 (Higher Bandwidth) and v1.4 (Higher Bandwidth) to no avail.
  • I am able to successfully authenticate from a computer on the same WiFi network for the same endpoint using the same certs using openssl s_client CLI helper (output provided in debug section)

The TLS handshake fails around BSSL:_wait_for_handshake: failed; please see debug output.

Further troubleshooting advice greatly appreciated.

MCVE Sketch

#include <Arduino.h>
#include <ESP8266WiFi.h>
#include <PubSubClient.h>

namespace Secrets {
const char wifiSsid[] = "myssid";
const char wifiPassword[] = "mypass";
const char awsIotEndpoint[] = "redacted.iot.eu-west-1.amazonaws.com";

const char awsCentralAuthorityCertificate[] = R"EOF(
-----BEGIN CERTIFICATE-----
redacted
-----END CERTIFICATE-----
)EOF";
const char awsDeviceCertificate[] = R"EOF(
-----BEGIN CERTIFICATE-----
redacted
-----END CERTIFICATE-----
)EOF";
const char awsDevicePrivateKey[] = R"EOF(
-----BEGIN RSA PRIVATE KEY-----
redacted
-----END RSA PRIVATE KEY-----
)EOF";
}  // namespace Secrets

void wifiConnect() {
  WiFi.mode(WIFI_STA);

  WiFi.begin(Secrets::wifiSsid, Secrets::wifiPassword);

  if (WiFi.waitForConnectResult() != WL_CONNECTED) {
    Serial.printf("WiFi connection failed (status=[%s])");
  } else {
    Serial.println("WiFi connected. IP address: ");
    Serial.println(WiFi.localIP());
  }
}

PubSubClient mqttClient;
WiFiClientSecure wifiClient;

void mqttSetup() {
  wifiClient.setClientRSACert(new X509List(Secrets::awsDeviceCertificate),
                              new PrivateKey(Secrets::awsDevicePrivateKey));
  wifiClient.setInsecure();  // TODO: verify server identity using CA cert

  mqttClient.setServer(Secrets::awsIotEndpoint, 8883);
  mqttClient.setClient(wifiClient);
}

void mqttReconnect() {
  while (!mqttClient.connected()) {
    Serial.printf("Connecting to MQTT broker... (MQTT client state: %d)\n",
                  mqttClient.state());
    if (!mqttClient.connect("test-client-id")) {
      char errorMessage[128];
      wifiClient.getLastSSLError(errorMessage, 128);
      Serial.printf(
          "Connecting to MQTT broker failed. (MQTT client state: %d, SSL "
          "error: %s)\n",
          mqttClient.state(), errorMessage);
    };
    delay(2500);
  }
  Serial.println("MQTT client connected to broker.");
}

void setup() {
  Serial.begin(115200);
  Serial.println("Booting");

  wifiConnect();
  mqttSetup();
}

void loop() {
  delay(5000);
  mqttReconnect();

  if (mqttClient.publish("testTopic", "hello world")) {
    Serial.println("MQTT message published successfully!");
  } else {
    Serial.println("MQTT message publish failed.");
  }

  Serial.println("Finished loop");
}

Debug Messages

[Info] Opened the serial port - /dev/ttyUSB0
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 2
cnt 

connected with MyWifiSSID, channel 11
dhcp client start...
wifi evt: 0
ip:192.168.86.249,mask:255.255.255.0,gw:192.168.86.1
wifi evt: 3
WiFi connected. IP address: 
192.168.86.249
Connecting to MQTT broker... (MQTT client state: -1)
[hostByName] request IP for: redacted.iot.eu-west-1.amazonaws.com
[hostByName] Host: redacted.iot.eu-west-1.amazonaws.com IP: 52.31.xxx.xx
:ref 1
BSSL:_connectSSL: start connection
:wr 251 0
:wrc 251 251 0
:ack 251
:rn 1414
:rd 5, 1414, 0
:rdi 1414, 5
:rd 1409, 1414, 5
:rdi 1409, 1409
:c0 1409, 1414
BSSL:CERT: aa bb cc etc. REDACTED
BSSL:CERT: aa bb cc etc. REDACTED
BSSL:CERT: aa bb cc etc. REDACTED
BSSL:CERT: aa bb cc etc. REDACTED
BSSL:CERT: aa bb cc etc. REDACTED
:rn 1414
:rch 1414, 1414
:rch 2828, 1108
:rd 3936, 3936, 0
:rdi 1414, 1414
:c 1414, 1414, 3936
:rdi 1414, 1414
:c 1414, 1414, 2522
:rdi 1108, 1108
:c0 1108, 1108
BSSL:CERT: aa bb cc etc. REDACTED
:wr 82 0
:wrc 82 82 0
:wr 6 0
:wrc 6 6 0
:wr 45 0
:wrc 45 45 0
:ack 82
:rn 7
:rcl pb=0x3fff88cc sz=7
:rd 5, 7, 0
:rdi 7, 5
:rd 2, 7, 5
:rdi 2, 2
:c0 2, 7
BSSL:_wait_for_handshake: failed
BSSL:Couldn't connect. Error = 'Unknown error code.'
Connecting to MQTT broker failed. (MQTT client state: -2, SSL error: Unknown error code.)
:ack 51
Connecting to MQTT broker... (MQTT client state: -2)
<repeats>

From a terminal, I can connect successfully using the same certificates:
(the certificates redacted in the sketch above are a copy-paste of the files referenced below)

openssl s_client -connect redacted.iot.eu-west-1.amazonaws.com:8443 -CAfile AmazonRootCA1.pem -cert redacted-certificate.pem.crt -key redacted-private.pem.key

And receive successful output:

CONNECTED(00000005)
depth=2 C = US, O = Amazon, CN = Amazon Root CA 1
verify return:1
depth=1 C = US, O = Amazon, OU = Server CA 1B, CN = Amazon
verify return:1
depth=0 CN = *.iot.eu-west-1.amazonaws.com
verify return:1
---
Certificate chain
 0 s:CN = *.iot.eu-west-1.amazonaws.com
   i:C = US, O = Amazon, OU = Server CA 1B, CN = Amazon
 1 s:C = US, O = Amazon, OU = Server CA 1B, CN = Amazon
   i:C = US, O = Amazon, CN = Amazon Root CA 1
 2 s:C = US, O = Amazon, CN = Amazon Root CA 1
   i:C = US, ST = Arizona, L = Scottsdale, O = "Starfield Technologies, Inc.", CN = Starfield Services Root Certificate Authority - G2
 3 s:C = US, ST = Arizona, L = Scottsdale, O = "Starfield Technologies, Inc.", CN = Starfield Services Root Certificate Authority - G2
   i:C = US, O = "Starfield Technologies, Inc.", OU = Starfield Class 2 Certification Authority
---
Server certificate
-----BEGIN CERTIFICATE-----
redacted
-----END CERTIFICATE-----
subject=CN = *.iot.eu-west-1.amazonaws.com

issuer=C = US, O = Amazon, OU = Server CA 1B, CN = Amazon

---
No client certificate CA names sent
Client Certificate Types: RSA sign, DSA sign, ECDSA sign
Requested Signature Algorithms: ECDSA+SHA512:RSA+SHA512:ECDSA+SHA384:RSA+SHA384:ECDSA+SHA256:RSA+SHA256:DSA+SHA256:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:ECDSA+SHA1:RSA+SHA1:DSA+SHA1
Shared Requested Signature Algorithms: ECDSA+SHA512:RSA+SHA512:ECDSA+SHA384:RSA+SHA384:ECDSA+SHA256:RSA+SHA256:DSA+SHA256:ECDSA+SHA224:RSA+SHA224:DSA+SHA224:ECDSA+SHA1:RSA+SHA1:DSA+SHA1
Peer signing digest: SHA256
Peer signature type: RSA
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 5400 bytes and written 1620 bytes
Verification: OK
---
New, TLSv1.2, Cipher is ECDHE-RSA-AES128-GCM-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES128-GCM-SHA256
    Session-ID: redacted
    Session-ID-ctx: 
    Master-Key: redacted
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1594729814
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: yes
---
@earlephilhower
Copy link
Collaborator

Just FYI, the certs that BearSSL are dumping are the public certs sent by the remote side, so there's not much reason to redact them. No private keys are in there.

Since your MCVE can't be run by anyone but you (private client certs, etc.), it's going to be on you to debug this.

It could be a 1) BearSSL protocol issue (unlikely, but possible), 2) a timeout on the AMZN server (negotiation takes a long time on the 8266 vs. a desktop) or the 8266-side, or 3) something else.

For 1) I would suggest you use the tests/host Makefile to build natively using the same code as your sketch. See the README.txt for more info. That will use only BearSSL and native TCP/IP to run your code.

For 2), you can look at the handshake times on the core (get millis before connect and after connect fails). Not sure what you would be able to about it, but if you're taking 10 seconds to do the handshake AMZN may think you're a DOS bot and drop the line.

For 3), I would suggest setting up a local mosquito server with your client certs and running there. You can use wireshark on the server to collect packet handshake between the two. I've run that a lot, actually, without issue. If your code has trouble there, then I'd look at something in the sketch as the culprit.

I would also suggest using https://gitter.im/esp8266/Arduino or https://esp8266.com . I see you posted on gitter already, but you might want to start with "Anyone have a working AMZN MQTT" vs. the specific error your code's hitting to get better response.

Good luck!

@earlephilhower earlephilhower added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Jul 14, 2020
@Gor-Ren
Copy link
Author

Gor-Ren commented Jul 14, 2020

Thanks for your feedback @earlephilhower! My bad redacting public certs :-)

I am generally following AWS documentation for MQTT over SSL on an ESP32, the only real difference I encountered on the ESP8266 is that the interface to setCertificate/setPrivateKey is slightly different on the ESP8266's WifiClientSecure.

I've looked into a few more things with no joy yet:

  • It was easiest to investigate your timeout theory (2) - I'm seeing ~895 to ~950 ms for the connect call at 160 MHz, which seems fine (?)
  • tried specifying the cert/key in binary format following this example but got the same error behaviour.
  • double-checked all the permissions and policies on the AWS IOT console are properly granted for the certificate
  • set the clientId arg in the connect call to match exactly the AWS thingName (shouldn't strictly be necessary, but can't hurt)

I will look into your other suggestions next. Thanks again.

@Jeroen88
Copy link
Contributor

@Gor-Ren You could also add some debug statements in _run_until. Looking at the BearSSL source I think this function will return a -1 in your case. Some returns from this function will generate a log, but a lot don't. Maybe you can drill down the issue in this way.

@earlephilhower earlephilhower changed the title WifiClientSecure errors with BSSL:_wait_for_handshake: failed when establishing SSL connection Client certs removed from connection when setInsecure called, causing SSL connect failure. Jul 17, 2020
@earlephilhower
Copy link
Collaborator

I built a standalone HTTPS test and a Flask.py server requiring client certs and it worked fine. W/o the client cert, it failed. W/the cert, I connected and got a HTTPS GET request services.

But, the order of setInsecure and setClientCert is the issue.

        client.setInsecure();
        client.setClientRSACert(&client_crt, &client_pk);

is good.

        client.setClientRSACert(&client_crt, &client_pk);
        client.setInsecure();

failed because setInsecure clears the client cert, too. Change the order in your app and it probably will get going.

I think that it may make sense to NOT touch client certs when setInsecure is called. The insecurity is on verifying the server at the other end, not stopping us from identifying ourselves (w/a public cert which is safe if it escapes into the wild).

earlephilhower added a commit to earlephilhower/Arduino that referenced this issue Jul 17, 2020
WiFiClientSecure.setInsecure() was clearing the secret key (but not the
_chain public client cert) incorrectly.  The other server authentication
modes also had the same effect.

The only way for it to work would be if the app first set the server
authentication method and then the client keys.  There's no good reason
for this.

Adjust the connection to only clear the server id methods and leave the
client ID untouched.

Fixes esp8266#7455
@devyte
Copy link
Collaborator

devyte commented Jul 17, 2020

@earlephilhower I remember that clearing the cert on setInsecure() was implemented on purpose, i. e. you had a good reason for it.
I consider the current behavior correct: the methods are imperative and not declarative, so whichever is called last rules.
But this is your area, so your choice.

@earlephilhower
Copy link
Collaborator

I vaguely remember the discussion, but I am pretty sure the current setup doesn't make sense.

setInsecure/setFingerprint/setKnownKey/setTrustAnchors all refer to the server identification. They're mutually exclusive, so I factored out the clearAuthentication() method to call before applying them.

The client public cert/secret key only refer to the client proving it's who it says it is. It's not really related to the prior 4 calls and should be orthogonal to them.

I think I just goofed. For example, the current setup clears the secret key but not the cert so I send crap to the BearSSL backend (which just doesn't sent any client cert). It should either clear them both or neither.

@Gor-Ren
Copy link
Author

Gor-Ren commented Jul 17, 2020

I can confirm swapping the order of the setClientRSACert() and setInsecure() calls fixed the problem and I was able to successfully connect to the AWS endpoint and subsequently publish MQTT messages. Thanks for the help.

earlephilhower added a commit that referenced this issue Jul 17, 2020
WiFiClientSecure.setInsecure() was clearing the secret key (but not the
_chain public client cert) incorrectly.  The other server authentication
modes also had the same effect.

The only way for it to work would be if the app first set the server
authentication method and then the client keys.  There's no good reason
for this.

Adjust the connection to only clear the server id methods and leave the
client ID untouched.

Fixes #7455
@moritzlerch
Copy link

Hey, any updates on this? I'm getting an _wait_for_handshake: failed unfortunately, too. Please look at this:

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: ESP-12F
  • Core Version: SDK:2.2.2-dev(38a443e)/Core:3.0.1=30001000/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-48-g7421258/BearSSL:c0b69df
  • Development Env: Platformio
  • Operating System: Windows

Settings in IDE

  • Module: Wemos D1 mini r2
  • Flash Mode: qio
  • Flash Size: 4MB
  • lwip Variant: v2 Lower Memory
  • Reset Method: ck
  • Flash Frequency: [40Mhz]
  • CPU Frequency: 80Mhz
  • Upload Using: SERIAL
  • Upload Speed: 115200

Problem Description

So I got problems to do the handshake with my tesla powerwall (solar battery storage system) which I'm doing a project with. The powerwall did an update and unfortunately as described I can not connect to it anymore. I'm getting the _wait_for_handshake: failed error. However, if I try that request with curl on my PC it works just fine. Raspberry Pi also works. So problem has to have to do with the BearSSL-WifiClientSecure library.

Getting stuck here:

(In my project in lib/Powerwall/Powerwall.h:54)

powerwall_ip = "192.168.178.38"

WiFiClientSecure httpsClient;
httpsClient.setInsecure();
httpsClient.setTimeout(10000);
int retry = 0;

while ((!httpsClient.connect(powerwall_ip, 443)) && (retry < 15)) {
    delay(100);
    Serial.print(".");
    retry++;
}

if (retry >= 15) {
    return ("CONN-FAIL");
}

Debug Messages

(DEV: doing GET-request to 192.168.178.38/api/system_status/soe)
:ref 1
BSSL:_connectSSL: start connection

_iobuf_in:       0x3fff1744
_iobuf_out:      0x3fffb2a4
_iobuf_in_size:  16709
_iobuf_out_size: 597
:wr 137 0
:wrc 137 137 0
:ack 137
:rcl pb=0 sz=-1
:abort

BSSL:_wait_for_handshake: failed
BSSL:Couldn't connect. Error = 'Unknown error code.'
.:ur 1
:dsrcv 0
:del

Debugging

I did some debugging with a friend and from the _wait_for_handshake function (bool) in the WifiClientSecure-class we got to the _run_until function and searched for the -1. We found out the connection is crashing here:

if (!(_client->state() == ESTABLISHED) && !WiFiClient::available()) {
  return (state & target) ? 0 : -1;
}

After this we just outputted the values of state and target via DEBUG_BSSL and found out they were 4 and 8. So they are not matching with the binary-and (&) so this has to be the problem.

These would be the two values:

/** \brief SSL engine state: engine may receive records from the peer. */
#define BR_SSL_RECVREC   0x0004
/** \brief SSL engine state: engine may accept application data to send. */
#define BR_SSL_SENDAPP   0x0008

Probably someone can help.

@ElectricBeat
Copy link

@earlephilhower

hello sir, first sorry for my poor English

I am working on node MCu 1.0 board v2.4.2. i uploaded the our ca certificate ,private key certificate and secure key in my board it is working fine.
But when i upgrade the boar's version i have receive the handshake error and rc= -2.

how to rectify this error ?

@cjacky475
Copy link

@ElectricBeat, did you fix the problem with rc = -2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants