Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in libreswan 3.32 #390

Closed
markdgray opened this issue Jan 9, 2021 · 21 comments
Closed

Segmentation fault in libreswan 3.32 #390

markdgray opened this issue Jan 9, 2021 · 21 comments

Comments

@markdgray
Copy link

kernel: pluto[7958]: segfault at 0 ip 00005562dc0fa6f6 sp 00007ffe9fc46a50 error 4 in pluto[5562dc067000+18c000]

Unfortunately, I don't have a core dump but I do have the location:

addr2line -f -a -e /lib/debug/usr/libexec/ipsec/pluto-3.32-7.el8_3.x86_64.debug +0x936f6
0x00000000000936f6
ikev1_out_sa
/usr/src/debug/libreswan-3.32-7.el8_3.x86_64/programs/pluto/ikev1_spdb_struct.c:396

It looks like 'sadb' is null?

 386         if (oakley_mode) {
 387                 /*
 388                  * Construct the proposals by combining ALG_INFO_IKE
 389                  * with the AUTH (proof of identity) extracted from
 390                  * the (default?) SADB.  As if by magic, attrs[2] is
 391                  * always the authentication method.
 392                  *
 393                  * XXX: Should replace SADB with a simple map to the
 394                  * auth method.
 395                  */
 396                 struct db_attr *auth = &sadb->prop_conjs[0].props[0].trans[0].attrs[2];
 397                 passert(auth->type.oakley == OAKLEY_AUTHENTICATION_METHOD);
 398                 enum ikev1_auth_method auth_method = auth->val;
@markdgray
Copy link
Author

I managed to get some logs. Luckily they are short for this node in the cluster:

Jan  8 07:52:58.012091: NSS DB directory: sql:/etc/ipsec.d
Jan  8 07:52:58.013147: Initializing NSS
Jan  8 07:52:58.013170: Opening NSS database "sql:/etc/ipsec.d" read-only
Jan  8 07:52:58.131825: NSS crypto library initialized
Jan  8 07:52:58.131849: FIPS Mode: NO
Jan  8 07:52:58.131854: FIPS mode disabled for pluto daemon
Jan  8 07:52:58.131858: FIPS HMAC integrity support [disabled]
Jan  8 07:52:58.131971: libcap-ng support [enabled]
Jan  8 07:52:58.131979: Linux audit support [enabled]
Jan  8 07:52:58.132302: Linux audit activated
Jan  8 07:52:58.132322: Starting Pluto (Libreswan Version 3.32 XFRM(netkey) XFRMI esp-hw-offload FORK PTHREAD_SETSCHEDPRIO GCC_EXCEPTIONS NSS (IPsec profile) (NSS-PRF) DNSSEC SYSTEMD_WATCHDOG LABELED_IPSEC SECCOMP LIBCAP_NG LINUX_AUDIT XAUTH_PAM NETWORKMANAGER CURL(non-NSS) LDAP(non-NSS)) pid:180
Jan  8 07:52:58.132327: core dump dir: /run/pluto
Jan  8 07:52:58.132330: secrets file: /etc/ipsec.secrets
Jan  8 07:52:58.132334: leak-detective enabled
Jan  8 07:52:58.132337: NSS crypto [enabled]
Jan  8 07:52:58.132340: XAUTH PAM support [enabled]
Jan  8 07:52:58.132462: Initializing libevent in pthreads mode: headers: 2.1.8-stable (2010800); library: 2.1.8-stable (2010800)
Jan  8 07:52:58.132535: NAT-Traversal support  [enabled]
Jan  8 07:52:58.132647: Encryption algorithms:
Jan  8 07:52:58.132659:   AES_CCM_16              IKEv1:     ESP     IKEv2:     ESP     FIPS  {256,192,*128}  aes_ccm, aes_ccm_c
Jan  8 07:52:58.132665:   AES_CCM_12              IKEv1:     ESP     IKEv2:     ESP     FIPS  {256,192,*128}  aes_ccm_b
Jan  8 07:52:58.132671:   AES_CCM_8               IKEv1:     ESP     IKEv2:     ESP     FIPS  {256,192,*128}  aes_ccm_a
Jan  8 07:52:58.132676:   3DES_CBC                IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS  [*192]  3des
Jan  8 07:52:58.132681:   CAMELLIA_CTR            IKEv1:     ESP     IKEv2:     ESP           {256,192,*128}
Jan  8 07:52:58.132687:   CAMELLIA_CBC            IKEv1: IKE ESP     IKEv2: IKE ESP           {256,192,*128}  camellia
Jan  8 07:52:58.132693:   AES_GCM_16              IKEv1:     ESP     IKEv2: IKE ESP     FIPS  {256,192,*128}  aes_gcm, aes_gcm_c
Jan  8 07:52:58.132698:   AES_GCM_12              IKEv1:     ESP     IKEv2: IKE ESP     FIPS  {256,192,*128}  aes_gcm_b
Jan  8 07:52:58.132704:   AES_GCM_8               IKEv1:     ESP     IKEv2: IKE ESP     FIPS  {256,192,*128}  aes_gcm_a
Jan  8 07:52:58.132709:   AES_CTR                 IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS  {256,192,*128}  aesctr
Jan  8 07:52:58.132714:   AES_CBC                 IKEv1: IKE ESP     IKEv2: IKE ESP     FIPS  {256,192,*128}  aes
Jan  8 07:52:58.132720:   SERPENT_CBC             IKEv1: IKE ESP     IKEv2: IKE ESP           {256,192,*128}  serpent
Jan  8 07:52:58.132725:   TWOFISH_CBC             IKEv1: IKE ESP     IKEv2: IKE ESP           {256,192,*128}  twofish
Jan  8 07:52:58.132730:   TWOFISH_SSH             IKEv1: IKE         IKEv2: IKE ESP           {256,192,*128}  twofish_cbc_ssh
Jan  8 07:52:58.132736:   NULL_AUTH_AES_GMAC      IKEv1:     ESP     IKEv2:     ESP     FIPS  {256,192,*128}  aes_gmac
Jan  8 07:52:58.132740:   NULL                    IKEv1:     ESP     IKEv2:     ESP           []
Jan  8 07:52:58.132745:   CHACHA20_POLY1305       IKEv1:             IKEv2: IKE ESP           [*256]  chacha20poly1305
Jan  8 07:52:58.132755: Hash algorithms:
Jan  8 07:52:58.132759:   MD5                     IKEv1: IKE         IKEv2:                 
Jan  8 07:52:58.132764:   SHA1                    IKEv1: IKE         IKEv2:             FIPS  sha
Jan  8 07:52:58.132768:   SHA2_256                IKEv1: IKE         IKEv2:             FIPS  sha2, sha256
Jan  8 07:52:58.132773:   SHA2_384                IKEv1: IKE         IKEv2:             FIPS  sha384
Jan  8 07:52:58.132777:   SHA2_512                IKEv1: IKE         IKEv2:             FIPS  sha512
Jan  8 07:52:58.132799: PRF algorithms:
Jan  8 07:52:58.132804:   HMAC_MD5                IKEv1: IKE         IKEv2: IKE               md5
Jan  8 07:52:58.132809:   HMAC_SHA1               IKEv1: IKE         IKEv2: IKE         FIPS  sha, sha1
Jan  8 07:52:58.132821:   HMAC_SHA2_256           IKEv1: IKE         IKEv2: IKE         FIPS  sha2, sha256, sha2_256
Jan  8 07:52:58.132826:   HMAC_SHA2_384           IKEv1: IKE         IKEv2: IKE         FIPS  sha384, sha2_384
Jan  8 07:52:58.132831:   HMAC_SHA2_512           IKEv1: IKE         IKEv2: IKE         FIPS  sha512, sha2_512
Jan  8 07:52:58.132836:   AES_XCBC                IKEv1:             IKEv2: IKE               aes128_xcbc
Jan  8 07:52:58.132871: Integrity algorithms:
Jan  8 07:52:58.132876:   HMAC_MD5_96             IKEv1: IKE ESP AH  IKEv2: IKE ESP AH        md5, hmac_md5
Jan  8 07:52:58.132882:   HMAC_SHA1_96            IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS  sha, sha1, sha1_96, hmac_sha1
Jan  8 07:52:58.132888:   HMAC_SHA2_512_256       IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS  sha512, sha2_512, sha2_512_256, hmac_sha2_512
Jan  8 07:52:58.132894:   HMAC_SHA2_384_192       IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS  sha384, sha2_384, sha2_384_192, hmac_sha2_384
Jan  8 07:52:58.132900:   HMAC_SHA2_256_128       IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS  sha2, sha256, sha2_256, sha2_256_128, hmac_sha2_256
Jan  8 07:52:58.132904:   HMAC_SHA2_256_TRUNCBUG  IKEv1:     ESP AH  IKEv2:         AH      
Jan  8 07:52:58.132909:   AES_XCBC_96             IKEv1:     ESP AH  IKEv2: IKE ESP AH        aes_xcbc, aes128_xcbc, aes128_xcbc_96
Jan  8 07:52:58.132914:   AES_CMAC_96             IKEv1:     ESP AH  IKEv2:     ESP AH  FIPS  aes_cmac
Jan  8 07:52:58.132919:   NONE                    IKEv1:     ESP     IKEv2: IKE ESP     FIPS  null
Jan  8 07:52:58.132936: DH algorithms:
Jan  8 07:52:58.132941:   NONE                    IKEv1:             IKEv2: IKE ESP AH  FIPS  null, dh0
Jan  8 07:52:58.132946:   MODP1024                IKEv1: IKE ESP AH  IKEv2: IKE ESP AH        dh2
Jan  8 07:52:58.132950:   MODP1536                IKEv1: IKE ESP AH  IKEv2: IKE ESP AH        dh5
Jan  8 07:52:58.132955:   MODP2048                IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS  dh14
Jan  8 07:52:58.132959:   MODP3072                IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS  dh15
Jan  8 07:52:58.132963:   MODP4096                IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS  dh16
Jan  8 07:52:58.132968:   MODP6144                IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS  dh17
Jan  8 07:52:58.132972:   MODP8192                IKEv1: IKE ESP AH  IKEv2: IKE ESP AH  FIPS  dh18
Jan  8 07:52:58.132977:   DH19                    IKEv1: IKE         IKEv2: IKE ESP AH  FIPS  ecp_256, ecp256
Jan  8 07:52:58.132982:   DH20                    IKEv1: IKE         IKEv2: IKE ESP AH  FIPS  ecp_384, ecp384
Jan  8 07:52:58.132986:   DH21                    IKEv1: IKE         IKEv2: IKE ESP AH  FIPS  ecp_521, ecp521
Jan  8 07:52:58.132991:   DH31                    IKEv1: IKE         IKEv2: IKE ESP AH        curve25519
Jan  8 07:52:58.132994: testing CAMELLIA_CBC:
Jan  8 07:52:58.132998:   Camellia: 16 bytes with 128-bit key
Jan  8 07:52:58.133077:   Camellia: 16 bytes with 128-bit key
Jan  8 07:52:58.133116:   Camellia: 16 bytes with 256-bit key
Jan  8 07:52:58.133157:   Camellia: 16 bytes with 256-bit key
Jan  8 07:52:58.133195: testing AES_GCM_16:
Jan  8 07:52:58.133199:   empty string
Jan  8 07:52:58.133236:   one block
Jan  8 07:52:58.133295:   two blocks
Jan  8 07:52:58.133337:   two blocks with associated data
Jan  8 07:52:58.133373: testing AES_CTR:
Jan  8 07:52:58.133377:   Encrypting 16 octets using AES-CTR with 128-bit key
Jan  8 07:52:58.133415:   Encrypting 32 octets using AES-CTR with 128-bit key
Jan  8 07:52:58.133455:   Encrypting 36 octets using AES-CTR with 128-bit key
Jan  8 07:52:58.133496:   Encrypting 16 octets using AES-CTR with 192-bit key
Jan  8 07:52:58.133534:   Encrypting 32 octets using AES-CTR with 192-bit key
Jan  8 07:52:58.133574:   Encrypting 36 octets using AES-CTR with 192-bit key
Jan  8 07:52:58.133622:   Encrypting 16 octets using AES-CTR with 256-bit key
Jan  8 07:52:58.133659:   Encrypting 32 octets using AES-CTR with 256-bit key
Jan  8 07:52:58.133696:   Encrypting 36 octets using AES-CTR with 256-bit key
Jan  8 07:52:58.133742: testing AES_CBC:
Jan  8 07:52:58.133746:   Encrypting 16 bytes (1 block) using AES-CBC with 128-bit key
Jan  8 07:52:58.133784:   Encrypting 32 bytes (2 blocks) using AES-CBC with 128-bit key
Jan  8 07:52:58.133826:   Encrypting 48 bytes (3 blocks) using AES-CBC with 128-bit key
Jan  8 07:52:58.133868:   Encrypting 64 bytes (4 blocks) using AES-CBC with 128-bit key
Jan  8 07:52:58.133920: testing AES_XCBC:
Jan  8 07:52:58.133924:   RFC 3566 Test Case 1: AES-XCBC-MAC-96 with 0-byte input
Jan  8 07:52:58.134087:   RFC 3566 Test Case 2: AES-XCBC-MAC-96 with 3-byte input
Jan  8 07:52:58.134279:   RFC 3566 Test Case 3: AES-XCBC-MAC-96 with 16-byte input
Jan  8 07:52:58.134449:   RFC 3566 Test Case 4: AES-XCBC-MAC-96 with 20-byte input
Jan  8 07:52:58.134618:   RFC 3566 Test Case 5: AES-XCBC-MAC-96 with 32-byte input
Jan  8 07:52:58.134788:   RFC 3566 Test Case 6: AES-XCBC-MAC-96 with 34-byte input
Jan  8 07:52:58.135007:   RFC 3566 Test Case 7: AES-XCBC-MAC-96 with 1000-byte input
Jan  8 07:52:58.135393:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 16)
Jan  8 07:52:58.135560:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 10)
Jan  8 07:52:58.135773:   RFC 4434 Test Case AES-XCBC-PRF-128 with 20-byte input (key length 18)
Jan  8 07:52:58.136097: testing HMAC_MD5:
Jan  8 07:52:58.136102:   RFC 2104: MD5_HMAC test 1
Jan  8 07:52:58.140853:   RFC 2104: MD5_HMAC test 2
Jan  8 07:52:58.141107:   RFC 2104: MD5_HMAC test 3
Jan  8 07:52:58.141415: 4 CPU cores online
Jan  8 07:52:58.141424: starting up 3 crypto helpers
Jan  8 07:52:58.141487: started thread for crypto helper 0
Jan  8 07:52:58.141522: started thread for crypto helper 1
Jan  8 07:52:58.141552: started thread for crypto helper 2
Jan  8 07:52:58.141566: Using Linux XFRM/NETKEY IPsec kernel support code on 4.18.0-240.10.1.el8_3.x86_64
Jan  8 07:52:58.142132: selinux support is NOT enabled.
Jan  8 07:52:58.148293: systemd watchdog not enabled - not sending watchdog keepalives
Jan  8 07:52:58.216279: listening for IKE messages
Jan  8 07:52:58.219473: Kernel supports NIC esp-hw-offload
Jan  8 07:52:58.219502: adding interface ovn-k8s-gw0/ovn-k8s-gw0 (esp-hw-offload not supported by kernel) 169.254.0.1:500
Jan  8 07:52:58.219584: adding interface ovn-k8s-gw0/ovn-k8s-gw0 169.254.0.1:4500
Jan  8 07:52:58.219615: adding interface ovn-k8s-mp0/ovn-k8s-mp0 (esp-hw-offload not supported by kernel) 10.129.0.2:500
Jan  8 07:52:58.219641: adding interface ovn-k8s-mp0/ovn-k8s-mp0 10.129.0.2:4500
Jan  8 07:52:58.219669: adding interface br-ex/br-ex (esp-hw-offload not supported by kernel) 10.0.164.113:500
Jan  8 07:52:58.219695: adding interface br-ex/br-ex 10.0.164.113:4500
Jan  8 07:52:58.219724: adding interface lo/lo (esp-hw-offload not supported by kernel) 127.0.0.1:500
Jan  8 07:52:58.219751: adding interface lo/lo 127.0.0.1:4500
Jan  8 07:52:58.219944: adding interface lo/lo (esp-hw-offload not supported by kernel) [::1]:500
Jan  8 07:52:58.230436: loading secrets from "/etc/ipsec.secrets"
Jan  8 07:52:58.230517: no secrets filename matched "/etc/ipsec.d/*.secrets"
Jan  8 07:52:59.322396: packet from 10.0.244.89:500: initial parent SA message received on 10.0.164.113:500 but no suitable connection found with IKEv2 policy
Jan  8 07:52:59.322432: packet from 10.0.244.89:500: responding to IKE_SA_INIT (34) message (Message ID 0) from 10.0.244.89:500 with unencrypted notification NO_PROPOSAL_CHOSEN
Jan  8 07:52:59.919060: loading secrets from "/etc/ipsec.secrets"
Jan  8 07:53:00.095147: added connection description "ovn-cc3669-0-in-1"
Jan  8 07:53:00.108411: "ovn-cc3669-0-in-1" #1: initiating IKEv2 IKE SA
Jan  8 07:53:00.108441: "ovn-cc3669-0-in-1": local IKE proposals (IKE SA initiator selecting KE): 
Jan  8 07:53:00.108452: "ovn-cc3669-0-in-1":   1:IKE=AES_GCM_C_256-HMAC_SHA2_256-NONE-MODP2048+MODP3072+MODP4096+MODP8192+ECP_256+ECP_384+ECP_521+CURVE25519
Jan  8 07:53:00.117315: "ovn-cc3669-0-in-1" #1: STATE_PARENT_I1: sent v2I1, expected v2R1
Jan  8 07:53:00.117574: "ovn-cc3669-0-in-1" #1: ERROR: asynchronous network error report on br-ex (10.0.164.113:500) for message to 10.0.175.161 port 500, complainant 10.0.175.161: Connection refused [errno 111, origin ICMP type 3 code 3 (not authenticated)]
Jan  8 07:53:00.189356: added connection description "ovn-cc3669-0-out-1"
Jan  8 07:53:00.207400: "ovn-cc3669-0-out-1": queuing pending IPsec SA negotiating with 10.0.175.161 IKE SA #1 "ovn-cc3669-0-in-1"
Jan  8 07:53:00.292386: added connection description "ovn-b486b9-0-in-1"
Jan  8 07:53:00.309134: "ovn-b486b9-0-in-1" #2: initiating IKEv2 IKE SA
Jan  8 07:53:00.309188: "ovn-b486b9-0-in-1": local IKE proposals (IKE SA initiator selecting KE): 
Jan  8 07:53:00.309199: "ovn-b486b9-0-in-1":   1:IKE=AES_GCM_C_256-HMAC_SHA2_256-NONE-MODP2048+MODP3072+MODP4096+MODP8192+ECP_256+ECP_384+ECP_521+CURVE25519
Jan  8 07:53:00.313353: "ovn-b486b9-0-in-1" #2: STATE_PARENT_I1: sent v2I1, expected v2R1
Jan  8 07:53:00.342978: "ovn-b486b9-0-in-1": local ESP/AH proposals (IKE SA initiator emitting ESP/AH proposals): 
Jan  8 07:53:00.343001: "ovn-b486b9-0-in-1":   1:ESP=AES_GCM_C_256-NONE-NONE-DISABLED
Jan  8 07:53:00.343059: "ovn-b486b9-0-in-1" #3: STATE_PARENT_I2: sent v2I2, expected v2R2 {auth=IKEv2 cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_256 group=MODP2048}
Jan  8 07:53:00.386374: added connection description "ovn-b486b9-0-out-1"
Jan  8 07:53:00.398372: "ovn-b486b9-0-out-1": queuing pending IPsec SA negotiating with 10.0.244.89 IKE SA #2 "ovn-b486b9-0-in-1"
Jan  8 07:53:00.409440: loading root certificate cache
Jan  8 07:53:00.410751: "ovn-b486b9-0-in-1" #3: certificate verified OK: CN=b486b9ee-22dc-4c40-a21b-db485ac0d234,OU=kind,O=ovnkubernetes,C=US
Jan  8 07:53:00.411051: "ovn-b486b9-0-in-1" #3: IKEv2 mode peer ID is ID_FQDN: '@b486b9ee-22dc-4c40-a21b-db485ac0d234'
Jan  8 07:53:00.411196: "ovn-b486b9-0-in-1" #3: Authenticated using RSA with IKEv2_AUTH_HASH_SHA2_512
Jan  8 07:53:00.570332: "ovn-b486b9-0-in-1" #3: negotiated connection [10.0.164.113-10.0.164.113:6081-6081 17] -> [10.0.244.89-10.0.244.89:0-65535 17]
Jan  8 07:53:00.570361: "ovn-b486b9-0-in-1" #3: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0xbae81f3f <0x95e0d8ea xfrm=AES_GCM_16_256-NONE NATOA=none NATD=none DPD=passive}
Jan  8 07:53:00.570406: "ovn-b486b9-0-out-1": local ESP/AH proposals (ESP/AH initiator emitting proposals): 
Jan  8 07:53:00.570414: "ovn-b486b9-0-out-1":   1:ESP=AES_GCM_C_256-NONE-MODP2048-DISABLED
Jan  8 07:53:00.573153: "ovn-b486b9-0-out-1" #4: STATE_V2_CREATE_I: sent IPsec Child req wait response
Jan  8 07:53:00.617619: "ovn-cc3669-0-in-1" #1: STATE_PARENT_I1: retransmission; will wait 0.5 seconds for response
Jan  8 07:53:00.618412: "ovn-cc3669-0-in-1" #1: ERROR: asynchronous network error report on br-ex (10.0.164.113:500) for message to 10.0.175.161 port 500, complainant 10.0.175.161: Connection refused [errno 111, origin ICMP type 3 code 3 (not authenticated)]
Jan  8 07:53:00.770914: "ovn-b486b9-0-out-1" #4: negotiated connection [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.244.89-10.0.244.89:6081-6081 17]
Jan  8 07:53:00.770941: "ovn-b486b9-0-out-1" #4: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0x394e9e2a <0x4040eba9 xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}
Jan  8 07:53:01.119317: "ovn-cc3669-0-in-1" #1: STATE_PARENT_I1: retransmission; will wait 1 seconds for response
Jan  8 07:53:01.119566: "ovn-cc3669-0-in-1" #1: ERROR: asynchronous network error report on br-ex (10.0.164.113:500) for message to 10.0.175.161 port 500, complainant 10.0.175.161: Connection refused [errno 111, origin ICMP type 3 code 3 (not authenticated)]
Jan  8 07:53:02.120317: "ovn-cc3669-0-in-1" #1: STATE_PARENT_I1: retransmission; will wait 2 seconds for response
Jan  8 07:53:02.120535: "ovn-cc3669-0-in-1" #1: ERROR: asynchronous network error report on br-ex (10.0.164.113:500) for message to 10.0.175.161 port 500, complainant 10.0.175.161: Connection refused [errno 111, origin ICMP type 3 code 3 (not authenticated)]
Jan  8 07:53:03.502192: "ovn-cc3669-0-out-1": local IKE proposals (IKE SA responder matching remote proposals): 
Jan  8 07:53:03.502236: "ovn-cc3669-0-out-1":   1:IKE=AES_GCM_C_256-HMAC_SHA2_256-NONE-MODP2048+MODP3072+MODP4096+MODP8192+ECP_256+ECP_384+ECP_521+CURVE25519
Jan  8 07:53:03.502248: "ovn-cc3669-0-out-1" #5: proposal 1:IKE=AES_GCM_C_256-HMAC_SHA2_256-MODP2048 chosen from remote proposals 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;DH=MODP2048;DH=MODP3072;DH=MODP4096;DH=MODP8192;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519[first-match]
Jan  8 07:53:03.504464: "ovn-cc3669-0-out-1" #5: STATE_PARENT_R1: received v2I1, sent v2R1 {auth=IKEv2 cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_256 group=MODP2048}
Jan  8 07:53:03.510607: "ovn-cc3669-0-out-1" #5: processing decrypted IKE_AUTH request: SK{IDi,CERT,CERTREQ,IDr,AUTH,SA,TSi,TSr,N}
Jan  8 07:53:03.511318: "ovn-cc3669-0-out-1" #5: certificate verified OK: CN=cc366937-ef3b-4cf1-aed8-47348b573c8d,OU=kind,O=ovnkubernetes,C=US
Jan  8 07:53:03.511504: "ovn-cc3669-0-out-1" #5: IKEv2 mode peer ID is ID_FQDN: '@cc366937-ef3b-4cf1-aed8-47348b573c8d'
Jan  8 07:53:03.511600: "ovn-cc3669-0-out-1" #5: Authenticated using RSA with IKEv2_AUTH_HASH_SHA2_512
Jan  8 07:53:03.514566: "ovn-cc3669-0-out-1": local ESP/AH proposals (IKE_AUTH responder matching remote ESP/AH proposals): 
Jan  8 07:53:03.514584: "ovn-cc3669-0-out-1":   1:ESP=AES_GCM_C_256-NONE-NONE-DISABLED
Jan  8 07:53:03.514593: "ovn-cc3669-0-out-1" #5: proposal 1:ESP=AES_GCM_C_256-DISABLED SPI=f61da21c chosen from remote proposals 1:ESP:ENCR=AES_GCM_C_256;ESN=DISABLED[first-match]
Jan  8 07:53:03.560218: "ovn-cc3669-0-out-1" #6: negotiated connection [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.175.161-10.0.175.161:6081-6081 17]
Jan  8 07:53:03.560240: "ovn-cc3669-0-out-1" #6: STATE_V2_IPSEC_R: IPsec SA established transport mode {ESP=>0xf61da21c <0x2c6a6e0b xfrm=AES_GCM_16_256-NONE NATOA=none NATD=none DPD=passive}
Jan  8 07:53:03.630088: "ovn-cc3669-0-out-1": local ESP/AH proposals (CREATE_CHILD_SA responder matching remote ESP/AH proposals): 
Jan  8 07:53:03.630120: "ovn-cc3669-0-out-1":   1:ESP=AES_GCM_C_256-NONE-MODP2048-DISABLED
Jan  8 07:53:03.630130: "ovn-cc3669-0-out-1" #5: proposal 1:ESP=AES_GCM_C_256-MODP2048-DISABLED SPI=d035cc8a chosen from remote proposals 1:ESP:ENCR=AES_GCM_C_256;DH=MODP2048;ESN=DISABLED[first-match]
Jan  8 07:53:03.675445: "ovn-cc3669-0-in-1" #7: negotiated new IPsec SA [10.0.164.113-10.0.164.113:6081-6081 17] -> [10.0.175.161-10.0.175.161:0-65535 17]
Jan  8 07:53:03.675488: "ovn-cc3669-0-in-1" #7: negotiated connection [10.0.164.113-10.0.164.113:6081-6081 17] -> [10.0.175.161-10.0.175.161:0-65535 17]
Jan  8 07:53:03.675496: "ovn-cc3669-0-in-1" #7: STATE_V2_IPSEC_R: IPsec SA established transport mode {ESP=>0xd035cc8a <0x433a1703 xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}
Jan  8 07:53:04.120648: "ovn-cc3669-0-in-1" #1: STATE_PARENT_I1: retransmission; will wait 4 seconds for response
Jan  8 07:53:04.122945: "ovn-cc3669-0-in-1" #1: state superseded by #7 try=1, drop this negotiation
Jan  8 07:53:04.123034: "ovn-cc3669-0-in-1" #1: encountered fatal error in state STATE_PARENT_I1
Jan  8 07:53:04.123046: "ovn-cc3669-0-in-1" #1: deleting state (STATE_PARENT_I1) aged 4.014s and NOT sending notification
Jan  8 07:53:04.123061: "ovn-cc3669-0-in-1" #1: deleting IKE SA but connection is supposed to remain up; schedule EVENT_REVIVE_CONNS
Jan  8 07:53:04.123142: "ovn-cc3669-0-in-1": initiating connection which received a Delete/Notify but must remain up per local policy
Jan  8 07:53:04.123168: "ovn-cc3669-0-in-1": local ESP/AH proposals (ESP/AH initiator emitting proposals): 
Jan  8 07:53:04.123174: "ovn-cc3669-0-in-1":   1:ESP=AES_GCM_C_256-NONE-MODP2048-DISABLED
Jan  8 07:53:04.124861: "ovn-cc3669-0-in-1" #8: STATE_V2_CREATE_I: sent IPsec Child req wait response
Jan  8 07:53:04.129755: "ovn-cc3669-0-in-1" #8: negotiated connection [10.0.164.113-10.0.164.113:6081-6081 17] -> [10.0.175.161-10.0.175.161:0-65535 17]
Jan  8 07:53:04.129778: "ovn-cc3669-0-in-1" #8: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0xeaaa0d52 <0xe13e7cd7 xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}
Jan  8 07:53:07.322186: "ovn-b486b9-0-in-1" #9: proposal 1:IKE=AES_GCM_C_256-HMAC_SHA2_256-MODP2048 chosen from remote proposals 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;DH=MODP2048;DH=MODP3072;DH=MODP4096;DH=MODP8192;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519[first-match]
Jan  8 07:53:07.324112: "ovn-b486b9-0-in-1" #9: STATE_PARENT_R1: received v2I1, sent v2R1 {auth=IKEv2 cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_256 group=MODP2048}
Jan  8 07:53:07.326603: "ovn-b486b9-0-in-1": local ESP/AH proposals (CREATE_CHILD_SA responder matching remote ESP/AH proposals): 
Jan  8 07:53:07.326619: "ovn-b486b9-0-in-1":   1:ESP=AES_GCM_C_256-NONE-MODP2048-DISABLED
Jan  8 07:53:07.326630: "ovn-b486b9-0-in-1" #2: proposal 1:ESP=AES_GCM_C_256-MODP2048-DISABLED SPI=9f4138fd chosen from remote proposals 1:ESP:ENCR=AES_GCM_C_256;DH=MODP2048;ESN=DISABLED[first-match]
Jan  8 07:53:07.329244: "ovn-b486b9-0-out-1" #10: negotiated new IPsec SA [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.244.89-10.0.244.89:6081-6081 17]
Jan  8 07:53:07.329279: "ovn-b486b9-0-out-1" #10: negotiated connection [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.244.89-10.0.244.89:6081-6081 17]
Jan  8 07:53:07.329290: "ovn-b486b9-0-out-1" #10: STATE_V2_IPSEC_R: IPsec SA established transport mode {ESP=>0x9f4138fd <0x3ceded52 xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}
Jan  8 07:56:24.127228: "ovn-cc3669-0-out-1" #5: proposal 1:ESP=AES_GCM_C_256-MODP2048-DISABLED SPI=de452a66 chosen from remote proposals 1:ESP:ENCR=AES_GCM_C_256;DH=MODP2048;ESN=DISABLED[first-match]
Jan  8 07:56:24.134708: "ovn-cc3669-0-out-1" #11: negotiated new IPsec SA [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.175.161-10.0.175.161:6081-6081 17]
Jan  8 07:56:24.134742: "ovn-cc3669-0-out-1" #11: negotiated connection [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.175.161-10.0.175.161:6081-6081 17]
Jan  8 07:56:24.134753: "ovn-cc3669-0-out-1" #11: STATE_V2_IPSEC_R: IPsec SA established transport mode {ESP=>0xde452a66 <0x9f0a47de xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}
Jan  8 07:56:27.326302: "ovn-b486b9-0-in-1" #9: deleting incomplete state after 200.000 seconds
Jan  8 07:56:27.326335: "ovn-b486b9-0-in-1" #9: deleting state (STATE_PARENT_R1) aged 200.004s and NOT sending notification
Jan  8 07:56:27.326342: "ovn-b486b9-0-in-1" #9: deleting IKE SA but connection is supposed to remain up; schedule EVENT_REVIVE_CONNS
Jan  8 07:56:27.326388: "ovn-b486b9-0-in-1": initiating connection which received a Delete/Notify but must remain up per local policy
Jan  8 07:56:27.328765: "ovn-b486b9-0-in-1" #12: STATE_V2_CREATE_I: sent IPsec Child req wait response
Jan  8 07:56:27.334396: "ovn-b486b9-0-in-1" #12: negotiated connection [10.0.164.113-10.0.164.113:6081-6081 17] -> [10.0.244.89-10.0.244.89:0-65535 17]
Jan  8 07:56:27.334417: "ovn-b486b9-0-in-1" #12: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0xcc7a838d <0x8b3975cb xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}
Jan  8 07:58:03.514851: destroying root certificate cache
Jan  8 07:59:50.461687: forgetting secrets
Jan  8 07:59:50.461752: loading secrets from "/etc/ipsec.secrets"
Jan  8 07:59:50.476991: "ovn-cc3669-0-out-1" #5: received Delete SA payload: replace IPsec State #11 now
Jan  8 07:59:50.477063: "ovn-cc3669-0-out-1" #5: STATE_PARENT_R2: received v2I2, PARENT SA established
Jan  8 07:59:50.477115: "ovn-cc3669-0-out-1" #11: deleting state (STATE_V2_IPSEC_R) aged 206.349s and NOT sending notification
Jan  8 07:59:50.477148: "ovn-cc3669-0-out-1" #11: ESP traffic information: in=0B out=1MB
Jan  8 07:59:50.489168: "ovn-b486b9-0-in-1" #2: received Delete SA payload: replace IPsec State #10 now
Jan  8 07:59:50.489250: "ovn-b486b9-0-in-1" #2: STATE_PARENT_I3: PARENT SA established
Jan  8 07:59:50.489301: "ovn-b486b9-0-in-1": terminating SAs using this connection
Jan  8 07:59:50.489321: "ovn-b486b9-0-in-1": IKE SA is shared - only terminating IPsec SA
Jan  8 07:59:50.489331: "ovn-b486b9-0-in-1" #12: deleting state (STATE_V2_IPSEC_I) aged 203.162s and sending notification
Jan  8 07:59:50.489362: "ovn-b486b9-0-in-1" #12: ESP traffic information: in=1MB out=0B
Jan  8 07:59:50.500285: "ovn-b486b9-0-in-1" #3: deleting state (STATE_V2_IPSEC_I) aged 410.180s and sending notification
Jan  8 07:59:50.500334: "ovn-b486b9-0-in-1" #3: ESP traffic information: in=699KB out=0B
Jan  8 07:59:50.500536: "ovn-b486b9-0-in-1" #2: deleting state (STATE_PARENT_I3) aged 410.191s and sending notification
Jan  8 07:59:50.519353: "ovn-cc3669-0-out-1" #5: received Delete SA payload: delete IPsec State #6 now
Jan  8 07:59:50.519382: "ovn-cc3669-0-out-1" #6: deleting other state #6 (STATE_V2_IPSEC_R) aged 407.004s and NOT sending notification
Jan  8 07:59:50.519413: "ovn-cc3669-0-out-1" #6: ESP traffic information: in=0B out=912KB
Jan  8 07:59:50.519668: "ovn-cc3669-0-out-1" #5: STATE_PARENT_R2: received v2I2, PARENT SA established
Jan  8 07:59:50.519710: "ovn-b486b9-0-out-1" #14: initiating Main Mode

@letoams
Copy link
Member

letoams commented Jan 11, 2021

Do you have the configuration file? And possibly that of the remote end point ?

I don't see anything useful in the logs.

the sadb should not be null obviously.

@markdgray
Copy link
Author

markdgray commented Jan 11, 2021

I don't have the exact configuration at that point unfortunately but it would look something like the following (and would be similar on both ends) - this is from a different cluster:

config setup
    uniqueids=yes

conn %default
    keyingtries=%forever
    type=transport
    auto=route
    ike=aes_gcm256-sha2_256
    esp=aes_gcm256
    ikev2=insist

conn ovn-5ffa94-0-in-1
    left=%defaultroute
    right=10.0.147.215
    leftid=@f559947c-7672-4bf1-9a99-24dda76817b4
    rightid=@5ffa942c-71f9-4fa0-b1fc-5de5bc052386
    leftcert="ovs_certkey_f559947c-7672-4bf1-9a99-24dda76817b4"
    leftrsasigkey=%cert
    rightca=%same
    leftprotoport=udp/6081
    rightprotoport=udp

conn ovn-5ffa94-0-out-1
    left=%defaultroute
    right=10.0.147.215
    leftid=@f559947c-7672-4bf1-9a99-24dda76817b4
    rightid=@5ffa942c-71f9-4fa0-b1fc-5de5bc052386
    leftcert="ovs_certkey_f559947c-7672-4bf1-9a99-24dda76817b4"
    leftrsasigkey=%cert
    rightca=%same
    leftprotoport=udp
    rightprotoport=udp/6081

conn ovn-d811c5-0-in-1
    left=%defaultroute
    right=10.0.218.10
    leftid=@f559947c-7672-4bf1-9a99-24dda76817b4
    rightid=@d811c5fa-35de-4423-9868-8f2c151b6638
    leftcert="ovs_certkey_f559947c-7672-4bf1-9a99-24dda76817b4"
    leftrsasigkey=%cert
    rightca=%same
    leftprotoport=udp/6081
    rightprotoport=udp

conn ovn-d811c5-0-out-1
    left=%defaultroute
    right=10.0.218.10
    leftid=@f559947c-7672-4bf1-9a99-24dda76817b4
    rightid=@d811c5fa-35de-4423-9868-8f2c151b6638
    leftcert="ovs_certkey_f559947c-7672-4bf1-9a99-24dda76817b4"
    leftrsasigkey=%cert
    rightca=%same
    leftprotoport=udp
    rightprotoport=udp/6081

conn ovn-e33d52-0-in-1
    left=%defaultroute
    right=10.0.176.67
    leftid=@f559947c-7672-4bf1-9a99-24dda76817b4
    rightid=@e33d52a7-5d72-446b-ac85-80ef8bed50c2
    leftcert="ovs_certkey_f559947c-7672-4bf1-9a99-24dda76817b4"
    leftrsasigkey=%cert
    rightca=%same
    leftprotoport=udp/6081
    rightprotoport=udp

conn ovn-e33d52-0-out-1
    left=%defaultroute
    right=10.0.176.67
    leftid=@f559947c-7672-4bf1-9a99-24dda76817b4
    rightid=@e33d52a7-5d72-446b-ac85-80ef8bed50c2
    leftcert="ovs_certkey_f559947c-7672-4bf1-9a99-24dda76817b4"
    leftrsasigkey=%cert
    rightca=%same
    leftprotoport=udp
    rightprotoport=udp/6081

conn ovn-64912b-0-in-1
    left=%defaultroute
    right=10.0.201.42
    leftid=@f559947c-7672-4bf1-9a99-24dda76817b4
    rightid=@64912ba0-bb71-49e1-91f3-ecd47428f52a
    leftcert="ovs_certkey_f559947c-7672-4bf1-9a99-24dda76817b4"
    leftrsasigkey=%cert
    rightca=%same
    leftprotoport=udp/6081
    rightprotoport=udp

conn ovn-64912b-0-out-1
    left=%defaultroute
    right=10.0.201.42
    leftid=@f559947c-7672-4bf1-9a99-24dda76817b4
    rightid=@64912ba0-bb71-49e1-91f3-ecd47428f52a
    leftcert="ovs_certkey_f559947c-7672-4bf1-9a99-24dda76817b4"
    leftrsasigkey=%cert
    rightca=%same
    leftprotoport=udp
    rightprotoport=udp/6081

conn ovn-d6de94-0-in-1
    left=%defaultroute
    right=10.0.132.29
    leftid=@f559947c-7672-4bf1-9a99-24dda76817b4
    rightid=@d6de9469-ea5a-411b-b9da-caf5d226e696
    leftcert="ovs_certkey_f559947c-7672-4bf1-9a99-24dda76817b4"
    leftrsasigkey=%cert
    rightca=%same
    leftprotoport=udp/6081
    rightprotoport=udp

conn ovn-d6de94-0-out-1
    left=%defaultroute
    right=10.0.132.29
    leftid=@f559947c-7672-4bf1-9a99-24dda76817b4
    rightid=@d6de9469-ea5a-411b-b9da-caf5d226e696
    leftcert="ovs_certkey_f559947c-7672-4bf1-9a99-24dda76817b4"
    leftrsasigkey=%cert
    rightca=%same
    leftprotoport=udp
    rightprotoport=udp/6081

@letoams
Copy link
Member

letoams commented Jan 11, 2021 via email

@cagney
Copy link
Collaborator

cagney commented Jan 11, 2021

The core dump is in IKEv1's Main Mode proposal code, but the config file seems to have ikev2=insist (the IKE= line won't work with ikev1 either).

@markdgray
Copy link
Author

markdgray commented Jan 11, 2021

@cagney does that mean it is (unexpectedly) entering (for some reason) into ikev1 mode?

@cagney
Copy link
Collaborator

cagney commented Jan 11, 2021

@cagney does that mean it is (unexpectedly) entering (for some reason) into ikev1 mode?

Ah, yes.. The logs would suggest that ovn-cc3669-0-out-1 has a very complicated life, and at some point it flips from IKEv2 to IKEv1.

Jan 8 07:59:50.519668: "ovn-cc3669-0-out-1" #5: STATE_PARENT_R2: received v2I2, PARENT SA established
Jan 8 07:59:50.519710: "ovn-b486b9-0-out-1" #14: initiating Main Mode

@markdgray
Copy link
Author

@cagney what can cause that?

@markdgray
Copy link
Author

On Mon, 11 Jan 2021, Mark Gray wrote: Subject: Re: [libreswan/libreswan] Segmentation fault in libreswan 3.32 (#390) I don't have the exact configuration at that point unfortunately but it would look something like the following (and would be the same on both ends):
Hmm that looks regular enough. I guess we would really need a gdb backtrace, as the logs you shared shows no issue whatsoever.

I will try to do this. It is a little tricky because I cannot reproduce everytime so I will need to run the CI in a loop and figure out a way to get a coredump or backtrace from the CI

@markdgray
Copy link
Author

markdgray commented Jan 14, 2021

@letoams @cagney

Does this help

(gdb) bt
#0  0x000055bd694e56f6 in ikev1_out_sa (outs=outs@entry=0x7ffdf5011250, sadb=0x55bd697fe2e0 <IKEv1_oakley_sadb_table>, st=st@entry=0x55bd6b697138, oakley_mode=oakley_mode@entry=true, aggressive_mode=aggressive_mode@entry=false, 
    np=np@entry=ISAKMP_NEXT_VID) at /usr/src/debug/libreswan-3.32-7.el8_3.x86_64/programs/pluto/ikev1_spdb_struct.c:396
#1  0x000055bd694db2d0 in main_outI1 (whack_sock=<optimized out>, c=0x55bd6b683398, predecessor=0x0, policy=32, try=1, inception=<optimized out>, uctx=0x0) at /usr/src/debug/libreswan-3.32-7.el8_3.x86_64/programs/pluto/ikev1_main.c:170
#2  0x000055bd694d2fb0 in ipsecdoi_initiate (whack_sock=0x0, c=0x55bd6b683398, policy=32, try=try@entry=1, replacing=5, inception=inception@entry=0x7ffdf5011400, uctx=0x0) at /usr/src/debug/libreswan-3.32-7.el8_3.x86_64/programs/pluto/ipsec_doi.c:229
#3  0x000055bd694d31a9 in ipsecdoi_replace (st=st@entry=0x55bd6b68aa88, try=try@entry=1) at /usr/src/debug/libreswan-3.32-7.el8_3.x86_64/programs/pluto/ipsec_doi.c:333
#4  0x000055bd694fc169 in v2_event_sa_replace (st=st@entry=0x55bd6b68aa88) at /usr/src/debug/libreswan-3.32-7.el8_3.x86_64/programs/pluto/ikev2_parent.c:6457
#5  0x000055bd694d017b in timer_event_cb (unused_fd=<optimized out>, unused_event=<optimized out>, arg=<optimized out>) at /usr/src/debug/libreswan-3.32-7.el8_3.x86_64/programs/pluto/timer.c:346
#6  0x00007f59ba1ffd85 in event_process_active_single_queue () from /lib64/libevent-2.1.so.6
#7  0x00007f59ba200787 in event_base_loop () from /lib64/libevent-2.1.so.6
#8  0x000055bd694cd575 in call_server (conffile=0x55bd6b5a8bb8 "/etc/ipsec.conf") at /usr/src/debug/libreswan-3.32-7.el8_3.x86_64/programs/pluto/server.c:1509
#9  0x000055bd6948837d in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/libreswan-3.32-7.el8_3.x86_64/programs/pluto/plutomain.c:1819
(gdb) 
(gdb) list
391			 * always the authentication method.
392			 *
393			 * XXX: Should replace SADB with a simple map to the
394			 * auth method.
395			 */
396			struct db_attr *auth = &sadb->prop_conjs[0].props[0].trans[0].attrs[2];
397			passert(auth->type.oakley == OAKLEY_AUTHENTICATION_METHOD);
398			enum ikev1_auth_method auth_method = auth->val;
399			/*
400			 * Aggr-Mode - Max transforms == 2 - Multiple
(gdb) p sadb
$1 = (const struct db_sa *) 0x55bd697fe2e0 <IKEv1_oakley_sadb_table>
(gdb) p *sadb
$2 = {dynamic = false, parentSA = false, prop_conjs = 0x0, prop_conj_cnt = 0}
(gdb) 

@markdgray
Copy link
Author

This looks useful as well:

(gdb) p *st
$2 = {st_inception = {rt = {tv_sec = 1610656415, tv_usec = 341987}}, st_timing = {approx_seconds = 0, level = 1, last_log = {time = {tv_sec = 0, tv_nsec = 103748874}, level = 0}}, st_serialno = 15, st_clonedfrom = 0, st_ike_pred = 0, st_ipsec_pred = 0, 
  st_xauth = 0x0, st_ike_version = IKEv1, st_establishing_sa = IKE_SA, st_ikev2_anon = false, st_suppress_del_notify = false, st_connection = 0x55bd6b683398, st_whack_sock = 0x0, st_v1_rfrags = 0x0, st_v2_rfrags = 0x0, st_oakley = {
    ta_comp = IPCOMP_NONE, auth = 0, doing_xauth = false, esn_enabled = false, life_seconds = {dt = {tv_sec = 0, tv_usec = 0}}, life_kilobytes = 0, ta_encrypt = 0x0, enckeylen = 0, ta_prf = 0x0, ta_integ = 0x0, ta_dh = 0x0}, st_ah = {present = false, 
    attrs = {transattrs = {ta_comp = IPCOMP_NONE, auth = 0, doing_xauth = false, esn_enabled = false, life_seconds = {dt = {tv_sec = 0, tv_usec = 0}}, life_kilobytes = 0, ta_encrypt = 0x0, enckeylen = 0, ta_prf = 0x0, ta_integ = 0x0, ta_dh = 0x0}, 
      spi = 0, life_seconds = {dt = {tv_sec = 0, tv_usec = 0}}, life_kilobytes = 0, encapsulation = 0}, our_spi = 0, keymat_len = 0, our_keymat = 0x0, peer_keymat = 0x0, our_bytes = 0, peer_bytes = 0, our_lastused = {mt = {tv_sec = 0, tv_usec = 0}}, 
    peer_lastused = {mt = {tv_sec = 0, tv_usec = 0}}, add_time = 0}, st_esp = {present = false, attrs = {transattrs = {ta_comp = IPCOMP_NONE, auth = 0, doing_xauth = false, esn_enabled = false, life_seconds = {dt = {tv_sec = 0, tv_usec = 0}}, 
        life_kilobytes = 0, ta_encrypt = 0x0, enckeylen = 0, ta_prf = 0x0, ta_integ = 0x0, ta_dh = 0x0}, spi = 0, life_seconds = {dt = {tv_sec = 0, tv_usec = 0}}, life_kilobytes = 0, encapsulation = 0}, our_spi = 0, keymat_len = 0, our_keymat = 0x0, 
    peer_keymat = 0x0, our_bytes = 0, peer_bytes = 0, our_lastused = {mt = {tv_sec = 0, tv_usec = 0}}, peer_lastused = {mt = {tv_sec = 0, tv_usec = 0}}, add_time = 0}, st_ipcomp = {present = false, attrs = {transattrs = {ta_comp = IPCOMP_NONE, auth = 0, 
        doing_xauth = false, esn_enabled = false, life_seconds = {dt = {tv_sec = 0, tv_usec = 0}}, life_kilobytes = 0, ta_encrypt = 0x0, enckeylen = 0, ta_prf = 0x0, ta_integ = 0x0, ta_dh = 0x0}, spi = 0, life_seconds = {dt = {tv_sec = 0, tv_usec = 0}}, 
      life_kilobytes = 0, encapsulation = 0}, our_spi = 0, keymat_len = 0, our_keymat = 0x0, peer_keymat = 0x0, our_bytes = 0, peer_bytes = 0, our_lastused = {mt = {tv_sec = 0, tv_usec = 0}}, peer_lastused = {mt = {tv_sec = 0, tv_usec = 0}}, 
    add_time = 0}, st_tunnel_in_spi = 0, st_tunnel_out_spi = 0, st_ref = 0, st_refhim = 0, st_reqid = 0, st_outbound_done = false, st_pfs_group = 0x0, st_hash_negotiated = 0, st_policy = 0, st_remote_endpoint = {version = 4, 
    bytes = "\n\000\065\256", '\000' <repeats 11 times>, hport = 500}, st_interface = 0x55bd6b6660e8, st_mobike_del_src_ip = false, st_mobike_remote_endpoint = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, st_deleted_local_addr = {
    version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, st_mobike_local_endpoint = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, st_mobike_host_nexthop = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, st_v1_msgid = {
    id = 0, reserved = false, phase15 = 0}, st_used_msgids = 0x0, st_rpacket = {ptr = 0x0, len = 0}, st_v1_last_transition = 0x0, st_v1_new_iv = {len = 0, ptr = '\000' <repeats 63 times>}, st_v1_iv = {len = 0, ptr = '\000' <repeats 63 times>}, 
  st_v1_ph1_iv = {len = 0, ptr = '\000' <repeats 63 times>}, st_viable_parent = false, st_accepted_ike_proposal = 0x0, st_accepted_esp_or_ah_proposal = 0x0, st_original_role = 0, st_sa_role = (unknown: 0), st_v2_msgid_wip = {initiator = 0, 
    responder = 0}, st_v2_msgid_windows = {initiator = {sent = 0, recv = 0, pending = 0x0}, responder = {sent = 0, recv = 0, pending = 0x0}}, st_msgid_lastack = 0, st_msgid_nextuse = 0, st_msgid_lastrecv = 0, st_msgid_lastreplied = 0, 
  st_firstpacket_me = {ptr = 0x0, len = 0}, st_firstpacket_him = {ptr = 0x0, len = 0}, ipseckey_dnsr = 0x0, ipseckey_fwd_dnsr = 0x0, st_active_redirect_gw = 0x0, st_seen_cfg_dns = 0x0, st_seen_cfg_domains = 0x0, st_seen_cfg_banner = 0x0, st_ike_spis = {
    initiator = {bytes = "?P\037\206\271", <incomplete sequence \375\255>}, responder = {bytes = "\000\000\000\000\000\000\000"}}, st_ike_rekey_spis = {initiator = {bytes = "\000\000\000\000\000\000\000"}, responder = {
      bytes = "\000\000\000\000\000\000\000"}}, st_gi = {ptr = 0x0, len = 0}, st_ni = {ptr = 0x0, len = 0}, st_gr = {ptr = 0x0, len = 0}, st_nr = {ptr = 0x0, len = 0}, st_dcookie = {ptr = 0x0, len = 0}, st_tpacket = {ptr = 0x0, len = 0}, 
  st_v2_tfrags = 0x0, sec_ctx = 0x0, st_myuserprotoid = 0 '\000', st_myuserport = 0, st_peeruserprotoid = 0 '\000', st_peeruserport = 0, st_peeridentity_protocol = 0 '\000', st_peeridentity_port = 0, st_peer_alt_id = false, st_remote_certs = {
    verified = 0x0, pubkey_db = 0x0}, st_dh_secret = 0x0, st_shared_nss = 0x0, st_peer_pubkey = 0x0, st_state = 0x55bd697fb8e8 <v1_states+72>, st_pstats = {sa_type = IKE_SA, delete_reason = REASON_UNKNOWN}, st_retransmit = {delay = {dt = {tv_sec = 0, 
        tv_usec = 0}}, timeout = {dt = {tv_sec = 0, tv_usec = 0}}, start = {mt = {tv_sec = 0, tv_usec = 0}}, delays = {dt = {tv_sec = 0, tv_usec = 0}}, nr_retransmits = 0, nr_duplicate_replies = 0, limit = 0}, st_try = 1, st_replace_margin = {dt = {
      tv_sec = 0, tv_usec = 0}}, st_replace_by = {mt = {tv_sec = 0, tv_usec = 0}}, st_outbound_count = 0, st_outbound_time = {mt = {tv_sec = 0, tv_usec = 0}}, st_offloaded_task = 0x0, st_v1_offloaded_task_in_background = false, st_suspended_md = 0x0, 
  st_suspended_md_func = 0x0, st_suspended_md_line = 0, st_p1isa = {ptr = 0x0, len = 0}, st_skeyid_nss = 0x0, st_skey_d_nss = 0x0, st_skey_ai_nss = 0x0, st_skey_ar_nss = 0x0, st_skey_ei_nss = 0x0, st_skey_er_nss = 0x0, st_skey_pi_nss = 0x0, 
  st_skey_pr_nss = 0x0, st_skey_initiator_salt = {ptr = 0x0, len = 0}, st_skey_responder_salt = {ptr = 0x0, len = 0}, st_skey_chunk_SK_pi = {ptr = 0x0, len = 0}, st_skey_chunk_SK_pr = {ptr = 0x0, len = 0}, st_ppk_used = false, st_seen_ppk = false, 
  st_no_ppk_auth = {ptr = 0x0, len = 0}, st_sk_d_no_ppk = 0x0, st_sk_pi_no_ppk = 0x0, st_sk_pr_no_ppk = 0x0, st_ts_this = {ts_type = 0 '\000', ipprotoid = 0 '\000', startport = 0, endport = 0, net = {start = {version = 0, 
        bytes = '\000' <repeats 15 times>, hport = 0}, end = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, is_subnet = false}}, st_ts_that = {ts_type = 0 '\000', ipprotoid = 0 '\000', startport = 0, endport = 0, net = {start = {
        version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, end = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, is_subnet = false}}, st_enc_key_nss = 0x0, st_event = 0x0, st_serialno_list_entry = {older = 0x55bd6b68c458, 
    newer = 0x55bd6981ad60 <serialno_list_head>, data = 0x55bd6b697138, info = 0x55bd697df4f0 <serialno_list_info>}, st_hash_entries = {{older = 0x55bd69814780 <serialno_hash_slots+12416>, newer = 0x55bd69814780 <serialno_hash_slots+12416>, 
      data = 0x55bd6b697138, info = 0x55bd697fb020 <state_hashes>}, {older = 0x55bd6b68b338, newer = 0x55bd6980de20 <connection_hash_slots+1408>, data = 0x55bd6b697138, info = 0x55bd697fb058 <state_hashes+56>}, {older = 0x55bd6b68c4b8, 
      newer = 0x55bd69809a40 <reqid_hash_slots>, data = 0x55bd6b697138, info = 0x55bd697fb090 <state_hashes+112>}, {older = 0x55bd69802d20 <ike_spis_hash_slots+4000>, newer = 0x55bd69802d20 <ike_spis_hash_slots+4000>, data = 0x55bd6b697138, 
      info = 0x55bd697fb0c8 <state_hashes+168>}, {older = 0x55bd69807920 <ike_initiator_spi_hash_slots+7488>, newer = 0x55bd69807920 <ike_initiator_spi_hash_slots+7488>, data = 0x55bd6b697138, info = 0x55bd697fb100 <state_hashes+224>}}, 
  hidden_variables = {st_malformed_received = 0, st_malformed_sent = 0, st_xauth_client_done = false, st_xauth_client_attempt = 0, st_modecfg_server_done = false, st_modecfg_vars_set = false, st_got_certrequest = false, st_modecfg_started = false, 
    st_skeyid_calculated = false, st_peer_supports_dpd = false, st_nat_traversal = 0, st_nat_oa = {version = 4, bytes = '\000' <repeats 15 times>, hport = 0}, st_natd = {version = 4, bytes = '\000' <repeats 15 times>, hport = 0}}, 
  st_xauth_username = '\000' <repeats 127 times>, st_xauth_password = {ptr = 0x0, len = 0}, st_last_liveness = {mt = {tv_sec = 0, tv_usec = 0}}, st_pend_liveness = false, st_liveness_event = 0x0, st_rel_whack_event = 0x0, st_send_xauth_event = 0x0, 
  st_addr_change_event = 0x0, st_last_dpd = {mt = {tv_sec = 0, tv_usec = 0}}, st_dpd_seqno = 0, st_dpd_expectseqno = 0, st_dpd_peerseqno = 0, st_dpd_rdupcount = 0, st_dpd_event = 0x0, st_seen_nortel_vid = false, quirks = {xauth_ack_msgid = false, 
    modecfg_pull_mode = false, qnat_traversal_vid = VID_none, xauth_vid = false}, st_xauth_soft = false, st_seen_fragvid = false, st_seen_hashnotify = false, st_seen_fragments = false, st_seen_no_tfc = false, st_seen_use_transport = false, 
  st_seen_use_ipcomp = false, st_seen_mobike = false, st_sent_mobike = false, st_seen_nonats = false, st_seen_initialc = false, st_seen_redirect_sup = false, st_sent_redirect = false, st_redirected_in_auth = false, st_requested_ca = 0x0, 
  st_reply_xchg = 0 '\000', st_peer_wants_null = false}
(gdb) 
(gdb) p *outs
$4 = {container = 0x55bd6982fb60 <reply_stream>, desc = 0x55bd697ea920 <isakmp_hdr_desc>, name = 0x55bd6959ea72 "ISAKMP Message", start = 0x55bd6981fb60 <reply_buffer> "?P\037\206\271", <incomplete sequence \375\255>, 
  cur = 0x55bd6981fb7c <reply_buffer+28> '\372' <repeats 199 times>, <incomplete sequence \372>..., roof = 0x55bd6982fb60 <reply_stream> "", lenfld = 0x55bd6981fb78 <reply_buffer+24> '\372' <repeats 199 times>, <incomplete sequence \372>..., 
  lenfld_desc = 0x55bd697f6100 <isa_fields+224>, next_payload_chain = {loc = 0x0, sd = 0x0, fp = 0x0}, last_substructure = {loc = 0x0, sd = 0x0, fp = 0x0}}
(gdb) 

@cagney
Copy link
Collaborator

cagney commented Jan 15, 2021

So definitely fallen back to IKEv1:

  st_ike_version = IKEv1

could you be able to print *st->st_connection (I'm interested in policy and ike_version)?

At the end is a running commentry on the log. I noticed two things:

  • a revive falls back from IKEv2 to IKEv1 when it shouldn't have
  • a delete for a CHILD SA triggered deleting the IKE SA

4.x's code is significantly different to 3.x for the above:

  • the remaining IKEv2->IKEv1 fallback code was removed
  • the revival and delete code was improved
    would you be able to try a snapshot from:
    https://download.libreswan.org/development/
    or mainline from git (if building from mainline add USE_NSS_KDF=false).

.

Outgoing connection ovn-b486b9-0-in-1 starts establishing an IKE#2 and CHILD#3

 added connection description "ovn-b486b9-0-in-1"
 "ovn-b486b9-0-in-1" #2: initiating IKEv2 IKE SA
 "ovn-b486b9-0-in-1": local IKE proposals (IKE SA initiator selecting KE): 
 "ovn-b486b9-0-in-1":   1:IKE=AES_GCM_C_256-HMAC_SHA2_256-NONE-MODP2048+MODP3072+MODP4096+MODP8192+ECP_256+ECP_384+ECP_521+CURVE25519
 "ovn-b486b9-0-in-1" #2: STATE_PARENT_I1: sent v2I1, expected v2R1
 "ovn-b486b9-0-in-1": local ESP/AH proposals (IKE SA initiator emitting ESP/AH proposals): 
 "ovn-b486b9-0-in-1":   1:ESP=AES_GCM_C_256-NONE-NONE-DISABLED
 "ovn-b486b9-0-in-1" #3: STATE_PARENT_I2: sent v2I2, expected v2R2 {auth=IKEv2 cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_256 group=MODP2048}

Second outgoing connection ovn-b486b9-0-out-1 queued up waiting for ovn-b486b9-0-in-1 to establish:

 added connection description "ovn-b486b9-0-out-1"
 "ovn-b486b9-0-out-1": queuing pending IPsec SA negotiating with 10.0.244.89 IKE#2 "ovn-b486b9-0-in-1"

ovn-b486b9-0-in-1 establishes IKE#2 and CHILD#3

 "ovn-b486b9-0-in-1" #3: certificate verified OK: CN=b486b9ee-22dc-4c40-a21b-db485ac0d234,OU=kind,O=ovnkubernetes,C=US
 "ovn-b486b9-0-in-1" #3: IKEv2 mode peer ID is ID_FQDN: '@b486b9ee-22dc-4c40-a21b-db485ac0d234'
 "ovn-b486b9-0-in-1" #3: Authenticated using RSA with IKEv2_AUTH_HASH_SHA2_512
 "ovn-b486b9-0-in-1" #3: negotiated connection [10.0.164.113-10.0.164.113:6081-6081 17] -> [10.0.244.89-10.0.244.89:0-65535 17]
 "ovn-b486b9-0-in-1" #3: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0xbae81f3f <0x95e0d8ea xfrm=AES_GCM_16_256-NONE NATOA=none NATD=none DPD=passive}

and ovn-b486b9-0-out-1 initiates CHILD#4 under ovn-b486b9-0-in-1 IKE#2:

 "ovn-b486b9-0-out-1": local ESP/AH proposals (ESP/AH initiator emitting proposals): 
 "ovn-b486b9-0-out-1":   1:ESP=AES_GCM_C_256-NONE-MODP2048-DISABLED
 "ovn-b486b9-0-out-1" #4: STATE_V2_CREATE_I: sent IPsec Child req wait response
 "ovn-b486b9-0-out-1" #4: negotiated connection [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.244.89-10.0.244.89:6081-6081 17]
 "ovn-b486b9-0-out-1" #4: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0x394e9e2a <0x4040eba9 xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}

so far so good.

But then there's an incoming IKE#9 for ovn-b486b9-0-out-1 and it starts to establish (if it establishes it will take over - the assumption is the other end feels the need to replace):

 "ovn-b486b9-0-in-1" #9: proposal 1:IKE=AES_GCM_C_256-HMAC_SHA2_256-MODP2048 chosen from remote proposals 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;DH=MODP2048;DH=MODP3072;DH=MODP4096;DH=MODP8192;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519[first-match]
 "ovn-b486b9-0-in-1" #9: STATE_PARENT_R1: received v2I1, sent v2R1 {auth=IKEv2 cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_256 group=MODP2048}

But then the remote end initiates CHILD#10 for ovn-b486b9-0-out-1 under IKE#2 (I suspect it's noticed that IKE#2 is established and usable so starts establishing the child) (should the CHILD#4 it replaced have been deleted, perhaps it is with a timeout?)

 "ovn-b486b9-0-in-1": local ESP/AH proposals (CREATE_CHILD_SA responder matching remote ESP/AH proposals): 
 "ovn-b486b9-0-in-1":   1:ESP=AES_GCM_C_256-NONE-MODP2048-DISABLED
 "ovn-b486b9-0-in-1" #2: proposal 1:ESP=AES_GCM_C_256-MODP2048-DISABLED SPI=9f4138fd chosen from remote proposals 1:ESP:ENCR=AES_GCM_C_256;DH=MODP2048;ESN=DISABLED[first-match]
 "ovn-b486b9-0-out-1" #10: negotiated new IPsec SA [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.244.89-10.0.244.89:6081-6081 17]
 "ovn-b486b9-0-out-1" #10: negotiated connection [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.244.89-10.0.244.89:6081-6081 17]
 "ovn-b486b9-0-out-1" #10: STATE_V2_IPSEC_R: IPsec SA established transport mode {ESP=>0x9f4138fd <0x3ceded52 xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}

time passes (this end times out):

The incomplete IKE#9 times out (presumably the remote end abandoned it). This triggers a "must-remain-up" (I suspect it shouldn't have got that far - it was already up):

 "ovn-b486b9-0-in-1" #9: deleting incomplete state after 200.000 seconds
 "ovn-b486b9-0-in-1" #9: deleting state (STATE_PARENT_R1) aged 200.004s and NOT sending notification
 "ovn-b486b9-0-in-1" #9: deleting IKE SA but connection is supposed to remain up; schedule EVENT_REVIVE_CONNS

which it does, but since IKE#2 is still available it grabs that:

 "ovn-b486b9-0-in-1": initiating connection which received a Delete/Notify but must remain up per local policy
 "ovn-b486b9-0-in-1" #12: STATE_V2_CREATE_I: sent IPsec Child req wait response
 "ovn-b486b9-0-in-1" #12: negotiated connection [10.0.164.113-10.0.164.113:6081-6081 17] -> [10.0.244.89-10.0.244.89:0-65535 17]
 "ovn-b486b9-0-in-1" #12: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0xcc7a838d <0x8b3975cb xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}

time passes (suspect other end timed out)

remote sends a delete for ovn-b486b9-0-out-1 CHILD#10; which triggers a call to event_force(EVENT_SA_REPLACE, st):

 "ovn-b486b9-0-in-1" #2: received Delete SA payload: replace IPsec State #10 now
 "ovn-b486b9-0-in-1" #2: STATE_PARENT_I3: PARENT SA established
 "ovn-b486b9-0-in-1": terminating SAs using this connection
 "ovn-b486b9-0-in-1": IKE SA is shared - only terminating IPsec SA

But then it goes on and wipes out IKE#2 and all other CHILD SAs.

 "ovn-b486b9-0-in-1" #12: deleting state (STATE_V2_IPSEC_I) aged 203.162s and sending notification
 "ovn-b486b9-0-in-1" #12: ESP traffic information: in=1MB out=0B
 "ovn-b486b9-0-in-1" #3: deleting state (STATE_V2_IPSEC_I) aged 410.180s and sending notification
 "ovn-b486b9-0-in-1" #3: ESP traffic information: in=699KB out=0B
 "ovn-b486b9-0-in-1" #2: deleting state (STATE_PARENT_I3) aged 410.191s and sending notification

finally, the attempt to revive ovn-b486b9-0-out-1 has flipped IKE versions:

 "ovn-b486b9-0-out-1" #14: initiating Main Mode

@markdgray
Copy link
Author

Note this was a different run to the original log so the connection name could be different:

(gdb) print *st->st_connection 
$1 = {name = 0x55bd6b679c88 "ovn-3d1e56-0-out-1", ike_version = IKEv2, foodgroup = 0x0, connalias = 0x0, policy = 1412889889997350, 
  sighash_policy = 7, sa_ike_life_seconds = {dt = {tv_sec = 3600, tv_usec = 0}}, sa_ipsec_life_seconds = {dt = {tv_sec = 28800, 
      tv_usec = 0}}, sa_rekey_margin = {dt = {tv_sec = 540, tv_usec = 0}}, sa_rekey_fuzz = 100, sa_keying_tries = 0, sa_priority = 0, 
  sa_tfcpad = 0, send_no_esp_tfc = false, sa_replay_window = 32, sa_marks = {in = {val = 0, mask = 0, unique = false}, out = {val = 0, 
      mask = 0, unique = false}}, vti_iface = 0x0, vti_routing = false, vti_shared = false, xfrmi = 0x0, r_interval = {dt = {tv_sec = 0, 
      tv_usec = 500000}}, r_timeout = {dt = {tv_sec = 60, tv_usec = 0}}, sa_reqid = 0, encapsulation = 2, nic_offload = yna_auto, 
  dpd_delay = {dt = {tv_sec = 0, tv_usec = 0}}, dpd_timeout = {dt = {tv_sec = 0, tv_usec = 0}}, dpd_action = DPD_ACTION_HOLD, 
  nat_keepalive = true, initial_contact = false, cisco_unity = false, fake_strongswan = false, mobike = false, send_vendorid = false, 
  ikev1_natt = NATT_BOTH, encaps = yna_auto, nmconfigured = false, policy_label = 0x0, remotepeertype = NON_CISCO, xauthby = XAUTHBY_FILE, 
  xauthfail = XAUTHFAIL_HARD, log_file_name = 0x0, log_file = 0x0, log_link = {cqe_next = 0x0, cqe_prev = 0x0}, log_file_err = false, 
  spd = {spd_next = 0x0, this = {id = {kind = ID_FQDN, ip_addr = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, name = {
          ptr = 0x55bd6b673d38 "531330f1-b6a0-4663-ac62-8982f187b82f", len = 36}}, host_type = KH_IPADDR, host_addr_name = 0x0, 
      host_addr = {version = 4, bytes = "\n\000\060?", '\000' <repeats 11 times>, hport = 0}, host_nexthop = {version = 4, 
        bytes = "\n\000\060\001", '\000' <repeats 11 times>, hport = 0}, host_srcip = {version = 0, bytes = '\000' <repeats 15 times>, 
        hport = 0}, client = {addr = {version = 4, bytes = "\n\000\060?", '\000' <repeats 11 times>, hport = 0}, maskbits = 32}, 
      host_vtiip = {addr = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, maskbits = 0}, ifaceip = {addr = {version = 0, 
          bytes = '\000' <repeats 15 times>, hport = 0}, maskbits = 0}, key_from_DNS_on_demand = false, has_client = false, 
      has_client_wildcard = false, has_port_wildcard = false, has_id_wildcards = false, updown = 0x55bd6b670258 "ipsec _updown", 
      host_port = 500, port = 0, protocol = 17 '\021', sendcert = CERT_ALWAYSSEND, cert = {ty = CERT_X509_SIGNATURE, u = {
          nss_cert = 0x55bd6b667d10}}, ca = {
        ptr = 0x55bd6b681558 "081604\006\003U\004\003\f-openshift-ovn-kubernetes_signer-ca@1610655483", len = 58}, virt = 0x0, 
      authby = AUTH_RSASIG, xauth_server = false, xauth_client = false, xauth_username = 0x0, xauth_password = 0x0, pool_range = {start = {
          version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, end = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, 
        is_subnet = false}, has_lease = false, has_internal_address = false, modecfg_server = false, modecfg_client = false, cat = false, 
      has_cat = false}, that = {id = {kind = ID_FQDN, ip_addr = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, name = {
          ptr = 0x55bd6b6810b8 "3d1e5609-2bdd-48dc-8cbd-3a61700aff5d", len = 36}}, host_type = KH_IPADDR, 
      host_addr_name = 0x55bd6b681c98 "10.0.53.174", host_addr = {version = 4, bytes = "\n\000\065\256", '\000' <repeats 11 times>, 
        hport = 6081}, host_nexthop = {version = 4, bytes = "\n\000\060?", '\000' <repeats 11 times>, hport = 0}, host_srcip = {
        version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, client = {addr = {version = 4, 
          bytes = "\n\000\065\256", '\000' <repeats 11 times>, hport = 6081}, maskbits = 32}, host_vtiip = {addr = {version = 0, 
          bytes = '\000' <repeats 15 times>, hport = 0}, maskbits = 0}, ifaceip = {addr = {version = 0, bytes = '\000' <repeats 15 times>, 
          hport = 0}, maskbits = 0}, key_from_DNS_on_demand = false, has_client = false, has_client_wildcard = false, 
      has_port_wildcard = false, has_id_wildcards = false, updown = 0x55bd6b674088 "ipsec _updown", host_port = 500, port = 6081, 
      protocol = 17 '\021', sendcert = CERT_ALWAYSSEND, cert = {ty = CERT_NONE, u = {nss_cert = 0x0}}, ca = {
        ptr = 0x55bd6b67a498 "081604\006\003U\004\003\f-openshift-ovn-kubernetes_signer-ca@1610655483", len = 58}, virt = 0x0, 
      authby = AUTH_RSASIG, xauth_server = false, xauth_client = false, xauth_username = 0x0, xauth_password = 0x0, pool_range = {start = {
          version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, end = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, 
        is_subnet = false}, has_lease = false, has_internal_address = false, modecfg_server = false, modecfg_client = false, cat = false, 
      has_cat = false}, eroute_owner = 5, routing = RT_ROUTED_TUNNEL, reqid = 16400}, instance_serial = 0, prio = 4202497, 
  instance_initiation_ok = false, kind = CK_PERMANENT, interface = 0x55bd6b6660e8, temp_vars = {revive_delay = 0, num_redirects = 0, 
    first_redirect_time = {rt = {tv_sec = 0, tv_usec = 0}}, redirect_ip = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}, 
--Type <RET> for more, q to quit, c to continue without paging--c
    old_gw_address = {version = 0, bytes = '\000' <repeats 15 times>, hport = 0}}, newest_isakmp_sa = 0, newest_ipsec_sa = 5, extra_debugging = {set = 0, clr = 0}, extra_impairing = {set = 0, clr = 0}, policy_next = 0x0, ike_proposals = {p = 0x55bd6b67a578}, child_proposals = {p = 0x55bd6b683978}, v2_ike_proposals = 0x0, v2_ike_auth_child_proposals = 0x55bd6b619168, v2_create_child_proposals = 0x0, v2_create_child_proposals_default_dh = 0x0, host_pair = 0x55bd6b682158, hp_next = 0x0, ac_next = 0x55bd6b66e8d8, send_ca = CA_SEND_NONE, dnshostname = 0x0, pool = 0x0, modecfg_dns = 0x0, modecfg_domains = 0x0, modecfg_banner = 0x0, metric = 0, connmtu = 0, statsval = 0, nflog_group = 0, ike_window = 1, redirect_to = 0x0, accept_redirect_to = 0x0}

@markdgray
Copy link
Author

Thanks @cagney

could you be able to print *st->st_connection (I'm interested in policy and ike_version)?

I did this in a separate comment above

At the end is a running commentry on the log. I noticed two things:

I have added my additional commentary.

* a revive falls back from IKEv2 to IKEv1 when it shouldn't have

* a delete for a CHILD SA triggered deleting the IKE SA

4.x's code is significantly different to 3.x for the above:

* the remaining IKEv2->IKEv1 fallback code was removed

* the revival and delete code was improved
  would you be able to try a snapshot from:
  https://download.libreswan.org/development/
  or mainline from git (if building from mainline add USE_NSS_KDF=false).

Really difficult to do this because it is running in an OpenShift cluster and is failing intermittently. I have no way to reproduce this other than waiting so I don't think I can easily upgrade libreswan and then get it to reproduce to test it.

.

Outgoing connection ovn-b486b9-0-in-1 starts establishing an IKE#2 and CHILD#3

 added connection description "ovn-b486b9-0-in-1"
 "ovn-b486b9-0-in-1" #2: initiating IKEv2 IKE SA
 "ovn-b486b9-0-in-1": local IKE proposals (IKE SA initiator selecting KE): 
 "ovn-b486b9-0-in-1":   1:IKE=AES_GCM_C_256-HMAC_SHA2_256-NONE-MODP2048+MODP3072+MODP4096+MODP8192+ECP_256+ECP_384+ECP_521+CURVE25519
 "ovn-b486b9-0-in-1" #2: STATE_PARENT_I1: sent v2I1, expected v2R1
 "ovn-b486b9-0-in-1": local ESP/AH proposals (IKE SA initiator emitting ESP/AH proposals): 
 "ovn-b486b9-0-in-1":   1:ESP=AES_GCM_C_256-NONE-NONE-DISABLED
 "ovn-b486b9-0-in-1" #3: STATE_PARENT_I2: sent v2I2, expected v2R2 {auth=IKEv2 cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_256 group=MODP2048}

Second outgoing connection ovn-b486b9-0-out-1 queued up waiting for ovn-b486b9-0-in-1 to establish:

Each of these pairs represents and connection between this node in the cluster and another node in the cluster.

 added connection description "ovn-b486b9-0-out-1"
 "ovn-b486b9-0-out-1": queuing pending IPsec SA negotiating with 10.0.244.89 IKE#2 "ovn-b486b9-0-in-1"

ovn-b486b9-0-in-1 establishes IKE#2 and CHILD#3

 "ovn-b486b9-0-in-1" #3: certificate verified OK: CN=b486b9ee-22dc-4c40-a21b-db485ac0d234,OU=kind,O=ovnkubernetes,C=US
 "ovn-b486b9-0-in-1" #3: IKEv2 mode peer ID is ID_FQDN: '@b486b9ee-22dc-4c40-a21b-db485ac0d234'
 "ovn-b486b9-0-in-1" #3: Authenticated using RSA with IKEv2_AUTH_HASH_SHA2_512
 "ovn-b486b9-0-in-1" #3: negotiated connection [10.0.164.113-10.0.164.113:6081-6081 17] -> [10.0.244.89-10.0.244.89:0-65535 17]
 "ovn-b486b9-0-in-1" #3: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0xbae81f3f <0x95e0d8ea xfrm=AES_GCM_16_256-NONE NATOA=none NATD=none DPD=passive}

and ovn-b486b9-0-out-1 initiates CHILD#4 under ovn-b486b9-0-in-1 IKE#2:

 "ovn-b486b9-0-out-1": local ESP/AH proposals (ESP/AH initiator emitting proposals): 
 "ovn-b486b9-0-out-1":   1:ESP=AES_GCM_C_256-NONE-MODP2048-DISABLED
 "ovn-b486b9-0-out-1" #4: STATE_V2_CREATE_I: sent IPsec Child req wait response
 "ovn-b486b9-0-out-1" #4: negotiated connection [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.244.89-10.0.244.89:6081-6081 17]
 "ovn-b486b9-0-out-1" #4: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0x394e9e2a <0x4040eba9 xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}

so far so good.

But then there's an incoming IKE#9 for ovn-b486b9-0-out-1 and it starts to establish (if it establishes it will take over - the assumption is the other end feels the need to replace):

As the configuration on each node is the same, the remote node is trying to establish the same connection with this node. This happens at cluster install so all the nodes come up at about the same time.

 "ovn-b486b9-0-in-1" #9: proposal 1:IKE=AES_GCM_C_256-HMAC_SHA2_256-MODP2048 chosen from remote proposals 1:IKE:ENCR=AES_GCM_C_256;PRF=HMAC_SHA2_256;DH=MODP2048;DH=MODP3072;DH=MODP4096;DH=MODP8192;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519[first-match]
 "ovn-b486b9-0-in-1" #9: STATE_PARENT_R1: received v2I1, sent v2R1 {auth=IKEv2 cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_256 group=MODP2048}

But then the remote end initiates CHILD#10 for ovn-b486b9-0-out-1 under IKE#2 (I suspect it's noticed that IKE#2 is established and usable so starts establishing the child) (should the CHILD#4 it replaced have been deleted, perhaps it is with a timeout?)

 "ovn-b486b9-0-in-1": local ESP/AH proposals (CREATE_CHILD_SA responder matching remote ESP/AH proposals): 
 "ovn-b486b9-0-in-1":   1:ESP=AES_GCM_C_256-NONE-MODP2048-DISABLED
 "ovn-b486b9-0-in-1" #2: proposal 1:ESP=AES_GCM_C_256-MODP2048-DISABLED SPI=9f4138fd chosen from remote proposals 1:ESP:ENCR=AES_GCM_C_256;DH=MODP2048;ESN=DISABLED[first-match]
 "ovn-b486b9-0-out-1" #10: negotiated new IPsec SA [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.244.89-10.0.244.89:6081-6081 17]
 "ovn-b486b9-0-out-1" #10: negotiated connection [10.0.164.113-10.0.164.113:0-65535 17] -> [10.0.244.89-10.0.244.89:6081-6081 17]
 "ovn-b486b9-0-out-1" #10: STATE_V2_IPSEC_R: IPsec SA established transport mode {ESP=>0x9f4138fd <0x3ceded52 xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}

time passes (this end times out):

The incomplete IKE#9 times out (presumably the remote end abandoned it). This triggers a "must-remain-up" (I suspect it shouldn't have got that far - it was already up):

 "ovn-b486b9-0-in-1" #9: deleting incomplete state after 200.000 seconds
 "ovn-b486b9-0-in-1" #9: deleting state (STATE_PARENT_R1) aged 200.004s and NOT sending notification
 "ovn-b486b9-0-in-1" #9: deleting IKE SA but connection is supposed to remain up; schedule EVENT_REVIVE_CONNS

which it does, but since IKE#2 is still available it grabs that:

 "ovn-b486b9-0-in-1": initiating connection which received a Delete/Notify but must remain up per local policy
 "ovn-b486b9-0-in-1" #12: STATE_V2_CREATE_I: sent IPsec Child req wait response
 "ovn-b486b9-0-in-1" #12: negotiated connection [10.0.164.113-10.0.164.113:6081-6081 17] -> [10.0.244.89-10.0.244.89:0-65535 17]
 "ovn-b486b9-0-in-1" #12: STATE_V2_IPSEC_I: IPsec SA established transport mode {ESP=>0xcc7a838d <0x8b3975cb xfrm=AES_GCM_16_256-NONE-MODP2048 NATOA=none NATD=none DPD=passive}

time passes (suspect other end timed out)

remote sends a delete for ovn-b486b9-0-out-1 CHILD#10; which triggers a call to event_force(EVENT_SA_REPLACE, st):

 "ovn-b486b9-0-in-1" #2: received Delete SA payload: replace IPsec State #10 now
 "ovn-b486b9-0-in-1" #2: STATE_PARENT_I3: PARENT SA established
 "ovn-b486b9-0-in-1": terminating SAs using this connection
 "ovn-b486b9-0-in-1": IKE SA is shared - only terminating IPsec SA

But then it goes on and wipes out IKE#2 and all other CHILD SAs.

Should it do that if it is shared?

 "ovn-b486b9-0-in-1" #12: deleting state (STATE_V2_IPSEC_I) aged 203.162s and sending notification
 "ovn-b486b9-0-in-1" #12: ESP traffic information: in=1MB out=0B
 "ovn-b486b9-0-in-1" #3: deleting state (STATE_V2_IPSEC_I) aged 410.180s and sending notification
 "ovn-b486b9-0-in-1" #3: ESP traffic information: in=699KB out=0B
 "ovn-b486b9-0-in-1" #2: deleting state (STATE_PARENT_I3) aged 410.191s and sending notification

finally, the attempt to revive ovn-b486b9-0-out-1 has flipped IKE versions:

 "ovn-b486b9-0-out-1" #14: initiating Main Mode

One other thing to note is that we found a bug in our code that, while the cluster is starting, ipsec connections are deleted ipsec auto --delete and then immediately started again ipsec auto --start as nodes are added. So, for example, if you were the last node up (node X), this would not happen to you but some other nodes may have some connections deleted/started in response to the addition of node X. In the log above, I suspect this is one of the first nodes up and it could be that this connection is deleted at some stage (or maybe the remote one). We have fixed that bug now so I am waiting to see if this segmentation fault happens again. However, I would like to understand if that could cause this to see if they are related.

@markdgray
Copy link
Author

markdgray commented Jan 20, 2021

This still appears to be happening - even with the previously mentioned "fix". Do you have any idea what could cause this segfault? @cagney

@cagney
Copy link
Collaborator

cagney commented Jan 25, 2021

This still appears to be happening - even with the previously mentioned "fix". Do you have any idea what could cause this segfault? @cagney

Were you testing mainline from git?

v3.32's pick_initiator() has the check:
if (policy & c->policy & POLICY_IKEV2_ALLOW) {
} else {
it should be just:
if (c->ike_version == IKEv2) {

in current sources that code's been replaced by the more robust:
if (policy & c->policy & POLICY_IKEV2_ALLOW) {
if (policy & c->policy & POLICY_IKEV1_ALLOW) {

@markdgray
Copy link
Author

markdgray commented Jan 25, 2021

This still appears to be happening - even with the previously mentioned "fix". Do you have any idea what could cause this segfault? @cagney

Were you testing mainline from git?

v3.32's pick_initiator() has the check:
if (policy & c->policy & POLICY_IKEV2_ALLOW) {
} else {
it should be just:
if (c->ike_version == IKEv2) {

in current sources that code's been replaced by the more robust:
if (policy & c->policy & POLICY_IKEV2_ALLOW) {
if (policy & c->policy & POLICY_IKEV1_ALLOW) {

Thanks @cagney, I am using the RHEL 3.32-6 package. I had a quick look at the 3.32-7 code and I see the following in ipsec_doi.c

                                          lset_t policy)
{
        if (policy & c->policy & POLICY_IKEV2_ALLOW) {
                return ikev2_parent_outI1;
        } else {
                /* we may try V1; Aggressive or Main Mode? */
                return (policy & POLICY_AGGRESSIVE) ? aggr_outI1 : main_outI1;
        }

What would cause the else clause to trigger? What would help me at the moment would be a reliable way to reproduce or (prevent this problem). Is there a race condition?

@letoams
Copy link
Member

letoams commented Feb 2, 2021

the policy can either contain IKEV1_ALLOW or IKEV2_ALLOW, but not both. But the check fails to take into consideration that policy could have neither set. Then it falls back to ikev1 mistakenly. Of course, the code should never be called with a policy that contains neither - we are expecting all connections to have one of these set.
I'm not sure what could trigger this though. If it is triggered bya quick succession of delete + add, perhaps the fact that we are scheduling a delete for 0s in the future, which we assume would be the next event, ends up being the next-next event ?

@letoams
Copy link
Member

letoams commented Feb 22, 2021

this should be fixed in libreswan 4.3

@letoams
Copy link
Member

letoams commented Mar 31, 2021

please re-open if this is still an issue, but we believe it has been addressed successfully.

@letoams letoams closed this as completed Mar 31, 2021
@markdgray
Copy link
Author

please re-open if this is still an issue, but we believe it has been addressed successfully.

Thanks, when this gets integrated into OpenShift, I will have a better idea if it is resolved. Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants