LDAP request failed timed out #788

nunohorta · 2015-10-21T13:44:12Z

I have ejabberd 15.07 running with LDAP authentication. This works fine for 95% of the times. Randomly a user is unable to authenticate. Looking at ejabberd.log the following log is displayed moments after it communicates with the server.

2015-10-21 14:20:47.227 [error] <0.7412.0>@eldap_pool:do_request:75 LDAP request failed: timed out 2015-10-21 14:20:47.227 [info] <0.7412.0>@ejabberd_c2s:wait_for_feature_request:756 ({socket_state,p1_tls,{tlssock,#Port<0.28105>,#Port<0.28107>},<0.7411.0>}) Accepted authentication for johndoe by ejabberd_auth_ldap from 31.112.96.168 2015-10-21 14:20:47.227 [debug] <0.7412.0>@ejabberd_c2s:send_text:1901 Send XML on stream = <<"<success xmlns='urn:ietf:params:xml:ns:xmpp-sasl'/>">> 2015-10-21 14:20:47.227 [debug] <0.7412.0>@ejabberd_socket:send:187 Error in p1_tls:send: {error,einval}

If I try again it will happen the same, over and over, while other users can log in just fine and the chat keeps working correctly. Waiting for longer periods, let's say 10 ou 20 minutes, or restart the server, will allow the user to log in again.
Is there anything that I should change on my side to fix this issue?

The text was updated successfully, but these errors were encountered:

zinid · 2015-10-21T13:54:04Z

2015-10-21 14:20:47.227 [error] <0.7412.0>@eldap_pool:do_request:75 LDAP request failed: timed out

Did you check LDAP logs? Did you investigate the disconnections via tcpick or something? There is not enougth information for us to give any advice.

nunohorta · 2015-10-21T14:05:41Z

I'll investigate those logs as soon as possible!

nunohorta · 2015-10-25T13:14:00Z

sorry but I can´t find any relevant LDAP logs. I'll do a better investigation next time this occurs. It can be closed for now, thanks

eLvErDe · 2015-11-11T08:28:50Z

Hello,

I can confirm this behavior, especially under heavy load. The LDAP server is running on a different machine and there's a Cisco ASA between them.
I highly suspect ejabberd to have a permanent connection to LDAP, Cisco ASA reset it after a while, ejabberd is unable to detect the broken link.

I found this blog post saying nearly the same:
http://start.nwt.fhstp.ac.at/blog/?p=307

eLvErDe · 2015-11-11T08:31:45Z

Another one: http://lists.jabber.ru/pipermail/ejabberd/2006-September/002167.html

nunohorta · 2015-11-11T09:09:47Z

Yeah I have the same scenario where LDAP is running on a different server. It's not happening since I created the issue, but it did happen 3 ou 4 times before

eLvErDe · 2015-11-11T09:31:13Z

According to ejabberd code, I think it now requests for TCP keepalive when establishing LDAP connection:
https://github.com/processone/ejabberd/blob/master/src/eldap.erl#L1064

However, we may have a network setup issue here: Default Linux keepalive is 7200s (2 hours) and there's a Cisco ASA firewall between the ejabberd server and the LDAP server which seems to reset the connection after 1 hour of inactivity.

I chanded the server's keepalive parameters:

root@server:~# cat /etc/sysctl.d/98-local-keepalived.conf 
### Cisco ASA closes connection after 1hours inactivity
###
### ejabberd triggers keepalive but never closes and restarts LDAP
### connections pool
###
### Workaround by configuring keepalive shorter than Cisco timeout
# Start keepalive after 30min since last real packet
net.ipv4.tcp_keepalive_time = 1800
# Send keepalive packet every 1 minute
net.ipv4.tcp_keepalive_intvl    = 60
# Consider the connection being dead after 5 tries (5x1min)
net.ipv4.tcp_keepalive_probes   = 5

And I'm quite confident it'll fix the issue.

However, I don't think ejabberd behavior is sane. First I think it shouldn't expect for permanent connection. It should have a maximum lifetime setting and should handle the LDAP connection pool by closing too old connections and opening new ones. Second, it should be able to detect dead connection without the help of keepalive mechanism. What's the point requesting again and again and established connection that returns nothing ? Just close it, reopen and give another try.

I'm not a dev, so I may be saying bullshit.

nunohorta · 2015-11-13T11:35:51Z

it's happening to me again...I'll try your solution to see if it fixes.

eLvErDe · 2015-11-13T12:13:31Z

Problem looks fixed for us with my workaround

eLvErDe · 2015-12-01T11:10:38Z

uh?
Closed? Why?

There's obviously something wrong in ejabberd behavior. It cannot expect lifetime connection with no ability to recover from a failure imho.

Regards, Adam.

nunohorta · 2015-12-02T11:15:58Z

Yeah it's still happening here(running 15.10). I have installed tcpick but I can't seem to get it working properly. If you guys could help me with tcpick so I can catch the logs that would be great. Every time this happens I have to restart the server and I simply cannot change what @eLvErDe has changed on this server. Update: I did the same changes as @eLvErDe but it keeps timing out.

k-poole · 2016-07-19T01:20:47Z

Yep same issue here. No Cisco in between though, just one VM running CentOS 7.2 x64 with eJabberd 14.07 from http://files.lucidsolutions.co.nz/linux/centos/ejabberd-el7/ejabberd-14.07-6.el7.centos.x86_64.rpm talking to another VM running MS SBS 2011 AD DS.

btlogy · 2017-11-27T14:58:15Z

It looks like I see this problem too, whole the way to 17.08
And as @eLvErDe pointed out: I do believe a client (LDAP or else) should never assume endless tcp session life time (especially when using a pool - see src/eldap_pool.erl).
From what I've collected, it is recommended to implement a periodical no-op or send an abandon messageId=0 to implement a keep-alive.
Unfortunately, my experience with Erlang does not allow me to propose a PR to implement this: :-(

zavinator · 2018-03-09T09:58:59Z

Any news on this? We've this problem several times a day :(
Centos 7, ejabberd 18.01

Bundesferkel · 2018-07-20T09:38:34Z

I request honestly a solution to this too...

Nearly all users experience an "invalid user / password", because the connection to both (two domaincontrollers) were interrupted. After a few retries (saved login-data) they're able to login (new connection from ejabberd to ldap).

Server: Debian
LDAP: Windows AD (2012 r2)
Clients: Gajim (Windows), Conversations (Android)

stefanotoro · 2018-07-27T13:54:00Z

Same problem for us. LDAP timeout even when low trafic.
Ubuntu 16.04.5 LTS
ejabberd 16.01-2
Active Directory 2012 R2
pfSense 2.3.4-p1
OS X clients with Apple Messages

photoninger · 2018-12-17T15:37:05Z

Today one of the three LDAP servers was not reachable, because it was upgraded. I assumed ejabberd would failover to the other two, but no, I get [error] <0.581.0>@eldap:connect_bind:1078 LDAP connection to ldap.server:636 failed: connection refused and (some) users cannot login.
What is the point in having a list of LDAP servers if ejabberd does no failover if one of them is unavailable?
ejabberd 18.06-1
debian stretch

Neustradamus · 2021-07-04T23:37:04Z

Any news about it?

Neustradamus · 2023-05-14T13:16:38Z

@rkujawa, @eLvErDe, @photoninger, @zavinator, @btlogy, @k-poole, @stefanotoro, @Bundesferkel: What do you think with current version?

mremond closed this as completed Dec 1, 2015

mremond added the Component:LDAP label Dec 1, 2015

mremond reopened this Dec 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LDAP request failed timed out #788

LDAP request failed timed out #788

nunohorta commented Oct 21, 2015

zinid commented Oct 21, 2015

nunohorta commented Oct 21, 2015

nunohorta commented Oct 25, 2015

eLvErDe commented Nov 11, 2015

eLvErDe commented Nov 11, 2015

nunohorta commented Nov 11, 2015

eLvErDe commented Nov 11, 2015

nunohorta commented Nov 13, 2015

eLvErDe commented Nov 13, 2015

eLvErDe commented Dec 1, 2015

nunohorta commented Dec 2, 2015

k-poole commented Jul 19, 2016

btlogy commented Nov 27, 2017

zavinator commented Mar 9, 2018

Bundesferkel commented Jul 20, 2018

stefanotoro commented Jul 27, 2018

photoninger commented Dec 17, 2018

Neustradamus commented Jul 4, 2021

Neustradamus commented May 14, 2023

LDAP request failed timed out #788

LDAP request failed timed out #788

Comments

nunohorta commented Oct 21, 2015

zinid commented Oct 21, 2015

nunohorta commented Oct 21, 2015

nunohorta commented Oct 25, 2015

eLvErDe commented Nov 11, 2015

eLvErDe commented Nov 11, 2015

nunohorta commented Nov 11, 2015

eLvErDe commented Nov 11, 2015

nunohorta commented Nov 13, 2015

eLvErDe commented Nov 13, 2015

eLvErDe commented Dec 1, 2015

nunohorta commented Dec 2, 2015

k-poole commented Jul 19, 2016

btlogy commented Nov 27, 2017

zavinator commented Mar 9, 2018

Bundesferkel commented Jul 20, 2018

stefanotoro commented Jul 27, 2018

photoninger commented Dec 17, 2018

Neustradamus commented Jul 4, 2021

Neustradamus commented May 14, 2023