Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SERVER_DOWN exception raised during batch operations #516

Open
azmeuk opened this issue Mar 9, 2023 · 7 comments
Open

SERVER_DOWN exception raised during batch operations #516

azmeuk opened this issue Mar 9, 2023 · 7 comments

Comments

@azmeuk
Copy link
Contributor

azmeuk commented Mar 9, 2023

Operating system: Archlinux
Python version: 3.10.9
python-ldap version: 3.4.3
ldap server: slapd 2.6.4

When I try to perform a large number (several thousands) of async operations (groups of search, add, modify), there is a point where a SERVER_DOWN exception is raised.

msgids = [conn.add(dn, modlist) for dn, modlist in items]
for msgid in msgids:
    conn.result(msgid)

The exception is randomly raised by add or result, but if the list is, say 5000 items long, it never gets to the end. It feels like I open too much connections to the ldap server and at a moment I reach some kind of limit.

Is it due to:

  • a misusage of python-ldap?
  • a misconfiguration from slap?
  • a misconfiguration from my system? I have tried to increase the file descriptor limit, but this had no effect.

Thank you for your help

@droideck
Copy link
Contributor

We certainly need more details.
What is the whole python-ldap script that you wrote?
What is your Slapd Server Configuration?

Generally though, I suggest asking on the OpenLDAP mailing list - openldap-technical@openldap.org - as it's probably the configuration issue.

@azmeuk
Copy link
Contributor Author

azmeuk commented May 30, 2023

I am not certain if the bug I observe is due to a misconfiguration of OpenLDAP or a misusage of python-ldap, this is the point of the issue.

I will try to share an example project that reproduces the issue soon.

@quanah
Copy link
Contributor

quanah commented May 31, 2023

Why are you using async? Also, performance wise, it is best to use a single connection and do the modifications serially. You will slow the system down by trying to do many modifications concurrently. I.e., your code is far from optimal as described.

@azmeuk
Copy link
Contributor Author

azmeuk commented Aug 13, 2023

Sorry to have taken so long to answer.

@quanah

Why are you using async?

Naively I thought that would help performances.

Also, performance wise, it is best to use a single connection and do the modifications serially. You will slow the system down by trying to do many modifications concurrently. I.e., your code is far from optimal as described.

Out of curiosity, why is that?

I can still reproduce the error with async methods. Here is a MRE for the bug I describe in OP. I use python 3.11.3 and python-ldap 3.4.3. You will need python-slapd to run the snippet (or adapt it to another LDAP server):

>>> import ldap
>>> import slapd
>>> NB_ITEMS = 5000
>>> server = slapd.Slapd(
>>>     schemas=[
>>>         "core.ldif",
>>>         "cosine.ldif",
>>>         "inetorgperson.ldif",
>>>     ],
>>> )
>>> server.start()
>>> server.init_tree()
>>> conn = ldap.initialize(server.ldap_uri)
>>> conn.simple_bind_s(server.root_dn, server.root_pw)
>>> for i in range(NB_ITEMS):
...    conn.add(
...        f"cn=cn{i},{server.suffix}",
...        [
...            ("objectClass", [b"inetOrgPerson"]),
...            ("cn", [f"cn{i}".encode()]),
...            ("sn", [b"foo"]),
...        ],
...    )
Traceback (most recent call last):
  File ".../test.py", line 16, in <module>
    conn.add(
  File "/.../lib/python3.11/site-packages/ldap/ldapobject.py", line 233, in add
    return self.add_ext(dn,modlist,None,None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/ldap/ldapobject.py", line 218, in add_ext
    return self._ldap_call(self._l.add_ext,dn,modlist,RequestControlTuples(serverctrls),RequestControlTuples(clientctrls))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../lib/python3.11/site-packages/ldap/ldapobject.py", line 128, in _ldap_call
    result = func(*args,**kwargs)
             ^^^^^^^^^^^^^^^^^^^^
ldap.SERVER_DOWN: {'result': -1, 'desc': "Can't contact LDAP server", 'errno': 32, 'ctrls': [], 'info': 'Broken pipe'}

With 5000 items I get the error at all times, but decreasing the number of items to ~1000 won't crash.

@mistotebe
Copy link
Contributor

I suspect the server is getting frustrated because you're not reading the response side of the stream: you should call result(msgid=ldap.RES_ANY, all=1) every so often to drain the responses or switch your add to add_s.

Check your server logs, especially if the above doesn't help.

@azmeuk
Copy link
Contributor Author

azmeuk commented Aug 14, 2023

For the record, here is the last logs from the server:

64da06df.1baf9ee9 0x7ff80f9fe6c0 connection_input: conn=1002 deferring operation: pending operations
64da06df.1bafb33c 0x7ff80f9fe6c0 ber_get_next
64da06df.1bafbffd 0x7ff80f9fe6c0 ldap_read: want=8, got=8
64da06df.1bafc78f 0x7ff80f9fe6c0   0000:  30 74 02 02 03 f1 68 6e                            0t....hn
64da06df.1bafd3ca 0x7ff80f9fe6c0 ldap_read: want=110, got=110
64da06df.1bafdb58 0x7ff80f9fe6c0   0000:  04 2d 63 6e 3d 63 6e 31  30 30 37 2c 64 63 3d 73   .-cn=cn1007,dc=s
64da06df.1bafe2d2 0x7ff80f9fe6c0   0010:  6c 61 70 64 2d 74 65 73  74 2c 64 63 3d 70 79 74   lapd-test,dc=pyt
64da06df.1bafea4f 0x7ff80f9fe6c0   0020:  68 6f 6e 2d 6c 64 61 70  2c 64 63 3d 6f 72 67 30   hon-ldap,dc=org0
64da06df.1baff1bd 0x7ff80f9fe6c0   0030:  3d 30 1e 04 0b 6f 62 6a  65 63 74 43 6c 61 73 73   =0...objectClass
64da06df.1baff92f 0x7ff80f9fe6c0   0040:  31 0f 04 0d 69 6e 65 74  4f 72 67 50 65 72 73 6f   1...inetOrgPerso
64da06df.1bb000a9 0x7ff80f9fe6c0   0050:  6e 30 0e 04 02 63 6e 31  08 04 06 63 6e 31 30 30   n0...cn1...cn100
64da06df.1bb0081d 0x7ff80f9fe6c0   0060:  37 30 0b 04 02 73 6e 31  05 04 03 66 6f 6f         70...sn1...foo
64da06df.1bb00fcc 0x7ff80f9fe6c0 ber_get_next: tag 0x30 len 116 contents:
64da06df.1bb017e4 0x7ff80f9fe6c0 ber_dump: buf=0x7ff80025ced0 ptr=0x7ff80025ced0 end=0x7ff80025cf44 len=116
64da06df.1bb01f67 0x7ff80f9fe6c0   0000:  02 02 03 f1 68 6e 04 2d  63 6e 3d 63 6e 31 30 30   ....hn.-cn=cn100
64da06df.1bb026db 0x7ff80f9fe6c0   0010:  37 2c 64 63 3d 73 6c 61  70 64 2d 74 65 73 74 2c   7,dc=slapd-test,
64da06df.1bb02e50 0x7ff80f9fe6c0   0020:  64 63 3d 70 79 74 68 6f  6e 2d 6c 64 61 70 2c 64   dc=python-ldap,d
64da06df.1bb035bd 0x7ff80f9fe6c0   0030:  63 3d 6f 72 67 30 3d 30  1e 04 0b 6f 62 6a 65 63   c=org0=0...objec
64da06df.1bb03d32 0x7ff80f9fe6c0   0040:  74 43 6c 61 73 73 31 0f  04 0d 69 6e 65 74 4f 72   tClass1...inetOr
64da06df.1bb044a1 0x7ff80f9fe6c0   0050:  67 50 65 72 73 6f 6e 30  0e 04 02 63 6e 31 08 04   gPerson0...cn1..
64da06df.1bb04c15 0x7ff80f9fe6c0   0060:  06 63 6e 31 30 30 37 30  0b 04 02 73 6e 31 05 04   .cn10070...sn1..
64da06df.1bb0536c 0x7ff80f9fe6c0   0070:  03 66 6f 6f                                        .foo
64da06df.1bb06b01 0x7ff80f9fe6c0 op tag 0x68, time 1692010207
64da06df.1bb07369 0x7ff80f9fe6c0 connection_input: conn=1002 deferring operation: pending operations
64da06df.1bb088b8 0x7ff80f9fe6c0 ber_get_next
64da06df.1bb095b4 0x7ff80f9fe6c0 ldap_read: want=8, got=8
64da06df.1bb09d45 0x7ff80f9fe6c0   0000:  30 74 02 02 03 f2 68 6e                            0t....hn
64da06df.1bb0aa28 0x7ff80f9fe6c0 ldap_read: want=110, got=110
64da06df.1bb0b1c0 0x7ff80f9fe6c0   0000:  04 2d 63 6e 3d 63 6e 31  30 30 38 2c 64 63 3d 73   .-cn=cn1008,dc=s
64da06df.1bb0b93e 0x7ff80f9fe6c0   0010:  6c 61 70 64 2d 74 65 73  74 2c 64 63 3d 70 79 74   lapd-test,dc=pyt
64da06df.1bb0c91c 0x7ff80f9fe6c0   0020:  68 6f 6e 2d 6c 64 61 70  2c 64 63 3d 6f 72 67 30   hon-ldap,dc=org0
64da06df.1bb0d0ae 0x7ff80f9fe6c0   0030:  3d 30 1e 04 0b 6f 62 6a  65 63 74 43 6c 61 73 73   =0...objectClass
64da06df.1bb0d81c 0x7ff80f9fe6c0   0040:  31 0f 04 0d 69 6e 65 74  4f 72 67 50 65 72 73 6f   1...inetOrgPerso
64da06df.1bb0df88 0x7ff80f9fe6c0   0050:  6e 30 0e 04 02 63 6e 31  08 04 06 63 6e 31 30 30   n0...cn1...cn100
64da06df.1bb0e6f5 0x7ff80f9fe6c0   0060:  38 30 0b 04 02 73 6e 31  05 04 03 66 6f 6f         80...sn1...foo
64da06df.1bb0eec8 0x7ff80f9fe6c0 ber_get_next: tag 0x30 len 116 contents:
64da06df.1bb0f6ac 0x7ff80f9fe6c0 ber_dump: buf=0x7ff80025d3a0 ptr=0x7ff80025d3a0 end=0x7ff80025d414 len=116
64da06df.1bb0fe2b 0x7ff80f9fe6c0   0000:  02 02 03 f2 68 6e 04 2d  63 6e 3d 63 6e 31 30 30   ....hn.-cn=cn100
64da06df.1bb105a2 0x7ff80f9fe6c0   0010:  38 2c 64 63 3d 73 6c 61  70 64 2d 74 65 73 74 2c   8,dc=slapd-test,
64da06df.1bb10d54 0x7ff80f9fe6c0   0020:  64 63 3d 70 79 74 68 6f  6e 2d 6c 64 61 70 2c 64   dc=python-ldap,d
64da06df.1bb114c7 0x7ff80f9fe6c0   0030:  63 3d 6f 72 67 30 3d 30  1e 04 0b 6f 62 6a 65 63   c=org0=0...objec
64da06df.1bb11c3b 0x7ff80f9fe6c0   0040:  74 43 6c 61 73 73 31 0f  04 0d 69 6e 65 74 4f 72   tClass1...inetOr
64da06df.1bb123ae 0x7ff80f9fe6c0   0050:  67 50 65 72 73 6f 6e 30  0e 04 02 63 6e 31 08 04   gPerson0...cn1..
64da06df.1bb12b1a 0x7ff80f9fe6c0   0060:  06 63 6e 31 30 30 38 30  0b 04 02 73 6e 31 05 04   .cn10080...sn1..
64da06df.1bb1326d 0x7ff80f9fe6c0   0070:  03 66 6f 6f                                        .foo
64da06df.1bb13afb 0x7ff80f9fe6c0 op tag 0x68, time 1692010207
64da06df.1bb142cb 0x7ff80f9fe6c0 connection_input: conn=1002 deferring operation: pending operations
64da06df.1bb158cd 0x7ff80f9fe6c0 connection_read(12): input error=-1 id=1002, closing.
64da06df.1bb16271 0x7ff80f9fe6c0 connection_closing: readying conn=1002 sd=12 for close
64da06df.1bb3fea7 0x7ff80f9fe6c0 connection_close: deferring conn=1002 sd=12
64da06df.1bb4968e 0x7ff80e9fc6c0 connection_resched: attempting closing conn=1002 sd=12
64da06df.1bb4b04a 0x7ff80e9fc6c0 connection_close: deferring conn=1002 sd=12
64da06df.1bb4dda2 0x7ff80f1fd6c0 connection_resched: attempting closing conn=1002 sd=12
64da06df.1bb4fb97 0x7ff80f1fd6c0 connection_close: deferring conn=1002 sd=12
64da06df.1bb59409 0x7ff80e1fb6c0 connection_resched: attempting closing conn=1002 sd=12
64da06df.1bb5abab 0x7ff80e1fb6c0 connection_close: deferring conn=1002 sd=12
64da06df.1bb5dba0 0x7ff80d9fa6c0 connection_resched: attempting closing conn=1002 sd=12
64da06df.1bb5f0b9 0x7ff80d9fa6c0 connection_close: deferring conn=1002 sd=12
64da06df.1bb61cab 0x7ff80d1f96c0 connection_resched: attempting closing conn=1002 sd=12
64da06df.1bb62eba 0x7ff80d1f96c0 connection_close: deferring conn=1002 sd=12
64da06df.1bb65596 0x7ff80c9f86c0 connection_resched: attempting closing conn=1002 sd=12
64da06df.1bb664a4 0x7ff80c9f86c0 connection_close: deferring conn=1002 sd=12
64da06df.1bb68596 0x7ff7effff6c0 connection_resched: attempting closing conn=1002 sd=12
64da06df.1bb69d06 0x7ff7effff6c0 connection_close: deferring conn=1002 sd=12
64da06df.1bb6c691 0x7ff7ef7fe6c0 connection_resched: attempting closing conn=1002 sd=12
64da06df.1bb6d939 0x7ff7ef7fe6c0 connection_close: conn=1002 sd=12
64da06df.1bb70590 0x7ff7ef7fe6c0 daemon: removing 12
64da06df.1bb7c70e 0x7ff7ef7fe6c0 conn=1002 fd=12 closed (connection lost)
64da06df.1bb8c9d8 0x7ff8101ff6c0 slapd shutdown: waiting for 0 operations/tasks to finish
64da06df.1bc0bf04 0x7ff84ce57880 slapd shutdown: initiated
64da06df.1bc560e1 0x7ff84ce57880 slapd destroy: freeing system resources.
64da06df.1bc857b6 0x7ff84ce57880 slapd stopped.

I understand by your answers that this might be a server configuration issue, but also that there is a misusage on my side.

If async is not optimal and sync can only perform one operation at a time, how would you recommend to perform batch operations?

@mistotebe
Copy link
Contributor

64da06df.1bb158cd 0x7ff80f9fe6c0 connection_read(12): input error=-1 id=1002, closing.

Looks like the connection got lost somehow.

Since you're using OpenLDAP, back-mdb will serialise your writes anyway, if you're concerned that network latency slows down your operations, limit your batches to a small amount, only issuing new operations as the old ones have been confirmed (using one of the result<x> methods).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants