Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Bindings : Xpath_Ctx - Should it be used with Python? #2021

Closed
johnbeckettn2e opened this issue Jun 15, 2020 · 11 comments
Closed

Python Bindings : Xpath_Ctx - Should it be used with Python? #2021

johnbeckettn2e opened this issue Jun 15, 2020 · 11 comments

Comments

@johnbeckettn2e
Copy link
Contributor

johnbeckettn2e commented Jun 15, 2020

Hi,

I've encountered an issue testing sysrepo v1.4.2 with python3, as part of the debug process I have issued a few xpath_ctx calls whilst running the interpreter .

In making some manual calls into Sysrepo and running in Valgrind I'm seeing that an Xpath_Ctx recover() will write to memory after it has been freed.

I'm running the equivalent of the following lines

import sysrepo
xpath = "/mymodule:cap[foo='BAR']"
ctx = sysrepo.Xpath_Ctx()
n = ctx.last_node(xpath)
ctx.recover()

I believe what is happening is that the underlying c code is modifying the xpath string (as per the Xpath_ctx section of the documentation) and the call to recover is returning the string to it's original contents.

From the SWIG documentation
http://www.swig.org/Doc3.0/SWIGDocumentation.html - 9.3.1 Default string handling
(example is shown on how a string argument is generated, then this note follows...) it is NOT safe for a function to modify this data (doing so may corrupt the interpreter and lead to a crash).

I think it happens to work for python2 because the underlying string is a c string and SWIG returns a pointer to the underlying data. Whereas for python3 a temporary string (due to unicode conversion) is generated, but gets freed after the original xpath function returns, leaving the Xpath_Ctx with a pointer to a freed string.

Is this API intended for use with the python bindings?

For reference the valgrind output as I pasted the commands, I've removed some lines for clarity. It appears that the life of the xpath string is only within the autogenerated function _wrap_Xpath_Ctx_last_node (sysrepoPYTHON_wrap.cxx:88116 to sysrepoPYTHON_wrap.cxx:88139)

$  valgrind  --track-origins=yes --leak-check=full  python3

#### Skip a bunch of warning relating to python before first prompt

>>> import sysrepo
>>> xpath = "/mymodule:cap[foo='BAR']"
>>> ctx = sysrepo.Xpath_Ctx()
>>> n = ctx.last_node(xpath)
>>> ctx.recover()
==1201== Invalid write of size 1
==1201==    at 0x8C3D3D9: sr_xpath_recover (xpath.c:584)
==1201==    by 0x80FD736: sysrepo::Xpath_Ctx::recover() (Xpath.hpp:88)
==1201==    by 0x8032705: _wrap_Xpath_Ctx_recover (sysrepoPYTHON_wrap.cxx:88311)
...rest of stack dump outside sysrepo...
==1201==  Address 0x9f52e6d is 13 bytes inside a block of size 25 free'd
==1201==    at 0x4C3173B: operator delete[](void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1201==    by 0x8031950: _wrap_Xpath_Ctx_last_node (sysrepoPYTHON_wrap.cxx:88139)
...rest of stack dump outside sysrepo...
==1201==  Block was alloc'd at
==1201==    at 0x4C3089F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1201==    by 0x7EC9C71: SWIG_AsCharPtrAndSize(_object*, char**, unsigned long*, int*) (sysrepoPYTHON_wrap.cxx:3999)
==1201==    by 0x8031886: _wrap_Xpath_Ctx_last_node (sysrepoPYTHON_wrap.cxx:88116)
...rest of stack dump outside sysrepo...
@michalvasko
Copy link
Collaborator

Hi,
what you are saying makes sense to me and thanks for properly investigating it. As there are new bindings being worked on, disabling this in the current ones seems fine to me, @lpaulic?

Regards,
Michal

@johnbeckettn2e
Copy link
Contributor Author

Hi @michalvasko , @lperkov ,

Thanks for clarifying

Do you have any more information on the new bindings?

For instance

  • How complete are they?
  • When are they expected to be available?

John

@michalvasko
Copy link
Collaborator

Hi,
@rjarry is the one working on them so maybe he can share some details.

Regards,
Michal

@rjarry
Copy link
Collaborator

rjarry commented Jun 16, 2020

Hi @johnbeckettn2e,

I am still working on the bindings. I am planning to create a draft pull request in the coming weeks.

So far, here is the (high level) API coverage:

  • subscribe to module changes
  • subscribe for operational data requests
  • subscribe for rpc calls
  • get data
  • get/set/delete individual items
  • edit/replace configuration
  • validate/apply/discard pending changes
  • get the yang schemas (via get_ly_context())
  • install/uninstall yang modules
  • enable/disable yang features

The python API itself is very different from the existing SWIG bindings.

Side note: I have added asyncio support. It is not perfect since I had to cut some corners in order to make it work. I am hoping we can discuss the issues I encountered when submitting the PR.

@rjarry
Copy link
Collaborator

rjarry commented Jun 16, 2020

Here are some dummy code examples:

# operational data callback (threads)
import signal
import time
import socket
import sysrepo

def get_items_cb(module, xpath, private_data):
    return {'hostname': socket.gethostname()}

do_run = True
def stop(signum, frame):
    global do_run
    do_run = False
signal.signal(signal.SIGINT, stop)

with sysrepo.Connection() as conn:
    with conn.start_session() as sess:
        sess.subscribe_get_items(
            'example', '/example:state/system/hostname', get_items_cb)
        while do_run:
            time.sleep(1)
# rpc callback (asyncio)
import asyncio
import pwd
import sysrepo

async def create_user(xpath, rpc_input, private_data):
    proc = await asyncio.create_subprocess_exec(
        ['useradd', '-b', rpc_input['home'], rpc_input['name']])
    if await proc.wait() != 0:
        raise sysrepo.SysrepoOperationFailedError('useradd failed')
    u = pwd.getpwnam(rpc_input['name'])
    return {'uid': u.pw_uid, 'gid': u.pw_gid}

loop = asyncio.get_event_loop()
stop_event = asyncio.Event()
loop.add_signal_handler(signal.SIGINT, stop_event.set)

with sysrepo.Connection() as conn:
    with conn.start_session() as sess:
        sess.subscribe_rpc_call(
            '/example:create-user', create_user, asyncio_register=True)
        loop.run_until_complete(stop_event.wait())
# client

import sysrepo

with sysrepo.Connection(no_sched_changes=True) as conn:
    with conn.start_session('operational') as sess:
        dic = sess.get_data_dict('/example:state')
        # {'state': {'system': {'hostname': 'foobar'}}}
    with conn.start_session() as sess:
        inp = {'name': 'foo', 'home': '/home/foo'}
        out = sess.rpc_send_dict('/example:create-user', inp)
        # {'uid': 1003, 'gid': 100}

@johnbeckettn2e
Copy link
Contributor Author

johnbeckettn2e commented Jun 17, 2020

Thanks @rjarry ,

It seems that the callbacks will now receive a dictionary of key value pairs.

What happens if there are duplicate keys in the xpath?
/module:thing/higher[name='acme']/lower[name='thing'][port='1']

The current API lets you specify the node and key when getting the value:
ctx.key_value(xpath, 'higher', 'name')
ctx.key_value(xpath, 'lower', 'name')

John

@rjarry
Copy link
Collaborator

rjarry commented Jun 17, 2020

Hi John,

I'm not sure what you mean. If there are nested lists, the dictionary will reflect this:

For example, your xpath /module:thing/higher[name='acme']/lower[name='thing'][port='1'] would be represented by such a python dictionary:

{
    'thing': {
        'higher': [
            {
                'name': 'acme',
                'lower': [
                    {
                        'name': 'thing',
                        'port': 1,
                    },
                ],
            },
        ],
    },
}

The actual data format is handled by the libyang CFFI bindings which I recently pushed in CESNET/libyang#1087. You can see how it looks like here: https://github.com/CESNET/libyang/blob/v1.0.176/python/libyang/data.py#L206

@johnbeckettn2e
Copy link
Contributor Author

Hi @rjarry - thanks for clarifying.

I was thrown off because I was expecting more in the xpath. Using rpc_input['home'], rpc_input['name'] from above implies that the module only contains the keys home and name at the root level

@rjarry
Copy link
Collaborator

rjarry commented Jun 17, 2020

Actually, module change callbacks will be called with only the subtree that is affected.

For example, if you subscribe with /module:thing/higher you will receive the subtree without the first level as the config callback argument.

    {
        'higher': [
            {
                'name': 'acme',
                'lower': [
                    {
                        'name': 'thing',
                        'port': 1,
                    },
                ],
            },
        ],
    }

@rjarry
Copy link
Collaborator

rjarry commented Jun 17, 2020

I am still thinking about all this. Maybe if there were some additions in the sysrepo C API, I could have something more flexible. I'll discuss this with more concrete examples when actually submitting the PR.

@johnbeckettn2e
Copy link
Contributor Author

Its definitely the right direction! I just tried to illustrate that a comment above the example callback that includes the xpath would have been nice, but that's one for documentation and comments in the example code when it's written.

This has been very useful for me.

Thanks again
John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants