Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/CLN: Vendored ujson Module #49604

Open
lithomas1 opened this issue Nov 9, 2022 · 8 comments
Open

BUG/CLN: Vendored ujson Module #49604

lithomas1 opened this issue Nov 9, 2022 · 8 comments
Labels
Bug Clean IO JSON read_json, to_json, json_normalize

Comments

@lithomas1
Copy link
Member

While working on the meson build integration, I've discovered that the JSON Module is doing some sketchy things.

  1. The ujson extension module should not be named json.

    This can result in a really nasty segfault, since our ujson module tries to initialize numpy's array api(via import_array). In the C code on numpy's side, numpy then tries to import itself. During this process, it needs to initialize its version process via versionneer. Well, it turns out versioneer needs to import json to figure out version stuff. If versioneer picks up pandas' json module, bad things happen since its like a circular import but worse (best case: Python exception saying json failed to initialize, worst case: segfault).

  2. We are using POSIX functions.

    If you're not paying attention to the build warnings, you'll get a segfault, since in C99 mode strdup gets excluded from string.h declarations. Now, Fix build warning for use of strdup in ultrajson #48369 was supposed to fix this so I'm not sure what's going on here.

All in all, this burned like a week of my time, so it would be nice to fix this sometime soon.

cc @WillAyd

@lithomas1 lithomas1 added Bug IO JSON read_json, to_json, json_normalize Clean labels Nov 9, 2022
@WillAyd
Copy link
Member

WillAyd commented Nov 9, 2022

Yea I think we can internally reference it always as ujson to avoid conflicts with the stdlib. I did this in the CMake POC if you need a reference.

As far as issue 2 is concerned are we appropriately including the header files? If the fix from that issue didn't help can you provide the build warning to review?

In the past we would explicitly fail setup on any build warnings but stopped because setuptools could issue spurious things. With a real build system however I would be all for setting -Werror again. Most gcc warnings are in fact serious issues that should be corrected

@WillAyd
Copy link
Member

WillAyd commented Nov 9, 2022

See #33315 for history

@lithomas1
Copy link
Member Author

 ../../pandas/_libs/src/ujson/lib/ultrajsonenc.c: In function ‘JSON_EncodeObject’:
  ../../pandas/_libs/src/ujson/lib/ultrajsonenc.c:1180:18: warning: implicit declaration of function ‘strdup’; did you mean ‘strcmp’? [-Wimplicit-function-declaration]
   1180 |         locale = strdup(locale);
        |                  ^~~~~~
        |                  strcmp
  ../../pandas/_libs/src/ujson/lib/ultrajsonenc.c:1180:16: warning: assignment to ‘char *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
   1180 |         locale = strdup(locale);
        |                ^
  [135/182] Compiling C object pandas/_libs/json.cpython-38-x86_64-linux-gnu.so.p/src_ujson_lib_ultrajsondec.c.o
  ../../pandas/_libs/src/ujson/lib/ultrajsondec.c: In function ‘JSON_DecodeObject’:
  ../../pandas/_libs/src/ujson/lib/ultrajsondec.c:1178:18: warning: implicit declaration of function ‘strdup’; did you mean ‘strcmp’? [-Wimplicit-function-declaration]
   1178 |         locale = strdup(locale);
        |                  ^~~~~~
        |                  strcmp
  ../../pandas/_libs/src/ujson/lib/ultrajsondec.c:1178:16: warning: assignment to ‘char *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
   1178 |         locale = strdup(locale);
        | 

Here's the snippet that builds the json module.

    'json': {'sources': ['src/ujson/python/ujson.c',
                         'src/ujson/python/objToJSON.c',
                         'src/ujson/python/date_conversions.c',
                         'src/ujson/python/JSONtoObj.c',
                         'src/ujson/lib/ultrajsonenc.c',
                         'src/ujson/lib/ultrajsondec.c',
                         'tslibs/src/datetime/np_datetime.c',
                         'tslibs/src/datetime/np_datetime_strings.c'],
             'include_dirs': [inc_datetime, 'src/ujson/lib', 'src/ujson/python']}

@lithomas1
Copy link
Member Author

The only thing I can think of right now, is to remove src/ujson/lib and src/ujson/python from the include dirs, since meson includes the source directories as include directories implicitly I think.

I vaguely remember meson messing up headers if they happen to get double included.

@WillAyd
Copy link
Member

WillAyd commented Nov 9, 2022

Hmm I don't think meson would be doing anything with the headers. Most headers in C should have an include guard to prevent them from being double injected, but that's outside the scope of what meson would do I think.

What if you changed the header in ujson.c and ultrajson.h to be #include headers/portable.h and then added pandas/_libs/src to your include_dirs?

@WillAyd
Copy link
Member

WillAyd commented Nov 9, 2022

You might also want to try and execute meson via VERBOSE=1 <meson_invocation>, which I'm guessing would show you the actual command that gcc tries to execute. That would be very helpful to diagnose as part of the traceback

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Nov 10, 2022

Well, ninja -v specifically. There's an open PR for meson-python to support passing pip install --config-settings options through into meson itself.

@jbrockmendel
Copy link
Member

we've done some renaming of the functions in the module recently; there shouldn't be any resistance to renaming the module itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Clean IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

No branches or pull requests

4 participants