Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap: String access is truncated or causes NullPointerException #28

Open
666asd opened this issue Jul 4, 2018 · 11 comments
Open

mmap: String access is truncated or causes NullPointerException #28

666asd opened this issue Jul 4, 2018 · 11 comments
Labels

Comments

@666asd
Copy link

666asd commented Jul 4, 2018

image
In the first pic, i used mmap to read file with python. the second is jython and JyNI. the string is truncated when i used JyNI.

@Stewori
Copy link
Owner

Stewori commented Jul 5, 2018

Thanks for reporting this issue. However, a copy/paste-able form would have been convenient for reproduction, so I'll share it here:

import sys
sys.path.append('/usr/lib/python2.7/lib-dynload')
import mmap
print mmap
f = open('jython.jar', "rb")
mp = mmap.mmap(int(f.fileno()), 0, access=mmap.ACCESS_READ)
print repr(mp[:8])
print len(mp)

Added a length output, which at least yields the correct value.
Does the truncation always occur at a line break? That might be a starting point.
I cannot tell right now when I will find time to investigate this further.
Feel free to track down this issue in native code, then I would help as far as I can.

@Stewori
Copy link
Owner

Stewori commented Jul 5, 2018

I just tried it with some text file and there it works fine. Seems to be unrelated to newline characters.
On the other hand I tried it on a distinct binary file, where I could even provoke a crash:

Traceback (most recent call last):
  File "/data/workspace/linux/JyNI/JyNI-Demo/src/issue_28.py", line 17, in <module>
    print repr(mp[:60])
java.lang.NullPointerException
	at org.python.core.__builtin__.repr(__builtin__.java:1089)
	at org.python.core.BuiltinFunctions.__call__(__builtin__.java:105)
	at org.python.core.PyObject.__call__(PyObject.java:460)
	at org.python.pycode._pyx0.f$0(/data/workspace/linux/JyNI/JyNI-Demo/src/issue_28.py:20)
	at org.python.pycode._pyx0.call_function(/data/workspace/linux/JyNI/JyNI-Demo/src/issue_28.py)
	at org.python.core.PyTableCode.call(PyTableCode.java:171)
	at org.python.core.PyCode.call(PyCode.java:18)
	at org.python.core.Py.runCode(Py.java:1589)
	at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:296)
	at org.python.util.jython.run(jython.java:366)
	at org.python.util.jython.main(jython.java:142)
java.lang.NullPointerException: java.lang.NullPointerException

So, this seems to depend somehow on file content. print repr(mp[:60]) and any higher value cause the crash for that file while e.g. print repr(mp[:50]) succeeds. There must be specific values in the data that blow up string conversion. Maybe this is a general issue with converting arbitrary data to strings or an encoding issue. Maybe it actually boils down to a Jython or even a Java thing. Does non-string access to the data work okay to some extend?

@666asd
Copy link
Author

666asd commented Jul 6, 2018

Thanks for your help. I think the problem is due to \x00. i have changed the function JySync_Init_JyString_From_PyString in JyNI-C/JySync.c . it seems to work fine.

aa

this is my change.

jobject JySync_Init_JyString_From_PyString(PyObject* src, jclass subtype)
{
    jstring jstr;
    env(NULL);
    int size = ((PyStringObject *)(src))->ob_size;
    jclass strClass = (*env)->FindClass(env, "Ljava/lang/String;");
    jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BLjava/lang/String;)V");
    jbyteArray bytes = (*env)->NewByteArray(env, size);
    (*env)->SetByteArrayRegion(env, bytes, 0, size, (jbyte*) ((PyStringObject *)(src))->ob_sval);
    jstring encoding = (*env)->NewStringUTF(env, "GB2312");
    jstr = (*env)->NewObject(env, strClass, ctorID, bytes, encoding);
    if (JyNI_HasJyAttribute(AS_JY_NO_GC(src), JyAttributeStringInterned))
        jstr = (*env)->CallObjectMethod(env, jstr, string_intern);
    return (*env)->CallStaticObjectMethod(env, pyPyClass, pyPy_newString, jstr);
}

this result
image

@666asd
Copy link
Author

666asd commented Jul 6, 2018

But the crash still appears.

>>> mp[13:14]
'J'
>>> mp[12:14]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
java.lang.NullPointerException
	at org.python.core.PySystemState.displayhook(PySystemState.java:1489)
	at org.python.core.PySystemStateFunctions.__call__(PySystemState.java:1905)
	at org.python.core.PyObject.invoke(PyObject.java:3727)
	at org.python.core.Py.printResult(Py.java:2268)
	at org.python.pycode._pyx40.f$0(<stdin>:1)
	at org.python.pycode._pyx40.call_function(<stdin>)
	at org.python.core.PyTableCode.call(PyTableCode.java:171)
	at org.python.core.PyCode.call(PyCode.java:18)
	at org.python.core.Py.runCode(Py.java:1614)
	at org.python.core.Py.exec(Py.java:1658)
	at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:276)
	at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter.java:131)
	at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:116)
	at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:62)
	at org.python.util.InteractiveConsole.push(InteractiveConsole.java:187)
	at org.python.util.InteractiveConsole._interact(InteractiveConsole.java:168)
	at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:126)
	at org.python.util.jython.run(jython.java:419)
	at org.python.util.jython.main(jython.java:142)
java.lang.NullPointerException: java.lang.NullPointerException
>>> mp[13:14]
'J'
>>> mp[10:11]
'n'
>>> mp[10:12]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
java.lang.NullPointerException
	at org.python.core.PySystemState.displayhook(PySystemState.java:1489)
	at org.python.core.PySystemStateFunctions.__call__(PySystemState.java:1905)
	at org.python.core.PyObject.invoke(PyObject.java:3727)
	at org.python.core.Py.printResult(Py.java:2268)
	at org.python.pycode._pyx43.f$0(<stdin>:1)
	at org.python.pycode._pyx43.call_function(<stdin>)
	at org.python.core.PyTableCode.call(PyTableCode.java:171)
	at org.python.core.PyCode.call(PyCode.java:18)
	at org.python.core.Py.runCode(Py.java:1614)
	at org.python.core.Py.exec(Py.java:1658)
	at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:276)
	at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter.java:131)
	at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:116)
	at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:62)
	at org.python.util.InteractiveConsole.push(InteractiveConsole.java:187)
	at org.python.util.InteractiveConsole._interact(InteractiveConsole.java:168)
	at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:126)
	at org.python.util.jython.run(jython.java:419)
	at org.python.util.jython.main(jython.java:142)
java.lang.NullPointerException: java.lang.NullPointerException

@Stewori Stewori changed the title The string is truncated. mmap: String access is truncated or causes NullPointerException Jul 6, 2018
@Stewori
Copy link
Owner

Stewori commented Jul 6, 2018

Okay, these seem to be two separate things. The idea to fix it in JySync_Init_JyString_From_PyString is promising. However, I don't see why changing to another encoding would solve it. As far as I understand it, we get bytes from a binary file that cannot be encoded as a string properly, using utf8. How can you know that this wouldn't fail with some other input for GB2312 as well? Actually I would think that an exception on attempting to turn arbitrary bytes into a string is justified. However, it does not occur with CPython, so they have some way to resolve it. We must look how it is done there and port that solution to JyNI. Do they use GB2312 in CPython? (I doubt, but haven't looked it up so far).

For the NullPointerException we'll have to understand why PySystemState.displayHook gets a null-argument...

@666asd
Copy link
Author

666asd commented Jul 6, 2018

I modified the way to get the length of C string when it was converted to jstring. In the previous way, when the string of C was converted to jstring, the string was truncated because of \x00. Modifying the encoding method to GB2312 does not solve the problem of string truncation. The code is copied from the network, and the original code is GB2312 encoded. I found that the code can take effect so it doesn't change. I have the same results for utf8.

@Stewori
Copy link
Owner

Stewori commented Jul 6, 2018

Oh, okay \x00 is really the plain 0, I overlooked that. This causes C string handling to terminate it. This makes sense now. Then I'd suggest to use the encoding UTF-8 which should be close to the original NewStringUTF-based implementation.
The lines

jclass strClass = (*env)->FindClass(env, "Ljava/lang/String;");
jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BLjava/lang/String;)V");

can be removed because in JyNI these IDs are already cached at startup. I suggest this implementation:

jstring charsetName;
jobject strByteArray, jstr;
jbyte* strBytes;
Py_ssize_t len = PyString_GET_SIZE(src);

env(NULL);
charsetName = (*env)->NewStringUTF(env, "UTF-8");
strByteArray = (*env)->NewByteArray(env, len);
strBytes = (*env)->GetByteArrayElements(env, strByteArray, NULL);
memcpy(strBytes, PyString_AS_STRING(src), len);
(*env)->ReleaseByteArrayElements(env, strByteArray, strBytes, 0); //copy back and free buffer
jstr = (*env)->NewObject(env, stringClass,
		string_fromBytesAndCharsetNameConstructor, strByteArray, charsetName);
//todo: check interned-regulations on jython-side
if (JyNI_HasJyAttribute(AS_JY_NO_GC(src), JyAttributeStringInterned))
	jstr = (*env)->CallObjectMethod(env, jstr, string_intern);
return (*env)->CallStaticObjectMethod(env, pyPyClass, pyPy_newString, jstr);

We should move charsetName = (*env)->NewStringUTF(env, "UTF-8"); to be a global constant since this here might be a heavily used function. Same with charsetName = (*env)->NewStringUTF(env, "UTF-16BE"); in the PyUnicode equivalent. But that's another todo.

@666asd
Copy link
Author

666asd commented Jul 9, 2018

@Stewori Did you analyze the cause of NullPointerException? I couldn't find the reason.

@Stewori
Copy link
Owner

Stewori commented Jul 9, 2018

Sorry, this is currently in conflict with my work on #22 in my local branch, causing additional errors. Sure, I could make a second clone for this, but anyway -- I prefer to work on one issue at a time. Given that #22 is tricky, it might take a while until I can follow up here. So far it's limited to rough comments, review and style hints, e.t.c. from my side.

@666asd
Copy link
Author

666asd commented Jul 12, 2018

i find pyfile in jython use the function to convent byte[] to string.
image

then i change the function JySync_Init_JyString_From_PyString.

jobject JySync_Init_JyString_From_PyString(PyObject* src, jclass subtype)
{
        jstring charsetName;
        jobject strByteArray, jstr;
        jbyte* strBytes;
        Py_ssize_t len = PyString_GET_SIZE(src);

        env(NULL);
        charsetName = (*env)->NewStringUTF(env, "UTF-8");
        strByteArray = (*env)->NewByteArray(env, len);
        strBytes = (*env)->GetByteArrayElements(env, strByteArray, NULL);
        memcpy(strBytes, PyString_AS_STRING(src), len);
        (*env)->ReleaseByteArrayElements(env, strByteArray, strBytes, 0); //copy back and free buff$
        jclass strClass = (*env)->FindClass(env, "java/lang/String");
        jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BIII)V");

        jstr = (*env)->NewObject(env, strClass, ctorID, strByteArray, 0, 0, len);
        //todo: check interned-regulations on jython-side
        if (JyNI_HasJyAttribute(AS_JY_NO_GC(src), JyAttributeStringInterned))
                jstr = (*env)->CallObjectMethod(env, jstr, string_intern);
                return (*env)->CallStaticObjectMethod(env, pyPyClass, pyPy_newString, jstr);
}

now it can work.
image

@Stewori Stewori added the bug label Jul 12, 2018
@Stewori
Copy link
Owner

Stewori commented Jul 12, 2018

Did you check that the conversion matches CPython behavior?
The constructor you suggest now would fill the upper 8 bits with zeros. This should be the same like creating an array of jchars and assigning the bytes to it in a loop (goodbye memcopy). Based on this we should get the same result using the constructor String(char[] value, int offset, int count), but avoiding a deprecated constructor.

With that taken into account, would you turn your fix into a PR?

If you would turn it into a PR, these two lines should be adjusted to fit with JyNI's approach to handle JNI:

jclass strClass = (*env)->FindClass(env, "java/lang/String");
jmethodID ctorID = (*env)->GetMethodID(env, strClass, "<init>", "([BIII)V");

The first lookup can be avoided by directly using stringClass, which is globally cached on JyNI startup. The second line should move to JyNI_JNI.c, see this post for a rough usage overview.

Otherwise I can refactor it into a commit as soon as I find time.
Thanks for finding this isue and thanks even more for solving it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants