Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: MegaApi.startStreaming() only gives me a fraction of the data #2612

Open
Eboreg opened this issue Mar 5, 2022 · 7 comments
Open

Comments

@Eboreg
Copy link

Eboreg commented Mar 5, 2022

I am trying to use MegaApi.startStreaming() via the Python bindings, but my MegaTransferListener.onTransferUpdate() reports huge differences between the values returned by MegaTransfer.getDeltaSize() and the lengths of the byte arrays I actually get from MegaTransfer.getLastBytes(), and so only a fraction of the file is actually received.

My debug listener:

class TransferListener(MegaTransferListener):
    def __init__(self):
        self.buffers = []
        self.size = 0
        super().__init__()

    def onTransferStart(self, api: "MegaApi", transfer: "MegaTransfer"):
        logger.info(f"onTransferStart: transfer={transfer}")

    def onTransferUpdate(self, api: "MegaApi", transfer: "MegaTransfer"):
        buffer = transfer.getLastBytes().encode("utf-8", errors="surrogateescape")
        size = transfer.getDeltaSize()
        self.buffers.append(buffer)
        self.size += size
        logger.info(
            f"onTransferUpdate: getTotalBytes()={transfer.getTotalBytes()}, "
            f"getTransferredBytes()={transfer.getTransferredBytes()}, "
            f"getDeltaSize()={size}, "
            f"getLastBytes() length={len(buffer)}"
        )

    def onTransferFinish(self, api: "MegaApi", transfer: "MegaTransfer", error: "MegaError"):
        buffers_size = sum([len(b) for b in self.buffers])
        logger.info(
            f"onTransferFinish: transfer={transfer}, "
            f"error={error}, "
            f"reported size from accumulated getDeltaSize()={self.size}, "
            f"actual total size of received data={buffers_size}"
        )

    def onTransferTemporaryError(self, api: "MegaApi", transfer: "MegaTransfer", error: "MegaError"):
        logger.error(f"onTransferTemporaryError: transfer={transfer}, error={error}")

    def onTransferData(self, api: "MegaApi", transfer: "MegaTransfer", buffer: str, size: int) -> bool:
        return True

I got the .encode() thing I do on the received string from the SWIG docs, so I guess it's the correct way to do it?

I also tried handling the returned data in onTransferData() instead, but it just had the exact same result.

I am testing this out by using a MegaNode belonging to a known file, and sending it to startStreaming() like so:

node = api.getNodeByHandle(150729868582434)
size = api.getSize(node)
# size = 29458186, which is consistent with the size of the actual file
transfer_listener = TransferListener()
api.startStreaming(node, 0, size, transfer_listener)

However, this is some of what the listener above logs:

onTransferStart: transfer=DOWNLOAD
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=28960, getDeltaSize()=28960, getLastBytes() length=0
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=36160, getDeltaSize()=7200, getLastBytes() length=0
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=50640, getDeltaSize()=14480, getLastBytes() length=10
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=57920, getDeltaSize()=7280, getLastBytes() length=38
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=65120, getDeltaSize()=7200, getLastBytes() length=82
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=72400, getDeltaSize()=7280, getLastBytes() length=192
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=79600, getDeltaSize()=7200, getLastBytes() length=94
[... lots of lines cut ...]
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=29407200, getDeltaSize()=7200, getLastBytes() length=445
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=29414480, getDeltaSize()=7280, getLastBytes() length=427
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=29436160, getDeltaSize()=21680, getLastBytes() length=785
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=29443440, getDeltaSize()=7280, getLastBytes() length=68
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=29450640, getDeltaSize()=7200, getLastBytes() length=409
onTransferUpdate: getTotalBytes()=29458186, getTransferredBytes()=29458186, getDeltaSize()=7546, getLastBytes() length=116
Listener.onTransferFinish: transfer=DOWNLOAD, error=No error, continue_event=True
onTransferFinish: transfer=DOWNLOAD, error=No error, reported size from accumulated getDeltaSize()=29458186, actual total size of received data=770162

So as you see, getLastBytes() consistently returns a much smaller amount of data than what getDeltaSize() reports. And I know for a fact that the actual file is 29458186 bytes.

I compiled with the following configure arguments (but I have tried various other combinations as well):

--disable-silent-rules --enable-python --with-python3 --disable-examples --enable-debug --enable-doxygen-html

Is there something obvious I'm missing here? I really hope someone can help me out a little.

@Eboreg
Copy link
Author

Eboreg commented Mar 6, 2022

Update: I find it works fine for text files for some reason, but only if I take the strings returned by MegaTransfer.getLastBytes(), convert them to bytes, and then crop them to the length given by MegaTransfer.getDeltaSize(). Like so:

buffer_str = transfer.getLastBytes()
delta_size = transfer.getDeltaSize()
buffer_bin = buffer_str.encode("utf-8", errors="surrogateescape")[:delta_size]
self.buffers_bin.append(buffer_bin)

I can then do b"".join(transfer_listener.buffers_bin).decode(), which gives me an exact copy of the original text.

Why it fails so miserably for binary files, though, is still a mystery to me.

@Eboreg
Copy link
Author

Eboreg commented Mar 6, 2022

I would like to try building the Python bindings with SWIG_PYTHON_STRICT_BYTE_CHAR, as per the SWIG documentation: http://swig.org/Doc4.0/Python.html#Python_nn77

How to do that is unfortunately beyond my competence at the moment.

@Eboreg
Copy link
Author

Eboreg commented Mar 6, 2022

Been doing a little more debugging, and it seems that whatever generates the return value of MegaTransfer.getLastBytes() stops as soon as it encounters a null character.

E.g. if I do api.startStreaming(node, 0, 1000, transfer_listener), and there is a null at position 10 in the file, I only get characters 0 through 9 in return, even if character 11 is non-null.

I guess this makes sense, as SWIG assumes that a returned char * value is a null-terminated string (source). But that's not really helpful in this case.

@Eboreg
Copy link
Author

Eboreg commented Mar 8, 2022

I managed to build the SDK with #define SWIG_PYTHON_STRICT_BYTE_CHAR. Everything is indeed bytes instead of str now, but unfortunately that didn't solve anything. The returned values still stop at the first null character.

Am I configuring the build wrong? Or is startStreaming() simply not meant to be used for binary files?

From the generated bindings/python/megaapi_wrap.cpp:

SWIGINTERN PyObject *_wrap_MegaTransfer_getLastBytes(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
  PyObject *resultobj = 0;
  mega::MegaTransfer *arg1 = (mega::MegaTransfer *) 0 ;
  void *argp1 = 0 ;
  int res1 = 0 ;
  PyObject *swig_obj[1] ;
  char *result = 0 ;
  
  if (!args) SWIG_fail;
  swig_obj[0] = args;
  res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_mega__MegaTransfer, 0 |  0 );
  if (!SWIG_IsOK(res1)) {
    SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "MegaTransfer_getLastBytes" "', argument " "1"" of type '" "mega::MegaTransfer const *""'"); 
  }
  arg1 = reinterpret_cast< mega::MegaTransfer * >(argp1);
  {
    SWIG_PYTHON_THREAD_BEGIN_ALLOW;
    result = (char *)((mega::MegaTransfer const *)arg1)->getLastBytes();
    SWIG_PYTHON_THREAD_END_ALLOW;
  }
  resultobj = SWIG_FromCharPtr((const char *)result);
  return resultobj;
fail:
  return NULL;
}


SWIGINTERNINLINE PyObject * 
SWIG_FromCharPtr(const char *cptr)
{ 
  return SWIG_FromCharPtrAndSize(cptr, (cptr ? strlen(cptr) : 0));
}


SWIGINTERNINLINE PyObject *
SWIG_FromCharPtrAndSize(const char* carray, size_t size)
{
  if (carray) {
    if (size > INT_MAX) {
      swig_type_info* pchar_descriptor = SWIG_pchar_descriptor();
      return pchar_descriptor ? 
	SWIG_InternalNewPointerObj(const_cast< char * >(carray), pchar_descriptor, 0) : SWIG_Py_Void();
    } else {
#if PY_VERSION_HEX >= 0x03000000
#if defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
      return PyBytes_FromStringAndSize(carray, static_cast< Py_ssize_t >(size));
#else
      return PyUnicode_DecodeUTF8(carray, static_cast< Py_ssize_t >(size), "surrogateescape");
#endif
#else
      return PyString_FromStringAndSize(carray, static_cast< Py_ssize_t >(size));
#endif
    }
  } else {
    return SWIG_Py_Void();
  }
}

I notice SWIG_FromCharPtrAndSize() is called with a size argument generated by strlen(). And that function of course assumes it's dealing with a null-terminated string. The question is, could and should I do anything differently in order to avoid this? It seems to me like the reasonable thing would be for _wrap_MegaTransfer_getLastBytes() to call SWIG_FromCharPtrAndSize() directly, using the same size as reported by getDeltaSize().

@jorgeajimenezl
Copy link

I had this problem recently, my solution is based on some changes to the megaapi_wrap.cpp file, I leave you the patch that I applied to version 3.12.0. U can apply using patch megaapi_wrap.cpp megaapi_wrap.txt
megaapi_wrap.txt

@Eboreg
Copy link
Author

Eboreg commented May 23, 2022

@jorgeajimenezl Thanks! I was thinking along the same lines myself. Manually patching an auto generated file is of course not the optimal solution, but it's better than nothing. :)

@ShareefshaF
Copy link

ShareefshaF commented May 24, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants