Skip to content

Supplying a large-ish json object to the python child process for processing #179

@CharlieBickerton

Description

@CharlieBickerton

Hi,

I am trying to use python-shell (excellent package btw) to run python child processes to process large amounts of data for a webapp. This is being run on a computer that is running ubuntu 18.04. I am currently passing in the script name and the data to be processed as part of the options in PythonShell.run:

return new Promise((done, reject) =>
    PythonShell.run(
      resolve('src/python/' + pythonScript + '.py'),
      options,
      (err: Error, results: JSON) => (err ? reject(err) : done(results)),
    ),
  );

where

options: PythonShellOptions = {
      args:
      [
        JSON.stringify(arg1),
        JSON.stringify(arg2),
        moreData,
      ],
    };

This is working great until the data I am trying to pass in as arguments reaches a certain size - around 200kb. When working with larger datasets, I am receiving this error:

Error: { Error: spawn E2BIG 
at ChildProcess.spawn (internal/child_process.js:313:11) 
at Object.exports.spawn (child_process.js:508:9) 
at new PythonShell (******************/server/node_modules/python-shell/index.ts:138:29) 
at Function.run (******************/server/node_modules/python-shell/index.ts:266:23) 
at /****************** 
at new Promise (<anonymous>) 
at executePythonScript (******************) 
at Object.<anonymous> (******************) 
at step (******************) 
at Object.next (******************) 
at ****************** 
at new Promise (<anonymous>) 
at __awaiter (******************) 
at ****************** 
at Array.map (<anonymous>) 
at Object.<anonymous> (******************) 
errno: 'E2BIG', code: 'E2BIG', syscall: 'spawn' }

This appears to be because the data being sent to the python child process is too large be passed through using PythonShell.run.

Does anyone know anything about this? Is it possible to pass in larger amounts of data to a python shell child process using the .send() function? What do you suggest is the best way of getting around dealing with large amounts of data?

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions