Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.js Readline not writting to output file #1292

Closed
Fl4m3Ph03n1x opened this issue May 22, 2018 · 10 comments
Closed

Node.js Readline not writting to output file #1292

Fl4m3Ph03n1x opened this issue May 22, 2018 · 10 comments

Comments

@Fl4m3Ph03n1x
Copy link

Fl4m3Ph03n1x commented May 22, 2018

  • Node.js Version: 10.1.0
  • OS: Windows/Ubuntu
  • Scope: code
  • Module: Readfile

Background

Dear awesome people,

I am trying to read a several GB sized file line by line. I want to process each line and after that write it to a file. I don't want to ( nor can I ) put everything into memory.

It is important that the order in which i read a line is the order in which I write it to a file.

Code

To achieve this I tried using Node.js Readline interface

const fs = require( "fs" ),
    readline = require( "readline" );

const readStream = fs.createReadStream( "./logs/report.csv" );
const writeStream = fs.createWriteStream( "./logs/out.csv", { encoding: "utf8"} );

const rl = readline.createInterface({
    input: readStream,
    output: writeStream,
    terminal: false,
    historySize: 0
});

rl.on( "line", function(line) {
    //Do your stuff ...
    const transformedLine = line.toUpperCase();
    console.log(transformedLine);

    //Then write to outstream
    rl.write(transformedLine );
});

Problem

As you can see, I am trying to read a line, parse it, and write it into a file called out.csv.

The problem is that the output file is always empty. Nothing is ever written into it.

I have read all the methods, events and options, but clearly I am missing something.

Question

Why is this code not writing into the file?

@shellberg
Copy link

shellberg commented May 22, 2018

You are not writing the 'line' on the writeStream that you've opened. That is, the write() method on ReadLine is a way of adding data into the input buffer as documented thus:

"The rl.write() method will write the data to the readline Interface's input as if it were provided by the user."

You probably mean something like:

const fs = require( "fs" ),
    readline = require( "readline" );

const readStream = fs.createReadStream( "./logs/report.csv" );
const writeStream = fs.createWriteStream( "./logs/out.csv", { encoding: "utf8"} );

const rl = readline.createInterface({
    input: readStream,
//    output: writeStream,
    terminal: false,
    historySize: 0
});

console.log(rl);

rl.on( "line", function(line) {
    console.log(line);
//Do your stuff ...
//Then write to outstream
    writeStream.write(line);
});

@Fl4m3Ph03n1x
Copy link
Author

Fl4m3Ph03n1x commented May 22, 2018

I guess I understand it better now.

However, if I do it that way, in order to preserver the order of the logs, I need to resume and pause the Readline stream, correct?

rl.on( "line", function(line) {
    console.log(line);
 //Do your stuff ...
    const transformedLine = line.toUpperCase();
    console.log(transformedLine);

    //Then write to outstream
    rl.pasue();
    writeStream.write(transformedLine, () => rl.resume() );
});

Like this?


Also, I know I missed that part in the docs, but I fail to see why I would want my write method to feed input into the next iteration of my read event.... Is there a good use case for this?

It mainly sounds counter-intuitive ( at least to me ), that's all.

@shellberg
Copy link

Are you actually finding that the order of your output is being distorted?

I didn't observe this (yet) although I'm certainly not testing with large input files. However, you could certainly use those controls on readLine to govern the rate of consumption/transformation. You might also wish to only being processing the rl.on( "line", ...) after the writeStream.on( "ready" , ...) event if you want to be that prescriptive.

@shellberg
Copy link

shellberg commented May 22, 2018

I fail to see why I would want my write method to feed input into the next iteration of my read event.... Is there a good use case for this?

I'll have to defer to others on that point! However, I expect that there is, otherwise they would not have created the option... And, I believe it probably relates to when used with a TTY, rather than with files.
Perhaps in the case of implementing a REPL-based CLI?

@advanceddeveloper
Copy link

However, if I do it that way, in order to preserver the order of the logs, I need to resume and pause the Readline stream, correct?

No, the order will be preserved with or without pausing the stream.

@Fl4m3Ph03n1x
Copy link
Author

No, the order will be preserved with or without pausing the stream.

How if writeStream.write is async?

@advanceddeveloper
Copy link

advanceddeveloper commented May 22, 2018

Yes, but there is an internal buffer which stores the data in the same order in which the data came in.

@Fl4m3Ph03n1x
Copy link
Author

Ahh, thank you for the feedback.
Having that in mind, it also means I don't need to wait for writeStream.on( "ready" , ...) because it will be saved in the buffer, am I correct ?

@advanceddeveloper
Copy link

advanceddeveloper commented May 22, 2018

I don't need to wait for writeStream.on( "ready" , ...) because it will be saved in the buffer, am I correct?

You don't need to wait as long as you write the data in separate asynchronous calls and as long as the "other stuff" that you do between reading a line and writing the line is synchronous.

For example, the following will crash the application for large data:

var fs = require('fs');
var stream = fs.createWriteStream('test.txt');

while(1)
  stream.write('some data');

because the internal buffer can't be freed until you return the control to the event loop. However, the following code works fine:

setInterval(() => {
  stream.write('some data');
});

because you're writting data from the function that is called asynchronously by setInterval, so the buffer can be freed after each chunk.


Now, back to your code. You example works fine if you replace rl.write with writeStream.write as long as you call it synchronously after reading the line. In the following example, it is guaranteed that the application will not crash and that the order will bre preserved:

rl.on('line', line => {
  //Do your stuff ...
  var transformedLine = line.toUpperCase();
  console.log(transformedLine);

  //Then write to outstream
  writeStream.write(transformedLine + '\n');
});

However, if you call writeStream.write asynchronously, it is not guaranteed that the order will be preserved. This is how you should not do it:

rl.on('line', line => {
  //Do your stuff ...
  var transformedLine = line.toUpperCase();
  console.log(transformedLine);

  setTimeout(() => {
    //Then write to outstream
    writeStream.write(transformedLine + '\n');
  });
});

And also, notice that if you want to write the data in separate lines, you need to add \n when writing data (or '\r\n' if you're on windows).

@Fl4m3Ph03n1x
Copy link
Author

Fl4m3Ph03n1x commented May 23, 2018

Got it !
Thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants