Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reliable way to strip BOM from a stream #221

Closed
vsemozhetbyt opened this issue Jul 13, 2016 · 6 comments
Closed

Reliable way to strip BOM from a stream #221

vsemozhetbyt opened this issue Jul 13, 2016 · 6 comments

Comments

@vsemozhetbyt
Copy link

Till recently I've used this way (this demo script reads and outputs itself line by line):

const fs = require('fs');
const rl = require('readline');

let lineNumber = 0;

rl.createInterface({
  input: fs.createReadStream(__filename, { encoding: 'utf8' }),
}).on('line', line => {
  if (++lineNumber === 1) line = line.replace(/^\uFEFF/, '');
  console.log(line);
}).on('close', () => {
  console.log('EOF');
});

However, it makes me uneasy that each iteration except the first one should increment and compare the lineNumber in vain. So I've come to this solution:

const fs = require('fs');
const rl = require('readline');

const input = fs.createReadStream(__filename, { encoding: 'utf8' });

input.once('readable', () => {
  const maybeBOM = input.read(1);
  if (maybeBOM !== '\uFEFF') input.unshift(maybeBOM);

  rl.createInterface({
    input,
  }).on('line', line => {
    console.log(line);
  }).on('close', () => {
    console.log('EOF');
  });
});

However, now I doubt if this flow is reliable. Could it somehow mess up the stream internal mechanics? Is this sequence of stream modes and methods safe?

@addaleax
Copy link
Member

Is this sequence of stream modes and methods safe?

I think so, yes. If you want to make sure it stays that way, working the above script into a test file for Node core seems feasible? ;)

@vsemozhetbyt
Copy link
Author

@addaleax Thank you. I am not sure, how a user can present such a test. The team makes it clear that BOM stripping is for userland. And it seems that nodejs/node#7337 makes such a test code currently unreliable.

@addaleax
Copy link
Member

I wouldn’t worry about nodejs/node#7337; realistically, I wouldn’t expect fs streams to touch the file contents in the future, so the BOM will remain included.

I’m thinking of something like this: A test fixture containing data like

abc
def
ghi
jkl
mno
pqr

(no BOM) and a test script like

const input = fs.createReadStream('testfile', { encoding: 'utf8' });
const lines = [];

input.once('readable', () => {
  assert.strictEqual(input.read(1), 'a');

  rl.createInterface({
    input,
  }).on('line', (line) => {
    lines.push(line);
  }).on('close', () => {
    assert.deepStrictEqual(lines, ['bc', 'def', 'ghi', 'jkl', 'mno', 'pqr']);
  });
});

plus things like common.mustCall everywhere, putting the test file in test/fixtures, the necessary imports, etc.

As a side note, I don’t know if that’s applicable to your situation, but '\uFEFFabc'.trim() === 'abc' might provide another reasonable answer to your original question.

@vsemozhetbyt
Copy link
Author

vsemozhetbyt commented Jul 13, 2016

@addaleax Making tests is a completely terra incognita for me now. I am not familiar with API nor with file/directory structure. Maybe I could do something like that hereafter, when I know more, but not in the near future, unfortunately.

And thank you again for the trim() tip. However it is not a universal solution for me: I often write scripts for processing big source files of digital dictionaries and leading spaces have a special meaning in them.

@addaleax
Copy link
Member

Making tests is a completely terra incognita for me now. I am not familiar with API nor file/directory structure.

https://github.com/nodejs/node/blob/master/doc/guides/writing_tests.md might be pretty helpful. Also, I’d be glad to help you find your way around, and generally you can always go to #node-dev on freenode for questions.

@vsemozhetbyt
Copy link
Author

@addaleax Thank you) I shall try.

addaleax pushed a commit to nodejs/node that referenced this issue Jul 27, 2016
Sometimes it is necessary to preprocess some initial bit
of a stream data before giving the entire stream
to the main processing function. Sometimes this bit should be extracted
from the stream before the main processing; sometimes it should be
returned to the stream. This test checks an order of stream
modes, methods and events for a possible preprocessing algorithm.
Stream BOM stripping is selected as a use case.

See nodejs/help#221 as the prehistory.

PR-URL: #7741
Reviewed-By: Anna Henningsen <anna@addaleax.net>
cjihrig pushed a commit to nodejs/node that referenced this issue Aug 10, 2016
Sometimes it is necessary to preprocess some initial bit
of a stream data before giving the entire stream
to the main processing function. Sometimes this bit should be extracted
from the stream before the main processing; sometimes it should be
returned to the stream. This test checks an order of stream
modes, methods and events for a possible preprocessing algorithm.
Stream BOM stripping is selected as a use case.

See nodejs/help#221 as the prehistory.

PR-URL: #7741
Reviewed-By: Anna Henningsen <anna@addaleax.net>
MylesBorins pushed a commit to nodejs/node that referenced this issue Sep 30, 2016
Sometimes it is necessary to preprocess some initial bit
of a stream data before giving the entire stream
to the main processing function. Sometimes this bit should be extracted
from the stream before the main processing; sometimes it should be
returned to the stream. This test checks an order of stream
modes, methods and events for a possible preprocessing algorithm.
Stream BOM stripping is selected as a use case.

See nodejs/help#221 as the prehistory.

PR-URL: #7741
Reviewed-By: Anna Henningsen <anna@addaleax.net>
rvagg pushed a commit to nodejs/node that referenced this issue Oct 18, 2016
Sometimes it is necessary to preprocess some initial bit
of a stream data before giving the entire stream
to the main processing function. Sometimes this bit should be extracted
from the stream before the main processing; sometimes it should be
returned to the stream. This test checks an order of stream
modes, methods and events for a possible preprocessing algorithm.
Stream BOM stripping is selected as a use case.

See nodejs/help#221 as the prehistory.

PR-URL: #7741
Reviewed-By: Anna Henningsen <anna@addaleax.net>
MylesBorins pushed a commit to nodejs/node that referenced this issue Oct 26, 2016
Sometimes it is necessary to preprocess some initial bit
of a stream data before giving the entire stream
to the main processing function. Sometimes this bit should be extracted
from the stream before the main processing; sometimes it should be
returned to the stream. This test checks an order of stream
modes, methods and events for a possible preprocessing algorithm.
Stream BOM stripping is selected as a use case.

See nodejs/help#221 as the prehistory.

PR-URL: #7741
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants