Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs readdir with buffer and file types problem #33348

Closed
vasyanewlifehere opened this issue May 11, 2020 · 9 comments
Closed

fs readdir with buffer and file types problem #33348

vasyanewlifehere opened this issue May 11, 2020 · 9 comments
Labels
confirmed-bug Issues with confirmed bugs. fs Issues and PRs related to the fs subsystem / file system. good first issue Issues that are suitable for first-time contributors.

Comments

@vasyanewlifehere
Copy link

vasyanewlifehere commented May 11, 2020

  • Version: 12.16.3
  • Platform: linux64
  • Subsystem: gentoo

What steps will reproduce the bug?

reading directory with files names using buffer as path does not work.

const fs = require('fs');
fs.readdir( Buffer.from( "."),{withFileTypes:true,encoding:"buffer"},(e,d)=>console.log("dir",d));

It works correct in version 12.14.0 but after upgrade it stopped to work.
using it without files types works:
// fs.readdir( Buffer.from( "."),{withFileTypes:false,encoding:"buffer"},(e,d)=>console.log("dir",d));

How often does it reproduce? Is there a required condition?

every time

What is the expected behavior?

get directory entries with file types i.e. array of Dirent objects

What do you see instead?

error message:

Uncaught:
TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received an instance of Buffer
    at validateString (internal/validators.js:117:11)
    at Object.join (path.js:1039:7)
    at getDirents (internal/fs/utils.js:159:39)
    at FSReqCallback.req.oncomplete (fs.js:858:7) {
  code: 'ERR_INVALID_ARG_TYPE'
}

Additional information

This bug is degradation because it doesn't exists in version 12.14.0 and present in versions 12.16.1 and 12.16.3 at least.
I use buffer instead of string because working with my old archives with non utf-8 names.

@addaleax addaleax added fs Issues and PRs related to the fs subsystem / file system. confirmed-bug Issues with confirmed bugs. labels May 11, 2020
@addaleax
Copy link
Member

This seems like a real bug, although it would be helpful to know what kind of directory entries there are in your directory since that will determine whether an error occurs or not.

@vasyanewlifehere
Copy link
Author

vasyanewlifehere commented May 11, 2020

Sorry.
After you comment I've checked once again.
It doesn't work just on my UDF blue ray mounted file systems.
It work correct in other cases.
than it is possible not a degradation.
but in any case the code throws "uncaught" exception and my program fails.

@addaleax
Copy link
Member

Well, as I mentioned, it seems like a real bug for sure – but it would have been nice to have a reproduction available as a basis for a regression test. 👍

Basically, the issue is that this line:

lazyLoadFs().lstat(pathModule.join(path, name), (err, stats) => {

uses path.join, which can only handle strings, but path and/or name can be Buffer instances as well (so 4 combinations in total – string, string + Buffer, string + string, Buffer + Buffer, Buffer). This happens a few times in that file, all of which probably need to be updated – it might make sense to put that in a separate join function that takes any of the 4 argument combinations.

@vasyanewlifehere If you like, you are welcome to work on a PR?

@addaleax addaleax added good first issue Issues that are suitable for first-time contributors. mentor-available labels May 11, 2020
@shackijj
Copy link
Contributor

shackijj commented May 11, 2020

I could help with this issue.

(UPD): will start working on it tommorow if there will be no answer from @vasyanewlifehere

@vasyanewlifehere are you planning to work on PR?

@shackijj
Copy link
Contributor

shackijj commented May 13, 2020

it would have been nice to have a reproduction available as a basis for a regression test

The following lines of code work if a dirent type is UV_DIRENT_UNKNOWN (which is crossplatform equivalent of DT_UNKNOWN).

lazyLoadFs().lstat(pathModule.join(path, name), (err, stats) => {

lazyLoadFs().lstat(pathModule.join(path, name), (err, stats) => {

As to create a reproduction we need to force OS to return DT_UNKNOWN. I spent an hour on researching, and still don't know how to achive this on macOS.

Currently, only some filesystems (among them: Btrfs, ext2, ext3, and ext4) have full support for returning the file type in d_type. All applications must properly handle a return of DT_UNKNOWN.

The paragraph above is from this manpage. It got me thinking, that for reporoducing the bug on CI we will need to mount such a FS that will cause OS to return UV_DIRENT_UNKNOWN.

Alternatively, according to this and that, we could find such a system where DT_UNKNOWN if undefined, so libuv will always return UV_DIRENT_UNKNOWN

All that sounds really tough). @addaleax Are there other ways to force OS to return DT_UNKNOWN?

Instead of having a reproduction, we could create unit tests for getDirents and getDirent. In the tests we might mock OS calls.

In code it will look something like

function getDirent(path, name, type, callback, loadFs = lazyLoadFs) {
  if (typeof callback === 'function') {
    if (type === UV_DIRENT_UNKNOWN) {
      loadFs().lstat(pathModule.join(path, name), (err, stats) => {
      // rest of the code
}

Tests

   function callback(err) {
      assert.strictlyEqual(err, null);
   }
   function lazyLoadFsMock() {
      // return something
   }
   getDirent(Buffer('.'), 'foo', 'bar', callback, lazyLoadFsMock)

UPD: I created a reproduction on macOS. I definetely won't work on Windows. The test doesn't fail on Ubuntu.

'use strict';
const tmpdir = require('../common/tmpdir');
const fs = require('fs');
const common = require('../common');
const assert = require('assert');

fs.readdir(
    Buffer.from("/dev"),
    {withFileTypes: true, encoding: "buffer"},
    common.mustCall((error,d) => {
        assert.strictEqual(error, null)
    })
)

@addaleax will it be OK if add unit tests only?

@shackijj
Copy link
Contributor

@addaleax Can we assume that name and path args are correct and buffers are in 'utf-8' when we get string + Buffer or Buffer + string or Buffer + Buffer combination?
If yes, then the join function will look like that

path: string,               -> path
path: string, name: Buffer  -> Buffer.concat([Buffer.from(path + path.sep)), name])
path: string, name: string  -> path.join(string, string)
path: Buffer, name: Buffer  -> Buffet.concat([path,  Buffer.from(path.sep), name])
path: Buffer, name: string  -> Buffer.concat([path, Buffer.from(path.sep + name)])
path: Buffer                -> path

If not, it seems that we need to implement something similar to path.join for buffers.

> path.join('123/////', '////123')
'123/123'
> path.join('123/////', '////123/////')
'123/123/'

Considereing the fact that a buffer may contain data in different encodings, the algorythm will be quite complex. Also, it seems to me that we can't stringify buffers and use path.join since we don't know a Buffer's encoding.

@kkz13250
Copy link

kkz13250 commented Jun 8, 2020

I'd like to work on this issue. Has it been fixed?

@shackijj
Copy link
Contributor

shackijj commented Jun 12, 2020

@kkz13250 It's not been fixed. I started to work on this issue and created a reproduction here.

I think I know how to do it. We could use options.encoding field as to perform convertions from Buffer to string and vice versa. will continue working on PR ASAP.

@addaleax
Copy link
Member

@shackjjj Thanks for the ping – no, unfortunately we can’t assume that Buffers contain UTF-8, otherwise we wouldn’t need to have a Buffer variant in the fs API. We can, however, assume that / is the path separator, so joining two Buffer paths in this case here boils down to Buffer.concat([buf1, slashBuffer, buf2]).

shackijj added a commit to shackijj/node that referenced this issue Jun 13, 2020
shackijj added a commit to shackijj/node that referenced this issue Jun 23, 2020
shackijj added a commit to shackijj/node that referenced this issue Jun 25, 2020
Refs nodejs#33348

Co-authored-by: James M Snell <jasnell@gmail.com>
codebytere pushed a commit that referenced this issue Jun 27, 2020
Fixes: #33348

PR-URL: #33395
Refs: #33348
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere pushed a commit that referenced this issue Jun 30, 2020
Fixes: #33348

PR-URL: #33395
Refs: #33348
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere pushed a commit that referenced this issue Jul 10, 2020
Fixes: #33348

PR-URL: #33395
Refs: #33348
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere pushed a commit that referenced this issue Jul 12, 2020
Fixes: #33348

PR-URL: #33395
Refs: #33348
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere pushed a commit that referenced this issue Jul 14, 2020
Fixes: #33348

PR-URL: #33395
Refs: #33348
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. fs Issues and PRs related to the fs subsystem / file system. good first issue Issues that are suitable for first-time contributors.
Projects
None yet
Development

No branches or pull requests

4 participants