Node occasionally gives multiple files/folders the same inode #12115

Closed
alutman opened this Issue Mar 29, 2017 · 53 comments

Comments

Projects
None yet
@alutman

alutman commented Mar 29, 2017

Moderator's note (@Fishrock123): Off-topic comments have and will be deleted.


  • Version: v7.2.1
  • Platform: Windows 10 Pro 64bit, Windows 7 Enterprise 64bit (Both using NTFS)
  • Subsystem: File System

Node sometimes reports different files/folders to have identical ino values.
I can't reproduce this consistently and copying/reproducing a folder structure that contains dupes elsewhere doesn't replicate the issue.

I did encounter lots of duplicates under C:\Users\%USER%\AppData but it may be different for other people

Example

Specific example I encountered

# Structure
│   ServerStore.jsx
│
├───constants
│       Api.jsx
│
└───stores
        UserStore.jsx

> fs.lstatSync("stores").ino
5910974511014218
> fs.lstatSync("stores/UserStore.jsx").ino
24206847997202570
> fs.lstatSync("constants").ino //Duplicate
9851624184963316
> fs.lstatSync("constants/Api.jsx").ino //Duplicate
9851624184963316
> fs.lstatSync("ServerStore.jsx").ino
3659174697792238

Test Script

Here's a hacky node script to loop through a directory and look for duplicate inodes.
Running it on most of my other folders didn't yield a result, until I ran it on C:\Users\%USER%\AppData where I encounted loads of duplicates

Usage: node dupe.js [dir]

var fs = require('fs');
var path = require('path');
var process = require('process');

// Recursively walks a directory and looks for duplicate inodes
// loop from http://stackoverflow.com/questions/5827612/node-js-fs-readdir-recursive-directory-search

var dir = process.argv[2];
if (dir == undefined) {
    dir = '.';
}

var walk = function(dir, done) {

  var results = [];
  fs.readdir(dir, function(err, list) {

    if (err) return done(err);

    var pending = list.length;
    if (!pending) return done(null, results);

    list.forEach(function(file) {

      file = path.resolve(dir, file);
      fs.stat(file, function(err, stat) {
        if(stat && stat.ino) {
            results.push({
                file: file,
                ino: stat.ino
            });
        }

        if (stat && stat.isDirectory()) {
          walk(file, function(err, res) {
            if(res) {
               results = results.concat(res);
            }
            if (!--pending) done(null, results);
          });
        } 
        else {
          if (!--pending) done(null, results);
        }
      });
    });
  });
};

walk(dir, function(err, results) {
    var merge = {};
    results.forEach(function(it) {
        if (!merge[it.ino]) {
            merge[it.ino] = [];
        }
        merge[it.ino].push(it.file);
    });
    var dupes = Object.keys(merge).filter(key => merge[key].length > 1);

    dupes.forEach(it => console.log(it, merge[it]));
})

@mscdex mscdex added the fs label Mar 29, 2017

@bnoordhuis

This comment has been minimized.

Show comment
Hide comment
@bnoordhuis

bnoordhuis Mar 29, 2017

Member

I suppose it could be either a Windows bug or quirk1 but it could also be caused by node.js converting libuv's 64 bits st_ino field to a double, which won't be lossless for large numbers.

I expect it's the latter that is happening here. You should be able to verify that by instrumenting src/node_file.cc if you are so inclined. Grep for 'st_ino', it's used in only one location.

1 The inode number is derived from NtQueryInformationFile(FileAllInformation). It has some documented quirks and probably a few undocumented as well, see e.g. the section 'Remarks' on this page.

Member

bnoordhuis commented Mar 29, 2017

I suppose it could be either a Windows bug or quirk1 but it could also be caused by node.js converting libuv's 64 bits st_ino field to a double, which won't be lossless for large numbers.

I expect it's the latter that is happening here. You should be able to verify that by instrumenting src/node_file.cc if you are so inclined. Grep for 'st_ino', it's used in only one location.

1 The inode number is derived from NtQueryInformationFile(FileAllInformation). It has some documented quirks and probably a few undocumented as well, see e.g. the section 'Remarks' on this page.

@alutman

This comment has been minimized.

Show comment
Hide comment
@alutman

alutman Mar 30, 2017

I had a little play with that Windows API and ran it directly against files that node reported with the same ino (Using C# DLL import)

Example

Node 
> fs.lstatSync('one.gif').ino
9851624185071828
> fs.lstatSync('two.gif').ino
9851624185071828

Windows 7 NTFS via NtQueryInformationFile
(QuadPart is what is used in libuv)

file: one.gif
type: LARGE_INTEGER
QuadPart: 9851624185071827

file: two.gif
type: LARGE_INTEGER
QuadPart: 9851624185071829

Looks like you were right about the problem being in the double conversion. Here's a C++ snippet highlighting it.

#include <iostream>

int main() {
    long long a = 9851624185071827;
    long long b = 9851624185071829;

    std::cout.precision(100);
    std::cout << a << std::endl; //9851624185071827
    double da = a;
    std::cout << da << std::endl; //9851624185071828

    std::cout << b << std::endl; //9851624185071829
    double db = b;
    std::cout << db << std::endl; //9851624185071828

    return 0;
}

Closest possible fix I could think of is to change the fs_stats_field_array type from double to long long but that sounds like it would just make more problems.

alutman commented Mar 30, 2017

I had a little play with that Windows API and ran it directly against files that node reported with the same ino (Using C# DLL import)

Example

Node 
> fs.lstatSync('one.gif').ino
9851624185071828
> fs.lstatSync('two.gif').ino
9851624185071828

Windows 7 NTFS via NtQueryInformationFile
(QuadPart is what is used in libuv)

file: one.gif
type: LARGE_INTEGER
QuadPart: 9851624185071827

file: two.gif
type: LARGE_INTEGER
QuadPart: 9851624185071829

Looks like you were right about the problem being in the double conversion. Here's a C++ snippet highlighting it.

#include <iostream>

int main() {
    long long a = 9851624185071827;
    long long b = 9851624185071829;

    std::cout.precision(100);
    std::cout << a << std::endl; //9851624185071827
    double da = a;
    std::cout << da << std::endl; //9851624185071828

    std::cout << b << std::endl; //9851624185071829
    double db = b;
    std::cout << db << std::endl; //9851624185071828

    return 0;
}

Closest possible fix I could think of is to change the fs_stats_field_array type from double to long long but that sounds like it would just make more problems.

@bnoordhuis

This comment has been minimized.

Show comment
Hide comment
@bnoordhuis

bnoordhuis Mar 30, 2017

Member

Yes, that wouldn't be a solution. JS doesn't have a 64 bits integral type, only a 64 bits floating-point type. Values >= 253 and <= -253 cannot be represented exactly and get rounded up or down.

We could detect such out-of-range values and present them as strings instead of numbers but it's an open question if a type change like that qualifies as a bug fix or a backwards-incompatible semver-major change.

cc @mscdex since you worked on that code recently.

Member

bnoordhuis commented Mar 30, 2017

Yes, that wouldn't be a solution. JS doesn't have a 64 bits integral type, only a 64 bits floating-point type. Values >= 253 and <= -253 cannot be represented exactly and get rounded up or down.

We could detect such out-of-range values and present them as strings instead of numbers but it's an open question if a type change like that qualifies as a bug fix or a backwards-incompatible semver-major change.

cc @mscdex since you worked on that code recently.

@mscdex

This comment has been minimized.

Show comment
Hide comment
@mscdex

mscdex Mar 30, 2017

Contributor

There really is no good solution for this until JS gets 64-bit types. IIRC this isn't the only place where node hopes for the best by using a double for 64-bit integer values.

Contributor

mscdex commented Mar 30, 2017

There really is no good solution for this until JS gets 64-bit types. IIRC this isn't the only place where node hopes for the best by using a double for 64-bit integer values.

@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Mar 30, 2017

Member

Best thing to do might be make a native module that copies the core part that you need but returns a string?

Member

Fishrock123 commented Mar 30, 2017

Best thing to do might be make a native module that copies the core part that you need but returns a string?

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Mar 30, 2017

Member

Alternatively we might think about having a second Stats constructor that treats the uint64_t fields returned by libuv faithfully… we’ve also had people asking for nanosecond resolution in the timer fields (that is otherwise lost by new Date()), this could be a good chance to tackle that, too.

Member

addaleax commented Mar 30, 2017

Alternatively we might think about having a second Stats constructor that treats the uint64_t fields returned by libuv faithfully… we’ve also had people asking for nanosecond resolution in the timer fields (that is otherwise lost by new Date()), this could be a good chance to tackle that, too.

@mscdex

This comment has been minimized.

Show comment
Hide comment
@mscdex

mscdex Mar 30, 2017

Contributor

Alternatively we might think about having a second Stats constructor

How would that work (having two different constructors at the same time)? Does someone opt-in to the new one with a flag passed to the fs.*stat*() methods or something?

that treats the uint64_t fields returned by libuv faithfully

Define 'faithfully.' Is this an array with the upper and lower 32-bits? A string? An 8-byte Buffer? A new BigInt class of our own? Something else?

we’ve also had people asking for nanosecond resolution in the timer fields (that is otherwise lost by new Date())

I am all for getting rid of the Date instances, it only adds more overhead if users just end up extracting the integer timestamp out of it anyway. At least if we just provide the number, end users could just pass it to new Date() if they need that kind of functionality and everyone else would get a speedup.

Contributor

mscdex commented Mar 30, 2017

Alternatively we might think about having a second Stats constructor

How would that work (having two different constructors at the same time)? Does someone opt-in to the new one with a flag passed to the fs.*stat*() methods or something?

that treats the uint64_t fields returned by libuv faithfully

Define 'faithfully.' Is this an array with the upper and lower 32-bits? A string? An 8-byte Buffer? A new BigInt class of our own? Something else?

we’ve also had people asking for nanosecond resolution in the timer fields (that is otherwise lost by new Date())

I am all for getting rid of the Date instances, it only adds more overhead if users just end up extracting the integer timestamp out of it anyway. At least if we just provide the number, end users could just pass it to new Date() if they need that kind of functionality and everyone else would get a speedup.

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Mar 30, 2017

Member

How would that work (having two different constructors at the same time)?

Yes, a flag (not a fan) or a new set of functions on fs (more of a fan because that won’t slow down any existing code).

Define 'faithfully.' Is this an array with the upper and lower 32-bits?

Something like that. It’s not actually important to me which of your suggestions – they all make sense – but generally something that people could reimplement the current Stats upon in any way they want.

Member

addaleax commented Mar 30, 2017

How would that work (having two different constructors at the same time)?

Yes, a flag (not a fan) or a new set of functions on fs (more of a fan because that won’t slow down any existing code).

Define 'faithfully.' Is this an array with the upper and lower 32-bits?

Something like that. It’s not actually important to me which of your suggestions – they all make sense – but generally something that people could reimplement the current Stats upon in any way they want.

@mscdex

This comment has been minimized.

Show comment
Hide comment
@mscdex

mscdex Mar 30, 2017

Contributor

something that people could reimplement the current Stats upon in any way they want

I still don't understand this, how they are changing/providing Stats? Are they monkey patching/overwriting fs.Stats with their own implementation? Should we really be encouraging that?

Contributor

mscdex commented Mar 30, 2017

something that people could reimplement the current Stats upon in any way they want

I still don't understand this, how they are changing/providing Stats? Are they monkey patching/overwriting fs.Stats with their own implementation? Should we really be encouraging that?

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Mar 30, 2017

Member

@mscdex Sorry, probably not expressing myself clearly. Basically, what I’d imagine is a 2nd API on Node’s side, not changing the current API’s behaviour. If somebody wanted, they could build the current API on top of that as a userland module – I’m not suggesting that they monkey-patch anything (or “change” fs.Stats at all).

Member

addaleax commented Mar 30, 2017

@mscdex Sorry, probably not expressing myself clearly. Basically, what I’d imagine is a 2nd API on Node’s side, not changing the current API’s behaviour. If somebody wanted, they could build the current API on top of that as a userland module – I’m not suggesting that they monkey-patch anything (or “change” fs.Stats at all).

@jorangreef

This comment has been minimized.

Show comment
Hide comment
@jorangreef

jorangreef Mar 31, 2017

Contributor

Personally, I would prefer if Node fixed this properly going forward.

I think passing the inode as a String would be appropriate since an inode is in essence an identifier. It happens to look like an integer, but it may as well be hex for all intents and purposes. I'm not sure if passing the inode as a String would have any performance implications? AFAIK it's a minimum of 24 bytes allocation compared to an integer? I wouldn't mind a Buffer either if that's lighter.

Referring to @mscdex's comment, it would be terrific if we could do away with instantiating thousands of Date objects, when users really just need the timestamp.

Contributor

jorangreef commented Mar 31, 2017

Personally, I would prefer if Node fixed this properly going forward.

I think passing the inode as a String would be appropriate since an inode is in essence an identifier. It happens to look like an integer, but it may as well be hex for all intents and purposes. I'm not sure if passing the inode as a String would have any performance implications? AFAIK it's a minimum of 24 bytes allocation compared to an integer? I wouldn't mind a Buffer either if that's lighter.

Referring to @mscdex's comment, it would be terrific if we could do away with instantiating thousands of Date objects, when users really just need the timestamp.

@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Apr 2, 2017

Just split the 64-bit word into a high and low 32-bit word and leave arithmetic to the user.

kkoopa commented Apr 2, 2017

Just split the 64-bit word into a high and low 32-bit word and leave arithmetic to the user.

@jorangreef

This comment has been minimized.

Show comment
Hide comment
@jorangreef

jorangreef Apr 3, 2017

Contributor

Just another idea, the 64-bit inode integer returned by Windows could be remapped through a hash into the 53-bit address space supported by JS integers. This should give less collisions than the current folding.

Contributor

jorangreef commented Apr 3, 2017

Just another idea, the 64-bit inode integer returned by Windows could be remapped through a hash into the 53-bit address space supported by JS integers. This should give less collisions than the current folding.

@stickywes

This comment has been minimized.

Show comment
Hide comment
@stickywes

stickywes Apr 17, 2017

nodejs/node-v0.x-archive#2670

Here is a conversation from a few years ago about what to do with Windows inodes. It is interesting to see the same conversation about the issue happening all over again!

nodejs/node-v0.x-archive#2670

Here is a conversation from a few years ago about what to do with Windows inodes. It is interesting to see the same conversation about the issue happening all over again!

@stickywes

This comment has been minimized.

Show comment
Hide comment
@stickywes

stickywes Apr 17, 2017

It seems like a combination of the string solution @jorangreef advocates (and @piscisaureus advocated nearly 2 years ago) and the 64 bit high/low fields @kkoopa suggests is the way to go, to me. Or, rather, I don't see why we need to choose between the two.

A string will allow Node to provide a consistent cross-platform representation at the cost of performance during comparisons. Switching an integer to a string would be a breaking change, though, so you'd need to account for that.

The 64 high/low fields would allow you to do performant inode comparisons if you need them.

It seems like a combination of the string solution @jorangreef advocates (and @piscisaureus advocated nearly 2 years ago) and the 64 bit high/low fields @kkoopa suggests is the way to go, to me. Or, rather, I don't see why we need to choose between the two.

A string will allow Node to provide a consistent cross-platform representation at the cost of performance during comparisons. Switching an integer to a string would be a breaking change, though, so you'd need to account for that.

The 64 high/low fields would allow you to do performant inode comparisons if you need them.

@sebadoom

This comment has been minimized.

Show comment
Hide comment
@sebadoom

sebadoom Apr 18, 2017

I still find it odd no big-integer library is part of standard Node.js. It would have solved problems like this until JS got proper support for 64-bit integers.

I still find it odd no big-integer library is part of standard Node.js. It would have solved problems like this until JS got proper support for 64-bit integers.

@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Apr 18, 2017

Member

I still find it odd no big-integer library is part of standard Node.js. It would have solved problems like this until JS got proper support for 64-bit integers.

I suppose we could have done that like with Buffer. I'm not sure why we didn't but I am quite sure there would have been some history of that coming up that pre-dates me.

Just split the 64-bit word into a high and low 32-bit word and leave arithmetic to the user.

I agree with this, we could just return a Uint32Array of 2.

The questions then become:

  • Is "BigNum" support far enough out that this is worth it?
  • Can we transition easier to "BigNum" from the existing APIs, or from from a new API, or will we end up with 3?
Member

Fishrock123 commented Apr 18, 2017

I still find it odd no big-integer library is part of standard Node.js. It would have solved problems like this until JS got proper support for 64-bit integers.

I suppose we could have done that like with Buffer. I'm not sure why we didn't but I am quite sure there would have been some history of that coming up that pre-dates me.

Just split the 64-bit word into a high and low 32-bit word and leave arithmetic to the user.

I agree with this, we could just return a Uint32Array of 2.

The questions then become:

  • Is "BigNum" support far enough out that this is worth it?
  • Can we transition easier to "BigNum" from the existing APIs, or from from a new API, or will we end up with 3?
@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Apr 18, 2017

Member

Actually, on second look we'd be adding a property to fs.Stats, which seems generally less troublesome.

Member

Fishrock123 commented Apr 18, 2017

Actually, on second look we'd be adding a property to fs.Stats, which seems generally less troublesome.

@dchest

This comment has been minimized.

Show comment
Hide comment
@dchest

dchest Apr 18, 2017

Why is there even discussion about big or small numbers? inode is a number only by accident (because it was an index into some internal array on early UNIX), in reality and semantically it's just an arbitrary identifier. Arithmetic on it is useless, so make it a string, object, whatever, it doesn't matter, just make sure it can be converted back and forth into the OS representation internally.

dchest commented Apr 18, 2017

Why is there even discussion about big or small numbers? inode is a number only by accident (because it was an index into some internal array on early UNIX), in reality and semantically it's just an arbitrary identifier. Arithmetic on it is useless, so make it a string, object, whatever, it doesn't matter, just make sure it can be converted back and forth into the OS representation internally.

@deckar01

This comment has been minimized.

Show comment
Hide comment
@deckar01

deckar01 Apr 19, 2017

FWIW comparing hex encoded strings is actually faster not that much slower than comparing UInt32Arrays in V8. Theoretically string comparison should be ~6.67% slower, but V8 optimizations make it 17% faster.

https://jsperf.com/64bit-comparison
https://jsperf.com/64bit-comparison-es-5

Strings also make better keys than arrays. 😉

deckar01 commented Apr 19, 2017

FWIW comparing hex encoded strings is actually faster not that much slower than comparing UInt32Arrays in V8. Theoretically string comparison should be ~6.67% slower, but V8 optimizations make it 17% faster.

https://jsperf.com/64bit-comparison
https://jsperf.com/64bit-comparison-es-5

Strings also make better keys than arrays. 😉

@TimothyGu

This comment has been minimized.

Show comment
Hide comment
@TimothyGu

TimothyGu Apr 19, 2017

Member

@deckar01 your benchmark is a bit flawed as it uses the let syntax which probably forces V8 to use TurboFan, which does not optimize TypedArrays as well as Crankshaft currently does. However, even with that, my Chrome 57 still runs faster on the TypedArray case than on strings. And if I eliminate the use of let, TypedArray is quite a bit faster than strings (158 vs. 102), which makes sense.

Member

TimothyGu commented Apr 19, 2017

@deckar01 your benchmark is a bit flawed as it uses the let syntax which probably forces V8 to use TurboFan, which does not optimize TypedArrays as well as Crankshaft currently does. However, even with that, my Chrome 57 still runs faster on the TypedArray case than on strings. And if I eliminate the use of let, TypedArray is quite a bit faster than strings (158 vs. 102), which makes sense.

@dchest

This comment has been minimized.

Show comment
Hide comment
@dchest

dchest Apr 19, 2017

Typed arrays are mutable though, so string is a better choice for immutable identifiers. (This is a kind of a design decision that should be considered and discussed before doing benchmarks.)

Strings are also transparently encoded and decoded as JSON, unlike typed arrays, so programs that store inodes in JSON-encoded structures will continue doing so without issues. Finally, strings can be compared with ===, just like numbers.

dchest commented Apr 19, 2017

Typed arrays are mutable though, so string is a better choice for immutable identifiers. (This is a kind of a design decision that should be considered and discussed before doing benchmarks.)

Strings are also transparently encoded and decoded as JSON, unlike typed arrays, so programs that store inodes in JSON-encoded structures will continue doing so without issues. Finally, strings can be compared with ===, just like numbers.

@Fishrock123

This comment has been minimized.

Show comment
Hide comment
@Fishrock123

Fishrock123 Apr 19, 2017

Member

I, personally, am totally fine with a string if that works well.

Member

Fishrock123 commented Apr 19, 2017

I, personally, am totally fine with a string if that works well.

@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Apr 19, 2017

Member

given the lack of anything else more suitable, I'm good with string also

Member

jasnell commented Apr 19, 2017

given the lack of anything else more suitable, I'm good with string also

@Memnarch

This comment has been minimized.

Show comment
Hide comment
@Memnarch

Memnarch Apr 20, 2017

I'd just have gone with low/high 32bit or 8Byte array.

Btw just to mention, since we are at using strings anyway as it seems: ReFS uses 128Bit identifiers. The 64bit part is not unique there:

The ReFS file system, introduced with Windows Server 2012, includes 128-bit file identifiers. To retrieve the 128-bit file identifier use the GetFileInformationByHandleEx function with FileIdInfo to retrieve the FILE_ID_INFO structure. The 64-bit identifier in this structure is not guaranteed to be unique on ReFS.

It seems that NtQueryInformationFile FILE_INTERNAL_INFORMATION structure and GetFileInformationByHandle(Ex) deliver the FileIDs, except the later one supports returning the 128Bit one in FILE_ID_INFO structure

FILE_INTERNAL_INFORMATION
https://msdn.microsoft.com/de-de/library/windows/hardware/ff540318(v=vs.85).aspx
FILE_ID_INFO
https://msdn.microsoft.com/en-us/library/hh802691(v=vs.85).aspx

Memnarch commented Apr 20, 2017

I'd just have gone with low/high 32bit or 8Byte array.

Btw just to mention, since we are at using strings anyway as it seems: ReFS uses 128Bit identifiers. The 64bit part is not unique there:

The ReFS file system, introduced with Windows Server 2012, includes 128-bit file identifiers. To retrieve the 128-bit file identifier use the GetFileInformationByHandleEx function with FileIdInfo to retrieve the FILE_ID_INFO structure. The 64-bit identifier in this structure is not guaranteed to be unique on ReFS.

It seems that NtQueryInformationFile FILE_INTERNAL_INFORMATION structure and GetFileInformationByHandle(Ex) deliver the FileIDs, except the later one supports returning the 128Bit one in FILE_ID_INFO structure

FILE_INTERNAL_INFORMATION
https://msdn.microsoft.com/de-de/library/windows/hardware/ff540318(v=vs.85).aspx
FILE_ID_INFO
https://msdn.microsoft.com/en-us/library/hh802691(v=vs.85).aspx

@decal

This comment has been minimized.

Show comment
Hide comment
@decal

decal Apr 20, 2017

I did encounter lots of duplicates under C:\Users%USER%\AppData but it may be different for other people

I would venture a guess that this is because the AppData folder contains many files that are <= the block size; causing new file control block creation in large numbers, and therefore inodes as well.. This almost sounds like a race condition in the underlying file system driver? shrug

decal commented Apr 20, 2017

I did encounter lots of duplicates under C:\Users%USER%\AppData but it may be different for other people

I would venture a guess that this is because the AppData folder contains many files that are <= the block size; causing new file control block creation in large numbers, and therefore inodes as well.. This almost sounds like a race condition in the underlying file system driver? shrug

@bacondropped

This comment has been minimized.

Show comment
Hide comment
@bacondropped

bacondropped Apr 20, 2017

@decal it's probably more because of the large number of config files, caches, and whatnot stored in there for the whole zoo of Windows applications. Create a lot of files in the same directory and some of their inodes are bound to be close.

@decal it's probably more because of the large number of config files, caches, and whatnot stored in there for the whole zoo of Windows applications. Create a lot of files in the same directory and some of their inodes are bound to be close.

@bnoordhuis

This comment has been minimized.

Show comment
Hide comment
@bnoordhuis

bnoordhuis Apr 20, 2017

Member

Off-topic comment deleted. Next offender gets a ban.

(Added a note to the OP -@Fishrock123)

Member

bnoordhuis commented Apr 20, 2017

Off-topic comment deleted. Next offender gets a ban.

(Added a note to the OP -@Fishrock123)

@wmertens

This comment has been minimized.

Show comment
Hide comment
@wmertens

wmertens Apr 20, 2017

Given that filesystem programming is not the main focus of Node, it seems to me that Strings would be fine. They can even be returned instead of the current Numbers; most programs will keep on working.

Does the string have to be hex-encoded though? How about a more efficient encoding, base-64 or higher? Do the strings have to be limited to non-control ASCII? The strings don't have to look nice, they should just be identifiers.

Given that filesystem programming is not the main focus of Node, it seems to me that Strings would be fine. They can even be returned instead of the current Numbers; most programs will keep on working.

Does the string have to be hex-encoded though? How about a more efficient encoding, base-64 or higher? Do the strings have to be limited to non-control ASCII? The strings don't have to look nice, they should just be identifiers.

@jorangreef

This comment has been minimized.

Show comment
Hide comment
@jorangreef

jorangreef Apr 20, 2017

Contributor

Does the string have to be hex-encoded though? How about a more efficient encoding, base-64 or higher?

Strings have a minimum 24 byte memory overhead in V8 if I recall correctly - irrespective of how little you store in them. Anything less than 24 bytes will still use 24 bytes. Using an encoding would only add cpu overhead and make the string harder to compare with stats returned directly by the filesystem.

Contributor

jorangreef commented Apr 20, 2017

Does the string have to be hex-encoded though? How about a more efficient encoding, base-64 or higher?

Strings have a minimum 24 byte memory overhead in V8 if I recall correctly - irrespective of how little you store in them. Anything less than 24 bytes will still use 24 bytes. Using an encoding would only add cpu overhead and make the string harder to compare with stats returned directly by the filesystem.

@jorangreef

This comment has been minimized.

Show comment
Hide comment
@jorangreef

jorangreef Apr 20, 2017

Contributor

Would returning the inode as a raw Buffer incur less overhead than a String?

Contributor

jorangreef commented Apr 20, 2017

Would returning the inode as a raw Buffer incur less overhead than a String?

@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Apr 20, 2017

Member

Would returning the inode as a raw Buffer incur less overhead than a String

Unfortunately, no. A Buffer object has a number of issues associated with it. In this particular case, one of the most significant is the fact that a Buffer is not immutable.

Member

jasnell commented Apr 20, 2017

Would returning the inode as a raw Buffer incur less overhead than a String

Unfortunately, no. A Buffer object has a number of issues associated with it. In this particular case, one of the most significant is the fact that a Buffer is not immutable.

@sam-github

This comment has been minimized.

Show comment
Hide comment
@sam-github

sam-github Apr 20, 2017

Member

Why does it matter that a buffer is mutable? The stat object returned is itself mutable. Upside of buffers is they have APIs to extract the high and low parts of them. Downside is two buffers with same contents are not comparable, which is a big downside I think for inodes.

I don't think the stat should be a number unless its out of representational range, and then be a string, hopefully that is not one of the suggestions.

If its a hex string, which I support, should it be prefixed with 0x?

Member

sam-github commented Apr 20, 2017

Why does it matter that a buffer is mutable? The stat object returned is itself mutable. Upside of buffers is they have APIs to extract the high and low parts of them. Downside is two buffers with same contents are not comparable, which is a big downside I think for inodes.

I don't think the stat should be a number unless its out of representational range, and then be a string, hopefully that is not one of the suggestions.

If its a hex string, which I support, should it be prefixed with 0x?

@robjtede

This comment has been minimized.

Show comment
Hide comment
@robjtede

robjtede Apr 20, 2017

Strings also seem to make more sense looking ahead to 128-bit identifiers as @Memnarch mentioned. It would make comparisons simpler; we would not have to check the TypedArray length to determine how many elements to compare.

Though I'd prefer BigInt/Larger Integer Size support in ES itself.

Strings also seem to make more sense looking ahead to 128-bit identifiers as @Memnarch mentioned. It would make comparisons simpler; we would not have to check the TypedArray length to determine how many elements to compare.

Though I'd prefer BigInt/Larger Integer Size support in ES itself.

@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Apr 20, 2017

Member

I'm good with it being a hex string. The leading 0x wouldn't seem necessary.

Member

jasnell commented Apr 20, 2017

I'm good with it being a hex string. The leading 0x wouldn't seem necessary.

@wmertens

This comment has been minimized.

Show comment
Hide comment
@wmertens

wmertens Apr 20, 2017

@TimothyGu

This comment has been minimized.

Show comment
Hide comment
@TimothyGu

TimothyGu Apr 20, 2017

Member

I'm good with it being a string. Now the question is, do we want to change the type of ino property or create a new one for compatibility?

Member

TimothyGu commented Apr 20, 2017

I'm good with it being a string. Now the question is, do we want to change the type of ino property or create a new one for compatibility?

@wmertens

This comment has been minimized.

Show comment
Hide comment
@wmertens

wmertens Apr 24, 2017

@TimothyGu we could make ino a getter? But what would the new name be?

I think there will be very little breakage by making ino a string though. I don't think it's sane to be doing arithmetic on inodes, or using it as a sparse array index…

@TimothyGu we could make ino a getter? But what would the new name be?

I think there will be very little breakage by making ino a string though. I don't think it's sane to be doing arithmetic on inodes, or using it as a sparse array index…

@TimothyGu

This comment has been minimized.

Show comment
Hide comment
@TimothyGu

TimothyGu Apr 25, 2017

Member

@wmertens I don't see the advantages in making it a getter if all it returns is a constant string anyway.

I'd err on the safe side (as we usually do regarding compatibility). How Node.js is used can sometimes be very... creative.

Member

TimothyGu commented Apr 25, 2017

@wmertens I don't see the advantages in making it a getter if all it returns is a constant string anyway.

I'd err on the safe side (as we usually do regarding compatibility). How Node.js is used can sometimes be very... creative.

@wmertens

This comment has been minimized.

Show comment
Hide comment
@wmertens

wmertens Apr 25, 2017

@vojtechkral

This comment has been minimized.

Show comment
Hide comment
@vojtechkral

vojtechkral May 9, 2017

I can't reproduce this consistently and copying/reproducing a folder structure that contains dupes elsewhere doesn't replicate the issue.

Presumably this could be made reproducible with a filesystem image mounted with DiskPart

I can't reproduce this consistently and copying/reproducing a folder structure that contains dupes elsewhere doesn't replicate the issue.

Presumably this could be made reproducible with a filesystem image mounted with DiskPart

@Bablakeluke

This comment has been minimized.

Show comment
Hide comment
@Bablakeluke

Bablakeluke May 13, 2017

Couldn't this just be a reinterpretation of the 64 bit integer to the 64 bit double rather than a conversion (which rounds and causes these issues)? I.e. the bit pattern is exactly the same - it's just treated as a double. The benefit of this being it's a simple change and it's doesn't break anything either.

I don't see why you'd need to convert to a string when a 64 bit double already has perfectly enough capacity - it's just the conversion from the 64 bit long to the 64 bit double which is being done incorrectly in this instance.

Bablakeluke commented May 13, 2017

Couldn't this just be a reinterpretation of the 64 bit integer to the 64 bit double rather than a conversion (which rounds and causes these issues)? I.e. the bit pattern is exactly the same - it's just treated as a double. The benefit of this being it's a simple change and it's doesn't break anything either.

I don't see why you'd need to convert to a string when a 64 bit double already has perfectly enough capacity - it's just the conversion from the 64 bit long to the 64 bit double which is being done incorrectly in this instance.

@dchest

This comment has been minimized.

Show comment
Hide comment
@dchest

dchest May 13, 2017

@Bablakeluke unfortunately, this will break comparison for inodes whose bit pattern result in NaN, since NaN != NaN. Example: https://play.golang.org/p/SaWpMu60Xt

dchest commented May 13, 2017

@Bablakeluke unfortunately, this will break comparison for inodes whose bit pattern result in NaN, since NaN != NaN. Example: https://play.golang.org/p/SaWpMu60Xt

@Bablakeluke

This comment has been minimized.

Show comment
Hide comment
@Bablakeluke

Bablakeluke May 13, 2017

@dchest Excellent point - I doubt that would ever actually happen however, considering the NaN bit patterns are all enormous numbers and inodes are fundamentally indices (i.e. no filesystem would support indices that high); i.e. you would apparently need 9,218,868,437,227,405,312 or more files for it to occur.

@dchest Excellent point - I doubt that would ever actually happen however, considering the NaN bit patterns are all enormous numbers and inodes are fundamentally indices (i.e. no filesystem would support indices that high); i.e. you would apparently need 9,218,868,437,227,405,312 or more files for it to occur.

@dchest

This comment has been minimized.

Show comment
Hide comment
@dchest

dchest May 13, 2017

That's pretty much the original issue — we can't and should not predict how inodes are generated by OS — they may even be randomized.

dchest commented May 13, 2017

That's pretty much the original issue — we can't and should not predict how inodes are generated by OS — they may even be randomized.

@vojtechkral

This comment has been minimized.

Show comment
Hide comment
@vojtechkral

vojtechkral May 13, 2017

@dchest I don't get what the problem in your example is. The fact that different NaNs (created from different inodes) don't compare true is a good thing. You don't want different inodes to compare true - that's the original bug.

@dchest I don't get what the problem in your example is. The fact that different NaNs (created from different inodes) don't compare true is a good thing. You don't want different inodes to compare true - that's the original bug.

@wmertens

This comment has been minimized.

Show comment
Hide comment
@wmertens

wmertens May 13, 2017

@dchest

This comment has been minimized.

Show comment
Hide comment
@dchest

dchest May 13, 2017

@vojtechkral sorry if it wasn't clear: the example showed that converting many different numbers give NaN value. But as @wmertens said, NaN is not equal to NaN even if it's represented by the same bits (that is, the same inode): https://play.golang.org/p/2EGBDaKpgM

Additional confusion adds the equality of -0 and +0: they are represented by different bits, but the values are equal in JavaScript: -0 === +0 is true.

dchest commented May 13, 2017

@vojtechkral sorry if it wasn't clear: the example showed that converting many different numbers give NaN value. But as @wmertens said, NaN is not equal to NaN even if it's represented by the same bits (that is, the same inode): https://play.golang.org/p/2EGBDaKpgM

Additional confusion adds the equality of -0 and +0: they are represented by different bits, but the values are equal in JavaScript: -0 === +0 is true.

@vojtechkral

This comment has been minimized.

Show comment
Hide comment
@vojtechkral

vojtechkral May 14, 2017

@dchest Ah, makes sense, thanks.

@dchest Ah, makes sense, thanks.

@wmertens

This comment has been minimized.

Show comment
Hide comment
@wmertens

wmertens May 14, 2017

@raffis

This comment has been minimized.

Show comment
Hide comment
@raffis

raffis Sep 15, 2017

We have encountered the same issue, inode collisions. I shortly wrote this alternative: https://github.com/gyselroth/node-windows-fsstat maybe it helps...

raffis commented Sep 15, 2017

We have encountered the same issue, inode collisions. I shortly wrote this alternative: https://github.com/gyselroth/node-windows-fsstat maybe it helps...

@XadillaX

This comment has been minimized.

Show comment
Hide comment
@XadillaX

XadillaX Nov 2, 2017

Member

I came from #16679.

IMHO my suggestion is to add a raw data into fs.Stats, or add a new function fs.safeStat to make ino stringify.

Member

XadillaX commented Nov 2, 2017

I came from #16679.

IMHO my suggestion is to add a raw data into fs.Stats, or add a new function fs.safeStat to make ino stringify.

XadillaX added a commit to XadillaX/node that referenced this issue Nov 7, 2017

fs: add `inoString` to fs.Stats
Since `ino` is an unsigned 64-bit integer, it may lost accuracy in
JavaScript. This situation may cause some strange bugs in filesystem. So
a string type `inoString` is added to `fs.Stats`.

Fixes: nodejs#16679
Fixes: nodejs#12115
Refs: http://docs.libuv.org/en/v1.x/fs.html#c.uv_stat_t

XadillaX added a commit to XadillaX/node that referenced this issue Nov 7, 2017

fs: add `inoString` to fs.Stats
Since `ino` is an unsigned 64-bit integer, it may lost accuracy in
JavaScript. This situation may cause some strange bugs in filesystem. So
a string type `inoString` is added to `fs.Stats`.

Fixes: nodejs#16679
Fixes: nodejs#12115
Refs: http://docs.libuv.org/en/v1.x/fs.html#c.uv_stat_t

@XadillaX XadillaX referenced this issue Nov 7, 2017

Closed

fs: add `inoString` to fs.Stats #16857

4 of 4 tasks complete

joyeecheung added a commit that referenced this issue Jun 7, 2018

doc: document BigInt support in fs.Stats
PR-URL: #20220
Fixes: #12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

targos added a commit that referenced this issue Jun 7, 2018

fs: support BigInt in fs.*stat and fs.watchFile
Add the `bigint: true` option to all the `fs.*stat` methods and
`fs.watchFile`.

PR-URL: #20220
Fixes: #12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

targos added a commit that referenced this issue Jun 7, 2018

doc: document BigInt support in fs.Stats
PR-URL: #20220
Fixes: #12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

shisama added a commit to shisama/node that referenced this issue Jun 9, 2018

fs: support BigInt in fs.*stat and fs.watchFile
Add the `bigint: true` option to all the `fs.*stat` methods and
`fs.watchFile`.

PR-URL: nodejs#20220
Fixes: nodejs#12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

shisama added a commit to shisama/node that referenced this issue Jun 9, 2018

doc: document BigInt support in fs.Stats
PR-URL: nodejs#20220
Fixes: nodejs#12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

shisama added a commit to shisama/node that referenced this issue Jun 11, 2018

fs: support BigInt in fs.*stat and fs.watchFile
Add the `bigint: true` option to all the `fs.*stat` methods and
`fs.watchFile`.

PR-URL: nodejs#20220
Fixes: nodejs#12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

shisama added a commit to shisama/node that referenced this issue Jun 11, 2018

doc: document BigInt support in fs.Stats
PR-URL: nodejs#20220
Fixes: nodejs#12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

shisama added a commit to shisama/node that referenced this issue Jun 12, 2018

fs: support BigInt in fs.*stat and fs.watchFile
Add the `bigint: true` option to all the `fs.*stat` methods and
`fs.watchFile`.

PR-URL: nodejs#20220
Fixes: nodejs#12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

shisama added a commit to shisama/node that referenced this issue Jun 12, 2018

doc: document BigInt support in fs.Stats
PR-URL: nodejs#20220
Fixes: nodejs#12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

targos added a commit that referenced this issue Jun 13, 2018

fs: support BigInt in fs.*stat and fs.watchFile
Add the `bigint: true` option to all the `fs.*stat` methods and
`fs.watchFile`.

PR-URL: #20220
Fixes: #12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

targos added a commit that referenced this issue Jun 13, 2018

doc: document BigInt support in fs.Stats
PR-URL: #20220
Fixes: #12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

shisama added a commit to shisama/node that referenced this issue Jun 17, 2018

fs: support BigInt in fs.*stat and fs.watchFile
Add the `bigint: true` option to all the `fs.*stat` methods and
`fs.watchFile`.

PR-URL: nodejs#20220
Fixes: nodejs#12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

shisama added a commit to shisama/node that referenced this issue Jun 17, 2018

doc: document BigInt support in fs.Stats
PR-URL: nodejs#20220
Fixes: nodejs#12115
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

@wmertens wmertens referenced this issue in mixu/file-dedupe Jul 9, 2018

Open

stat.ino can be truncated #4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment