fs.Stats.ino always returns 0 on Windows #2670

Open
marisks opened this Issue Feb 2, 2012 · 23 comments

Comments

Projects
None yet

marisks commented Feb 2, 2012

fs.Stats.ino always returns 0 on Windows. Libraries which depends on it, works incorrectly on Windows. For example, "findit".

"findit" has open issue related to this: substack/node-findit#5. @svelez mentions that fileID from FILE_ID_BOTH_DIR_INFO could be used for fs.Stats.ino.
More about FILE_ID_BOTH_DIR_INFO: http://msdn.microsoft.com/en-us/library/windows/desktop/aa364226%28v=vs.85%29.aspx

piscisaureus was assigned Feb 2, 2012

vvs commented Feb 26, 2012

Just wanted to add that this totally breaks 'findit', and as a consequence, every library that depends on findit is badly broken. That includes various test frameworks, e.g. jasmine-node, archivators, build systems.

This is a very big deal. Essentially, node.js on Windows is hosed due to this. It worth noting that this breakage is not very obvious. For example, jasmin-node would only execute a single spec file, ignoring the rest.

As a work around to findit being broken, walker solves this by not doing the inode check if the inode response is 0.

More generally, this plus the fact that windows fs.watchFile throws an exception means that many significant libraries are completely broken on windows and node is unusable for my use case without hacking libraries.

Is there any news on this? I think this is a big problem for node on Windows.

edit: I tested with both v0.6.15 and v0.7.8, the issue remains.

isaacs commented Nov 2, 2012

Node pretty much just blindly takes what libuv gives it here. Moved to joyent/libuv#613

isaacs closed this Nov 2, 2012

krisnye referenced this issue in michaelficarra/commonjs-everywhere May 10, 2013

Closed

-w option does not work on windows #60

piscisaureus reopened this Aug 24, 2013

Member

piscisaureus commented Aug 24, 2013

ino can reasonably be a 64-bit value on windows. We have to change the the stats object to accommodate these.

@tjfontaine suggested that we could change the type of ino and dev to string?

korve commented Feb 5, 2014

Any news on that?

Owner

jasnell commented May 15, 2015

@piscisaureus ... what's the status on this one?

orangemocha added the P-2 label May 15, 2015

Member

piscisaureus commented May 15, 2015

The ino field has been supported on windows for a while. I believe this was already the case with node v0.10; the output below is from node v0.12.2.

The issue that remains is that the ino on windows is a 64-bit int, but javascript can't represent 64-bit integer values precisely. For this reason we should probably make the ino field a string.

C:\Users\Bert Belder>node -pe "require('fs').statSync('/')"
{ dev: -1630166251,
  mode: 16822,
  nlink: 1,
  uid: 0,
  gid: 0,
  rdev: 0,
  blksize: undefined,
  ino: 1407374883553285,
  size: 0,
  blocks: undefined,
  atime: Fri May 15 2015 00:14:22 GMT-0700 (Pacific Daylight Time),
  mtime: Fri May 15 2015 00:14:22 GMT-0700 (Pacific Daylight Time),
  ctime: Fri May 15 2015 00:14:22 GMT-0700 (Pacific Daylight Time),
  birthtime: Mon Jul 13 2009 19:38:56 GMT-0700 (Pacific Daylight Time) }
Owner

jasnell commented May 15, 2015

Ok. Changing ino would be a breaking change, no? Perhaps adding a
secondary strino field would work as an alternative? Or do we feel a
break would be ok?
On May 15, 2015 12:21 PM, "Bert Belder" notifications@github.com wrote:

The ino field has been supported on windows for a while. I believe this
was already the case with node v0.10; the output below is from node v0.12.2.

The issue that remains is that the ino on windows is a 64-bit int, but
javascript can't represent 64-bit integer values precisely. For this reason
we should probably make the ino field a string.

C:\Users\Bert Belder>node -pe "require('fs').statSync('/')"
{ dev: -1630166251,
mode: 16822,
nlink: 1,
uid: 0,
gid: 0,
rdev: 0,
blksize: undefined,
ino: 1407374883553285,
size: 0,
blocks: undefined,
atime: Fri May 15 2015 00:14:22 GMT-0700 (Pacific Daylight Time),
mtime: Fri May 15 2015 00:14:22 GMT-0700 (Pacific Daylight Time),
ctime: Fri May 15 2015 00:14:22 GMT-0700 (Pacific Daylight Time),
birthtime: Mon Jul 13 2009 19:38:56 GMT-0700 (Pacific Daylight Time) }


Reply to this email directly or view it on GitHub
joyent#2670 (comment).

Member

piscisaureus commented May 15, 2015

Ok. Changing ino would be a breaking change, no?

Yes, indeed.

Perhaps adding a secondary strino field would work as an alternative?

That may be better indeed. However, as we're at it, let's consider that the only thing that makes ino useful is the invariant that dev+ino uniquely identifies a file. So we might also concatenate both values and put it in a field named uid or something.

The issue that remains is that the ino on windows is a 64-bit int

Mostly 64bit. However, It's 128bit for Windows Server 2012 ReFS.

For the 64bit part of the problem (for the majority of Windows file systems), 32bit number type is still useful since Windows returns a file handle that has 32bit high and 32bit low identifiers. In case the os is Windows, 'ino' may contain lower index and a separate ino_high or something like that can hold the higher end.

ReFS is another story and requires additional attention. i.e. the device number is also 64 bit.

Owner

jasnell commented May 18, 2015

@piscisaureus The uid idea with concatenated dev and ino makes sense. /cc @orangemocha: any additional thoughts on this one?

@obastemur my key concern with splitting the high and low identifiers like that is basic usability. How would you expect it to be used from within node that would not be covered by the concatenated uid idea?

Member

piscisaureus commented May 18, 2015

Mostly 64bit. However, It's 128bit for Windows Server 2012 ReFS.

@obastemur Do you have any background on that? I use the IndexNumber field for ino, which is 64-bit, and afaict this API hasn't changed in windows 10.

https://github.com/libuv/libuv/blob/a6fa3ca99a379912167c45b78a8d03ad23fa6d33/src/win/fs.c#L1087

Mostly 64bit. However, It's 128bit for Windows Server 2012 ReFS.
Do you have any background on that?
@piscisaureus MSDN: https://msdn.microsoft.com/en-us/library/aa363788%28v=vs.85%29.aspx

@jasnell The original problem here is that currently node fs.stat.ino does return 0 for Windows. @piscisaureus offered uid because of a) 64bit index number on Windows b) node eventually concatenates dev and ino to use it internally.

However imagine an application or native module would simply use Stats object from fs assuming to find the relevant information. Now ino is 0, when we add uid, it will be 0 again. Besides, we will be introducing a new string parameter into Stats which is way more expensive than a DWORD / uint32_t etc. On Windows we could combine dev + highIndex + lowIndex to reach whatever we need only when it's needed. Still lowerIndex serves the purpose (unless it's not ReFS) that ino is no longer 0.

I also would like to add that, on Windows lowerIndex serves the file reference index purpose while the higher index is a sequence number and it changes only when i.e a new file is created etc. On NTFS file system these information is only reliable during a process life time (although I didn't see them changing, it's highly possible the file system may reuse the index on a networked file system) thus we also shouldn't encourage people to store a string value that we created which may not be relevant when the process is restarted again in future.

Member

piscisaureus commented May 18, 2015

@obastemur

MSDN: https://msdn.microsoft.com/en-us/library/aa363788%28v=vs.85%29.aspx

Thanks for the link; I hadn't noticed that msft had mutilated the API in Windows 2012 (changing an API whose sole purpose is to return an unique identifier, and then making it return a non-unique identifier instead, is just plain stupid).

<> However imagine an application or native module would simply use Stats object from fs assuming to find the relevant information. <>

Most of what you're saying seems reasonable, but I don't really understand what you're trying to suggest we should do.

The problem we have is 1) we need a solution that allows users to detect file equality on windows. 2) we don't want to break the existing api. My suggestion is to add another API, using a string, to allow for 1).

The numeric ino field is too small for windows. There's nothing we can do about it, this API will never work correctly on windows. Let's move on.

we don't want to break the existing api.
@piscisaureus I don't see how my proposition breaks the api. Using lowerIndex value on ino wouldn't break the existing api. Instead of adding uid string, and keeping ino 0 on Windows, we may add a uint inoHigh or something like that and store lowerIndex on ino.

I don't think we are the only people trying to solve inode puzzle for ntfs. My concerns are;

  1. introduce a string into stat that the developer may save it for a future reference (which may end up failing)
  2. leaving ino 0 and not knowing what uid actually means (or how long it can survive)
  3. putting yet another string on a performance critical object.

Instead of easing an operating system variable into something only we know, we may store it as is. So the developer may find something make sense (i.e. on msdn see what these numbers actually mean).

Member

piscisaureus commented May 18, 2015

leaving ino 0 and not knowing what uid actually means (or how long it can survive)

It's not 0 right now; it gets truncated (albeit in a different way that you suggest; you're suggesting chopping off the topmost 32 bits whereas currently the ino is cast to a double).

introduce a string into stat that the developer may save it for a future reference (which may end up failing)

Is that not a concern on unix? I don't think linux has a way to provide a temporally constant ino on network filesystem either.

putting yet another string on a performance critical object.

That's a reasonable concern, although in reality we don't know how big the performance impact would be. I think it would make less than 10% difference.

A much bigger impact is to be expected from reading the 128-bit index number, since that requires an additional syscall.

We already fill out the st_blksize with a fixed value because actually reading that value would also incur another syscall.

For performance reasons we might want to support a "lite" version of fs.stat which only reads the most commonly used fields.

Instead of easing an operating system variable into something only we know, we may store it as is. So the developer may find something make sense (i.e. on msdn see what these numbers actually mean).

No! this is very much the opposite of how I have approached windows support in node. I've always tried to find "common denominator" APIs so people could (as much as possible) assume that node would behave the same on all platforms. I really don't want to make people look up what the semantic differences between 'ino' and 'indexNumber' is on windows, only to find out that there really isn't any but we couldn't fiddle the second value into the first field.

Instead, try to be a more imaginative.

  • The use case is: figure our if two files are the same
  • Node provides an API with the following semantics:
    • it retriever an unique string associated with an open file / directory entry
    • the string is guaranteed to be unique
      • across all filesystems (different files on different filesystems never have the same uid)
      • between process restarts (after restarting node the uids haven't changed)
      • not across reboots
      • on the same machine only (in case of a network filesystem, different machines may see different uids)
    • a uid may not be available if node wasn't able to determine such a string (uid set to null in case of fs.stat)
Member

piscisaureus commented May 18, 2015

@obastemur

BTW, I may sound a little aggressive but I'm really happy that you're questioning the way libuv abstracts these things. In the past 4 years nobody has really taken in interest in it and I don't want to do this on my own forever. But let's align ourselves a little bit on goals, so here's my perspective:

Node (and by proxy, libuv) should first and foremost try to "plaster over" api differences between platforms, esp. if the underlying feature is the same. e.g. TCP is really the same on windows and unix, so there's absolutely no excuse for the APIs to be platform-specific.

Sometimes there are conceptual differences between APIs. In that case I try to set up the APIs such that, when used for the intended use case, the behavior will be similar. So for example:

  • Use case: running child processes:
    • On unix, spawn() searches the PATH, and executes a file based on the shebang line and when the "x" permission is granted. Since unix supports "passing an array of arguments" to a process, libuv passes the array of arguments as-is to the child process.
    • On windows, spawn() searches PATH and PATHEXT(), and executes a file based on it's file extension. Since windows doesn't support an array of arguments, libuv forms a "command line" would be interpreted as a list of strings when parsed by either msvcrt or the shell().
  • Use case: "deamonize" a child process:
    • On unix, the UV_PROCESS_DETACH flag detaches the process from it's controlling terminal, so it won't be terminated when the terminal is closed.
    • On windows, the UV_PROCESS_DETACH runs the process in a separate console session, so it won't be terminated when the console is closed.

etc.

Sometimes the conceptual differences are too big. I haven't been able to meaningfully support "user groups" and "user ids" on windows. This affects e.g. fs.stat(), fs.chown(), process.setuid().

The way fs.chmod() behaves on windows is worse than useless.

There is no way to read/modify file attributes on windows (e.g. hidden, readonly).

(*) Currently not supported

It's not 0 right now; it gets truncated (albeit in a different way that you suggest; you're suggesting chopping off the topmost 32 bits whereas currently the ino is cast to a double).

Last time I've checked it was 0 for node 0.10.x. For node 0.12.x (assuming it takes the higher index), ino represents a sequence number which is mostly useless alone. However lowerIndex represents the file reference index which is most likely unique per app instance.

My suggestion is that they could both hold lowerIndex etc.

Is that not a concern on unix? I don't think linux has a way to provide a temporally constant ino on network filesystem either.

No it's not. BTW, which format are you referring to ? Windows file index number is a dynamic variable that you shouldn't rely on. However on ext2, 3 etc. it's a static identifier.

That's a reasonable concern, although in reality we don't know how big the performance impact would be. I think it would make less than 10% difference.

For a regular server, I wouldn't expect much performance issue since a system call would consume much more. However a smaller device that especially we are working on with JXcore would suffer from it.

A much bigger impact is to be expected from reading the 128-bit index number, since that requires an additional syscall.

If this statement refers to supporting ReFS, I think we need much more than that and yet supporting it would require a small api break anyways.

For performance reasons we might want to support a "lite" version of fs.stat which only reads the most commonly used fields.

Why bother instead of fixing what ino gives and sharing higher index from another property? I bet if there is any guarantee that a unix distro would always return a 32bit value for inode. It all depends to kernel and file system setup.

the string is guaranteed to be unique....

Definitely I'm not an NTFS expert but wouldn't expect a dynamic index to server this purpose. I don't see much similarities among unix inode and Windows fileIndex apart from the unique id stuff. In details they both act different.

I'm agree with you that uid is the simplest option (but may not be the reliable one) yet I have other concerns. We are trying to keep node compatibility with JXcore while trying to figure out how to reduce the footprint. I would really appreciate if we use as less string as possible on the critical parts of the library. I've just wanted to share an option that we don't need a string.

Besides, my experience with Windows 10 ARM (iot), I'm having a hard time to believe that dynamic file index is something reliable. Maybe it's just a preview edition problem but who knows.

BTW, I may sound a little aggressive

No hard feelings :) We are just discussing to do the best.

Member

piscisaureus commented May 18, 2015

Last time I've checked it was 0 for node 0.10.x. For node 0.12.x (assuming it takes the higher index), ino represents a sequence number which is mostly useless alone. However lowerIndex represents the file reference index which is most likely unique per app instance.

It takes both the higher and lower index as a single 64-bit value. This gets cast to a double, which is where information is lost.

@obastemur obastemur added a commit to jxcore/jxcore that referenced this issue May 20, 2015

@obastemur obastemur Windows: fix inode problem 967ff3f

I've made some tests on a 2+ years old Windows 8 installation (real hdd). In other words enough number of files and their historical duplicates. Besides a similar years old VMWare Fusion Virtual Machine (Server 2012).

Here is the test app

var fs = require('fs');
var path = require('path');

var dict = {};
var counter = 0;

function walk_on(location) {
  var dirs = null;
  try {
    dirs = fs.readdirSync(location);
  } catch (e) {
    return;
  }

  for (var i = 0, ln = dirs.length; i < ln; i++) {
    var file = path.join(location, dirs[i]);
    var stat;

    try {
      stat = fs.statSync(file);
    } catch (e) {
      continue;
    }

    var windowsFolder = stat.isDirectory();

    // dev + ino is not reliable with folders
    if (stat.nlink == 1 && !windowsFolder) {
      var marker = stat.dev + ":" + stat.ino + ":" + stat.sequenceId;
      if (dict.hasOwnProperty(marker)) {
        var statOld = fs.statSync(dict[marker]);

        // basic check if they are the same file
        if (statOld.size != stat.size) {
          console.log(dict[marker], statOld);
          console.log(file, stat);
          throw new Error(dict[marker] + " != " + file + " (" + marker + ")");
        }
      }

      dict[marker] = file;
      counter++;

      if (counter > 5e5) {
        // compared enough number of files
        return;
      }

      if (counter % 1e4 == 0) {
        process.stdout.write(".");
      }
    }

    if (stat.isDirectory()) {
      walk_on(file);
    }
  }
}

walk_on("c:\\");

console.log("Test Passed");

This test has no use on current node 0.10.x since there is no ino, dev support on Windows. It also fails very quickly with latest node.js 0.12.3 . However jxcore/jxcore@967ff3f fixes the problem on the jxcore side.

The fix is based on the fact that st_ino, st_uid, st_gid, st_rdev memory blocks are not used by node at all (on Windows). So the solution benefits the (3 x short + 1 x uint) empty space for 2xDWORDs

What has changed;

  • adds a uint 'sequenceId' property into fs.stat result. (nFileIndexHigh)
  • assigns deviceSerialNumber into dev instead of a 0
  • defines ino from nFileIndexLow

Some results from the tests;

  • nFileIndexLow has proved itself as a unique identifier (as expected)
  • nFileIndexHigh is shared by many files (sequence id)

Wasn't really surprised;

  • The files under Recycle Bin may have the same dev + low + high combination (Windows re-uses the index for a file inside the Recycle Bin) Before running the test app, make sure you have an empty recycle bin.

Problems;

  • Although libUV used by node 0.10.x/jxcore has no extra use for the given data structure, using the st_rdev etc. memory for other variables was an ugly hack. Especially in terms of look and feel. On the other hand, using a separate data structure is not an option. It would break the memory alignment for a module that already compiled against to older header files.

The proposed solution doesn't break the current API or existing solutions. (no memory aligning etc. problems. Uses the same memory structure, no change in actual types)

This was a solution for node 0.10.x. Indeed node 0.12.x uses a different libUV, WinAPI etc. If this approach (sequenceId or something like that) is good to go, I can help with the rest.

Is this fixed now? I can see a value in ino under Windows 10.

Is this value genuinely unique within a filesystem?

eight04 commented Dec 8, 2016

I don't think so. I can find two files look like this:

{ dev: -428978813,
  mode: 33206,
  nlink: 1,
  uid: 0,
  gid: 0,
  rdev: 0,
  blksize: undefined,
  ino: 9288674231461284,
  size: 253073,
  blocks: undefined,
  atime: 2014-07-23T03:08:14.800Z,
  mtime: 2014-07-23T03:08:14.814Z,
  ctime: 2016-12-08T15:03:26.399Z,
  birthtime: 2014-07-23T03:08:14.800Z }
{ dev: -428978813,
  mode: 33206,
  nlink: 1,
  uid: 0,
  gid: 0,
  rdev: 0,
  blksize: undefined,
  ino: 9288674231461284,
  size: 133284,
  blocks: undefined,
  atime: 2014-07-23T03:08:11.663Z,
  mtime: 2014-07-23T03:08:11.675Z,
  ctime: 2016-12-08T15:03:20.245Z,
  birthtime: 2014-07-23T03:08:11.663Z }

Windows 7
Node 7.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment