Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy 5.6.0 on Mac says Not a file when file is not writeable #681

Closed
thvv opened this issue Feb 19, 2018 · 15 comments
Closed

Tidy 5.6.0 on Mac says Not a file when file is not writeable #681

thvv opened this issue Feb 19, 2018 · 15 comments

Comments

@thvv
Copy link

thvv commented Feb 19, 2018

I check files before install with
tidy --markup no --quiet yes --show-warnings no index.html
I get the error
Document: "index.html" is not a file!
well, it certainly is. But it is mode 444. Which is fine because I am not updating it, just checking it.
Changing the file mode to 666 makes it work.

I installed this version from macports tidy-5.6.0_0.darwin_16.x86_64.tbz2

@geoffmcl
Copy link
Contributor

@thvv wow what an interesting discovery...

It seems it is not only on the MAC, but is the same in unix/linux, and Windows, using current tidy 5.7.3...

I just set a file to 444 - $ chmod 0444 temp.html, and then $ tidy temp.html and I get Document: "temp.html" is not a file! - Zute, or as Snagglepuss would say Heavens to Murgatroyd!!!

It certainly seems at least one write bit must be set, like $ chmod 0644, and tidy will be ok...

On the face of it this does not seem right. Provided you are not using the -modify option, which will try to overwite the file, it seems only read permission should be required...

AH HA! The answer is simple. The tidyDocParseFile service has the following code -

    int status = -ENOENT;
    FILE* fin = fopen( filnam, "r+" );

    if ( !fin )
    {
        TY_(ReportFileError)( doc, filnam, FILE_NOT_FILE );
        return status;
    }

    fclose( fin );

Now reading some fopen references that access string "r+" means **read/update**: Open a file for update (both for input and output). The file must exist.

And note that if I set the file read only in Windows, > attrib +R temp.html, tidy will fail at the same place...

Now after that code block, WIN32 and UNIX separate...

UNIX continues with fin = fopen( filnam, "rb" );... you will note the + has been removed...

Windows goes another code path, using in essence -

    HANDLE fp = CreateFileA( filnam, GENERIC_READ, FILE_SHARE_READ, NULL,
                              OPEN_EXISTING, 0, NULL );
    ...
    fin->map = CreateFileMapping( fp, NULL, PAGE_READONLY, 0, 0, NULL );
    ...
    data->view = MapViewOfFile( data->map, FILE_MAP_READ,
                                (DWORD)( data->pos >> 32 ),
                                (DWORD)data->pos, numb );

So we note also in Windows, that the concept of + is not required, with GENERIC_READ, PAGE_READONLY, and FILE_MAP_READ... this of course needs to be verified...

Reading this far, maybe, just maybe! there is a good case for removing that initial + also... but as I review the history, maybe NOT...

It should be noted that the above fopen( filnam, "r+" ); block was added relatively recently.. I trace it back to commit ce105dc, May 7, 2017... It was to address #391 - the fact that a directory would be treated as a file in unix/MAC...

And IIRC a fopen( dirname, "r" ); will succeed in unix - need to check this...

Maybe the whole block could be replaced with a crossported stat check, which would also yield the size and time, which would make some of the subsequent code both in windows and unix redundant...

I have been using the following, well tested, crossported service, in many projects, for a long time -

#ifdef WIN32
#define M_IS_DIR _S_IFDIR
#else // !WIN32
#define M_IS_DIR S_IFDIR
#endif
static struct stat buf;
enum DiskType {
    MDT_NONE,
    MDT_FILE,
    MDT_DIR
};

static DiskType is_file_or_directory(const char * path)
{
    if (!path)
        return MDT_NONE;
    if (stat(path, &buf) == 0)
    {
        if (buf.st_mode & M_IS_DIR)
            return MDT_DIR;
        else
            return MDT_FILE;
    }
    return MDT_NONE;
}
static size_t get_last_file_size() { return buf.st_size; }
static time_t get_last_file_time() { return buf.st_mtime; }

It seems this is a bug. Tidy should NOT have a problem with a read only files.

If the -modify option is used, then this problem should be handled at the output stage...

In any case tidy should not report is not a file! when it is a file...

What do others think... thanks...

@geoffmcl geoffmcl added the Bug label Feb 20, 2018
@geoffmcl geoffmcl added this to the 5.7 milestone Feb 20, 2018
@geoffmcl
Copy link
Contributor

@thvv, just to confirm one small test done of fopen( filnam, "r" );, that is without the +, succeeds in unix, and probably in OSX, even if the input is a directory, so can not just remove it...

Am tending toward using a 'stat' test and also fail if a directory, but would need another message like say FILE_IS_DIRECTORY, "%s is a directory!" so tidy can correctly report the problem...

As stated look forward to further feedback... thanks...

@i-give-up
Copy link

i-give-up commented Mar 9, 2018

Yes, encountered the same "no file" error in Windows 8.1 when the html file was opened in gVim (which I suppose locks the file from being written by another program).

I have no idea about the code but yes it shouldn't fail if the file is read-only.

@geoffmcl
Copy link
Contributor

@hosiet repeated this bug in #789 ... any help appreciated... thanks...

@jidanni
Copy link
Contributor

jidanni commented Jul 30, 2019

[Seen on 5.6. I wrote this offline before connecting to search for
already filed bugs.]

Tidy can't deal with readonly files even though we are not requesting overwrite

$ tidy -eq  /usr/share/doc/aptitude/html/en/ch02s04s05.html
Document: "/usr/share/doc/aptitude/html/en/ch02s04s05.html" is not a file!

First bug: It is. So don't say that. Proof:

$ tidy -eq < /usr/share/doc/aptitude/html/en/ch02s04s05.html
line 676 column 39 - Warning: trimming empty <p>

So what's the problem?

$ strace tidy -q /usr/share/doc/aptitude/html/en/ch02s04s05.html 2>&1
openat(AT_FDCWD, "/usr/share/doc/aptitude/html/en/ch02s04s05.html", O_RDWR) = -1 EACCES (Permission denied)

This is crazy.

$ cp /usr/share/doc/aptitude/html/en/ch02s04s05.html /tmp/g.html
$ strace tidy -q /tmp/g.html 2>&1
openat(AT_FDCWD, "/tmp/g.html", O_RDWR) = 3
close(3)                                = 0
openat(AT_FDCWD, "/tmp/g.html", O_RDONLY) = 3

Second bug: I am not asking tidy to overwrite the file. So don't use
that test in such cases!

Else one cannot pretty print any readonly file, no matter to STDOUT, a
pipe, or another file.

@geoffmcl
Copy link
Contributor

geoffmcl commented Aug 2, 2019

@i-give-up, do not know about gVim in Windows, but have encountered other editors, that use some form of file locking, sometimes by just keeping the file open, but have not tested how tidy would react in that circumstance, especially that libTidy uses file mapping... that would need to be explored...

And @jidanni thank you for repeating this known bug...

In discussion, the code problem has been identified, and code solutions offered, at least for the read-only files...

Just waiting for someone to step up with a patch, or PR, to fix this crazy bug... thanks...

@geoffmcl
Copy link
Contributor

geoffmcl commented Oct 7, 2020

20201007:

@thvv, @hosiet, all, any, still waiting for someone to step up with a patch, PR, to fix this crazy bug...

The discussion above offers solutions... need help... hopefully more that a +1... thanks...

@geoffmcl
Copy link
Contributor

@thvv, @hosiet - #789, @i-give-up, @jidanni, @jmadiot, @d0ugm - #918, etc, have got around to testing the following patch...

diff --git a/src/tidylib.c b/src/tidylib.c
index 75cb1d9..4d57510 100644
--- a/src/tidylib.c
+++ b/src/tidylib.c
@@ -1125,19 +1125,26 @@ int TIDY_CALL  tidyParseSource( TidyDoc tdoc, TidyInputSource* source )
     return tidyDocParseSource( doc, source );
 }
 
-
+#ifdef WIN32
+#define M_IS_DIR _S_IFDIR
+#else // !WIN32
+#define M_IS_DIR S_IFDIR
+#endif
 int   tidyDocParseFile( TidyDocImpl* doc, ctmbstr filnam )
 {
     int status = -ENOENT;
-    FILE* fin = fopen( filnam, "r+" );
-
-    if ( !fin )
+    FILE* fin = 0;
+    struct stat sbuf = { 0 }; /* Is. #681 - read-only files */
+    if ( stat(filnam,&sbuf) != 0 )
     {
         TY_(ReportFileError)( doc, filnam, FILE_NOT_FILE );
         return status;
     }
-
-    fclose( fin );
+    if (sbuf.st_mode & M_IS_DIR) /* and /NOT/ if a DIRECTORY */
+    {
+        TY_(ReportFileError)(doc, filnam, FILE_NOT_FILE);
+        return status;
+    }
 
 #ifdef _WIN32
     return TY_(DocParseFileWithMappedFile)( doc, filnam );
@@ -1147,7 +1154,6 @@ int   tidyDocParseFile( TidyDocImpl* doc, ctmbstr filnam )
 
 #if PRESERVE_FILE_TIMES
     {
-        struct stat sbuf = { 0 };
         /* get last modified time */
         TidyClearMemory(&doc->filetimes, sizeof(doc->filetimes));
         if (fin && cfgBool(doc, TidyKeepFileTimes) &&

At present only tested it in Windows 10, and it seems to work ok...

As time permits, will get around the creating an issue-681 branch, or the like, so it can be easily pulled, checked out, and test in all OSes, distros, etc... especially the MAC...

Meantime, would really appreciate others applying the patch, and testing it in your local system, and reporting, so we can get rid of this silly BUG... thanks...

geoffmcl added a commit that referenced this issue Feb 21, 2021
@geoffmcl
Copy link
Contributor

@thvv, @hosiet - #789, @i-give-up, @jidanni, @jmadiot, @d0ugm - #918, etc, have now pushed branch issue-681 for testing...

$ cd tidy-html5 # get to source
$ git pull # update
$ git checkout issue-681 # switch to branch
$ cd build/cmake # get to build folder
$ build and test

Look forward to feedback... thanks...

@relphj
Copy link

relphj commented Feb 23, 2021

issue-681 seems to fix the problem for me on Ubuntu 20.04.

@geoffmcl
Copy link
Contributor

@relphj, thank you for the positive feedback on Ubuntu 20.04.

In addition to Windows 10 testing, I have also now tested the issue-681 branch, on 1. Ubuntu 18.04, (must get around to upgrading this!), and 2. my RPI ARM 4.1.19 (Rasbian 8.0), and it works fine... for both read-only, chmod 0444 temp1.html, files, and any directory...

And, of course it fails in all, if I add the -m - modify current file - option... with output Document: Can't open "temp1.html"...

If the input html is otherwise ok, related to issue #921, am very disturbed that the exit value can be 0! This does not seem right... but that is a separate issue...

Anyway, @thvv, hope you, or others, can confirm it is OK for the MAC, where the this issue started...

And then maybe this can be merged to next... thanks...

@geoffmcl
Copy link
Contributor

Added PR #926 for testing... look forward to reports/comment/feedback... thanks...

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 5, 2021

@jmadiot, thank for the linux testing, and reporting...

I guess you missed that I prefer the reporting be back in this open issue... but no probs... I copied your script here...

mkdir /tmp/testtidy
cd /tmp/testtidy
git clone https://github.com/htacg/tidy-html5.git
cd tidy-html5/
git switch issue-681
cd build/cmake/
cmake ../.. -DCMAKE_BUILD_TYPE=Release
make
echo "foo" > foo.html
./tidy foo.html
# correct output
chmod -w foo.html
./tidy foo.html
# also correct output!
# I also check that the branch next had the problem:
git switch next
make
./tidy foo.html
# expectedly got the error of issue 681: Document: "foo.html" is not a file!
chmod +w foo.html
./tidy foo.html
# normal output again

Hope we get a report from a MAC user soonest, using the above script @jmadiot has provided - @thvv?, or others?...

And the PR #926 can be merged... thanks...

@thvv
Copy link
Author

thvv commented Apr 5, 2021 via email

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 7, 2021

@thvv, thank you for the lengthy report...

Yes, I too noted the git switch vs checkout, that I use mostly... depends on the git version... switch is also not in my git 2.17.0.window.1... maybe you can upgrade your git installation...

You did seem to miss the final steps - checking out next, rebuild, and ./tidy foo.html should fail - but since you started the issue using tidy-5.6.0, I guess that is ok...

Of course you now have the latest tidy 5.7.45.I681 available... lucky you ;=))

I do not know about the mac, but you may only need $ sudo make install, to make it the active tidy in your system, but unsure about the mac, since do not have one... maybe others could help with that...

And maybe you could now offer further help with other PR/patch testing, and reporting... thanks...

Hopefully macports can offer a later tidy version, also... maybe file a report...

But we now have positive reports from the 3 major OS'es, and no negatives, so will get to merging #926 soonest... thanks...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants