Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TidyHtml not working properly in C++ #707

Closed
bandito40 opened this issue Apr 2, 2018 · 10 comments
Closed

TidyHtml not working properly in C++ #707

bandito40 opened this issue Apr 2, 2018 · 10 comments

Comments

@bandito40
Copy link

bandito40 commented Apr 2, 2018

I am using TidyHtml to clean up html files in my application in a Linux envorment. I do realize that that libtidy is a C library. I did include it in the extern "C"{} syntax as indicated in my code. This is the example from the TidyHtml website adapted by me to use in a C++ example and I added the printf() function in conditions.

There are two problems. The first starting with tidyOptSetBool() which fails and the printf("%s\n", "1"); line is never reached. The second is

tidyBufFree( &output ); tidyBufFree( &errbuf );
lines cause a Segmentation fault (core dumped).

This is the full code example:

`
#include <stdio.h>
#include <tidy.h>
#include <tidybuffio.h>
#include <stdio.h>
#include <errno.h>
using namespace std;

extern "C" {
int tidyHtml(){
const char* input = "<title>Foo</title>

Foo!";
TidyBuffer output = {0};
TidyBuffer errbuf = {0};
int rc = -1;
Bool ok;

    TidyDoc tdoc = tidyCreate();                     // Initialize "document"
    printf( "Tidying:\t%s\n", input );

    ok = tidyOptSetBool( tdoc, TidyXhtmlOut, yes );  // Convert to XHTML
    if ( ok ){
        rc = tidySetErrorBuffer( tdoc, &errbuf );      // Capture diagnostics
        printf("%s\n", "1");
    }

    if ( rc >= 0 ){
    rc = tidyParseString( tdoc, input );           // Parse the input
        printf("%s\n", "2");
    }
    if ( rc >= 0 ){
    rc = tidyCleanAndRepair( tdoc );               // Tidy it up!
        printf("%s\n", "3");
    }
    if ( rc >= 0 ){
    rc = tidyRunDiagnostics( tdoc );               // Kvetch
        printf("%s\n", "4");
    }
    if ( rc > 1 ){                                    // If error, force output.
    rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );
        printf("%s\n", "5");
    }
    if ( rc >= 0 ){
    rc = tidySaveBuffer( tdoc, &output );          // Pretty Print
        printf("%s\n", "6");
    }

    if ( rc >= 0 )
    {
    if ( rc > 0 )
      printf( "\nDiagnostics:\n\n%s", errbuf.bp );
    printf( "\nAnd here is the result:\n\n%s", output.bp );
    }
    else
    printf( "A severe error (%d) occurred.\n", rc );

    tidyBufFree( &output );
    tidyBufFree( &errbuf );
    tidyRelease( tdoc );
    return rc;
}

}

int main(int argc, char *argv[]){
tidyHtml();
}`

Here is my compile statement g++ -o main main.cpp -ltidy which reports 0 errors.

and this is output when the application is run:

Tidying: <title>Foo</title><p>Foo! A severe error (-1) occurred. Segmentation fault (core dumped)

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 2, 2018

@bandito40 thank you for your issue...

However, I just compiled and ran your sample and have no problems...

And of course I should not have problems because it is basically exactly per our sample code...

Full output:

F:\Projects\tidy-test\build.x64>release\issue-707.exe
Tidying:        <title>Foo</title><p>Foo!
1
2
3
4
6

Diagnostics:

line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 19 - Warning: inserting implicit <body>
Info: Document content looks like XHTML5
Tidy found 2 warnings and 0 errors!


And here is the result:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Windows version 5.7.3" />
<title>Foo</title>
</head>
<body>
<p>Foo!</p>
</body>
</html>

Now I did this build in Windows, and I linked with the static lib tidys.lib, but this should make no difference... unless your strange use of extern "C" { .... } declaration fooled, messed up, g++, or something...

That declaration around int tidyHtml() is just not needed. libTidy can be used inside a C++ program without any changes... Both tidy.h, and tidybuffio.h already have extern "C" { .... } declarations...

But I just copied my issue-707.cxx sample - note that file has window line endings - into Ubuntu 16.04 64-bit linux, and ran $ dos2unix issue-707.cxx; g++ -o issue-707 issue-707.cxx -ltidy, and thus built issue-707 app, with no errors, and it runs fine, with exactly the same output as the above in Windows... the 1 is output, and no segfault...

And just to be sure I removed your extern "C" { .... } around tidyHtml(), and it still compiled, and ran fine...

So at this point I can not duplicate your problem... sorry...

BTW what version of libtidy are you using? Where did you get and install it from? Do you have more than one install of libtidy, and its headers? Just trying to search for what can be wrong... that sample code works for me... in 2 OS'es...

Maybe give more feedback, or something... thanks...

@geoffmcl geoffmcl added this to the 5.7 milestone Apr 2, 2018
@bandito40
Copy link
Author

bandito40 commented Apr 2, 2018

I did originally download it but I am not 100% sure from where. I did the cmake/install/make install in the same folder as my code. For some reason I couldn't get it to work, deleted the folder and installed it using sudo apt-get install libtidy-dev (I am doing this on Ubuntu). I think the reason I couldn't get it work was because I was including buffio.h instead of tidybuffio.h. This I found out after I installed libtidy-dev.

Anyhow I just purged libytidy-dev. The code still compiled and ran with the same results as I was having before. After issuing a sudo updatedb and locate tidy I got this.

/usr/lib/libtidy-0.99.so.0 /usr/lib/libtidy-0.99.so.0.0.0 /usr/lib/libtidy.so /usr/local/lib/libtidy.so /usr/local/lib/libtidy.so.5 /usr/local/lib/libtidy.so.5.6.0 /usr/local/lib/libtidys.a /usr/share/doc/libtidy-0.99-0 /usr/share/doc/libtidy-0.99-0/changelog.Debian.gz /usr/share/doc/libtidy-0.99-0/copyright /var/cache/apt/archives/libtidy-dev_20091223cvs-1.5_amd64.deb /var/lib/dpkg/info/libtidy-0.99-0.list /var/lib/dpkg/info/libtidy-0.99-0.md5sums /var/lib/dpkg/info/libtidy-0.99-0.postinst /var/lib/dpkg/info/libtidy-0.99-0.postrm /var/lib/dpkg/info/libtidy-0.99-0.shlibs

So I guess libtidy is still installed. Not sure how to remove it though.

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 3, 2018

@bandito40 sorry to hear you still have a problem, and from what you showed, it seems this may be due to confusion over multiple versions installed...

And regretably this is also a big problem in the Ubuntu distribution. It seems they are way, WAY out of date only offering it seems a 2009 version, called libtidy-0.99-0, and that would include an out of date buffio.h... and this should be removed!

Now I am not a linux expert, and hope others will help in this... In just googling around I found -

$ dpkg --list | grep tidy
$ sudo apt-get remove package_name
$ sudo apt-get purge package_name
$ sudo apt-get autoremove
$ sudo apt-get clean

Or you can use the Synaptic Package Manager - search libtidy, uncheck, and apply...

Then do the $ dpkg --list again to make sure it has been removed... But as stated, hope others will step in and help here if there is a better way...

But your locate tidy listing also indicates you have installed libtidy.so.5.6.0 at some time not using the package manager, so the above will not remove it. That must be done manually I think...

And I think I would also remove all tidy headers. The list would be tidy.h, tidyplatform.h, tidyenum.h, tidybuffio.h. And that would include any old names platform.h and buffio.h. And that would be from directories like /usr/include and /usr/local/include... And remove any /usr/include/tidy or /usr/local/include/tidy should they exist.

You must get to a position where you sample code will not compile... should fail on missing headers, and missing library...

Then get the current source -

$ cd some-project-root-dir
$ git clone https://github.com/htacg/tidy-html5.git tidy-html5
$ cd tidy-html5/build/cmake # do not build in the source
$ cmake ../.. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr # do not forget these two options
$ make
$ sudo make install

Note the addition of the cmake options is important so you do not get the cmake default for these...

Alternatively, you can download a ZIP file of the source, and unzip it into a folder of your choice, then do the above out-of-source build... But a ZIP is limited to one specific version at a time, while with git all versions are available, and you can even update later to keep with the latest, the best...

For sure with mixed up headers, and/or mixed up libraries, while the code can appear to compile, it can also crash!

Look forward to further feedback... thanks...

OT: Note you should use 3 back-checks to quote a block of text. One back-check is only for inline quoting. See mastering-markdown or the specific github help...

@bandito40
Copy link
Author

Thanks again. I followed your suggestion. Install worked. All the tidy files are now located in the install folder tidy-html5 and nowhere else. Compiles fine but I get an error when executing error while loading shared libraries: libtidy-0.99.so.0: cannot open shared object file: No such file or directory . Looks to be Linux specific but I may be wrong. Googling as we speak.

P.S. thanks for pointing out the mastering-markdown page. I was looking on how to format my post correctly.

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 6, 2018

@bandito40 not sure you followed all my suggestions...

Remember I suggested you delete all installed tidy, all... that is libs, links, and headers... when done your source compile should fail... That execution error indicates the compiler/linker found some residual lib pieces... and maybe some older tidy headers...

Specifically neither $ locate tidy, nor $ dpkg --list must have any entries for libtidy-0.99, especially in the /usr/ folders... and especially not libtidy-0.99.so, libtidy-0.99.so.0, nor libtidy-0.99.so.0.0.0...

You must get to an absolutely clean no tidy state, specifically in the /usr/ folders...

Now to get back a working tidy, you must compile source tidy-html5, and indicate that the install prefix is to be /usr, not some other place...

I do not understand that it is in an install folder! What install folder??? It should only be installed to /usr, nowhere else...

And because this is a root folder you have to be either operating as root, or use sudo make install otherwise...

You should end up with just -

/usr/lib/libtidy.so # this should be a link to
/usr/lib/libtidy.so.5 # this should be a link to
/usr/lib/libtidy.so.5.7.3 # this is the actual lib file

Of course, before the install those 3 entries will also be in the folder where you built tidy, like <some-path>/tidy-html5/build/cmake/, if you built it as suggested...

And likewise after install the 4 tidy headers should be in /usr/include/... only 4, no others... exactly tidy.h, tidyenum.h, tidyplatform.h, tidybuffio.h... they should come from <some-path>/tidy-html5/include/... check their date, size...

AND after install your should be able to type $ tidy -v anywhere in your system, because there should be an installed /usr/bin/tidy executable, and it should report the same version as the above library...

AND type $ man tidy anywhere , and see the same version number in the top line... from the installed /usr/share/man/man1/tidy.1

So is this clear? Do I need to add anything? Any deviation from the above suggests something wrong, and could potentially lead to problems...

So I ask again "what install folder" are you referring to?

Now with tidy installed, as above, back to your sample source... now it should compile and run fine...

Look forward to your report of success! Thanks...

@bandito40
Copy link
Author

bandito40 commented Apr 7, 2018

By install folder I meant /usr. Previously to my last post I had deleted any trace of any file has the word tidy in the file name. I uninstalled tidy as such sudo apt purge libtidy* followed by sudo apt autoremove and sudo apt autoclean. I downloaded and installed libtidy following the steps you had suggested which complied fine. The error is not happening while compiling but when I execute it and only and I get the error error while loading shared libraries: libtidy-0.99.so.0: cannot open shared object file: No such file or directory.

This command 'locate tidy' shows the following installed files after installation.

/usr/include/tidy.h
/usr/include/tidybuffio.h
/usr/include/tidyenum.h
/usr/include/tidyplatform.h
/usr/lib/libtidy.so
/usr/lib/libtidy.so.5
/usr/lib/libtidy.so.5.7.3
/usr/lib/libtidys.a
/usr/lib/pkgconfig/tidy.pc
/usr/local/lib/pkgconfig/tidy.pc
/usr/local/share/man/man1/tidy.1
/usr/share/man/man1/tidy.1

And if I use dpkg --list|grep tidy it returns nothing.

So it appears to me that there is another file that is calling libtidy-0.99.so.0. I even tried recursive text searching through every file in /usr to find the file that is calling libtidy-0.99.0.so but that returned with nothing.

@bandito40
Copy link
Author

I fixed it. Well it's more of a bandage approach. I create the following symbolic link: sudo ln -s /usr/lib/libtidy.so /usr/lib/libtidy-0.99.so.0. Now when I execute my code it gives the following results:

1
2
3
4
6

Diagnostics:

line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 19 - Warning: inserting implicit <body>
Info: Document content looks like XHTML5
Tidy found 2 warnings and 0 errors!


And here is the result:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.7.3" />
<title>Foo</title>
</head>
<body>
<p>Foo!</p>
</body>
</html>

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 7, 2018

@bandito40 well glad you got some form of success from creating a link but I would like to understand more so if you could bear with me a little longer...

What does it show when you do $ readelf -d your-exe, which I called issue-707... I get -

$ readelf -d issue-707

Dynamic section at offset 0x1e18 contains 25 entries:
  Tag        Type                Name/Value
 0X0000000000000001 (NEEDED)    Shared library: [libtidy.so.5]
 0X0000000000000001 (NEEDED)    Shared library: [libc.so.6]
  skip the rest...

I guess you are going to see [libtidy-0.99.so.0], which is why you then need the link libtidy-0.99.so.0 => libtidy.so. If I am wrong then forget it... on the wrong track...

If I am right the question becomes WHY???

Is there some cache that needs to be corrected? And that led me to ldconfig...

What do you get from $ ldconfig -p | grep tidy? I get only my current links libtidy.so and libtidy.so.5 in /usr/lib/. What do you get?

And if you see libtidy-0.99.so.0 how to fix, remove that...

Your locate tidy looks great, so why are you having this problem?

I really seek to understand so I can help better should someone else run into this problem, especially while some distros continue to have that ugly old libtidy-0.99.so... I downright do not like your link bandaid approach...

While that has worked, and good job finding it, it is not really a solution...

Can someone with more linux experience weight in here? We need help! Thanks...

@bandito40
Copy link
Author

bandito40 commented Apr 7, 2018

The results from readelf -d main|grep tidy are:

0x0000000000000001 (NEEDED) Shared library: [libtidy.so.5]

And the results of ldconfig -p | grep tidy are:

	libtidy.so.5 (libc6,x86-64) => /usr/lib/libtidy.so.5
	libtidy.so (libc6,x86-64) => /usr/lib/libtidy.so

I also rebooted my computer and tried the commands again just in case something might be cache. Same results from both commands however...after rebooting and using the two commands you suggested I then deleted the symbolic link and my code compiles and executes fine so something somewhere was cached.

@geoffmcl
Copy link
Contributor

geoffmcl commented Apr 7, 2018

@bandito40 AH! HA! rebooting clears some cache...

You know, I should have thought of this! As a developer working on a lot of projects, I have certainly run into this secret cache in linux before...

I have had cases where my just compiled project fails, missing something... I immediately see the problem and fix it, re-compile, but have seen the project continue to fail. Sometimes it was only necessary to close the terminal session I was in, and start a new one, but other times I needed to reboot the machine... very strange...

I am now happy your main works using the installed libtidy, without that very iffy symbolic link bandaid ;=)) Now if we can just get to the real explanation...

Maybe someone with unix/linux experience can explain this to us... or at least provide some pointer where to look, read...

But thanks for your feedback...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants