Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 93 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,96 @@
# HTML Tidy with HTML5 support
# HTACG HTML Tidy

All READMEs and related materials can be found in [README/][1].
All other READMEs and related materials can be found in [README/][100]. Although all of our materials should be linked in this README, be sure to check this directory for documents we’ve not yet added to this document.

For build instructions please see [README/README.md][2].
## Building HTML Tidy

[1]: https://github.com/htacg/tidy-html5/tree/master/README
[2]: https://github.com/htacg/tidy-html5/blob/master/README/README.md

- For build instructions please see [README/BUILD.md][115].

## Branches and Versions

Learn about which branches are available, which branch you should use, and how HTML Tidy’s versioning scheme works.

- Learn about version numbering in [README/VERSION.md][160].
- Learn about our repository branches in [README/BRANCHES.md][110].

## Contributing and Development Guides

We gladly accept PRs! Read about some of our contribution guidelines, and check out some of the additional explanatory documents that will aid your understanding of how to accomplish certain things in HTML Tidy.

### General Contribution Guidelines

These are some general guidelines that will help you help us when it comes to making your own contributions to HTML Tidy.

- Learn about our contributing guidelines in [README/CONTRIBUTING.md][125].
- Understand HTML Tidy’s source code style in [README/CODESTYLE.md][120].

### Adding Features Guides

When you’re ready to add a great new feature, these write-ups may be useful.

- Learn how to add new element attributes to HTML Tidy by reading [README/ATTRIBUTES.md][105].
- Discover how to add new tags to Tidy in [README/TAGS.md][130].
- If you want to add new messages to Tidy, read [README/MESSAGE.md][150].
- Configuration options can be added according to [README/OPTIONS.md][155].

### Language Localization Guides

Tidy supports localization, and welcomes translations into various languages. Please read up on how to localize HTML Tidy.

- The general README for localizing can be found in [/README/LOCALIZE.md][140].
- And [/localize/README.md][145] contains specific instructions for localizing.


## Other Important Links

- site: [http://www.html-tidy.org/][4]
- source: [https://github.com/htacg/tidy-html5][5]
- binaries: [http://binaries.html-tidy.org][6]
- bugs: [https://github.com/htacg/tidy-html5/issues][7]
- list: [https://lists.w3.org/Archives/Public/html-tidy/][8]
- api and quickref: [http://api.html-tidy.org/][9]

[4]: http://www.html-tidy.org/
[5]: https://github.com/htacg/tidy-html5
[6]: http://binaries.html-tidy.org
[7]: https://github.com/htacg/tidy-html5/issues
[8]: https://lists.w3.org/Archives/Public/html-tidy/
[9]: http://api.html-tidy.org/


## History

This repository should be considered canonical for HTML Tidy as of 2015-January-15.

- This repository originally transferred from [w3c.github.com/tidy-html5][20], now redirected to the current site.

- First moved to Github from [tidy.sourceforge.net][21]. Note, this site is kept only for historic reasons, and is not now well maintained.

**Tidy is the granddaddy of HTML tools, with support for modern standards.** Have fun...

[20]: http://w3c.github.com/tidy-html5/
[21]: http://tidy.sourceforge.net


## License

HTML Tidy and LibTidy are free and open source software with a permissive license.

- You can read the complete license in [README/LICENSE.md][135].



[100]: README/
[105]: README/ATTRIBUTES.md
[110]: README/BRANCHES.md
[115]: README/BUILD.md
[120]: README/CODESTYLE.md
[125]: README/CONTRIBUTING.md
[130]: README/TAGS.md
[135]: README/LICENSE.md
[140]: /README/LOCALIZE.md
[145]: /localize/README.md
[150]: README/MESSAGE.md
[155]: README/OPTIONS.md
[160]: README/VERSION.md

19 changes: 12 additions & 7 deletions README/ATTRIBUTES.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,26 @@
# Tidy Element Attributes

This is about adding a **new** `attribute=value` for one or more html `element`, here called `tags`.
This is about adding a **new** HTML attribute to one or more HTML tags, i.e., a new attribute such as `attribute=value`.

Tidy supports a large number of `attributes`, first defined in `tidyenum.h`, to give it a value, then defined in `attrs.c` to give it a unique **string** name, and a `function` to verify the atrribute **value**. Then in `attrdict.c` the attribute is defined, giving what version(s) of html support this attribute. Finally, what tags support this attrinute, is done in `tags.c`, where each attribute is allowed on that tag, or not, in the `tag_defs[]` table.
Tidy’s large number of attributes are supported via number of files:

- `tidyenum.h` is where you first define a new attribute in order to give it an internal value.
- `attrs.c` is where you give a unique **string** name to the attribute, as well as a **function** to verify the **value**.
- `attrdict.c` further refines the definition of your attribute, specifying which version(s) of HTML support this attribute.
- `tags.c`, finally, determines which tags support the attribute, in the `tag_defs[]` table.

So, to add a new `attribute=value`, on one or more existing tags, consists of the following simple steps -

1. tidyenum.h - Give the attribute an internal name, like `TidyAttr_XXXX`, and thus a value. While there were some initial steps to keep this `TidyAttrId` enumeration alphabetic, now just add the new `TidyAttr_XXXX` just before the last entry 'N_TIDY_ATTRIBS'.
1. `tidyenum.h` - Give the attribute an internal name, like `TidyAttr_XXXX`, and thus a value. While there were some initial steps to keep this `TidyAttrId` enumeration alphabetic, now just add the new `TidyAttr_XXXX` just before the last entry `N_TIDY_ATTRIBS`.

2. attrs.c - Assign the string value of the attribute. Of course this must be unique. And then assign a `function` to verify the attribute value. There are already a considerable number of defined functions to verify specific attribute values, but maybe this new attribute requires a new function, so that should be written, and defined.
2. `attrs.c` - Assign the string value of the attribute. Of course this must be unique. And then assign a `function` to verify the attribute value. There are already a considerable number of defined functions to verify specific attribute values, but maybe this new attribute requires a new function, so that should be written, and defined.

3. attrdict.c - If this attribute only relates to specific `tags`, then it should be added to their list. There are some `general` attributes that are allowed on every, or most tags, so this new attribute and value should be added accordingly.
3. `attrdict.c` - If this attribute only relates to specific tags, then it should be added to their list. There are some general attributes that are allowed on every, or most tags, so this new attribute and value should be added accordingly.

4. tags.c - Now the new attribute will be verified for each tag it is associate with in the `tag_defs[]` table. Like for example the `<button ...>`, `{ TidyTag_BUTTON, ...` has `&TY_(W3CAttrsFor_BUTTON)[0]` assigned.
4. `tags.c` - Now the new attribute will be verified for each tag it is associate with in the `tag_defs[]` table. Like for example the `<button ...>`, `{ TidyTag_BUTTON, ...` has `&TY_(W3CAttrsFor_BUTTON)[0]` assigned.

So, normally, just changing 3 files, `tidyenum.h`, `attrs.c`, and `attrdict.c`, will already adjust `tags.c` to accept a new `attribute=value` for any tag, or all tags. Simple...

Now, one could argue that this is not the **best** way to verify every attribute and value, for every tag, but that is a mute point - that is how tidy does it!
Now, one could argue that this is not the **best** way to verify every attribute and value, for every tag, but that is a moot point - that is how Tidy does it!

; eof 20170205
28 changes: 28 additions & 0 deletions README/BRANCHES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# HTML Tidy Branches

## About Branches

Starting with **HTML Tidy** 5.4.0, HTACG will adopt a new branch management strategy utilizing **master** as the _release branch_, and **next** as the active development branch.

As described thoroughly in our [VERSION.md](VERSION.md) document, this means that **master** will always consist of an even-numbered minor version, and activity will remain relatively quiet unless we backport a critical bug fix from **next**.

The **next** branch, then will host the majority of our development activity, and any contributions and PR’s should be again this branch. This means that **next** will always consist of an odd minor version number.


## About Versioning

You can read the specifics about version numbers in our [VERSION.md](VERSION.md) document.


## FAQs

### Which version or branch should I choose?

As described above, the branch is very strongly correlated with the version. If you require a stable API and relatively stable output and don’t require the features and enhancements of an odd-numbered **next** version, then you should stick to **master**, even-numbered versions.

On the other hand if you are primarily a console application user, then the API isn’t likely as important to you, and you probably want the latest and greatest. If this describes you, you probably want to at least try out **next**.

If you are developing for Tidy, then you _definitely_ want to stick to **next**, even for bug fixes meant for **master**. If it’s a critical enough bug fix, then one of our friendly team will back-port the fix to **master**.



66 changes: 66 additions & 0 deletions README/BUILD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# HTACG HTML Tidy

## Prerequisites

1. git - [http://git-scm.com/book/en/v2/Getting-Started-Installing-Git][1]

2. cmake - [http://www.cmake.org/download/][2]

3. appropriate build tools for the platform

4. the [xsltproc][3] tool is required to build and install the `tidy.1` man page on Unix-like platforms.

CMake comes in two forms - command line and GUI. Some installations only install one or the other, but sometimes both. The build commands below are only for command line use.

Also the actual build tools vary for each platform. But that is one of the great features of CMake, it can generate variuous 'native' build files. Running `cmake --help` should list the generators available on that platform. For sure one of the common ones is "Unix Makefiles", which needs autotools make installed, but many other generators are supported.

In Windows CMake offers various versions for MSVC. Again below only the command line use of MSVC is shown, but the tidy solution (*.sln) file can be loaded into the MSVC IDE, and the building done in there.


## Build the tidy library and command line tool

1. `cd build/cmake`

2. `cmake ../.. -DCMAKE_BUILD_TYPE=Release [-DCMAKE_INSTALL_PREFIX=/path/for/install]`

3. Windows: `cmake --build . --config Release`
Unix/OS X: `make`

4. Install, if desired:
Windows: `cmake --build . --config Release --target INSTALL`
Unix/OS X: `[sudo] make install`

By default cmake sets the install path to `/usr/local/bin` in Unix. If you wanted the binary in say `/usr/bin` instead, then in 2. above use `-DCMAKE_INSTALL_PREFIX=/usr`.

Also, in Unix if you want to build the release library without any debug `assert` in the code then add `-DCMAKE_BUILD_TYPE=Release` in step 2. This adds a `-DNDEBUG` macro to the compile switches. This is normally added in windows build for the `Release` config.

In Windows the default install is to `C:\Program Files\tidy`, or `C:/Program Files (x86)/tidy`, which is not very useful. After the build the `tidy.exe` is in the `Release` directory, and can be copied to any directory in your `PATH` environment variable for global use.

If you do **not** need the tidy library built as a 'shared' (DLL) library, then in 2. add the command `-DBUILD_SHARED_LIB:BOOL=OFF`. This option is **ON** by default. The static library is always built and linked with the command line tool for convenience in Windows, and so the binary can be run as part of the man page build without the shared library being installed in unix.

See the `CMakeLists.txt` file for other CMake **options** offered.

## Build PHP with the tidy-html5 library

Due to API changes in the PHP source, `buffio.h` needs to be renamed to `tidybuffio.h` in the file `ext/tidy/tidy.c` in PHP's source.

That is - prior to configuring PHP run this in the PHP source directory:
```
sed -i 's/buffio.h/tidybuffio.h/' ext/tidy/*.c
```

And then continue with (just an example here, use your own PHP config options):

```
./configure --with-tidy=/usr/local
make
make test
make install
```

[1]: http://git-scm.com/book/en/v2/Getting-Started-Installing-Git
[2]: http://www.cmake.org/download/
[3]: http://xmlsoft.org/XSLT/xsltproc2.html


; eof
12 changes: 6 additions & 6 deletions README/CODESTYLE.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# HTML Tidy Code Style

The source code of **libTidy**, and console app **tidy**, follow the preferences of the original maintainers. Perhaps some of these decisions were arbitrary and based on their sense of aesthetics at the time, but it is good to have all the code looking the same even if it is not exactly what everyone would prefer.
The source code of **libTidy** and console app **tidy** mostly follow the preferences of the original maintainers. Perhaps some of these decisions were arbitrary and based on their sense of aesthetics at the time, but it is good to have all the code looking the same even if it is not exactly what everyone would prefer.

Developers adding code to **Tidy!** are urged to try to follow the existing code style. Code that does not follow these conventions may be accepted, but may be modified as time goes by to best fit the `Tidy Style`.
Developers adding code to HTML Tidy are urged to try to follow the existing code style. Code that does not follow these conventions may be accepted, but may be modified as time goes by to best fit the Tidy Style.”

There has been a suggestion of using available utilities to make the style consistent, like [Uncrusty](https://github/bengardener/uncrusty) - see [issue #245](https://github.com/htacg/tidy-html5/issues/245), and maybe others...
There has been a suggestion of using available utilities to make the style consistent, like [Uncrustify](https://github.com/uncrustify/uncrustify) - see [issue #245](https://github.com/htacg/tidy-html5/issues/245), and maybe others.

Others have suggested the [AStyle](http://astyle.sourceforge.net/) formatting program with say '-taOHUKk3 -M8' arguments, to conform, but there are a few bugs in AStyle.
Others have suggested the [AStyle](http://astyle.sourceforge.net/) formatting program with say `-taOHUKk3 -M8` arguments, to conform, but there are a few bugs in AStyle.

But again these, and other tools, may not produce code that everybody agrees with... and are presently not formally used in Tidy!
But again, these and other tools may not produce code that everybody agrees with, and are presently not formally used in Tidy!

#### Known Conventions

From reading of the Tidy source, some things are self evident... in no particular order...
From reading of the Tidy source, some things are self evident, in no particular order...

- Use of 4-space indenting, and no tabs.
- No C++ single line comments using `//`.
Expand Down
Loading