-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdocumentation.md
More file actions
456 lines (305 loc) · 13.6 KB
/
documentation.md
File metadata and controls
456 lines (305 loc) · 13.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
# Quick Start
These steps show how to use `lexbor` in your code. They assume you are using Linux and `gcc`.
1. [Install](#installation) the `lexbor` library on your system.
2. Let's parse some sample HTML markup.
Save the following code as `myhtml.c`:
```c
#include <lexbor/html/parser.h>
#include <lexbor/dom/interfaces/element.h>
int
main(int argc, const char *argv[])
{
lxb_status_t status;
const lxb_char_t *tag_name;
lxb_html_document_t *document;
static const lxb_char_t html[] = "<div>Works fine!</div>";
size_t html_len = sizeof(html) - 1;
document = lxb_html_document_create();
if (document == NULL) {
exit(EXIT_FAILURE);
}
status = lxb_html_document_parse(document, html, html_len);
if (status != LXB_STATUS_OK) {
exit(EXIT_FAILURE);
}
tag_name = lxb_dom_element_qualified_name(lxb_dom_interface_element(document->body),
NULL);
printf("Element tag name: %s\n", tag_name);
lxb_html_document_destroy(document);
return EXIT_SUCCESS;
}
```
3. Compile `myhtml.c` and run the resulting executable:
```sh
gcc myhtml.c -llexbor -o myhtml
./myhtml
```
## Installation
To install `lexbor` from binary packages, refer to the [Download](download.md) section.
## Source code
The source code is available on [GitHub](https://github.com/lexbor/lexbor).
To build and install `lexbor` from source, use [CMake](https://cmake.org/), an open-source, cross-platform build system.
### Linux, *BSD, macOS
At the project root:
```sh
cmake .
make
sudo make install
```
Optional flags recognized by the `cmake` command:
| Flag | Default | Description |
|-------------------------------|:-------:|----------------------------------------------------------------------------------------------------------|
| LEXBOR_OPTIMIZATION_LEVEL | -O2 | Optimization level for building. |
| LEXBOR_C_FLAGS | | Default `C` compilation flags. See the `port.cmake` files in the [ports](https://github.com/lexbor/lexbor/tree/master/source/lexbor/ports) directory. |
| LEXBOR_CXX_FLAGS | | Default `C++` compilation flags. |
| LEXBOR_WITHOUT_THREADS | ON | Reserved for future use. |
| LEXBOR_BUILD_SHARED | ON | Create a shared library. |
| LEXBOR_BUILD_STATIC | ON | Create a static library. |
| LEXBOR_BUILD_SEPARATELY | OFF | Build all modules separately. Each module will have its own library (shared and static). |
| LEXBOR_BUILD_EXAMPLES | OFF | Build example programs. |
| LEXBOR_BUILD_TESTS | OFF | Build tests. |
| LEXBOR_BUILD_TESTS_CPP | ON | Build C++ tests to verify library operation in C++. Requires `LEXBOR_BUILD_TESTS`. |
| LEXBOR_TEST_AMALGAMATION | OFF | Build tests for the amalgamation file. Requires `LEXBOR_BUILD_TESTS`. |
| LEXBOR_BUILD_UTILS | OFF | Build project utilities and helpers. |
| LEXBOR_BUILD_WITH_ASAN | OFF | Enable Address Sanitizer if possible. |
| LEXBOR_INSTALL_HEADERS | ON | Install library headers (`.h` files). |
| LEXBOR_PRINT_MODULE_DEPENDENCIES| OFF | Print module dependencies. |
### Windows
Use the [CMake](https://cmake.org/) GUI tool.
For Windows with [MSYS2](https://www.msys2.org/):
```sh
cmake . -G "Unix Makefiles"
make
make install
```
### Command Line Examples
We recommend building the project in a separate directory to easily clean up later, as `cmake` generates many temporary files:
```sh
mkdir build
cd build
```
To build a debug version of `lexbor` with Address Sanitizer enabled:
```sh
cmake .. -DCMAKE_C_FLAGS="-fsanitize=address -g" -DLEXBOR_OPTIMIZATION_LEVEL="-O0" -DLEXBOR_BUILD_TESTS=ON -DLEXBOR_BUILD_EXAMPLES=ON
make
make test
```
To build `lexbor` with tests:
```sh
cmake .. -DLEXBOR_BUILD_TESTS=ON
make
make test
sudo make install
```
To set the installation location (`prefix`):
```sh
cmake .. -DCMAKE_INSTALL_PREFIX=/my/path/usr
make
make install
```
To install only the shared library without headers:
```sh
cmake .. -DLEXBOR_BUILD_STATIC=OFF -DLEXBOR_INSTALL_HEADERS=OFF
make
sudo make install
```
## Code Samples
All code samples are available in the `lexbor` repository under the [`/examples/` directory](https://github.com/lexbor/lexbor/tree/master/examples).
To build and run the samples:
```sh
cmake .. -DLEXBOR_BUILD_EXAMPLES=ON
make
./examples/lexbor/html/element_create
./examples/lexbor/html/document_title
```
## General Considerations
We focus on minimal dependencies, custom algorithms, and platform-specific solutions:
- The project is written in pure `C` without external prerequisites. We believe
in a "go hard or go home" approach.
- While we're not reinventing every algorithm known to humankind, we handle
object creation and memory management in our own way. Many classic algorithms
used in `lexbor` are adapted to meet the specific needs of the project.
- We're open to using third-party code, but it’s often simpler to start from
scratch than to add extra dependencies (looking at you, Node.js).
- Some functions are platform-dependent, such as threading, timers, I/O, and
blocking primitives (spinlocks, mutexes). For these, we have a separate `port`
module with its own structure and build rules, distinct from the other
modules.
## Memory Management
There are four main dynamic memory functions:
```c
void *
lexbor_malloc(size_t size);
void *
lexbor_calloc(size_t num, size_t size);
void *
lexbor_realloc(void *dst, size_t size);
void *
lexbor_free(void *dst);
```
These functions:
- Are defined in `/source/lexbor/core/lexbor.h` (in the [core](#core) module).
- Are implemented in `/source/port/*/lexbor/core/memory.c` (in the `port`
module).
- Can be redefined if needed.
As the names suggest, they serve as replacements for the standard `malloc`,
`calloc`, `realloc`, and `free`. However, unlike `free`, the `lexbor_free`
function returns a `void *` that is always `NULL`. This simplifies the process
of nullifying freed variables:
```c
if (object->table != NULL) {
object->table = lexbor_free(object->table);
}
```
Without this, you'd need to explicitly nullify `object->table`:
```c
if (object->table != NULL) {
lexbor_free(object->table);
object->table = NULL;
}
```
We'll discuss other differences later.
## Status Codes
If a function can fail, it should report the failure. We follow two main rules when working with status codes:
- If the status is `LXB_STATUS_OK` (`0`), everything is fine; otherwise,
**something went wrong**.
- Always return **meaningful** status codes. For example, if memory allocation
fails, return `LXB_STATUS_ERROR_MEMORY_ALLOCATION`, not a generic value like
`0x1f1f`.
Status codes are passed as `lxb_status_t`. This type is defined throughout the
codebase in `/source/lexbor/core/types.h`, and all available status codes are
listed in `/source/lexbor/core/base.h`.
## Function Naming
Most functions follow this naming pattern:
[naming1]: img/naming1.png
![Common Naming Pattern][naming1]
<style>
img[alt="Common Naming Pattern"] { height: 305px; display: block; margin: auto; }
</style>
The exception is the [core](#core) module (`/source/lexbor/core/`), which uses a
different pattern:
[naming2]: img/naming2.png
![Core Naming Pattern][naming2]
<style>
img[alt="Core Naming Pattern"] { height: 305px; display: block; margin: auto; }
</style>
In other words, all `lexbor_*` functions are located in the `core` module,
without exceptions.
## Header Locations
All paths are relative to the `/source/` directory. For example, to include a
header file from the [html](#html) module located in `/source/lexbor/html/`,
use:
```c
#include "lexbor/html/tree.h"
```
## Data Structures
Most structures and objects have an API for creating, initializing, cleaning,
and deleting them. This follows the general pattern:
```c
<structure-name> *
<function-prefix>_create(void);
lxb_status_t
<function-name>_init(<structure-name> *obj);
void
<function-name>_clean(<structure-name> *obj);
void
<function-name>_erase(<structure-name> *obj);
<structure-name> *
<function-name>_destroy(<structure-name> *obj, bool self_destroy);
```
- The `*_init` function can accept any number of arguments and always returns
`lxb_status_t`.
- Cleanup functions, `*_clean` and `*_erase`, may return any value, but they
typically return `void`.
- If `NULL` is passed as the first argument (the object) to the `*_init`
function, it returns `LXB_STATUS_ERROR_OBJECT_NULL`.
- When the `*_destroy` function is called with `self_destroy` set to `true`, the
returned value is always `NULL`; otherwise, the object (`obj`) is returned.
- The `*_destroy` functions always check if the object is `NULL`; if so, they
return `NULL`.
- If the `*_destroy` function doesn’t take the `bool self_destroy` argument, the
object can only be created using the `*_create` function (i.e., not on the
stack).
Typical usage:
```c
lexbor_avl_t *avl = lexbor_avl_create();
lxb_status_t status = lexbor_avl_init(avl, 1024);
if (status != LXB_STATUS_OK) {
lexbor_avl_node_destroy(avl, true);
exit(EXIT_FAILURE);
}
/* Do something super useful */
lexbor_avl_node_destroy(avl, true);
```
Now, with an object on the stack:
```c
lexbor_avl_t avl = {0};
lxb_status_t status = lexbor_avl_init(&avl, 1024);
if (status != LXB_STATUS_OK) {
lexbor_avl_node_destroy(&avl, false);
exit(EXIT_FAILURE);
}
/* Do something even more useful */
lexbor_avl_node_destroy(&avl, false);
```
Note that this approach is not an absolute requirement, even though it is
common. There are cases where a different API may be more suitable.
## Modules
The `lexbor` project is designed to be modular, allowing each module to be built
separately if desired. Modules can depend on each other; for instance, all
modules currently rely on the [core](#core) module.
Each module is located in a subdirectory within the `/source/` directory of the
project.
### Module Versioning
Each module records its version in the `base.h` file located at the module root.
For example, see `/source/lexbor/html/base.h`:
```c
#define <MODULE-NAME>_VERSION_MAJOR 1
#define <MODULE-NAME>_VERSION_MINOR 0
#define <MODULE-NAME>_VERSION_PATCH 3
#define <MODULE-NAME>_VERSION_STRING LXB_STR(<MODULE-NAME>_VERSION_MAJOR) LXB_STR(.) \
LXB_STR(<MODULE-NAME>_VERSION_MINOR) LXB_STR(.) \
LXB_STR(<MODULE-NAME>_VERSION_PATCH)
```
### Core
This is the base module, implementing essential algorithms for the project, such
as AVL and BST trees, arrays, and strings. It also handles memory management.
The module is continuously evolving with new algorithms being added and existing
ones optimized.
Documentation for this module will be available later.
### DOM
This module implements the [DOM specification](https://dom.spec.whatwg.org/).
Its functions manage the DOM tree, including its nodes, attributes, and events.
Documentation for this module will be available later.
### HTML
This module implements the [HTML
specification](https://html.spec.whatwg.org/multipage/).
Current implementations include: Tokenizer, Tree Builder, Parser, Fragment
Parser, and Interfaces for HTML Elements.
Documentation for this module will be available later. For guidance, refer to
the
[HTML examples](https://github.com/lexbor/lexbor/tree/master/examples/lexbor/html) in our repo
or the corresponding [articles](articles/index).
### Encoding
This module implements the [Encoding
specification](https://encoding.spec.whatwg.org/).
Current implementations include streaming encode/decode. Available encodings:
```
big5, euc-jp, euc-kr, gbk, ibm866, iso-2022-jp, iso-8859-10, iso-8859-13,
iso-8859-14, iso-8859-15, iso-8859-16, iso-8859-2, iso-8859-3, iso-8859-4,
iso-8859-5, iso-8859-6, iso-8859-7, iso-8859-8, iso-8859-8-i, koi8-r, koi8-u,
shift_jis, utf-16be, utf-16le, utf-8, gb18030, macintosh, replacement,
windows-1250, windows-1251, windows-1252, windows-1253, windows-1254,
windows-1255, windows-1256, windows-1257, windows-1258, windows-874,
x-mac-cyrillic, x-user-defined
```
Documentation for this module will be available later. For guidance, refer to
the [Encoding
examples](https://github.com/lexbor/lexbor/tree/master/examples/lexbor/encoding)
in our repo or the corresponding [articles](articles/index).
### CSS
This module implements the [CSS specification](https://drafts.csswg.org/).
Documentation for this module will be available later. For guidance, refer to
the [CSS
examples](https://github.com/lexbor/lexbor/tree/master/examples/lexbor/css) in
our repo or the corresponding [articles](articles/index).