## Process for adding c_extensions

There are two types of extensions to add:

1. Unicode character features
1. Transformation functions

### Unicode character features

* Unicode character features (like isspace, isalpha, etc.) are precompiled into a lookup table accessible to python
* To add new features, we simply add entries to this lookup table

For example, suppose we want to add an ```is_apostrophe``` feature.

First we decide what it means to be an apostrophe. For this example, we'll consider both the ascii single quote (0x27) and the unicode single quote (U+2019) to be ```True``` for is_apostrophe and all other characters are ```False```.

Next we edit ```latok/scripts/unicode/makeunicodedata.py```:


Add a mask:

```python
CHAR_APOS_MASK = 0x0100000
```

Update sizing masks if needed from, e.g.,

```python
SIZING_MASK = 0x100000
DESIZING_MASK = 0x0FFFFF
```

to:

```python
SIZING_MASK = 0x1000000
DESIZING_MASK = 0x0FFFFFF
```

add logic for setting the flags:

```python
def makeunicodetype(unicode, trace):
    ...
    for char in unicode.chars:
        ...
        if record:
            ...
            # Set apostrophe flag
            if char == 0x0027 or char == 0x2019:
                flags |= CHAR_APOS_MASK
            ...
        ...
    ...
```

add/change generated output in ```makeunicodetype```:

```python
def makeunicodetype(unicode, trace):
    ...
    print("#define CHAR_APOS_MASK 0x0100000", file=fp)
    ...
    print("#define CHAR_APOS_IDX 25", file=fp)
    print("#define FEATURE_COUNT 26", file=fp)
    ...
    print("CHAR_APOS_MASK = 0x0100000", file=fp)
    ...
    print("CHAR_APOS_IDX = 25", file=fp)
    print("FEATURE_COUNT = 26", file=fp)
    ...
```

And run the script to generate the code:

```bash
pushd scripts/unicode; python ./makeunicodedata.py; popd
```

Next, edit ```gen_parse_matrix``` in ```latok.c``` adding reference to the new mask:

```
static PyObject *
gen_parse_matrix(PyObject *self, PyObject *args)
{
    ...
    for (i = 0; i < length; i++)
    {
        ....
        *(m + CHAR_APOS_IDX) = flags & CHAR_APOS_MASK ? 1 : 0;
        ....
    }
    ...
}
```

And add a name for the new feature in ```latok_utils.py``` in the correct position:

```
FEATURE_NAMES = [
    ...
    'Apos',
    ...
]
```

Finally, install the modifications into your environment:

```bash
pip install -e .
```

### Transformation functions

To add a new transformation or helper function as a c-extension:

1. Edit ```latok/core/src/latok/latok.c``` to:
    1. Implement the function (e.g., see ```gen_block_mask```)
    1. Set up the name mapping in the methods table (see ```_C_latokMethods[]```):
        ```python
        ...
        {"_gen_block_mask", gen_block_mask, METH_VARARGS},
        ...
        ```
1. Install the modifications:
    ```bash
    pip install -e .
    ```
1. Refer to the newly mapped function from your python code
    ```python
    from latok.latok import _gen_block_mask
    _gen_block_mask(a1, a2)
    ```