Testing to see how much faster rust is than python (for this task).

In [1]:
import os
from pathlib import Path
from pyeio import opt

Using a set of wikidata entities, file is ~ 14 GB.

In [5]:
fp = Path.home() / "user/project/generic/personal/entitect/local/sample/items.jsonl"
assert fp.exists()
print(fp.name, "= {:,} bytes".format(os.path.getsize(fp)))
fp = str(fp)

items.jsonl = 14,669,314,232 bytes


Optimized but improved function is exposed here:

```python
opt.count_lines_in_file(...)
```

But can also directly access rust functions with:

```python
opt.rs.count_lines_in_file(...)
```

Or python comparison/fallback functions with:

```python
opt.py.count_lines_in_file(...)
```

In [None]:
%%timeit
opt.py.count_lines_in_file(path=fp, chunk_size=1<<20)

6.02 s ± 42.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [8]:
%%timeit
opt.rs.count_lines_in_file(path=fp, chunk_size=1<<20, num_threads=8)

382 ms ± 8.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [21]:
speed_increase = round(6.02 / 0.382)
print(f"rust is ~{speed_increase}x faster")

rust is ~16x faster


Can probably be optimized more.