# `numba` tests

This notebook documents and serves as a scratchpad for exploring `numba`-based acceleration on areal interpolation.

**NOTE** - To be removed/relocated once/if functionality is merged

---

**IMPORTANT**

As of Dec. 17th'20, the multi-core implementation requires the versions in `main` for `pygeos` and `geopandas`. On a working environment with the latest released versions (as the `gds_env:5.0`), this can be achieved by:

```shell
pip install --no-deps git+https://github.com/pygeos/pygeos.git
pip install --no-deps git+https://github.com/geopandas/geopandas.git
```

---

In [1]:
from tobler.area_weighted.area_interpolate import _area_tables_binning, _area_tables_binning_parallel
import geopandas, pandas

summary = lambda src, tgt: print(
    f"Transfer {src.shape[0]} polygons into {tgt.shape[0]}"
)

def down_load(p):
    fn = f"/home/jovyan/{p.split('/')[0]}"
    try:
        return geopandas.read_file(fn)
    except:
        ! wget $p -O $fn
        return geopandas.read_file(fn)

## Data setup

- Minimal problem

In [2]:
p = ("https://geographicdata.science/book/_downloads/"\
     "f2341ee89163afe06b42fc5d5ed38060/sandiego_tracts.gpkg")
src = down_load(p).rename(lambda i: 'i'+str(i))

p = ("https://geographicdata.science/book/_downloads/"\
     "d740a1069144baa1302b9561c3d31afe/sd_h3_grid.gpkg")
tgt = down_load(p).rename(lambda i: 'i'+str(i)).to_crs(src.crs)

w, s, e, n = tgt.total_bounds
#src = src.cx[w:e, s:n]
summary(src, tgt)

Transfer 628 polygons into 628


- Slightly larger problem

In [3]:
# Tracts
p = "https://ndownloader.figshare.com/files/20460645"
src = down_load(p)
src = pandas.concat([src]*5)

# Precincts
p = "https://ndownloader.figshare.com/files/20460549"
tgt = down_load(p).to_crs(src.crs)
tgt = pandas.concat([tgt]*4)
summary(src, tgt)

Transfer 3140 polygons into 2512


## Correctness

In [4]:
cross2 = _area_tables_binning_parallel(src, tgt, n_jobs=1)
cross = _area_tables_binning(src, tgt, 'auto')
(cross != cross2).sum()

0

## Performance

Results with all observations in first dataset:

In [5]:
%timeit cross = _area_tables_binning(src, tgt, 'auto')

2.22 s ± 20.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [6]:
%timeit cross2 = _area_tables_binning_parallel(src, tgt, n_jobs=1)

2.22 s ± 25.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [7]:
%timeit cross3 = _area_tables_binning_parallel(src, tgt, n_jobs=-1)

756 ms ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


---

Results with second dataset:

In [5]:
%time cross = _area_tables_binning(src, tgt, 'auto')

CPU times: user 47.5 s, sys: 15.8 ms, total: 47.5 s
Wall time: 47.6 s


In [8]:
%time cross3 = _area_tables_binning_parallel(src, tgt, n_jobs=1)

CPU times: user 46.8 s, sys: 108 ms, total: 46.9 s
Wall time: 46.9 s


In [6]:
%time cross3 = _area_tables_binning_parallel(src, tgt, n_jobs=-1)

CPU times: user 1.86 s, sys: 488 ms, total: 2.35 s
Wall time: 9.61 s
