# `speciesgrids` product

[speciesgrids](https://github.com/iobis/speciesgrids) is a Python package to build WoRMS aligned combined OBIS and GBIF species distribution datasets. The resulting dataset is available in a few resolutions on AWS S3. The dataset can be downloaded locally for best performance, or queried directly from the S3 bucket. For more details about downloading and using the dataset, see the [speciesgrids README](https://github.com/iobis/speciesgrids).

The product aggregate all marine data from the two major databases, in a very fine grid using Uber's H3 system. H3 is an hierarchical grid system, and that enables both very fast queries and aggregations. To learn more about Uber's H3 system check [here.](https://h3geo.org/)

<img src="https://h3geo.org/images/parent-child.png" height=200></img>

The dataset is served as a `parquet` file. [Parquet](https://parquet.apache.org/) is a column-based performant and light format developed by Apache. We can work with `parquet` files on R and Python using the `arrow` [package.](https://arrow.apache.org/docs/r/index.html). You can learn more about parquet in [this tutorial](https://resources.obis.org/tutorials/arrow-obis/).

In [None]:
library(arrow)
library(dplyr)
library(ggplot2)

ds_path <- "s3://obis-products/speciesgrids/h3_7"

ds <- open_dataset(ds_path)

ds

Let's get all records for the genus **Amphiprion**.

In [None]:
amph_records <- ds |> 
    filter(genus == "Amphiprion") |> 
    collect()

nrow(amph_records)

head(amph_records[,c("species", "genus", "cell", "records")])

We can aggregate the data by H3 cell, to now for example the total number of species by cell.

In [None]:
amph_total_cell <- amph_records |> 
    group_by(cell) |> 
    count()

head(amph_total_cell)

Because H3 system is hierarchical, we can actually aggregate it to coarser cells.

In [None]:
amph_h3_4 <- amph_records |> 
    mutate(h3_4 = h3jsr::get_parent(cell, res = 4))
head(amph_h3_4)

And calculate and plot the number of records, for example.

In [None]:
amph_h3_4_agg <- amph_h3_4 |> 
    group_by(h3_4) |> 
    summarise(total_records = sum(records))

amph_h3_4_agg_sf <- h3jsr::cell_to_polygon(amph_h3_4_agg$h3_4, simple = FALSE)

print(amph_h3_4_agg_sf)

colnames(amph_h3_4_agg)[1] <- "h3_address"

amph_h3_4_agg <- left_join(amph_h3_4_agg_sf, amph_h3_4_agg)

wrld <- rnaturalearth::ne_countries(returnclass = "sf")

ggplot() +
    geom_sf(data = wrld) +
    geom_sf(data = amph_h3_4_agg, aes(fill = total_records)) +
    theme_light() +
    coord_sf(xlim = c(150, 170), ylim = c(-40, -30))
