In [8]:
%load_ext autoreload
%autoreload 2

# import packages
from io import BytesIO
from urllib.request import urlopen
from zipfile import ZipFile
from pathlib import Path

import file_ops as file_ops

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Downloading reef presence/absence datasets

This notebook functions to download records of tropical coral presence. The following datasets are currently downloadable:
- UNEP-WCMC Global Distribution of Coral Reefs
- Allen Coral Atlas
- Global Reef Tracker, Reef Check
- XL Catlin Seaview Survey

N.B. due to the large quantities of data involveed, individual cells can take several minutes to run.
Users of MacOS may have to install certificates for their Python instance in order to download files using urllib. Follow the instructions [here](https://stackoverflow.com/questions/68275857/urllib-error-urlerror-urlopen-error-ssl-certificate-verify-failed-certifica).

### File structure

```
data
├── gdcr
├── aca
├── rc
└── seaview
```
----

In [13]:
# initialise file structure
data_dir_fp = (Path("data"))
sub_dirs = ["gdcr", "aca", "rc", "seaview"]
file_ops.create_directory_structure(data_dir_fp, sub_dirs)

Directory already exists: data/gdcr
Directory already exists: data/aca
Directory already exists: data/rc
Directory already exists: data/seaview


## [UNEP-WCMC Global Distribution of Coral Reefs](https://data.unep-wcmc.org/datasets/1)

Purported to be the "most comprehensive global dataset of warm-water coral reeefs to date", the UNEP-WCMC Global Distribution of Coral Reefs (GDCR) is compiled from various sourcees including the Millennium Coral Reef Mapping Project (IMaRS-USF and IRD 2005, IMaRS-USF 2005) and the World Atlas of Coral Reefs (Spalding et al. 2001).

*UNEP-WCMC, WorldFish Centre, WRI, TNC (2021). Global distribution of warm-water coral reefs, compiled from multiple sources including the Millennium Coral Reef Mapping Project. Version 4.1. Includes contributions from IMaRS-USF and IRD (2005), IMaRS-USF (2005) and Spalding et al. (2001). Cambridge (UK): UN Environment World Conservation Monitoring Centre. Data DOI: https://doi.org/10.34892/t2wk-5t34*

Current (January 2024) data download url: https://datadownload-production.s3.us-east-1.amazonaws.com/WCMC008_CoralReefs2021_v4_1.zip

| Coverage           | Spatial Resolution (m) | Actively updated? | Methodology                           | Format                                            | DOI |
|--------------------|------------------------|---------|---------------------------------------|---------------------------------------------------|---| 
| Global             | 30                     | No     | Historic in-situ surveys, remote sensing | GeoDataFrame containing multipolygon objects stored as .shp file. Additional columns detailing species type, date of record, and survey methodology included but often not filled. | https://doi.org/10.34892/t2wk-5t34 | 


In [11]:
# TODO: set up file system for handling all these
# TODO: µake sure you'ree not redownloading the same file
zipurl_gdcr = "https://datadownload-production.s3.us-east-1.amazonaws.com/WCMC008_CoralReefs2021_v4_1.zip"

with urlopen(zipurl_gdcr) as zipresp:
    with ZipFile(BytesIO(zipresp.read())) as zfile:
        zfile.extractall('data/gdcr/WCMC008_CoralReef2021_Py_v4_1.shp')

---
## [Seaview Survey Photo-quadrat and Image Classification Dataset](https://www.catlinseaviewsurvey.com/)

The XL Catlin Seaview survey used an SVII camera system along 2km survey transect to collect geo-referenced images of the benthos. In total, 1400km of reef was surveyed. The survey is no longer active. 

Data is available via THREDDS hosted by the [University of Queensland](https://espace.library.uq.edu.au/view/UQ:734799)

*González-Rivero, Manuel, Rodriguez-Ramirez, Alberto, Beijbom, Oscar, Dalton, Peter, Kennedy, Emma V., Neal, Benjamin P., Vercelloni, Julie, Bongaerts, Pim, Ganase, Anjani, Bryant, Dominic E.P., Brown, Kristen, Kim, Catherine, Radice, Veronica Z., Lopez-Marcano, Sebastian, Dove, Sophie, Bailhache, Christophe, Beyer, Hawthorne L., and Hoegh-Guldberg, Ove(2019). Seaview Survey Photo-quadrat and Image Classification Dataset. The University of Queensland. Data Collection.https://doi.org/10.14264/uql.2019.930*

Current (January 2024) data download url: http://data.qld.edu.au/public/Q1281/tabular-data.zip

| Coverage           | Spatial Resolution (m) | Actively updated? | Methodology                           | Format                                            | DOI |
|--------------------|------------------------|---------|---------------------------------------|---------------------------------------------------|---| 
| Global, sparse | 1m | No | ML-classified underwater photo transects | xxx | https://doi.org/10.14264/uql.2019.930 | 


In [18]:
words = ["asdf","sdfgdsfgssadf","sdfgdsfgasdf"]

[word for word in words if "asdf" in word]

['asdf', 'sdfgdsfgasdf']

In [21]:
zipurl_seaview = "http://data.qld.edu.au/public/Q1281/tabular-data.zip"
seaview_dir = Path("data/seaview")


with urlopen(zipurl_seaview) as zipresp:
    with ZipFile(BytesIO(zipresp.read())) as zfile:
        # Find the .shp file in the ZIP archive
        reefcover_fps = [file for file in zfile.namelist() if "reefcover" in file]

    print(reefcover_fps)
        # Check if there is at least one appropriate file
        # if reefcover_fps:
        #     # Extract the first "reefcover"-containing files into 'seaview' folder
        #     zfile.extract(reefcover_fps[0], seaview_dir)
        #     print(f"Extracted {reefcover_fps[0]} to {seaview_dir}")
        # else:
        #     print("No file found containing 'reefcover' in the ZIP archive.")



['tabular-data/seaviewsurvey_reefcover_atlantic.csv', 'tabular-data/seaviewsurvey_reefcover_indianocean.csv', 'tabular-data/seaviewsurvey_reefcover_pacificaustralia.csv', 'tabular-data/seaviewsurvey_reefcover_pacifichawaii.csv', 'tabular-data/seaviewsurvey_reefcover_southeastasia.csv']


---
## TODO: [Allen Coral Atlas](https://allencoralatlas.org/)

The latest – and highest-resolution – global coral map, the Allen Coral Atlas (ACA) uses a Random Forest algorithm to classify 3.7m Planet satellite imagery into various benthic classes including coral/algae, seagreass, micoalgal mats etc. While there are numerous limitations (restriction to shallows, fairly low classification accuracy), the ACA is being constantly improved, and could be updated to allow monitoring of the change of coverage.

The web app which hosts the ACA requires the creation of an account and only allows manual download of individual regions, or user-defined areas.

*Allen Coral Atlas (2022). Imagery, maps and monitoring of the world's tropical coral reefs. doi.org/10.5281/zenodo.3833242*

| Coverage           | Spatial Resolution (m) | Actively updated? | Methodology                           | Format                                            |
|--------------------|------------------------|---------|---------------------------------------|---------------------------------------------------|
| Global             | 5                     | Yes     | Remote sensing | GeoDataFrame containing multipolygon objects stored as .gpkg file. |


---
## [Reef Check](https://www.reefcheck.org/)

Reef Check is a non-profit organisation which trains local volunteer citizen scientist divers to collect data on reef health.

Downloading the csv requires filling in a Google Form which is processed manually before the dataset is sent via email.

*Reef Check Foundation. Reef Check Global Reef Dataset. data.reefcheck.org*

| Coverage           | Spatial Resolution (m) | Actively updated? | Methodology                           | Format                                            | DOI |
|--------------------|------------------------|---------|---------------------------------------|---------------------------------------------------|---| 
| Global, sparse | 100m transects, 20m intervals | Yes | Continuous, undirected in-situ surveys by citizen scientists | csv file detaiiling coordinatees (to 6dp), date, substrate type, "total" substrate, depth | n/a | 
