We come to crop and sort insects.
We combine art and science to create beauty and knowlege.
Our approach was fully powered by open source tools that helped us download, share, and process high resolution scans of insect specimen drawers provided by the Museum für Naturkunde Berlin for the Coding da Vinci Hackaton 2017 under a CC0 license.
After experimenting with linux command line tools, we adopted ImageJ as our main tool for feature detection (finding the outlines of bugs and butterflies) and cropping of individual insect specimens from the high resolution drawer scans, utilizing a similar protocol used for counting cells on microscopy images.
Our first approach to load all the data into this Github repository was quickly rejected, since there are about 300 GB files that need to be processed. These files will then generate at least another 100 GB of result data. This might be a bit much for a Github repository (though we did not check the limits of Github). Our alternative approach uses a private Nextcloud instance where we had about 700 GB diskspace available. To maximise the loading speed, we downloaded the data directly to the data folder of the Nextcloud using wget
. For this, we prepared a textfile containing all download links (the metadata csv file provided contained errors and was thus not directly usable).
We copied and pasted the text from http://gbif.naturkundemuseum-berlin.de/hackathon/Insektenkasten/High_resolution/ into a text file and used Sublimes multiselection to create the wget
input file data/highResUrls.txt
and executed wget
in the Nexcloud data directory (configured in nextcloud/config/config.php
) where the files should be located.
cd /var/lib/nextcloud/data/myUserName/files/bug-cruncher/highRes
wget -i highResUrls.txt
Since the download even with full 10 MB/s took over 8 hours, we had to cancel the download and resume at a later point.
wget -N -i highResUrls.txt
We then used the Nexcloud command line tool to rescan the files and add them to the Nextcloud database.
sudo -u www-data php occ files:scan myUserName --path myUserName/files/bug-cruncher
We were now able to mount the Nextcloud directoy on Linux and Windows machines using WebDAV. This enabled us to use a fast computer to calculate the data while directly sharing the results.
- masonany grid
- d3 treemap