Skip to content

haxorpoda/bug-cruncher

Repository files navigation

bug-cruncher

We come to crop and sort insects.

We combine art and science to create beauty and knowlege.

Approach

Our approach was fully powered by open source tools that helped us download, share, and process high resolution scans of insect specimen drawers provided by the Museum für Naturkunde Berlin for the Coding da Vinci Hackaton 2017 under a CC0 license.

After experimenting with linux command line tools, we adopted ImageJ as our main tool for feature detection (finding the outlines of bugs and butterflies) and cropping of individual insect specimens from the high resolution drawer scans, utilizing a similar protocol used for counting cells on microscopy images.

Loading and sharing the raw data

Our first approach to load all the data into this Github repository was quickly rejected, since there are about 300 GB files that need to be processed. These files will then generate at least another 100 GB of result data. This might be a bit much for a Github repository (though we did not check the limits of Github). Our alternative approach uses a private Nextcloud instance where we had about 700 GB diskspace available. To maximise the loading speed, we downloaded the data directly to the data folder of the Nextcloud using wget. For this, we prepared a textfile containing all download links (the metadata csv file provided contained errors and was thus not directly usable).

We copied and pasted the text from http://gbif.naturkundemuseum-berlin.de/hackathon/Insektenkasten/High_resolution/ into a text file and used Sublimes multiselection to create the wget input file data/highResUrls.txt and executed wget in the Nexcloud data directory (configured in nextcloud/config/config.php) where the files should be located.

cd /var/lib/nextcloud/data/myUserName/files/bug-cruncher/highRes
wget -i highResUrls.txt

Since the download even with full 10 MB/s took over 8 hours, we had to cancel the download and resume at a later point.

wget -N -i highResUrls.txt

We then used the Nexcloud command line tool to rescan the files and add them to the Nextcloud database.

sudo -u www-data php occ files:scan myUserName --path myUserName/files/bug-cruncher

We were now able to mount the Nextcloud directoy on Linux and Windows machines using WebDAV. This enabled us to use a fast computer to calculate the data while directly sharing the results.

HTML Galleries

  • masonany grid
  • d3 treemap

About

We come to crop and sort insects.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published