bug-cruncher

We come to crop and sort insects.

We combine art and science to create beauty and knowlege.

Approach

Our approach was fully powered by open source tools that helped us download, share, and process high resolution scans of insect specimen drawers provided by the Museum für Naturkunde Berlin for the Coding da Vinci Hackaton 2017 under a CC0 license.

After experimenting with linux command line tools, we adopted ImageJ as our main tool for feature detection (finding the outlines of bugs and butterflies) and cropping of individual insect specimens from the high resolution drawer scans, utilizing a similar protocol used for counting cells on microscopy images.

Loading and sharing the raw data

Our first approach to load all the data into this Github repository was quickly rejected, since there are about 300 GB files that need to be processed. These files will then generate at least another 100 GB of result data. This might be a bit much for a Github repository (though we did not check the limits of Github). Our alternative approach uses a private Nextcloud instance where we had about 700 GB diskspace available. To maximise the loading speed, we downloaded the data directly to the data folder of the Nextcloud using wget. For this, we prepared a textfile containing all download links (the metadata csv file provided contained errors and was thus not directly usable).

We copied and pasted the text from http://gbif.naturkundemuseum-berlin.de/hackathon/Insektenkasten/High_resolution/ into a text file and used Sublimes multiselection to create the wget input file data/highResUrls.txt and executed wget in the Nexcloud data directory (configured in nextcloud/config/config.php) where the files should be located.

cd /var/lib/nextcloud/data/myUserName/files/bug-cruncher/highRes
wget -i highResUrls.txt

Since the download even with full 10 MB/s took over 8 hours, we had to cancel the download and resume at a later point.

wget -N -i highResUrls.txt

We then used the Nexcloud command line tool to rescan the files and add them to the Nextcloud database.

sudo -u www-data php occ files:scan myUserName --path myUserName/files/bug-cruncher

We were now able to mount the Nextcloud directoy on Linux ~~and Windows~~ machines using WebDAV. This enabled us to use a fast computer to calculate the data while directly sharing the results.

HTML Galleries

masonany grid
d3 treemap

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
config		config
data		data
documentation		documentation
lib		lib
public		public
src		src
.editorconfig		.editorconfig
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
nodemon.json		nodemon.json
package.json		package.json
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bug-cruncher

Approach

Loading and sharing the raw data

HTML Galleries

About

Releases

Packages

Contributors 3

Languages

License

haxorpoda/bug-cruncher

Folders and files

Latest commit

History

Repository files navigation

bug-cruncher

Approach

Loading and sharing the raw data

HTML Galleries

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages