Skip to content

Backups

Jatin Khilnani edited this page Jun 26, 2023 · 34 revisions

Rook

Our acoustic data are stored in the /volume1/data folder on Rook. Box backs up this folder to the lab's Box account folder, data-backup.

When new data comes in, we:

  • Create a folder on an external hard drive named after the first date that recorders from this deployment were brought back (referred to here as <upload_folder_path>)

  • Remove microSD cards from recorders and check recorder & card IDs against deployment sheet

  • Insert microSD cards into Robin one-by-one, slowly, making sure that each inserted card is mounted, and that the mounted card has the same name as is written on the microSD card.

  • Use Robin to transfer all data from microSD cards to the upload folder on the external HD

    • Use this tool: https://github.com/kitzeslab/sd-transfer - located on Robin at /Volumes/lacie/scripts/sd-transfer/SD-tool/

    • Open the program "iTerm2" on the computer--this gives you access to the command line

    • Copy and paste the following into the command line, then press Enter. This switches your current directory to the one that contains the SD transfer tool:

        cd /Volumes/lacie/scripts/sd-transfer/dist/SD-tool
      
    • Copy and paste the following into the command line, but do not press "Enter" yet.

        ./SD-tool -p MSD -l 
      
    • Go to the Finder window, then navigate to the upload folder you created. Select the upload folder by clicking on it once, without opening it. Then press command + option + c to copy this path.

    • Go back to iTerm. Paste the path that you copied from the Finder window in the previous step. Then press Enter. This will run the SD transfer tool. Wait for cards to finish uploading.

  • After all cards have been uploaded, check that the proper number of files have been uploaded for each card

  • Create a .txt file describing the data. Save this spreadsheet in the upload folder you created.

  • Create a spreadsheet (Excel or .csv format) describing what data from each microSD card should be used. The spreadsheet should have three columns: name for the name of the SD card, dropoff_date for the dropoff date of the SD card, and pickup_date for the pickup date of the SD card. Each SD card should have its own row. Save this spreadsheet in the upload folder you created.

    • For deployments where all recorders should have the same amount of data: only use data taken after the last overall deployment date, and before the first overall return date (across all recorders). For other deployments: only use data taken after the recorder's deployment date and before its pickup date.
    • Identifying boundary days allows us to avoid people/handling noise, and makes sure all our recordings start and end on the same date
  • Make sure nobody in the lab is using the hard drive that you saved the data on. Eject that hard drive from Robin, carefully remove its power and connection cords, then connect the drive to Rook

  • Drag & drop the upload folder on Rook so it sits within /volume1/data/field-data/<location_code_name>, replacing <location_code_name> with the 4-letter code, e.g. pnre, ssfo.

  • After data are done uploading to Rook, use Mac's Disk Utility program to reformat all cards to the MS-DOS (FAT) format

Robin

  • Project files are stored on lacie, attached to Robin
  • seagate2 and seagate3 are used for generic data storage and transfer.
  • Time machine backups of Robin's hard drive are stored on seagate1.
  • Backblaze backs up Robin and all external hard drives except for seagate1.
    • Warning: if an external drive has been detached for over 30 days, Backblaze will stop backing it up and delete the remote copy of its backups!

Emu

  • emu is the new storage structure attached to our computing workhorse snowy. For reference on how to use snowy and access emu refer to Computing resources

  • There are two stages of backup performed for each dataset (field or annotated) as described below.

    1. Local: Once the datasets are collated and finalized per the dataset management protocol, they need to be saved to the correspoding storage location on emu, path for which are as following.
      • Field: /media/emu/datasets/aru
      • Annotated: /media/emu/datasets/labeled

    If the datasets are being moved over from external locations/drives, /media/emu/copyto_emu.sh script could be utilized to perform the data copy as it would provide log and statistics for the entire process with verification pre/post copy.

    1. Cloud: All finalized datasets should then be copied over to the lab AWS Glacier bucket. This can be achieved using /media/emu/copyto_cloud.sh script to be run on snowy through lab account. Details of script usage is provided in the docstring.