Skip to content

Dropbox

Matthew Gentzkow edited this page Mar 18, 2021 · 1 revision

Locations

We use the gslab_data team folder as our primary storage location. Raw files are stored in its raw subdirectory and intermediate files in intermediate.

We also use a Temporary file share to pass large temporary files and directories between one another when necessary. Nothing in this directory should be considered stable by default.

Permission control

  • The default permission for the gslab_data, raw, and intermediate directories is can edit so that all shared members can add new subdirectories in this folder.
  • Set stable subdirectories to can view.
  • To edit a stable directory, an administrator (MG, JMS, or an RA with admin responsibilities) will need to change the permission of that directory from can view to can edit. Note that all subdirectories of that directory will inherit such change.

Storage and revisions

We have theoretically unlimited potential storage on Dropbox, but our capacity at any point in time is limited. Before adding new assets, check that they will not exceed the cap. If they will exceed the cap—or if we seem to be running low on storage space—contact MG.

Dropbox supports unlimited version history for all files, unless the history is explicitly deleted. We should always use the most up-to-date version of each file. If there is an error with the most recent version, it should be fixed (possibly using the version history).

Local copies

We need to be careful that Dropbox does not automatically sync the entire gslab_data team folder to our hard drives. This will fill up all available disk space and can cause issues (including corrupting your User Account on macOS). We recommend three possible remedies: Dropbox sync settings, rclone, or Partitioning. (GSLab RAs currently mix the first two.)

Dropbox sync settings

Use some combination of Dropbox's (team) selective sync and smart sync protocols.

  • Team selective sync allows a Dropbox admin to opt all users out of syncing any team folder. Team members can override.
  • Selective sync allows you to opt out of syncing specific folders and sub-folders.
  • Smart sync will sync pointers to the hard drive and convert them to files on-access

Pro: Team selective sync keeps Dropbox from flooding newly connected devices. Selective sync allows users to use Dropbox syncing but only on desired folders. Smart sync keeps the files in these synced folders small and easy to manage.

Con: Cannot integrate into SCons workflow. Team selective sync and smart sync require a paid Dropbox account.

Note: Because of some applications issues with spaces in file-paths, we recommend creating a symbolic link if the Dropbox installation is in your home directory as "Dropbox (GSLab)" to be named just "Dropbox" as follows ln -s Dropbox\ \(GSLab\) Dropbox.

rclone

Use rclone to selectively downloaded directories from Dropbox.

Pro: Can be integrated into SCons workflow to check synchronization with upstream copy before data build. A lot more portable than setting up a new partition. Save storage space.

Con: If the subdirectory has a lot of small files, rclone may take slightly longer than Dropbox sync. There's also a small fixed cost in starting work on a new project since you'll have to rclone a new directory instead of looking in your local Dropbox sync.

Partitioning

Partition your hard drive into two parts: one with OS/User Account information and another where Dropbox local sync resides. Since Dropbox automatically sync newly-added subfolders under team-shared folders (see here), we implement an email protocol where members are notified with a new upload of more than 10 GBs.

Pro: There is automatic syncing when there are upstream changes.

Con: Packing problem. High setup cost for a new machine. Automatic download can still overfill available space. The email protocol is not robust.