Skip to content

Releases

Ana Carolina Paixao de Queiroz edited this page Aug 1, 2022 · 13 revisions

A release is an export of all or part of a repository that we want to be able to access for some downstream use. Examples include:

  • Draft of a paper for posting or submission
  • Slide deck for a presentation
  • Cleaned data files to be used in other repositories or projects
  • Intermediate data files to be used in the current repository that we want to maintain in a stable and replicable state

Releases should include the files we intend to use downstream as well as sufficient information to reproduce those files. That typically means recording the commit at which the released files were produced and/or the state of the full repository at that time.

When the released files are PDFs of papers, talks, etc. we create releases on Github. When the released files are data or other large files we create releases on Dropbox.

Creating a release

GitHub Release

A release on Github consists of a tag to a specific commit, a zip archive of the repository (excluding any large files handled by LFS), and additional binary files that may be attached by hand. To create a Github release:

  • Navigate to the Github releases section in a web browser
  • Choose "Draft a new release"
  • Choose a descriptive title (e.g., "Econometrica Second Submission 10_2020") and a descriptive short tag (e.g., "Ecma2ndSubmission")
  • Attach the released files (e.g., PDF of a paper or talk) as binaries

Dropbox Release

A release on Dropbox consists of an export of a file, directory, or entire repository. We save these within the releases/ directory of the GS Lab Dropbox in a subdirectory per project, with each release being stored using the day the release was made's date--which can be cross-referenced from GitHub. In case there are multiple releases made in one day, this will automatically append a letter to the name to indicate this.

Our preferred tool for creating these releases is rclone. The following code can be helpful in how to execute this in practice. Please edit it to ensure that RCLONE_DROPBOX is accurate to your rclone setup, PROJECT_NAME is your project's GitHub name, and EXT_NAME is either full_repo if it is the whole repository or an informative name such as cleaned_data. This script should be run knowing the current working directory will be be copied in full. If you only want to copy certain files, or types of files, consult this wiki on rclone filtering. Additionally, this script will also create a .txt file with key information about the current state of the repo on git.

	URL=$(git remote get-url origin)
	BRANCH_NAME=$(git rev-parse --abbrev-ref HEAD)
	HASH_CODE=$(git rev-parse HEAD)
	[ -e release_readme.md ] && rm release_readme.md
	echo "## URL\n$URL\n## Branch\n$BRANCH_NAME\n## Hash\n$HASH_CODE" >> release_readme.md

	DATE=$(date +%Y_%m_%d)
	PROJECT_NAME=<YOUR PROJECT NAME>
        EXT_NAME=<NAME OF SUBDIRECTORY>
	RCLONE_DROPBOX=dropbox

	if rclone lsf dropbox:release/$PROJECT_NAME/$EXT_NAME/$DATE 
	then
		for letter in {b..z}; do
	 		if rclone lsf dropbox:release/$PROJECT_NAME/$EXT_NAME/$DATE"_"$letter 
			then
				continue
			else
				echo "Copying to: release/$PROJECT_NAME/$EXT_NAME/$DATE"_"$letter"
				rclone mkdir $RCLONE_DROPBOX\:release/$PROJECT_NAME/$EXT_NAME/$DATE"_"$letter
				rclone copy --skip-links ./ $RCLONE_DROPBOX\:release/$PROJECT_NAME/$EXT_NAME/$DATE"_"$letter
				echo "Done copying"
				break
			fi
		done
	else 
		echo "Copying to: release/$PROJECT_NAME/$EXT_NAME/$DATE"
		rclone mkdir $RCLONE_DROPBOX\:release/$PROJECT_NAME/$EXT_NAME/$DATE
		rclone copy --skip-links ./ $RCLONE_DROPBOX\:release/$PROJECT_NAME/$EXT_NAME/$DATE
		echo "Done copying"
	fi
	rm release_readme.md

If you do not have rclone set up, check the section below for instructions.

Replication packages

For many, but not all, releases we may want to have an associated replication package that can be posted or distributed online that contains all of the necessary content to create the relevant paper and slides. These should not contain intermediate outputs or unnecessary code. This creation process can be done locally--or in a branch to allow for collaboration.

  1. Remove all files and repositories besides code/, external.txt, input.txt, and make.py. The only exception is that paper_slides/output should still contain the final paper and slides.
  2. Ensure that all comments within the LyX documents are removed.
  3. Delete any unused code files and remove them from the corresponding make.py.
  4. Replace any git submodules (e.g., gslab_make) with directories committed directly to the repository
  5. Ensure that there is a well-documented README of how to obtain any relevant data outside of what we have permission to share.
  6. Move the repo as it currently stands to a new repo and delete the git history by removing .git/. Leave .gitignore and .gitattributes within the repo.
  7. Initialize a new git history by using the command git init.
  8. Zip the folder.
  9. Run the entire repo using the command python run_all.py to ensure that it is indeed possible to create the appropriate output with this repo. If this does not run succesfully, use the zipped version of the repo to make changes since this will have the appropriate intermediate files already removed. Return to (7).
  10. Attach the zipped folder as a binary to your release.

Using releases

Dropbox data

If the repository incorporates data stored on Dropbox, ensure that for any re-run the outside data is in the proper state. Dropbox has a "rewind" feature, so you can go to the relevant repositories and choose to restore the state of the repository as it was on the date that the release was made. These changes affect all users of the Dropbox folder so be sure to revert to the most up to date status after use.

In certain cases, the release should also have a link to a corresponding data snapshot on Dropbox. The snapshot is a folder gslab_data_snapshots/[name-of-repo]-[release-commit-#]. The folder should contain the version copy of the repository corresponding to the release (repo) and a copy of the raw folders from Dropbox used on the repo (data_dropbox). Ideally, the folders and files should be compressed at the highest level possible, and each .zip file should not contain more than 50Gb of uncompressed data. A lab member with Team admin access should add the folder to gslab_data_snapshots using the Admin console, and confirm all sub-folders only have view-only access to users.

Outside repositories

In the case where a project needs the output from another repository, we recommend using stable output stored in the releases folder. In the case that the two projects must be worked on side by side rsync is useful and can be added to make files to ensure data is up to date before each release.

FAQ

How do I set up rclone?

On your personal machine, run the command brew install rclone if you do not already have it downloaded. On Sherlock, make sure that module load rclone is in your ./bash_profile. In either case, configure Dropbox using these instructions. Recommended names are dropbox and gsbox.

How do I use Dropbox rewind?

The documentation from Dropbox here is quite helpful.