Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
504 lines (300 sloc) 22.6 KB

Beginners introduction to Git and GitHub

Overview

Morning session (Sergio and Mark) https://tinyurl.com/git-intro-zoology:

Afternoon session (Mark and Anne) https://tinyurl.com/git-course: markdown, pages and wikis; creating good README files; issue tracking; sofware licenses

Background and motivation

  • When repeating / reviewing previous work, researchers greatly benefit from having access to detailed documention of the methods used.

  • Keeping track of the different versions of your project is one more way of being more reproducible.

  • Some of the manual approaches to version control have clear limitations:

(http://phdcomics.com/comics/archive_print.php?comicid=1531)

  • The scientific community is beginning to consider the value of peer reviewing computer code - see Nature Methods August 2018 editorial Easing the burden of code review:

An increasing share of modern research relies on analytical code and software. In turn, a good deal of irreproducible research can be attributed to computational tools that are difficult to decipher, use or recreate. Through the concerted efforts of computational researchers and stricter guidelines from publishers, the culture of scientific software is now more open and geared toward dissemination than ever [...]

  • GitHub is becoming the go-to site when it comes to releasing / sharing the code associated to a manuscript or the scripts developed within a project.

What is version control? What is Git? What is GitHub?

Version control is the management of changes (a.k.a. revisions) to any types of information

  • Simple versioning: adding v1.0, v1.1, v1.2, v2.0 ... to file names
  • Basic tools: Google Drive, Dropbox ...
  • Advanced tools: Git

The first version control systems were created by groups writing software and code. Fortunately they can now be used not only for computer code but for any type of files 😄

(adapted from http://lhzuigao.com/309note.html)

Advantages of distributed (right) over centralised (left) version control systems involve:

  • If the central repository (server) crashes, it could be recovered / backed up from any of the local repositories created e.g. by the researcher, collaborator or group leader.
  • Each person can make changes to their local repositories offline. Then integrate their individual changes in the central repository (server) when connected online.

Git is a distributed version control system to keep track and compare the history of changes made to your scripts and files. It allows groups of people to work on the same documents at the same time, and without stepping on each other's toes. It was created by Linus Torvalds in 2005 for the development of the Linux project. It is free and open source and helps you with:

  • Creating repositories to host your projects using the command-line
  • Tracking changes in the files and folders within your repositories

GitHub is a platform to share and showcase your work online with collaborators and the wider audience. A tool to help you build projects that are collaborative, well documented, and version-controlled. It provides you with:

  • A place to host and backup your repositories online
  • A nice web interface to your repositories
  • A strategy to collaborate with colleagues

Versions in Git and GitHub are identified by a revision number, e.g. 60363b1, also known as commit. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged.

There are other softwares for version control similar to Git, e.g. svn. Also, there are other online platforms similar to GitHub to share and collaborate code, e.g. GitLab.

How can you use Git and GitHub? How can they be useful for you?

The interfaces to Git and GitHub are:

  • Via the command-line using git

  • Github Desktop (available for Mac and Windows)

(https://programminghistorian.org/lessons/getting-started-with-github-desktop)

For this workshop, we will use Git commands and GitHub's online interface.

Examples

In the context of our research group

  • Communication is key as most projects have both experimental and computational leaders

  • Building from the classical ways of sharing - conversations/meetings, email, Dropbox, shared folders ... we want to build an environment where:

    • Computational colleagues can share code, figures and tables. Review others work and get credit from their collaborative work
    • Experimental colleagues can follow computational developments, access results and learn methods of data analysis
  • And ideally avoiding situations like ...

(http://phdcomics.com/comics.php?f=1689)

  • Parhaps a happier lifetime for a research project:

(https://github.com/semacu/20170703_GitHubintheLab_CRUK-CI)

Public (free) and private repositories

If you want to start creating repositories in GitHub, your first need to open an account:

  • Public repositories are free, and can be browsed and downloaded by anyone
  • Private repositories have associated costs - see pricing of plans. The developer plan costs $7/month but it is free if you are a student or an academic

Alternatively, GitLab uses a different business strategy with free private repositories and cost plans for public ones. There are other alternatives e.g. Bitbucket.

Markdown

  • GitHub uses Markdown for text edition, a language with plain text formatting syntax (bold, italics, checkboxes, lists, etc.), to render pages online (like HTML but easier). You can use this syntax in text files (.md), commit messages, issues, blog posts, and more.

  • Markdown is important because GitHub automatically renders anything written in Markdown. This can be specific files (eg: README), or your comments and issues.

  • Some examples of Markdown syntax are available here.

Practical session: working with Git and GitHub

We have four possible tutorials:

Create a GitHub account

  • Go to https://github.com
  • Fill in your Username, Email and Password. Then click on the green button "Sign up for GitHub".

  • Choose your personal plan page. Select "Free plan" and then click on "Continue".

  • Tailor your experience page. Choose the boxes that apply to you and click on "Submit". Otherwise, just go to "skip this step".

  • You have created a GitHub account! 😄

Create your first repository

  • If you are not already signed in, sign in to GitHub using the Username/Email and Password created before.
  • Click on the top-right "avatar icon" and select "Your profile". Have a quick browse through your page.

  • Click on the top-right "+" icon and select "New repository". Verify your email address. You should have just received an email from GitHub in the address provided before. Find this email and click on "Verify email address".

  • Create a new repository page. Fill in a "Repository name", e.g. "my_first_repository" or "my_analysis_script". Write a short description of your repository e.g. "This is a test repository". For now choose "Public" and select the box to initialize this repository with a README. Finally, click on "Create repository".

  • You created your first repository! 🚀

Explore your first repository and GitHub account

Your first repository

  • Click on README.md and go to the right pencil "Edit this file". Type anything to change the file, e.g. "GitHub is fun!".

  • Scroll down. Introduce a commit change message, e.g. "My first update", and select the radio button "Commit directly to the master branch". Then click on "Commit changes". Voilá!

  • To view your history of commits for README.md, click on README.md and then on the "History" button on the right.
  • Alternatively, to view your history of commits for your first repository, click on the name of your repository and select the tab depicting a small clock and the number of commits next to it.

Bonus points (5 min):

  • Try to create a new file
  • In your new repository, have a look at the "Settings" tab, explore "Collaborators" and try to add the person sitting next to you.

Your GitHub account

  • Click on your top-right "avatar" icon and select "Settings".

  • Explore the tabs "Profile", "Account" and "Emails".

Key glossary:

  • Repository: it can be thought of as a project folder. A repository contains all of the project files, issues, wikis and more. It also stores the history and versions of each file.

  • Commit: equivalent to saving your changes to a file. When you commit you usually include a brief description of the changes you made so you can identify versions later if you want to undo a change.

  • Branch: an identical copy of a project at a particular point in time kept separate from the 'master' branch (primary copy). This keeps your code in the 'master' branch safe while you make changes and experiment with code on the new branch. You can merge your new branch back into the 'master' branch when you want to publish your changes.

  • Master: the default branch in your repository.

  • Collaborator: someone with read and write privileges to a repository as approved by the repository owner.

Making changes using Git in the command-line

Check if Git is already installed in your computer, otherwise install it

  • (If in Mac), go to Finder -> Applications -> Utilities -> Terminal and type git --version.

    • If you get as output something like git version 2.5.4 (Apple Git-61), then Git is already installed -> Jump to the next section.
    • If you get something around git: command not found, keep reading.
  • To install Git in Mac, follow one of the next strategies:

    1. When running one of the following commands git --version, git config or xcode-select --install you may be offered to install developer command line tools. Accept the offer and follow with "Install".
    2. Go to https://git-scm.com/downloads and download git. Double click on the downloaded executable and follow instructions.
    3. If you have homebrew installed, type the following in the Terminal: brew install git.
  • Check the following for installing in Windows or Linux.

Tell git who you are (your GitHub username) and what your email address is

Example:

cd ~/Desktop
git config --global user.name "semacu"
git config --global user.email "sermarcue@gmail.com"

Remember to change "semacu" and "sermarcue@gmail.com" to the username and email you used when creating the GitHub account above.

Check:

git config --list

Clone the repository created before

git clone https://github.com/semacu/my_first_repository.git
cd my_first_repository
ls -lh

Your first repository created using GitHub (my_first_repository) is now a local repository located in your Desktop folder. Remember what we discussed earlier about Git being a distributed version control system.

Tell git what's your remote repository url to pull and push commits

cd ~/Desktop/my_first_repository
git remote set-url origin https://semacu@github.com/semacu/my_first_repository.git

Check:

git remote -v

Make a change to the README.md file using your favourite text editor

  • In your Desktop, use Finder to go to the cloned folder and open README.md with your favourite text editor, e.g. TextEdit.
  • Change README.md, e.g. add a new line "This is my second line of script" and save changes.
  • Now, go back to the Terminal and check how changes are tracked by Git:
cd ~/Desktop/my_first_repository
git status

The status of README.md is modified but the changes are not staged (red).

Stage and commit the change

Staging:

git add README.md
git status

The status of README.md is modified and now the changes are staged (green) and ready to commit.

Committing:

git commit -a -m "My second update"
git status

Push changes to your online GitHub repository

git push origin master

Now check that your change to README.md made to your online GitHub repository.

Bonus points (5 min):

  • Make another change to README.md using the online GitHub repository and pull the change to your local repository (Hint: use git pull).

Key glossary:

  • Clone: a copy of an online repository on your local computer so you can make edits on your own personal copy without having to be online. You can sync changes between your clone and the remote copy (GitHub) when you are online.

  • Remote: a version of your project repository that is hosted on the Internet or network somewhere (e.g. copy of your project on GitHub vs. on your local computer).

  • Stage and commit:

(https://git-scm.com/book/en/v2/Getting-Started-Git-Basics)

  • Push: sends the recent commit history from your local repository up to GitHub.

  • Pull: grabs any changes from the remote GitHub repository and merges them into your local repository.

Extras

  • Create an issue
  • Create a new branch, open a pull request and merge the newly created branch with the master branch
  • Fork a repository from another user e.g. https://github.com/githubtraining/hellogitworld, make some changes to the README.txt and create a pull request

Outlook

  • A basic overview to the basic functionality of Git and GitHub

  • Next steps for computational reproducibility, going back to the Nature Methods August 2018 editorial Easing the burden of code review:

[...] Yet, even in the era of Git repositories, peer reviewing code can be frustrating and time consuming [...] Computational tools are complex objects that depend on many components to run. Dependencies include the operating system, programming language, external code libraries, configuration settings and run parameters. Reproducing these conditions is made even harder by the fact that components typically exist in multiple versions. Many come with their own prerequisites, creating a maddening rabbit hole of dependencies on dependencies [...]

In other words, future steps will be to be able to execute code directly online (cloud). Two new resources are beginning to make a difference in this area - check them out 😉

  • Code Ocean: Nature Methods, Nature Biotechnology and Nature Machine Intelligence have launched a trial to facilitate the peer review of computational methods and to improve their reproducibility
  • Binder

The End

Many Thanks for your attention! Enjoy Git and GitHub! :octocat:

Feedback: please complete the following short survey

Any later questions about this workshop or the materials? Just email: sermarcue@gmail.com or mark.fernandes@cruk.cam.ac.uk

References and materials

Blogs:

Books:

Courses:

Help:

Papers:

Videos:

Websites:

Acknowledgements

You can’t perform that action at this time.