<h1>Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Version-control" data-toc-modified-id="Version-control-1">Version control</a></span><ul class="toc-item"><li><span><a href="#git" data-toc-modified-id="git-1.1">git</a></span><ul class="toc-item"><li><span><a href="#Installation" data-toc-modified-id="Installation-1.1.1">Installation</a></span></li><li><span><a href="#Configuration" data-toc-modified-id="Configuration-1.1.2">Configuration</a></span></li><li><span><a href="#Cloning" data-toc-modified-id="Cloning-1.1.3">Cloning</a></span></li></ul></li></ul></li><li><span><a href="#Creating-a-repository" data-toc-modified-id="Creating-a-repository-2">Creating a repository</a></span><ul class="toc-item"><li><span><a href="#README" data-toc-modified-id="README-2.1">README</a></span><ul class="toc-item"><li><span><a href="#Markdown" data-toc-modified-id="Markdown-2.1.1">Markdown</a></span><ul class="toc-item"><li><span><a href="#Headings" data-toc-modified-id="Headings-2.1.1.1">Headings</a></span></li><li><span><a href="#Links" data-toc-modified-id="Links-2.1.1.2">Links</a></span></li></ul></li></ul></li><li><span><a href="#gitignore" data-toc-modified-id="gitignore-2.2">gitignore</a></span></li></ul></li><li><span><a href="#Working-locally" data-toc-modified-id="Working-locally-3">Working locally</a></span><ul class="toc-item"><li><span><a href="#Atom" data-toc-modified-id="Atom-3.1">Atom</a></span><ul class="toc-item"><li><span><a href="#Committing-changes" data-toc-modified-id="Committing-changes-3.1.1">Committing changes</a></span></li></ul></li></ul></li></ul></div>

In [1]:
import shutil

def cleanup():
    for dirname in ['example_repo', 'my_fabulous_repo']:
        try:
            shutil.rmtree(dirname)
        except FileNotFoundError:
            pass

cleanup()

# GitHub

When we start to write 'real' programs that other people will actually want to use, we will need to clean up our act a bit and adopt a more organized way of working. In particular, if we have to collaborate with other programmers, there are some important things that we will need to organize to avoid getting into a sticky mess:

* **A [repository](extras/glossary.ipynb#repository)**. We would like a single central location where the current version of our program is stored and available to others, instead of just storing our program on our own computer and emailing it to people then emailing them again when we correct a mistake and then emailing them a third time when we forget the attachment.
* **[Version control](extras/glossary.ipynb#versioning)**. We would also like to keep track of different versions of our program. We may need to work on some changes provisionally, until we are confident that they are correct, or we may need to undo some changes once we discover that they have completely broken everything.
* **Documentation**. We should organize and make available some documents that explain to others how to use our program or how it works. Ideally these should be a little bit prettier than just plain text [docstrings](extras/glossary.ipynb#docstring).
* **Communication**. We would like to provide our users and collaborators with a channel through which they can notify us of all the embarrassing mistakes that they have discovered in the program, or request unnecessarily complex new features that they would urgently like the program to have.
* **Collaboration**. We may eventually be lucky enough to have assistants or kind mentors helping us write the program. We should let them know clearly what their tasks are (and are not), and provide a place for them to record their changes.

[GitHub](https://github.com/about) is a web platform that makes all of these things possible. We can store our computer programs on the GitHub website, track successive versions as we make changes, add attractive documentation, get notifications from others who comment on our project, and so on. GitHub isn't the only platform that provides these services, but it is one of the most widely used, and is relatively easy to navigate. In this lesson we will learn the basics of using GitHub to manage programming projects.

## Version control

'Version control', or simply 'versioning', broadly refers to the practice of tracking multiple versions of something. That something needn't especially be a computer program. You have probably practiced version control at some point in your career, for example with an essay or thesis that you wrote at college when you created a series of documents called *thesis_v2.docx*, *thesis_v3.docx*, *thesis_final.docx*, *thesis_final_adviser_corrections.docx*, *thesis_really_final.docx*, and so on.

Version control is particularly important for computer programs. In part this is because they often span more than a single file (for example a main [script](extras/glossary.ipynb#script) along with one or more [modules](extras/glossary.ipynb#module)), and in part because they are fragile; small mistakes can break them entirely, and changes to one part of the program need to be carefully tested to make sure that they are compatible with existing parts.

For these reasons, version control for computer programs is often automated; a version control program monitors our changes continuously in the background, records them, and later allows us to review, undo, or redo them in various combinations. An automated version control program is essential to any moderately large programming project. Imagine the hassle and the mistakes you could make if you had to manually name and store all the separate versions of all the files in your computer program in the same way as for a college thesis.

### git

There are various automatic version control programs available, but by far the most widely used is called *git*. git is a program that monitors changes to groups of files, and records those changes.

git and GitHub are two separate things, but they work together. git is a program on our computer that does the job of recording our changes locally and offline, whereas GitHub is a website on which to store the records that a git program makes. In a very loose analogy, you can think of git as being like the 'track changes' feature of a word processor program, and GitHub as being the equivalent of Dropbox or GoogleDrive, where you can store and view documents and review changes to those documents.

So to follow along with the examples in this lesson, you will need both an online account at the GitHub site (it's free and you can sign up [here](https://github.com/join)) and you will need to have git installed on your computer.

(And in case you are wondering why git is called the way it is, there are a few speculations [on Wikipedia](https://en.wikipedia.org/wiki/Git#Naming).)

#### Installation

Various data science or software development-related tools make use of git. So it is possible that you have already indirectly installed git on your computer in the process of installing some other program. To check, try `git` as a command line command at the Spyder console (don't forget the `!`). For example, you can ask git what version it is:

In [2]:
! git --version

git version 2.17.1


If you see a response like the one above (don't worry if you get a slightly different version number), then git is already installed. If instead the response is something like 'not found' or 'not recognized', then you will need to install git if you want to follow along with all of the examples in this lesson. The simplest way to do this is via Anaconda. The command line command `conda install` instructs Anaconda to install new packages (see the [online documentation for Anaconda](https://docs.anaconda.com/anaconda/packages/pkg-docs/#anaconda-package-lists)). So try it as a command line command at the Spyder console:

> `! conda install git`

You should see some printed output as git is downloaded and installed.

**Linux**: If you happen to be on Ubuntu instead of macOS or Windows and you are not using Anaconda as your Python [package manager](extras/glossary.ipynb#package), then you will instead need to go to the terminal (*not Spyder*) and enter `sudo apt-get install git` instead. You will be prompted to enter your password before installing.

#### Configuration

In order to synchronize programming projects on your own computer with your online storage at the GitHub website, git needs to know who you are on GitHub. To achieve this, you can configure git so that it knows your GitHub username and email. The command `git config`, together with the `--global` option, sets the global configuration for git on your computer ('global' in the sense that it applies across all of the git projects that you work on, and is not special to just one project).

Use it to set the `user.name` and `user.email` options. Like this:

In [3]:
! git config --global user.name 'luketudge'
! git config --global user.email 'luketudge@gmail.com'

Make sure that the username and email that you enter are the same ones you used when you signed up for a GitHub account.

You can check that these commands have taken effect correctly using the `--list` option for `git config` to list your full global configuration:

In [4]:
! git config --global --list

user.name=luketudge
user.email=luketudge@gmail.com


A few more details about configuring git are given at the [git website](https://git-scm.com/book/en/Getting-Started-First-Time-Git-Setup), but the configuration steps shown above are the only ones you are likely to ever need for basic use.

#### Cloning

There are a few different ways of starting a new programming project with git. One of these is to copy an existing online [repository](extras/glossary.ipynb#repository) from GitHub onto your computer. This is known as 'cloning' the repository. Let's see an example.

I have created an example project at the GitHub website. This project is there just for demonstrating the use of git; it does not in fact contain any Python files or indeed many files at all. This means that you should be able to clone it to your computer reasonably quickly without having to wait for a lot of files to download.

Go to the main page of the example project [here](https://github.com/luketudge/example_repo). The main page lists all the files contained in the project. There are just four small text files:

![](images/files_view.png)

Now look for the button marked 'Clone or download'. It looks like this:

![](images/clone.png)

You can use this button to just download all the project files as a *zip* archive file in the normal way. But don't do this. If we just do this, we will get the files, but git won't recognize them as something to track. Instead, copy the URL that is shown inside a text box when you click on the button. It will look like this:

`https://github.com/luketudge/example_repo.git`

Note that it has the file extension '*.git*'. This indicates that we are downloading something that git will recognize and will track changes to.

Use this URL with git's `clone` command to clone the repository to your computer:

In [5]:
! git clone https://github.com/luketudge/example_repo.git

Cloning into 'example_repo'...
remote: Enumerating objects: 10, done.[K
remote: Counting objects: 100% (10/10), done.[K
remote: Compressing objects: 100% (8/8), done.[K
remote: Total 10 (delta 1), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (10/10), done.


If you go to your file explorer, you should now see a new directory called 'example_repo' inside whatever working directory you were in when you ran the command above. (If you are not sure what directory you are in, remember that you can use Spyder's `%pwd` command to check.)

In [6]:
%pwd

'/home/lt/GitHub/introduction-to-programming/content'

You can go and check the new directory in your file explorer to verify that it has worked. You should see the same four text files that are listed on the main page of the GitHub repository. We won't look at the content of these files now. They aren't particularly interesting; this is just a demonstration of how `git clone` works.

If you see only three text files, this is probably because one of them, '.gitignore', has a name beginning with a dot, which on most file systems indicates a file that is intended to be 'hidden' (i.e. not shown by default).

If you are already feeling nostalgic for Python after all this command line work, you could try a Python command to verify that the hidden file is there. For example with the `listdir()` function from the `os` module:

In [7]:
import os

os.listdir('example_repo')

['LICENSE', '.gitignore', '.git', 'README.md', '1.txt']

Alternatively, if you would like to be able to see hidden files in general on your computer (something that can often be useful if you go further with programming), then you can tell your file explorer to show them. Follow the instructions for your operating system:

* [Windows](https://support.microsoft.com/en-us/help/14201/windows-show-hidden-files)
* [macOS](http://osxdaily.com/2018/02/12/show-hidden-files-mac-keyboard-shortcut/)

Sometimes you may wish to clone a repository into a directory with a different name, for example if its name conflicts with the name of an existing directory on your computer or if you just don't like its name. You can add the name of the directory to clone *into* at the end of the `git clone` command:

In [8]:
! git clone https://github.com/luketudge/example_repo.git my_fabulous_repo

Cloning into 'my_fabulous_repo'...
remote: Enumerating objects: 10, done.[K
remote: Counting objects: 100% (10/10), done.[K
remote: Compressing objects: 100% (8/8), done.[K
remote: Total 10 (delta 1), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (10/10), done.


Once we have cloned a repository to our computer, git is aware of it, and will track changes that we make to the files. We can also use git to update our copies of the files if the online repository changes.

We will learn how to do these things in a moment, but for now you can delete the directories that you have just created if you tried out the example `git clone` commands above. A directory containing a git project can safely be deleted in the normal way. Just go to your file explorer and delete them. git then forgets about them.

If you like, you can now use `git clone` to download the materials for the *introduction to Programming* class to your computer. You can find the GitHub page for the class [here](https://github.com/luketudge/introduction-to-programming). The advantage of doing this rather than just downloading the class notes individually or as a *zip* file is that you will be able to update your local copy with a single click or git command whenever you want to make sure you have the latest version (we will see how to do this in a moment).

## Creating a repository

You can use the `git clone` command to clone a repository that somebody else has created, as we just saw. But more commonly at the start of your programming career you will want to create your own repositories in order to manage your own programming projects. You can create a repository using git on your computer and then upload it to the GitHub website, or you can first create the repository directly on the GitHub website and then clone it to your computer. The second of these options is considerably simpler, so we will learn only about this one.

To follow along, you now need to head to the [GitHub website](https://github.com) and log in. Then look for the button for creating a new repository:

![](images/new_repo.png)

GitHub will then walk you through some options for your repository:

* **Repository name**. Name it after your project or program, so people can find it easily.
* **Description**.
* **Public or private**. A public repository is visible to anyone. Usually you will eventually want to make your repository public, so that people can download and test out your program, or so that you can include a link to your work in a job application. But you might want to make it private to begin with, until you are happy with it. You can always make a private repository public later.
* **Initialize this repository with a README**. A [readme](extras/glossary.ipynb#readme) is a file that provides information about a program. You should always include one. We will see in a moment what to write in the README file.
* **Add .gitignore**. We will learn about this in a moment. For a repository that contains a Python programming project, you should select the *Python* option from the dropdown menu here.
* **Add a license**. For a real project that you intend to make public you should always include a license file. This file provides the legal framework for your project and lets people know what they can and can't do with it. [choosealicense.com](https://choosealicense.com) provides some help with choosing a license. For most projects you can choose the 'MIT License', a standard 'permissive' license that allows people to do what they like with your program.

To test out the examples below, just create yourself a new private repository with some wacky name. You can delete it later when you have finished testing things out. Tick the option to include a README, and add a Python .gitignore file and the MIT License file.

Once you have created your repository you will see its main GitHub page, which looks a bit like this:

![](images/github_repo_frontpage.png)

On the front page of your repository, GitHub displays a table of the files that it contains. You can see that the three files we requested when setting up the repository are in there. Let's look at two of these in more detail.

### README

The file *README.md* not only appears not only in the table of files, its contents are also displayed immediately beneath. This is a feature of the GitHub site. If you put a file with the name *README* in your repository, its contents will be shown on the main page. So you can use this file to act as the front page of your repository, explaining the nature of your project, and providing links to the most important files.

At the moment, your *README* file contains only the name of your repository. Let's edit it to see what else we can put in there. You can edit a file directly on the GitHub website via the edit button:

![](images/github_edit.png)

#### Markdown

You may have wondered what type of file the *md* file extension refers to. 'md' here stands for 'markdown'. Markdown is a programming language (of sorts), but an extremely limited one. It consists mainly of normal human-readable text, with just a few extra symbols that instruct a web browser (or some other text display program) to add 'special effects' to the text. A language of this kind, that adds extra symbols to plain text in order to organize how it is displayed, is known as a [markup language](extras/glossary.ipynb#markup).

Markdown can add many features to your README file. The [GitHub guide to markdown](https://guides.github.com/features/mastering-markdown) gives a good overview. The two most useful ones are:

##### Headings

One or more hash symbols `#` at the beginning of a line turn that line into a heading. (Note that this is completely different from the role of the hash symbol in Python.) The more hash symbols, the smaller the heading.

For example:

`# Main heading`

`## Sub-heading`

`### And so on`

##### Links

To turn some text into a clickable link, enclose the text in square parentheses `[]`, then immediately afterwards enclose the address that you would like to link to in round parentheses `()`.

The address can be the address of a web page:

`Learn about the candiru fish [here](https://en.wikipedia.org/wiki/Candiru).`

Or it can be the [relative path](extras/glossary.ipynb#path) to a file in your repository. For example:

`The most important Python module in this project is [important.py](python_files/modules/important.py).`

Try editing your README file to include some headings and a link to the license file. You check whether you have got the markdown right by clicking 'Preview changes'.

To save your changes, click on the 'commit' button at the bottom of the editing page:

![](images/commit.png)

Ignore the other text fields and options above the commit button for the moment. We will learn about these later.

Then the README display on the front page of your repository will look something like this, depending on what headings and text you put in:

![](images/readme_markdown.png)

### gitignore

In a moment you will clone the new repository to your computer, where you can start adding new files, making changes, and so on. Each time you make changes you can send your files back to the GitHub website to store them there. But you might not always want to upload all of the files in your project folder.

In particular, there are certain temporary files that get created automatically when you run Python programs. You might already have noticed these appearing in your working folder after you have been doing some Python work. For example, Python creates a directory called *\_\_pycache\_\_* to store the results of [modules](extras/glossary.ipynb#module) that have already been [imported](extras/glossary.ipynb#import) so that it does not have to run them again if you import them a second time. The contents of \_\_pycache\_\_ directories and other temporary files are not really part of a programming project, so you don't want to publish them to your GitHub repository.

A 'gitignore' file instructs git to ignore certain files and not include them as part of the project. git won't track changes to these files, and it also won't upload them to the GitHub website when you publish your changes. Take a look at the contents of the gitignore file in your repository (you can just click on the file name in the files table). It contains the names of files and folders that git should ignore. You will see that '\_\_pycache\_\_/' is one of the first things that appears in the file.

GitHub provides a ready-made gitignore file full of the names of temporary files that might get created in the course of developing a Python program. It is this gitignore file that we chose when we created the repository. (Other programming languages create other temporary files, and GitHub provides ready-made gitignore files for these too.)

If you are interested in the legal issues involved in developing software, you can take a look at the contents of the 'LICENSE' file too. But for now we are finished with exploring the new repository on GitHub.

## Working locally

The next step is to clone the new repository to your computer and start working on it there. Go to the main page of your repository, copy the cloning link, and clone it to your computer with the `git clone` command, like we did already for the first example repository above.

Now let's see how to work on the repository 'locally', i.e. on your own computer. Find the newly-cloned repository folder in your file explorer. It works entirely like a normal folder. You can copy or save files into it, create new subdirectories, delete files, etc. For this example, you can start by just copying one of your Python programs into the repository folder. For example, you can use one of the solutions you wrote to the exercises in earlier lessons.

You have just made a change to the repository directory by adding this file. Let's now see how to review the change and send it back to the online repository on GitHub.

### Atom

It is entirely possible to work with git using only command line commands, and many people work this way. However, we won't torture ourselves unnecessarily. We will use a graphical interface to git instead. The command line may come in useful later in your career, for example if you have to write a program to automate some git tasks, or if you have to work on a web server computer that has no screen. But for now, don't be a masochist just for the sake of looking like a cool hacker.

The Atom text editor is a text editor produced by the developers of GitHub. As well as editing text files, Atom works with git to let you manage your project by clicking on buttons instead of typing commands. If you have not already installed Atom, you can download it [here](https://flight-manual.atom.io/getting-started/sections/installing-atom). Then go and find it among your apps. The icon looks something like this:

![](images/Atom_1.0_icon.png)

Most of what you see when you start up Atom will look familiar. Like any other editing program (and like the Spyder editor), it has a 'File' menu, from which you can open existing files or create new ones. Go to the **File** menu and select **Open Folder** (*not 'Open File'*), then find the folder that you cloned your new repository into, and open this folder.

You should see a few panels displayed side-by-side. At the left is a panel showing the files contained in the repository:

![](images/atom_files.png)

Atom color-codes files according to what has happened to them since you last saved the state of the project:

* **green**: new file
* **orangey-brown**: file has been changed

From this file menu, you can open and edit files as you would in any other editing program. You can try it out with your new file.

#### Committing changes

git has been tracking changes to your project, and it should have noticed that you have added a new file since you cloned the repository.

In [9]:
cleanup()