# Intro to Github

FISH 510

Week 8 Class

<br>
**References**: 

"Bioinformatics Data Skills, Chapter 5: Git for Scientists". 

Github's [Online Reference Guide](https://git-scm.com/docs)

Github's [cheatsheet](https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf)


<br>
**Topics Covered**:

1. Creating a Github account online
2. Installing Github on the desktop and the Git GUI
3. Creating and cloning repositories
4. Repository-related functions online
5. Committing
6. Interacting with other Git users
7. Branching
8. Publishing and Privacy

## Getting Setup: Create an Account, Install Git

Sign up for github [here](https://github.com/join?source=header-home). 

Then, [download the desktop version](https://desktop.github.com/). 

If you are working on WINDOWS, then you'll also want to [install Git Bash](https://git-scm.com/download/win). 


<br>
<br>
## Basics of repositories

### Creating repositories

A repository is like a root directory or a filing cabinet. It will contain all of the files 
Repositories can be created either on your desktop or online. We'll go through both ways here, by making two practice repositories. You can delete them later. 
<br>

** Creating an online repository **

When working with bioinformatics data: You'll want to initialize the repository with a README file AND with a .gitignore file. In the online repository this can be done by simply checking a button. In the desktop version, you'll have to create it yourself. 

*option 1*
- navigate to your profile
- go to the tab 'repositories'
- select the green "NEW" button


*option 2* 
- in the upper right corner of the screen, there is a `+` button
- on the dropdown menu, select "New Repository"


<br>
** Creating a repository on the desktop **

*option 1*
- in the Github GUI, select the `+` in the upper left corner

*option 2*
- in your file directory, create a new folder and assign it the repository name. 
- open your terminal or git bash window and navigate to that directory
- from within that folder, use the following code to initiate the repository: 
`git init .`


In [1]:
pwd

u'/mnt/hgfs/Pacific cod/DataAnalysis/PCod-Korea-repo/notebooks'

In [2]:
cd ../../../

/mnt/hgfs/Pacific cod


In [3]:
!mkdir practice_repo

In [4]:
cd practice_repo

/mnt/hgfs/Pacific cod/practice_repo


In [None]:
!git init .

<img src = "initialize_git.png">


<br>
** Including a README **
- create a text file called: README.md in the root folder of your repository
- Best Practices - what to include: overall description of the project, all programs and the version of those programs that you'll be using, an outline of the folders in the repo and what they contain. Here is [an example](https://github.com/mfisher5/mf-fish546-PCod/blob/master/README.md)

*Discussion: what makes a good README? what do collaborators find most helpful? *
<br>

** Including a .gitignore**
- open an empty text file in your repository >> save as >> file name: .gitignore >> file type: "all types"
- if your computer won't let you save as anything other than a text file, then you can save it with the extension of a text file and then manually remove the extension by renaming it in your folder browser. 
- the .gitignore file will show up as an unnamed file in your repository, with a text file icon. 


<br>
<br>
### Cloning repositories

We now have two repositories, one on the desktop and one online. Let's start by getting both repositories online, and both repositories on the desktop. We'll work mostly from the command line here, but you can also do this through the github GUI. 

<br>
** Cloning from online to the desktop **
- in your browser, navigate to the main page of the repository
- select "clone or download", and then copy the URL
- in the command line, navigate to the directory where you want to add your repository and use the following code: 

`git clone $COPIED LINK`



<img src = "clone_git.png">


In [6]:
cd ../

/mnt/hgfs/Pacific cod


In [7]:
cd FISH_510

/mnt/hgfs/Pacific cod/FISH_510


In [8]:
ls

[0m[01;32mREADME.md[0m*


** Cloning from the desktop to online **

My practice repository on the desktop has a `README.md` and a `.gitignore` file. Note that the `.gitignore` file doesn't show up when you list the files on the command line, but it is still there

In [9]:
cd ../practice_repo/

/mnt/hgfs/Pacific cod/practice_repo


In [10]:
ls

[0m[01;32mREADME.md[0m*


- create a new repository online with the same name, but without the README or .gitignore files
- when the "quick setup" page pops up, copy the remote repository HTTPS URL
- switch back to the command line on your desktop
- the steps are below; we'll go more into what they mean in the next section

<br>
*Checking your repository status*
- at the command line, navigate to your repository folder. Enter 

        `git status` 


<br>
*Tracking files*
- in order to move files between your desktop and online, the files must be "tracked." Right now, Git is not tracking your files. To make Git track all of your files, use the command 

        `git add .` 
        
- if you only want to track certain files, you can instead use the command 

        `git add README.md`



*Committing files* 
- now that you've started tracking your files, you can upload them to the remote repository using the command 

        `git commit -m "First Commit" `
 

<br>
*Setting up the remote repository*
- before you can push these changes to your remote repository, you have to tell Git where that repository exists. you also want to set that remote repository as the "origin", so that when you push or pull changes from that remote repository in the future, you can always refer to the remote as "origin". You can do this with the command 

        `git remote add origin` $ONLINE_REMOTE_URL, where the URL is the one you copied above. 

- verify that the remote exists using 
        `git remote -v`


*Pushing to the remote*
- once you have committed the changes, you can "push" these changes to your new remote repository with the command 

        `git push origin master` 
        

<br>
In the git bash terminal, this process looks like: 
<img src = "clone_desktop_to_online.png">

### Deleting a Repository

You now have two working repositories. For the rest of this demonstration, you'll only need one. You can go ahead and manually delete one of those repositories from both your online profile and your computer. 

<br>
An important note about deleting repositories: These steps **permanently** delete **everything** related to that repository - that includes the repo, wiki, issues, and comments. If you are   working with a private repository, you will also delete all of its related forks. 

<br>
<br>



## Committing

While cloning our repository from our desktop to our computer, we've already gone through the process of committing changes to a remote repository. But we'll review it here in more detail. 

Committing essentially allows you to copy the local changes you make on your desktop to the remote repository online. This flowchart, from the book *Bioinformatics Data Skills*, gives a good overview of the process. 

<img src = "commit_flowchart.png">


### How to commit changes made on the desktop 

**Exercise**: we need some changes to send to our remote repository. Edit your README.md file on your desktop to describe your latest project or, if you're familiar with markdown, include a picture of your study organism.  
<br>

Working at the command line, from within your git repository: 

<br>
`git status` : Describes the current state of your repository. It (1) Tells you which branch you are working on. The default branch is the "master." (2) Gives you information about the files in your repository, including which files have changed since your last commit, which files you are tracking, and which files you are not tracking. 

<br>
`git add`: This has two functions. The one we see here is that it tells git that it needs to start tracking files. Git *will not* submit changes to your remote repository unless they are being tracked. You can tell git to track *all* files with `git add .`, or to track only specific files with `git add README.md`. 

What if we accidentally tell git to track a file that we don't want tracked? Use the command `git rm README.md`: to tell git to stop tracking that file. 

The second function of `git add` is to stage changes made to an already tracked file. This comes up when you have already gone through the commit process once, have made changes to a file, and are now going back to commit the new changes. When you enter `git status`, it will show you something like this: 

<img src = "git_add.png">

What is happening here? There is actually a subtle difference between *tracked* files, and *tracked files with staged changes*. Git will continue tracking changes made to a file after you first tell it to with `git add`, but these new changes will not be included in the next commit unless they are staged. 

<br>
`git commit -m "your message here"`: commits all of the tracked files with staged changes to be pushed to the remote repository. You can customize the message to help you keep track of what was included in that commit. 

What if you commit changes that you don't want to push to your remote repository? Use the command `git reset`. 

<br>
`git push origin master` : This takes all of the changes you committed and pushes them from the working branch of your local repository ("master") to your remote repository ("origin").


<br> 
** The shortcut: ** If you, like me, are not a complex and intense github user, you will probably always want to commit every change made to every file. You can use the following shortcut to track file changes, stage them, and then commit them to be pushed in one command:

`git commit -a -m "your message here" `

<br>
<br>
### How to pull changes made online to the desktop

**Exercise** : use `git status` to make sure that you have no new changes on your local repository. Online, edit your README.md file to include your name, lab, and the initialization date. 


`git pull`: Since we assigned our remote repository as the "origin", and told git to track the remote master branch with our `git clone` command, this automatically merges changes from the remote repository with our local repository

`git fetch origin` or

`git fetch $remote name ` : This downloads any new work that has been pushed to the remote repository since you last cloned, or pulled, from it. However, it does not automatically merge these changes with your local repository. This is a good alternative when you are working on a repository with several collaborators, or if you are dealing with a merge conflict. 


<br>
<br>

### Troubleshooting: Merge Conflicts

A **merge conflict** occurs when you make changes to both your remote repository and to your local repository, and you then try to push these changes one way or the other. The error message looks something like this: 

<img src = "merge_conflict_error.png">


Some best practices to AVOID having to troubleshoot a merge conflict: 
- try to make changes only on your desktop; avoid making changes both online and on your desktop
- unless you have a specific reason not to, add >> commit >> push the new versions of all of your files EVERY TIME you make changes to them

<br>
<br>
You have several options to help you resolve a merge conflict: 


1. Use `git status`: Will inform you which file in your local repository has a merge conflict, which you can then manually check out.
    
    - navigate to that file
    - search the file for the conflict marker `<<<<<<<`. 
    - changes from the base branch are marked after the line `<<<<<<< HEAD`. 
    - next, you'll see `=======`, which divides your changes from the changes in the other branch, followed by `>>>>>>> BRANCH-NAME`
    - Decide if you want to keep only your branch's changes, keep only the other branch's changes, or make a brand new change, which may incorporate changes from both branches. Delete the conflict markers `<<<<<<<`, `=======`, `>>>>>>>` and make the changes you want in the final merge
    - stage and commit your changes, then push them to the remote. 
    

2. `git diff` : this let's you see the differences between the files in your local repository and what has been staged for commit. If nothing is staged for commit, it shows the difference between your last commit and the current versions of the files. 
    - after receiving a "merge conflict" message, you may have to use `git reset` to unstage the changes
    - use `git diff` to see the changes you have made to your files in your local repository. It will show the conflict markers described in option one, just in the terminal. 
    - go into the file and delete the changes
    - stage and commit your corrected file. 


<br>

And if all else fails, you can delete either the remote or desktop repository and clone again. 


**Exercise: ** Create and solve a merge conflict. Make an edit to your README.md on your desktop, and then, without pushing the changes to the remote, make an edit to your README.md online. Then try to push or pull your changes. 


<br>

<img src = "merge_conflict_corrected.png">

<br>
<br>

## Back to the Basics: Exploring Github


Your profile 

Your repositories 
- *Issues* let you ask questions or chat about certain items in the repository. You can post issues for everyone to answer, or request that certain people respond. 

**Exercise** : Find my profile and go into my `FISH_510` repository. You'll see that I have created an issue - respond to it! Then try creating your own issue. 

- *Wiki* is like a README.md file but is more flexible. For a good example of a wiki, check out Stephen Roberts' for his [bioinformatics class](https://github.com/sr320/course-fish546-2016/wiki)

**Discuss** : What are some of the most important functions / pages on Github that you see? 


<br>
<br>

## Branching



The branching structure of Github allows you to

1. Make experimental changes to a project without adversely affecting your main body of work (**make branches within your own repositories**)

2. Make collaboration easier (**make branches of other users' repositories**)



<br>
### Branching how to



<img src = "branching.png" >


<br>
<br>
**Local use**

Some nomenclature: *master branch* is the original version of your repo. *new branch* is the... new branch. 

1. From within your git repo, add a branch at the command line with 

    `git branch [new branch name]`
<br>
<br>
2. You'll need to instruct github to switch you from the master (which should be starred in the terminal) to the new branch. 

    `git checkout [new branch name]`
<br>
<br>
3. You can now change any files, stage the changes, and commit them to the new branch. 

4. When you push these changes to your remote repo, Github will register that you have two branches in your repository. 

5. If you want to merge the branches, you want to first switch to the branch you want to move the changes *into* 
<br>
<br>
    `git checkout master`
   
    `git merge [new branch name]`
    

**Exercise:** First, make sure that your local repository is up to date with your remote. Then, create a new branch called "new-readme". From within this new branch, make a change to your README.md file. Then merge the "new-readme" branch back into the master branch. 

<br>
<br>
**On Github **

Some nomenclature: *your repo* is the copied version of the original repository that you are editing. *creator's repo* is the original repository that you created a forked branch from. 

1. copy someone else's repository by creating a "fork". This essentially takes their entire repository and copies it to your profile. You can now edit the contents of your repo, either online or on the desktop. 

2. when you push commits from your local files to the remote for this repo, it *will not change the creator's repo*. The only copy that changes is yours. 

3. if you want to merge your changes in your repo with the original creator's repo, create a pull request. this lets the creator know that you have made changes that you think they should consider. Pull requests can also be used as a way to open up a conversation about an issue or share a developing feature. 

4. the creator can either accept your pull request and merge your repo's changes with their original repo, or reject the pull request. 

<br>
*Troubleshooting: What if changes have been made to the creator's repo since the last time you created your own branch? *

<br>


### Exercise: Putting it all together

In this exercise, you'll create your own working branch of someone else's practice repository, add a Jupyter notebook to your copy of the repository on your desktop, push the changes to your remote repository on Github, and then submit a pull request asking the other person to merge your changes with their original branch. 


1. Using the github search bar and the "users" tab, find the profile of the person next to you. 

2. Create a "fork" of their practice repository. 

2. Clone your newly copied repository onto your computer. 

3. Access the repository from either the command line or your folder browser. create a directory called "notebooks"

4. At the command line, open up Jupyter notebooks in the "notebooks" directory. 

5. Create a new python [default] notebook. Title it "FISH_510"

6. You can save and close the notebook. 

7. Open the README.md text file, and add a line which says "Program versions: Jupyter ___"

7. Push your changes to your remote repository

8. When you go to your main repo page, you'll notice that it now says "This branch is 1 commit ahead...". Create a pull request to suggest that the other person incorporate your changes into their original repository. 

9. Explore the "Open a pull request" page while you're there. 

10. Housekeeping: delete your branch of their repository. 



<br>
<br>
<br>
### Branching and workflows

It's important to note that there are MANY DIFFERENT TYPES OF WORKFLOWS on Github because of the branching structure. Someone wrote [an entire blog](https://www.atlassian.com/git/tutorials/comparing-workflows#forking-workflow) about how and when to use each type of workflow. 

<br>
**centralized workflow**

The textbook discussed the use of the **centralized workflow** as a good way to collaborate, which was described as collaborators taking turns editing the contents of the repository. This works well when you carefully manager when certain collaborators are editing the repo, or if different collaborators are editing different parts of the project. However, using a shared central repository this way can lead to merge conflicts. 

*Troubleshooting: What if changes have been made to the shared repository since the last time you created your own branch? *

`git pull --rebase origin master`
You can tell git to incorporate these "upstream" changes into your repository, which essentially pulls the entire commit history since you last updated your repository, and then tries to integrate it with your local changes. 

If you find that the "upstream" changes occurred in the same files where you made your local changes, you'll have to go through the process of troubleshooting the merge conflict, which you can do using the same protocol discussed previously. 

<br>
**Other workflows** include **featured branch**, **gitflow**, **forking**. 


<br>


<br>
<br>
<br>
## Working with your commit history