# Git, Github, and Jupyter notebooks


#### Version control with Github
http://swcarpentry.github.io/git-novice/01-basics/index.html

![alt text][logo]

[logo]: http://swcarpentry.github.io/git-novice/fig/phd101212s.png "Logo Title Text 2"

Why?
- because files go through many versions and we need a way to track changes and go back to previous versions
- version control is like an unlimited 'undo'
- version control allows many people to work in parallel

Where?
- Track Changes in Microsoft Word docs
- version history in Google Docs
- version history in Dropbox
- version history in Github
    - version control and better at handling conflicting changes

How?
- Version control systems start with a base version of the document and then record changes you make each step of the way.
- Changes tracked are associated with files in a Github repository




#### Create a Github account
If you already have a Github account and it's set up on your local computer, please help your neighbors!

Make account
- Go to https://github.com/

Basic GitHub accounts are free. Create a GitHub account if you don't have one already. Please consider what personal information you'd like to reveal. For example, you may want to review these instructions for keeping your email address private provided at GitHub:
https://help.github.com/articles/setting-your-commit-email-address-on-github/

#### Set up on local computer

When we use Git on a new computer for the first time, we need to configure a few things. Below are a few examples of configurations we will set as we get started with Git:

  - our name and email address,
  - what our preferred text editor is,
  - and that we want to use these settings globally (i.e. for every project).

Git command format: ```git verb options```

#### Configure git

Check configurations: ```git config --list```

There are many configuration settings you can customize to your preferences. Here are some examples:

This user name and email will be associated with your subsequent Git activity. 
```
git config --global user.name 'Vlad Dracula'
git config --global user.email 'vlad@tran.sylvan.ia'

```
It is possible to reconfigure the text editor for Git whenever you want to change it.
```
git config --global core.editor "nano -w"
```
More configuration examples with editors: http://swcarpentry.github.io/git-novice/02-setup/index.html

#### Interaction between Github cloud and your local computer

Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).

![alt text][logo]

[logo]: https://cdn-images-1.medium.com/max/1500/1*exCFWgo1cXpgCBmrwFRUUg.png "Logo Title Text 2"

Helpful links with nice graphics showing the process:
- https://medium.com/@abhishekj/an-intro-to-git-and-github-1a0e2c7e3a2f
- http://swcarpentry.github.io/git-novice/04-changes/index.html



#### Some useful Git commands

Getting help
- general help: ```git --help```
- help with particular git command: ```git config --help``` or ```git config -h```

Clone a repository from Github cloud to local computer:
- ```git clone```

Check status of repository:
- ```git status```
- this command will tell you if there are 
    - files to add/commit/push
    - or you're all up-to-date and nothing to commit

Check differences between current and previous version
- ```git diff```

Pushing changes to Github cloud from local computer:
- ```git status```
- ```git add <file name>```
- ```git commit -m '<description of changes>'```
- ```git push origin master```

Pulling changes from Github cloud to local computer:
- ```git pull```

Ok, now let's use these to create a new repository to store our Jupyter notebooks.

#### Create a new repository



1. Open Github 
1. Click + sign in top right corner next to your user icon to create a 'New repository'
1. Name your repository (e.g., jupyterWorkflow_9Oct18)
1. Give it a description (e.g., jupyter Workflow Example)
1. You can choose to make this repository open to the public or private
1. Check the box for initializing repository with a README file
1. Under 'Add .gitignore', select 'Python'. This will ignore temporary files associated with Python
1. Under 'Add a license', select MIT License.
    1. More information about licenses: http://swcarpentry.github.io/git-novice/11-licensing/index.html
1. Click 'Creating repository'

This gives us a cloud-based back-up where we can save and recover what we're working on.

The repository has a unique web address that you can access by clicking the green 'Clone or download' button, e.g.,
https://github.com/marisalim/stonybrook_juypterworkflow.git

Now, we will clone this repository so that it exists on the cloud and on our local computer. 
1. Open terminal (on Windows machines, open Git bash)
1. Navigate to the directory you set up for Github
1. Enter 'git clone' and then copy/paste your repository URL like this:
    ```
    git clone <your repository URL>
    
    #git clone https://github.com/marisalim/stonybrook_juypterworkflow.git
    ```
    Enter your Github password to complete the clone!
1. Create notebook in repository or move the jupyter notebook you've been working with to this repository
    ```
    mv <your path>/<your notebook name>.ipynb <name of repository>
    
    #mv ../../Dropbox/SCPythonOct8-9/Bike_count_dataset.ipynb ./stonybrook_juypterworkflow/
    ```
1. cd into the repository
    ```
    cd ./stonybrook_jupyterworkflow/
    git status
    ```
    We have 1 untracked file, which is our jupyter notebook. This file exists on our local computer, but not on the cloud...yet! 
    
1. We need to use a series of commands to put this notebook on the cloud too:
    ```
    # add file to staging area
    git add Bike_count_dataset.ipynb
    
    # commit addition and include description
        # descriptions are very useful for telling future you what this file is, if you made any changes to it, etc. 
        # these commit messages add to the power of version control
        # you can go back to previous versions of your files. These descriptions can help tell you which version you want to look at
     git commit -m "Add initial analysis notebook"
    
    # lastly, we push this addition to the cloud
    git push origin master
    # enter Github password to complete!
    
    git status
    # this should show that there is nothing to commit now
    ```

Ok! There should now be a copy of the notebook on your local computer AND on the cloud. Refresh the repository webpage to check!

Open your notebook. Github will render the notebook so it looks nice. Now we have a cloud-based version of this notebook on Github. Locally, we have the git repository where we can track changes and make changes to push to the cloud.

- If you make changes to files locally, you can *push* those changes to the version on the cloud.
- If you make changes to files on the cloud version, you can *pull* those changes down to your local version.

#### Revert to previous version

Information here:
- set up git history configuration

```git config --global alias.hist "log --pretty=format:'%h %ad | %s%d [%an]' --graph --date=short"```

https://githowto.com/getting_old_versions


#### Working with bike data and Github

1. Rerun your local computer version of the notebook (Kernal > Restart & Run All)
1. The Fremont.csv dataset is now in our local repository
1. Data files can be a bit tricky with Github. The data files can be too big to store on Github. Github has a limit on repository size. **The bike data is relatively small so it whould probably be ok.** 

However, it is possible to tell git to ignore datafiles.


```
# Add data set to .gitignore file
ls -a # shows the hidden file, .gitignore
    
# Open .gitignore with text editor
nano .gitignore
    # scroll to bottom of file and add
        
    # data
    Fremont.csv   
    
# push changes to version on cloud
git add .gitignore
git commit -m 'add data to gitignore'
git push origin master
```
This tells ```git status``` to ignore the data file called Fremont.csv. 

Now that we know how to use Git and Github, we're going to edit our bike data analysis script and then push changes to our repository.