<h1><center>Foundations of Data Science</center></h1>
<h3><font color='grey'><center> Delving Deeper into the Commandline and Git</center></font></h3>
<br>
<h4><center> Lecture 2 </center></h4>

**Last Time**

- Thought through why version control and documentation is useful
- Walked through basic use of the command line
- Began using Git + Github

<br>

**This Time**
- Continue expanding our knowledge of the commandline 
- Using branches in Git
- Using the REPL and Jupyter Notebooks for coding in Python

<h1><center>Back to the Commandline</center></h1>

<h2><center>Commandline Refresher</center></h2>

- `pwd`: check working directory
- `cd <path>`: change working directory
    + `cd ..`: go back to the last directory
    + `cd  `: go to the top directory
    + `cd -`: go back to where you once where
- `ls`: list all files in the working directory
- `mkdir <dir name>`: make a directory
- `mv <old path> <new path>`: move file from old path to new path
- `cp <old path> <new path>`: copy file from old path to new path

<h2><center>Commandline Refresher</center></h2>

- **Making a file**:
    - `touch <file name>`
    - `echo 'text' > file`
    - Other Ways
        - `printf 'text' > file`
        - `cat > file`
        - `nano file` (+ `text` + [ctr + w] + enter + [ctr + x])
        - `vi file` (+ `i` + `text` + esc + `:wq`)
        
        
        
- **Editing a File**
    - `open file`
    - `nano file` (+ `text` + [ctr + w] + enter + [ctr + x])
    - `vi file` (+ `i` + `text` + esc + `:wq`)
    
    
    
- **Renaming a File**
    - `mv <old file name> <new file name>`


<h2><center>Commandline Refresher</center></h2>

- **Help**:
    - `man <command name>`
    - `<command name> -h`

<h2><center> Break an Ongoing Process </center></h2>

- **<span color='#c1261b'>STOP</span> whatever it is you're doing**
    - `ctr + c`: stops current execution.
    
For example,

```bash
echo '
while True:
    print("-")
    print("--")
    print("---")' | 
python3
```
This will run forever or until our memory is full. Let's stop it.

<h2><center>Viewing Files</center></h2>

- `cat file`: print the entire file
- `less`: view the output as "pages"
    - `:n` =  next page (or space),`:p` = past page, `:q` = quit page
- `head`: view the start of a file to some $N$ number of lines
    - `head -n 3 file`
- `tail`: view the end of a file to some $N$ number of lines
    - `tail -n 3 file`

<h2><center>Finding Files</center></h2>

- **Searching within Files**
    - `grep`: selects lines according to what they contain.
        - Relevant arguments
            - `- c`: print a count of matching lines rather than the lines themselves
            - `- h`: do not print the names of files when searching multiple files
            - `- i`: ignore case (e.g., treat "Goose" and "goose" as matches)
            - `- l`: print the names of files that contain matches, not the matches
            - `- n`: print line numbers for matching lines
            - `- v`: invert the match, i.e., only show lines that don't match
- **Liberal use of `TAB`**
    + `open my_fi` + TAB (to complete all files in the working directory that match that file)
- **Wildcard operator** (or fuzzy match)
    + `open *.pdf`

<h2><center>Piping</center></h2>

- `<commands on the left side> | <get passed to commands on the right>`    
    
```bash
echo 'cat cat cat 
dog dog goose cat' |
head -n 1
```
```
cat cat cat
```


<h2><center>Variable Assignment</center></h2>


+ `variable=value`: no space between the variable value and assignment name
+ `$` + `variable`: to access the value within 

Example,

```bash
x=4  
echo x
echo $x # Use the $ to call to the values within the variable
```

<h2><center>Aliases</center></h2>


- allows a user to create simple names or abbreviations for commands no matter how complex 
- use those abbreviations in the same way that ordinary commands are used
- recognized only by the shell in which they are created/
    
Example 1: build an alias to make shortcuts to our desktop and dropbox
```bash
alias my_desk='cd ~/Desktop/'
alias my_drop='cd ~/Dropbox/'
my_desk
my_drop
```

Example 2: pretty out of the git log to see time line
```bash
alias gstory='git log --oneline --decorate --all --graph'
gstory # Assuming we are in a git directory
```

<h2><center>Functions</center></h2>

- Similar to aliases, we can make abbreviated commands that help us work on the commandline
- again, recognized only by the shell in which they are created
    
Example: build a function that makes shortcuts to our desktop and prints a message telling use where we are going.
```bash
my_desk () {
    cd ~/Desktop/
    echo "Heading to the Desktop"
}
my_desk
```

<h2><center>Functions + arguments</center></h2>


- order to the arguments subsequently passed specified by integer values
- order can be re-arranged within the function
    
Example: build a function that prints $n$ number of lines from a file
```bash
pp () {
    head -n $2 $1
}
pp text.txt 4
```

<h2><center> Running Shell Scripts </center></h2>

- We can construct programs that we can then pass into the shell.
- `.sh` file type
- `source` or `bash` (depending if one is running bash)

    
Example: {my-prog.sh}
```bash 
echo "alias my_desk='cd ~/Desktop/'" > my-prog.sh
source my-prog.sh
my_desk
```

<h2><center>Commandline &rarr; A Deeper Dive</center></h2>

**The point of it all...**
1. Understand file paths on your computer 
2. Serves as a common hub from which to work
3. Reproducible sequence
4. Streamline work flow
    + set projects up
    + work between languages
    + batch process heavy loads
5. Vital when speaking to a computing cluster, working on a virtual machine, or ssh-ing into a local computer

<h1><center>Version Control with Git</center></h1>

<h2><center>Git Refresher</center></h2>

- &rarr; `git init`: start a new repository from a working directory
- &rarr; `git clone <url or location to repository>`: clone an existing repository
- &rarr; `git status`: get the current status of the repository.


- &rarr; `git add <file>`: stage a file to be committed 
- &rarr; `git add .`: stage all files to be committed
- &rarr; `git git reset HEAD <file> `: un-stage all files to be committed


- &rarr; `git commit -m "some message"`: commit staged changes to repository 
- &rarr; `git commit`: commit staged changes to repository (will be prompted to leave a message)

<h2><center>Git Refresher</center></h2>

- &rarr; `git fetch`: download recent changes in the remote repository (but do not explicitly merge with your local version)
- &rarr; `git pull`: download recent changes in the remote repository and merge with your local version)


- &rarr; `git push`: push commits to remote (e.g. github repository)

<h2><center>Git Refresher</center></h2>

**Getting Help**

- `git help <verb>`
- `man git-<verb>`
    
```bash
git help log 

# or 

man git-log
```

<h2><center>Git Config </center></h2>

Recall that we are generating a historical record on the project, so we want to know who is who when reviewing the changes. To this end, let's generate 

- `git config --list`: list off all your configured settings


- `git config [args]`
    - `--system`: settings for every user on your computer.
    - `--global`: settings for every one on your projects.
    - `--local`: settings for one specific project.
    - e.g. `git config --list --local`


- **Set up your identity**
    - `git config --global user.name "myname"`: Set up a user name
    - `    git config --global user.email your-email@georgetown.edu`: Set up a user name

<h2><center> Reviewing Commit History </center></h2>

- `git log`: look at the commit history
    + Contains a range of useful arguments:
        + `--oneline`: view a condensed summary 
        + `--all`: view the entire commit history
        + `--graph`: view a text graph of the commit sequence
        + `--stat`: abbreviated stats for each commit
        + `--since=2.weeks`: review commits within some temporal range
- Easily format the log
    + `git log --pretty=format:"%h - %an, %ar : %s"`
    + see [Git Basics on Viewing the Commit History](https://git-scm.com/book/en/v2/Git-Basics-Viewing-the-Commit-History) for more insight into the different possible configurations and customizations

<h2><center> Tracking Differences </center></h2>

- `git diff` &rarr; explore the differences between files
    - `git diff <commit 1> <commit 2>`
    - Use the hash hexidecimal code to compare commits
        + e.g. `git diff 44d14b2 2adbea3`
- `git whatchanged`

<h2><center> Tracking Movement </center></h2>

- **Move files around so that the git history is retained**
    - `git mv old-file-location new-file-location` 
    
    
- **Rename files so that the git history is retained**
    - `git mv old-file-name new-file-name` 


If we were to just rename or move a file, Git doesn't necessarily know that it was already tracking that file.

<h2><center> Time Traveling </center></h2>

- **Move to prior snapshots of the project**
    - `git checkout <commit-hash>` 
- **Revert the project to a prior point**
    - `git revert <commit-hash>` 

<h2><center>Git 
    Branches </center></h2>

- A branch in Git is a lightweight, movable pointer to a commit.
- Default branch is named "**_master_**"

**Create new branch**
```bash 
git branch <name-of-new-branch>
```

**Checkout a branch**
```bash 
git checkout <name-of-branch>
```

**Do both simultaneously**
```bash 
git checkout -b <name-of-new-branch>
```

<h2><center> Branches </center></h2>

**Merging branches**
```bash 
git merge <name-of-main-branch> <name-of-branch-to-be-merged>
```

**Deleting branches**
```bash 
git branch -d <name-of-branch>
```

**Seeing Last Commit on each branch**
```bash
git branch -v
```

Let's visualize this process &rarr; [Visualize Git](http://git-school.github.io/visualizing-git/). Let's think of how branching can serve our work flow needs.

<h2><center>Git  &rarr; Branches &rarr; Merge Conflicts </center></h2>

Sometimes there are conflicts between branches that we are intending to merge. Say you changed the same part of the same file differently in the two branches and then try to merge those two branches together, Git won’t be able to merge them cleanly.

Git adds standard <u>**conflict-resolution markers**</u> to the files that have conflicts, so you can open them manually and resolve those conflicts.


```
<<<<<<< HEAD
Here I'll change the text. Make a different change.
=======
Here I'll change the text. Some more.
>>>>>>> new-branch
```


There are large number of conflict approaches. **The point is that Git is very careful to force you to check when and where discrepancies exist and resolve them yourself.**

<h2><center> Remotes </center></h2>

- **Git Remote**:
    - &rarr; `git remote add origin https://github.com/user/repo.git`  
    - &rarr; `git remote add <name-of-our-remote> <REMOTE_URL>`  
    - We can add another remote to say another git repository service, like [bitbucket](https://bitbucket.org/). We'll push and pull this remote to a specific "branch", as our main branch is already linked to Github
    
    
- **Looking at our different remotes**
    - `git remote`
    - `git ls-remote`
    - `git remote -v`: shows the URL of the remotes

<h2><center> Remotes </center></h2>

- **Fetching from a remote**
    - `git fetch <remote-name>`
    
    
- **Pushing changes to the remote**
    - `git push -u <remote> <branch>`: telling it which remote we are pushing to.
    - `git push -u origin master`: telling it which remote we are pushing to.

<h2><center> Remotes </center></h2>

- **Inspecting Remotes**
    - `git remote show origin`
    - `git remote show`
    
    
- **Renaming Remotes**
    - `git remote rename origin my-go-to-remote`
    
    
- **Removing Remotes**
    - `git remote remove <remote-name>`


<h2><center>Git  Ignore </center></h2>

Sometimes we do not want to track certain file types. 

For example, really big data sources that we wouldn't want to push to the repo, or log files that are constantly being updated everytime we run some process, but are otherwise meaningless. 

We can exclude these files by adding a `.gitignore` file to our project folder.


```bash
echo '*.ipynb_checkpoints *.Rdata' > .gitignore
```

<h2><center>Git  Aliases </center></h2>

Much like making aliases in the shell, we do so with git to make our lives easier

```bash
git config --global alias.unstage 'reset HEAD --'
git config --global alias.last 'log -1 HEAD'
git config --global alias.pg 'log --oneline --graph --decorate --all'
```

<h2><center>Git Graphical User Interfaces </center></h2>

Keep in mind that there are graphical ways to probe a repositories record:

- **Github record** online
- `gitk`: initiated in the commandline.
- [Github desktop](https://desktop.github.com/)
- Plug-ins
    + R Studio has a great git interface
    + Atom and the like

<h3><center>References</center></h3>

Wording for some slides were pulled from: Scott and Ben Straub. (2014). ‘Pro Git’. Ed. 2: https://git- scm.com/book/en/v2