## Configuring git

You need to set a username and email in git to make commits. this doesn't have to be the same as your github username


```
> git config --global user.name "Any Name"
> git config --global user.email "<github-email>"
```

## Initializing a repo

You can get a local repo by initializing one on github and cloning it, initializing one locally with `git init` (which then you can add a remote manually). You can clone someone else's repo directly in github, or you can fork it and clone your forked version. If you plan to make changes that you want to live upstream, forking is a best practice (you can't push to someone else's personal github repo).

1. Fork the tutorial repo (https://github.com/gchronis/git_tutorial)
2. Clone it over ssh
    ```
    git clone git@github.com:<your_username>/git_tutorial.git
    ```

### Commits

Git operates not on whole files but on changes to the repository. The current directory state is arrived at by layering a series of changes over one another. Changes to the repository are called commits. 

Sometimes you only want to save a little bit of work at a time. Git has three structures that help you keep track of the most recent edits you've made, the edits you plan to add to the 'history' of the repo and the history that's already saved. 

 <img src="img/staging_area.png" alt="Image">

### Check status of the repo

Right now our working directory is clean

```
> git status
```

### Making Changes

1. first make changes to the repo. We'll add a new file and do some work in it

```
> touch "example.txt"
> echo "some work" > example.txt
```

### Commiting Changes

We've changed the state of the working directory. Run the status command again. The changes in red show changes in your working directory, compared to the git repository. right now the file is untracked.

### git add

Stage the changes for commit. this command adds stuff to the **index** or **staging area**

The star is a wildcard operator
```
git add *
```
OR
```
git add example.txt
```

### git commit

When we have all the changes we want to add, we commit those changes to the repo. Specifically, we commit them to the branch we are on (in this case, master)
```
> git commit -m "I did some work"
```
You can also run the `git commit` command without the `-m` message flag, which will open the default git editor, where you can type a longer commit message.

The `log` command shows you the history of commits on your current branch. Space bar to scroll, `q` to exit.
```
> git log
```

Let's do some more work
```
> echo "moar werk" > example.txt
> git add example.txt
```

Oh no! This overwrote the work we previously did. Luckily we can go back to the state the file was in before we started making more changes to our working directory
```
> git reset --soft HEAD~1

git diff --cached


```

some work
more work

We actually wanted to *add* work. We needed the concat operator, which is two angle brackets
```
> echo "moar werk" >> example.txt
```

Checking the status of the repo now, again we see git has noticed changes to the file. Now, since git is tracking this file, it can show us the difference or `diff` between the file in the working directory and the latest version of the file at the tip of the current branch.
```
> git diff
```

Let's commit these changes and push them upstream to our fork

In [None]:
`HEAD` is a ref that points to the tip of the branch we are currently on

> git commit -m "more work is there now"
> git push origin HEAD

Refresh your fork on github. You should see a button that says 'compare and pull request'. This allows you  to create a PR, which in this case is a request to me to pull the changes that you made into my repo. Let's try it

Everything looks good, we can merge, but there's a typo! We don't want to commmit this work at all. We have two options here.

1. Make another commit that fixes the typo and push it. This will update the PR automatically
2. Undo our last commit. This makes a cleaner history. Sometimes this is absolutely necessary like if you accidentally commit a secret key---you don't want that in the commit history.
3. ammending commits: I love `git commit --amend` which allows you to add more staged changes to the most recent commit. I often do this if I forgot something small and important. 

Let's try option two: rewriting history

### git reset

A soft reset will put the committed changes back into the index. `HEAD~1` is a ref that always points to the commit before HEAD---which is almost always the tip of the current branch)
```
> git reset --soft HEAD~1
> git status
```
You should see your changes. First, fix the typo.

Now when you run `git status` you see the file twice, once in red and once in green. The index (green) contains the version of the file we just 'uncommitted'. The working directory (red) contains the typo fix. We have to re add the file to the index to stage the corrected version.
```
> git add example.txt
> git commit -m "did more work"
> git push origin HEAD
```

Uh oh! We've angered the beast. Git is upset because the history of the upstream repo is different from the history of our local repository. It can't just play or 'fast-forward' our local changes to master on top if the remote branch `git@github.com:gchronis/git_tutorial.git/origin/master`

**QUESTION:** Why is this? How does git compare the history of two branches?

Hint: The answer has to do with SHAs

### Force pushing

We'll have to do a force push. This is a very dangerous tool! Use with caution. Only do this if you are 100 percent confident of exactly what will happen. I'm showing it to you because I trust you. But I also trust you will use safe workflows 100 percent of the time that will ensure you rarely if ever have to do this. 

```
> git push --force origin HEAD
```

The branch should be updated in github and we can now make a happy, clean PR that shows our changes in a logical order that helps others to review our code. 

## BRANCHING WORKFLOWS

Git is at its most valuable, powerful—and therefore most dangerous—when used to collaborate with others. It allows us to maintain a single history of a repository, even as multiple people work in it at the same time, potentially changing the same files. 



Sometimes you are working with others, and everyone has their own upstream fork. Even in this case, somebody's repo is bound to be the source of truth. How does that person make changes to the repo without messing things up for everybody?

Enter branching.

In other scenarios, for instance if you have multiple copies of a repo on a local machine and a remote server, or if you are working with others in an org that has a single shared github repo for a project, all of the local copies will have the same upstream repo. In this case, branching is absolutely crucial. 

The commandments of collaboration in git (in order of importance)

1. **NEVER force push to master** (`git push --force origin master`is forbidden)
2. **NEVER commit to master** (double check with a `git branch` before you commit. are you on `master`?)
3. **ONLY merge into master through fast forward** (i.e. `git merge --ff-only` flag)

If you follow these three rules, you won't ever ever jeopardize the changes you have committed to the master branch. The last rule is hardest to follow, as a fast-forward merge can require you to do a `rebase` if you or other people have made changes to the branch you are merging into.

If you have access to an external server (like the compling gpus) **and have already created SSH keys or have a process for pushing and pulling from github**, feel free to log on there to follow along. Tou can also use two different local repositories to simulate collaboration between two people---sometimes, when you work on the same project in two places, you get into git snafus with yourself. 

## Checkout a new branch

The checkout command switches your branch, or creates a new one with the `-b` flag

```
> git checkout -b lots_of_work
> git branch
```

Now that we are on the 

```
> echo "even more work" >> example.txt
> git add example.txt
> git commit -m "i've done so much work today"
```

A `git log` will show the changes.  You can switch baack to the main branch and run a `git log` and the changes aren't there anymore! Now for the spicy part.

### Clone a second copy of the repo

Open a second tab in terminal

> cd Desktop
> git clone https://github.com/<your_username>/git_tutorial

You can see how easily we can run into problems if everybody did everything on master. Git would get hopelessly confused. 

Let's create a merge conflict. Make another branch where we totally change the `example.txt` file.

```
> git checkout -b refactor
> echo "total rehaul" > example.txt
> git add example.txt
> git commit -m "i changed absolutely everything"
> git push origin HEAD
```

Now, go back to github. You should have two branches that we can compare and make a PR. We're going to make a PR into our own fork. Navigate to the `lots_of_work` branch and make a PR. 

We can merge in our own PR because it's against our fork. Let's do it!

But now we can't merge the second branch! We edited the same file in a way that git can't resolve.

We could fix merge conflicts on github. but this is not recommended. Why might this be?

In [None]:
## git rebase

We are going to rewrite history.

First we need to get the latest changes on our local machine. Master has been updated!

This is what `git pull` is for. Pull is shorthand for `git fetch`+ `git merge`. Fetch usually goes off without a hitch but if there are merge conflicts, git will launch you into an interactive merge. You have to resolve merge conflicts and make a new merge commit. We want to avoid this!

```
> git fetch
```

In [None]:
Do a git log. Where are our changes?

```
> git branch -a
> git checkout remotes/origin/main
> git log
```

When we fetch, the upstream changes are stored in separate branches in our local repo. This is what actually gets merged in when we do a merge. Now switch back to main

```
> git checkout main
> git log
```

Now merge in the changes we fetched from upstream!

```
git merge --ff-only origin/main
```

We can make a git alias for commands we use very often
```
> git config --global alias.ff 'merge --ff-only'
> git ff origin/main
```

In [None]:
You are allowed to do a git pull! But only if you know what is going on.

In [None]:
### now for the rebase
> git checkout refactor
> git rebase master refactor

In [None]:
So, what just happened?

 <img src="img/before-git-rebase-command.png" alt="Image">

 <img src="img/after-git-rebase-command.png" alt="Image">

In [None]:
After a successful develop branch to master rebase:

    The files in the master branch will not change
    The develop branch will additionally acquire all of the master branch’s new files
    The develop stream’s branch point will change.
        It will appear as though the develop branch split after commit E on the master branch.
        Before the rebase, the develop branch split from master at commit C.

# TL;DR: Ideal Workflow

1. Always work off a fork of the repo if it isn't your own
2. Never work on `master`. Always check out a new branch when it's time to start developing a new feature
3. If you are collaborating, keep a branch to a reasonable, comprehensible amount of work.
4. Don't fix merge conflicts in github. DO them on tee command line. 

# Git + Jupyter

I don't use this but I'm going to look into it!

https://www.fast.ai/posts/2022-08-25-jupyter-git.html

Creating a  `.gitignore` file can help with those pesky cache files and anything else you don't want committed like very large files or other people's data. Any files matching the patterns in the file will be ignored by git. My `.gitignore` often looks like this:

```
data/
__pycache__/
.ipynb_checkpoints/
.RData
.Rhistory
.DS_Store
```

## Appendix: Setting up git on remote server (not finished yet)

2. Connect to the Cisco any connect VPN (link to gdoc)
3. log on onto the server 
    ```
    > ssh EID@compling.la.utexas.edu 
    ```
4. configure git with your git username and github email
    ```
    > git config --global user.name "Any Name"
    > git config --global user.email githubemailaccount@company.com
    ```

### Generating an ssh key pair

the private key lives on your machine and the public key lives on git
https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
if you plan on working on a remote machine like the compling server, you should make a separate ssh key pair for that device. Below is the sequence of commands and prompts that makes the key 

```
(base) [gsc685@phyl-ling-p01 ~]$ mkdir ~/.ssh
(base) [gsc685@phyl-ling-p01 ~]$ cd .ssh/
(base) [gsc685@phyl-ling-p01 .ssh]$ ssh-keygen -t ed25519 -C "gigichronis@gmail.com"
Generating public/private ed25519 key pair.
Enter file in which to save the key (/home/gsc685/.ssh/id_ed25519):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/gsc685/.ssh/id_ed25519.
Your public key has been saved in /home/gsc685/.ssh/id_ed25519.pub.
The key fingerprint is:
SHA256:4CwyZq4roMJzSSjiAjklwcYPdJo+KSpxHsJPRWg8Yk0 gigichronis@gmail.com
The key's randomart image is:
+--[ED25519 256]--+
|+.+E..           |
|.B+*.            |
|o+= ...          |
|+ o..o .         |
|+OO.. o S        |
|XX+= .           |
|@.+..            |
|*= o             |
|*.o              |
+----[SHA256]-----+
```


Restart the ssh agent and add the key to it

```
(base) [gsc685@phyl-ling-p01 .ssh]$ eval "$(ssh-agent -s)"
Agent pid 652299
(base) [gsc685@phyl-ling-p01 .ssh]$ ssh-add ~/.ssh/id_ed25519
Enter passphrase for /home/gsc685/.ssh/id_ed25519:
Identity added: /home/gsc685/.ssh/id_ed25519 (gigichronis@gmail.com)
```

Exercise - branching 
fork repository
clone it to your local environment


CHEAT SHEET



In [None]:
LINKS

important concepts (index, HEAD, etc.): https://www.tutorialspoint.com/git/git_basic_concepts.htm


