diff --git a/using-python-for-science/_toc.yml b/using-python-for-science/_toc.yml index 5c8eebb..ea3213c 100644 --- a/using-python-for-science/_toc.yml +++ b/using-python-for-science/_toc.yml @@ -12,6 +12,11 @@ chapters: # total time 15 minutes - file: editors-and-ides # TODO jni # Total time: 15 minutes +- file: git-and-github + sections: + - file: git-installation-and-setup + - file: git-working + - file: github - file: essential-libraries-for-science # DONE # Total time: 20 minutes split into sections # diff --git a/using-python-for-science/content.md b/using-python-for-science/content.md deleted file mode 100644 index 0f6aca7..0000000 --- a/using-python-for-science/content.md +++ /dev/null @@ -1,5 +0,0 @@ -Content in Jupyter Book -======================= - -There are many ways to write content in Jupyter Book. This short section -covers a few tips for how to do so. diff --git a/using-python-for-science/git-and-github.md b/using-python-for-science/git-and-github.md new file mode 100644 index 0000000..434b822 --- /dev/null +++ b/using-python-for-science/git-and-github.md @@ -0,0 +1,64 @@ +# Git and GitHub + +## What is Git and why do you want it? + +Git is a _revision control management system_[^git][^gitbook]. It helps you +keep track of changes to your files. You've probably had the following thing +happen to you before: + +![PhD Comics: +Final.doc](http://www.phdcomics.com/comics/archive/phd101212s.gif) + +Your authors certainly have: + +![Revision hell](images/revhell.png) + +Notice: + +- Different versions of the same document have different filenames. +- The date and author of a change are encoded haphazardly in the filename. +- Serial edits by different authors are concatenated in the filename, with no + indication as to whose edit came first. +- These are just five files in one directory; other revisions are probably + scattered around my various hard drives and as email attachments. + +This is known as _revision hell_, and it happens when you don't use _revision +control_ (the technical term for the category of software git belongs to), even +if you are working alone. Revision control is like a *time machine* for your +files. + +There are (at least) three stages of git enlightenment: + +1. Maintain a linear history for a file or group of files. +2. Use branches effectively to group related changes together, as well as + maintain parallel versions. +3. Use branches and pull-requests to collaborate effectively. + +It can take a long time to pass all three, and we hope this section will help +you get there faster. + +A thing to keep in mind while you're learning: time travel is *hard*, and git +is harder, because it combines time travel and parallel timelines with, shall +we say, a not-amazing user experience. Although changes to git in recent years +have improved this last point, almost everyone often finds themselves searching +for just that right git command on the internet. + +Now, after reading that, you might wonder, "why would anyone want to learn +this?" + +The answer is that time travel is an incredible superpower. Once you are +comfortable with git (and your new powers), you will be much more fearless in +your programming. This lack of fear lets you try out different things quickly, +and the resulting rapid feedback is key to improving your skills in scientific +programming. + +This doesn't even take into account the benefits of collaborating with fellow +time travellers, which, as you'll see in {doc}`github`, is essentially life +changing. + +## References + +[^git]: http://git-scm.com/ The Git homepage. A plethora of resources. + +[^gitbook]: http://git-scm.com/book The Git book. Your starting point for all + git knowledge. diff --git a/using-python-for-science/git-installation-and-setup.md b/using-python-for-science/git-installation-and-setup.md new file mode 100644 index 0000000..b3cd55f --- /dev/null +++ b/using-python-for-science/git-installation-and-setup.md @@ -0,0 +1,98 @@ +# Git installation and setup + +To work with this tutorial, you're going to need a few things: + +- **Git**, of course. Install this by going to the git homepage, + [git-scm.com](http://git-scm.com). On Linux, you probably already have git, + or you can install it with `sudo apt install git-all` or + `sudo yum install git`. +- **A graphical git client or browser**. This lets you visualise your git + history more easily, and understand the concepts behind git better. For a + full list of clients, see [here](http://git-scm.com/downloads/guis). On Mac, + we recommend [Git Tower](https://www.git-tower.com), which is paid software, + but free for students and academics. The cross-platform + [GitKraken](https://www.gitkraken.com) works on Mac, Windows, and Linux and + is free for local use and for use with public repositories. +- **A text editor**. We recommend Microsoft Visual Studio + Code](https://code.visualstudio.com/). To set up code as your + default git text editor, type + `git config --global core.editor "code --wait --disable-extensions"` into + your terminal. Note: programs like Microsoft Word or TextEdit are *not* + valid text editors here because they don't produce plain text files, but + rather more elaborate file formats that include text formatting information. + For more on text editors, see {doc}`editors-and-ides`. +- **A GitHub account**. Create an account by going to + [github.com](https://github.com). You can alternatively use + [gitlab.com](https://gitlab.com), though the screenshots and exact buttons + won't match. But the concepts and workflows are the same. +- **SSH keys to access GitHub**. Without these, you will need to type your + GitHub password every time you try to do read from or write to your + GitHub account. (Which will be many, many times! ;) Follow the instructions + [here](https://help.github.com/articles/generating-ssh-keys/), making sure + that you are seeing the instructions for your OS (Mac, Windows, or Linux). + +## Notes + +When typing a passphrase, it might seem that the keyboard isn't working. +However, this is just a security feature (similar to the `*`s you might see +when typing a password on the web). Just go ahead and type the passphrase, +then repeat it as requested. + +**For Windows users**: Windows does not have an ssh agent running in the +background by default. If you see the error: + +```console +ssh-add ~/.ssh/id_rsa +Could not open a connection to your authentication agent. +``` + +you will need to use this command to start the ssh-agent: + +```console +eval `ssh-agent -s` +``` + +(Be careful to use the proper backtick symbol, usually just above the "Tab" +key on most keyboards; NOT the single quote/apostrophe character.) + +Then type: + +```console +ssh-add ~/.ssh/id_rsa +``` + +(You might need to change the filename from `id_rsa` to the whatever you used.) +See [this StackOverflow answer](http://stackoverflow.com/a/17848593) for more +info. + +You need to keep the window on which you launched the ssh-agent open. + +## Setup + +Additionally, you'll want to set up git so that it knows your full name and +email address. Fire up a console/terminal, and type: + +```console +git config --global user.name "Your Name" +git config --global user.email your.name@email.com +``` + +(Use the same email you used for your GitHub account.) + +The following command also lets you see a rudimentary graphic of your history +without needing a GUI git client: + +```console +git config --global alias.lsd "log --graph --decorate --pretty=oneline --abbrev-commit --all" +``` + +Then you can get a nice history *within your terminal* by typing: + +```console +git lsd +``` + +--- + +Whew! That's quite a lot of stuff! But I hope by the end of the tutorial you'll +find it all useful and worth getting! (Plus: free stuff!) diff --git a/using-python-for-science/git-working.md b/using-python-for-science/git-working.md new file mode 100644 index 0000000..fa163e1 --- /dev/null +++ b/using-python-for-science/git-working.md @@ -0,0 +1,700 @@ +# Working with git + +In this section, we will get some hands-on experience with the above concepts: + + - getting started with git + - checkouts, commits, sprouting + - working copy, staging area, history + - pushing and pulling + +Throughout, an important message is to *trust* git. Your data is safe. It’s +very hard to actually lose data/code. + +## Configuring git + +Your git settings live in a file called `.gitconfig` in your home directory: +`/Users/username/` in Mac OS X, `$HOME`, usually `/home/username/` in Linux, +and `C:\Users\username\` in Windows (7 and above). + +The first thing you should do after installing git is setting your name and +email, so that your changes are properly attributed when you commit them to a +repo: + +```console +git config --global user.name "Your Name" +git config --global user.email "your.name@your.institute.edu" +``` + +(The `--global` tag means that this configuration should be used for all your +git projects. You can change the setting for specific project, e.g. if you want +to use a different institute's email address, by using the command without the +tag inside a specific project.) + +## Exercise 1: tracking your change history with git + +Let's jump right in! + +Git is a tool to control revisions. GitHub is a web service centered on +git. It provides repository hosting as well as a suite of workflows to +aid collaboration. + +You will learn a bit more about GitHub in later sections, but for now, +you should create your new project on GitHub, as it offers some nice +conveniences when starting a new project: the addition of a README, a +gitignore, and a license. + +> A quick note on licenses: +> +> The three most popular free and open source (FOSS) licenses are the +> GNU Public License (GPL), the MIT license, and the BSD 3-clause license. +> The main difference between them is that the GPL requires released +> modifications, and even linking software, to themselves be licensed under +> GPL-compatible licenses, while MIT and BSD make no such prescriptions. +> +> We're big proponents of BSD. Jake Vanderplas, then Director of Research at +> the University of Washington's eScience Institute, has a fantastic [blog +> post](http://www.astrobetter.com/blog/2014/03/10/the-whys-and-hows-of-licensing-scientific-code/) +> explaining why. +> +> Having said that, as Jake says, what's important is that you license +> your software and use one of the above licenses. Which license you use is +> a personal choice, not a technical one. A good rule of thumb is, when in +> Rome, do as the Romans do: if writing scientific Python, use BSD. If +> writing R packages, use either GPL or BSD. + +Follow [GitHub's +instructions](https://help.github.com/articles/creating-a-new-repository) +to create a new repository called "pycalc", with the description, +"A module to do simple arithmetic from strings." + +Once created, copy the **SSH** URL from the box in the top-right, and open a +terminal window to *clone* the project from GitHub to your local machine. + +```console +git clone git@github.com:/pycalc.git +cd pycalc +``` + +You can type `git status` to see that there is a git-tracked project there, +but nothing to add to the history: + +```console +git status +``` + +Now you can start working on the project. We are going to create a function +that takes in strings like '2 + 34' and produce the result, 36. + +In your editor, type: + +```python +def compute(expression): + values = expression.split(' ') + num0 = int(values[0]) + operator = values[1] + num1 = int(values[2]) + if operator == '+': + return num0 + num1 + else: + print('unknown operator!') + return 0 +``` + +And save it to a file called `calc.py` in the `pycalc` directory. + +Now we can see that git knows about the file, but it doesn't know whether it +needs to worry about it. That's why it's listed under "Untracked files". +This is handy for playing around with external files (e.g. dummy data files) +that you don't necessarily want to keep for posterity. + +```console +git status +``` + +Which should output: + +``` +On branch main +Your branch is up-to-date with 'origin/main'. +Untracked files: + (use "git add ..." to include in what will be committed) + + calc.py + + nothing added to commit but untracked files present (use "git add" to + track) +``` + +Tell git to add the file to its tracking system: + +```console +git add calc.py +git status +``` + +This should output: + +``` +On branch main +Your branch is up-to-date with 'origin/main'. +Changes to be committed: + (use "git reset HEAD ..." to unstage) + + new file: calc.py +``` + +Then, *commit* the changes to the history: + +```console +$ git commit --message "Initial work on a Python string calculator" +[main c85a965] Initial work on a Python string calculator + 1 file changed, 12 insertions(+) + create mode 100644 calc.py +``` + +The quoted string is called a *commit message*. This should summarise the +changes you made and *why* you made them. The message should be readable +without (much) context. + +You may want to enter a much longer message (one or two paragraphs). For that, +it is convenient to *set up a git editor*. You can find instructions for doing +that in GitHub's documentation page [Associating Text Editors with +Git](https://help.github.com/articles/associating-text-editors-with-git/)). +Once you have configured a git text editor, you can just type `git commit`, and +git will launch your text editor so you can write your commit message. + +```{note} +**An aside on commit messages** + +A bugbear of ours and others': messages should be in the present imperative. +“Fix bug X”, “Add feature Y”, “Document function Z”, rather than past tense +("Fixed", "Added", "Documented") or present tense ("Fixes", "Adds", +"Documents"). The present imperative meshes with git’s automatic commit +messages, e.g. "Merge branch A from repository B". Find more information about +conventions to follow in git log messages +[in this page](http://365git.tumblr.com/post/3308646748/writing-git-commit-messages). + +In addition to the above English style guidelines, two other conventions are +followed in the community: + +> Ensure first line is at most 50 chars long +> +> It should also not have a period at the end. The subsequent lines +> should be separated from the one-line summary by a blank line, and +> should be wrapped at 72 characters. This ensures readability in all +> text terminals. +``` + +Let's continue to improve our code. We shouldn't return 0 on error, because it +can be confused with a perfectly valid result. Instead, Python provides a handy +value for missing or invalid data: `None`. + +You should edit your file so that, if the operator is not recognised, it +returns `None` instead of 0. + +```python +def compute(expression): + values = expression.split(' ') + num0 = int(values[0]) + operator = values[1] + num1 = int(values[2]) + if operator == '+': + return num0 + num1 + else: + print('unknown operator!') + return None +``` + +As before, we can check the status with `git status`, which should output: + +``` +On branch main +Your branch is ahead of 'origin/main' by 1 commit. + (use "git push" to publish your local commits) +Changes not staged for commit: + (use "git add ..." to update what will be committed) + (use "git checkout -- ..." to discard changes in working directory) + + modified: calc.py + +no changes added to commit (use "git add" and/or "git commit -a") +``` + +This time, the output is a bit different: rather than showing you an +"Untracked file", your changes to `calc.py` are now listed as "not staged +for commit". What does this mean? + +If you think of git as taking snapshots of changes over the life of a +project, `git add` specifies *what* will go in a snapshot by putting them +into a *staging area*. Then, `git commit` *actually takes* the snapshot, +making a permanent record of it. This two-step process allows you to group +related changes together. For example, when writing a paper, you might +want to commit your new Discussion section, but wait to add your as-yet +unfinished changes to the Introduction. + +If you don't have anything staged when you type `git commit`, git will +prompt you to use `git commit -a` or `git commit --all`, which is kind +of like saying, "ok, *everyone in this one!*". However, it's almost +always better to explicitly add things to the staging area, because +you might otherwise commit changes you forgot you made. Going back to +the snapshots analogy, you might get the extra with the incomplete +makeup walking in and ruining the picture because you used `-a`! Try +to stage things manually, or you might find yourself searching for +"git undo commit" more than you would like! + +Ok let's just commit everything now. + +BUT FIRST! Let's actually check that "everything" is what we want. In +addition to the one-line, "files changed" summary that we've seen, we can +ask git for a line-by-line summary of changes by typing `git diff`, which +should output something like the following: + +``` +diff --git a/calc.py b/calc.py +index ffadff3..012d92a 100644 +--- a/calc.py ++++ b/calc.py +@@ -7,4 +7,4 @@ def compute(expression): + return num0 + num1 + else: + print('unknown operator!') +- return 0 ++ return None +``` + +You can see that changes to a line are encoded in git as deletion of +that line and addition of the modified line. Git works on a line by line +basis, which is why it's perfect for code and plain text files, but not +so great for big binary files such as images. + +At any rate, the list of all the changes is indeed what we want to commit, +so we can use `--all`: + +```console +$ git commit --all --message "Return None, not 0, on invalid input" +``` + +Finally, let's add support for subtraction: + +```python +def compute(expression): + values = expression.split(' ') + num0 = int(values[0]) + operator = values[1] + num1 = int(values[2]) + if operator == '+': + return num0 + num1 + elif operator == '-': + return num0 - num1 + else: + print('unknown operator!') + return None +``` + +Let's commit the changes: + +```console +git add calc.py +git commit -m "Add support for subtraction" +``` + +(`-m` is a shortcut for `--message`.) + +We now have a *history* that you can look at and interact with: + +```console +$ git log +commit a4e5f6d6c9bd5dbcad86b4c5269b9c1995a1f321 (HEAD -> main) +Author: Juan Nunez-Iglesias +Date: Tue Dec 10 16:21:35 2019 +1100 + + Add support for subtraction + +commit 18b68bcab248b571e4c4264c43b013c1cedbd7d5 +Author: Juan Nunez-Iglesias +Date: Tue Dec 10 16:16:33 2019 +1100 + + Return None, not 0, on invalid input + +commit f817c9aeb2ff15e3b3a91a8a9124c87f41a6cbb8 +Author: Juan Nunez-Iglesias +Date: Tue Dec 10 16:14:24 2019 +1100 + + Initial work on a Python string calculator +``` + +You can *check out* earlier versions of your code using the hash of +a particular snapshot, like so (note: you should use the hash from your own +`git log` output, which will be different from ours): + +```console +git checkout 18b68bcab248b571e4c4264c43b013c1cedbd7d5 +``` + +Which will output: + +``` +Note: switching to '18b68bcab248b571e4c4264c43b013c1cedbd7d5'. + +You are in 'detached HEAD' state. You can look around, make experimental +changes and commit them, and you can discard any commits you make in this +state without impacting any branches by switching back to a branch. + +If you want to create a new branch to retain commits you create, you may +do so (now or later) by using -c with the switch command. Example: + + git switch -c + +Or undo this operation with: + + git switch - + +Turn off this advice by setting config variable advice.detachedHead to false + +HEAD is now at 18b68bc Return None, not 0, on invalid input +``` + +"detached HEAD" is actually not as bad as it sounds! If we go back to the +analogy of git as a time machine, this means that we can just *observe* the +past: any changes we make will not be recorded, and will have no effect on any +timeline. + +You can verify that the file in your directory is now the older +version. It should have the following contents: + +```python +def compute(expression): + values = expression.split(' ') + num0 = int(values[0]) + operator = values[1] + num1 = int(values[2]) + if operator == '+': + return num0 + num1 + else: + print('unknown operator!') + return None +``` + +````{note} +Note: you can use just the first few digits of the hash when checking +out a specific revision: + +```console +git checkout f817c +``` +``` +Previous HEAD position was 18b68bc Return None, not 0, on invalid input +HEAD is now at f817c9a Initial work on a Python string calculator +``` +```` + +It's easy to go back to the latest version, the "present" part of the timeline: + +```console +$ git switch main +Previous HEAD position was c0a11b0... Initial work on a Python string calculator +Switched to branch 'main' +``` + +## Exercise 2: branches + +Now we will undertake a major change to the structure of the function. +Because it's such a big +change, you want to work in a different _branch_ from "main", so that +you can keep using that one while fixing up the new stuff. In practice, +almost _every_ change is significant enough to warrant a new branch, +because "sprouting" one is cheap and easy. + +In the time machine analogy, we are creating a parallel universe in which to +change things, without affecting the main timeline. + +```console +git switch --create use-unpacking +``` + +You should read that as: switch into a *new* branch +called `use-unpacking`. + +Now let's use *iterable unpacking* to get the values from our string: + +```python +def compute(expression): + num0, operator, num1 = expression.split(' ') + if operator == '+': + return num0 + num1 + elif operator == '-': + return num0 - num1 + else: + print('unknown operator!') + return None +``` + +And let's commit that change: + +```console +git add calc.py +git commit -m "Use fancy-schmancy iterable unpacking" +``` + +... Oops! It looks like we've broken our function! + +```python +>>> import calc +>>> calc.compute('5 + 8') +'58' +``` + +Meanwhile, our supervisor wants a working version of the program *now*! No time +to debug. Plus, she has no patience for broken software! Or +functions without documentation! Let's go back to our working version: + +```console +git switch main +``` + +And edit the file to add an informative comment above the function +definition: + +```python +# Perform simple arithmetic encoded in an input string: +# '1 + 2' -> 3, or '1 - 2' -> -1. +def compute(expression): + values = expression.split(' ') + num0 = int(values[0]) + operator = values[1] + num1 = int(values[2]) + if operator == '+': + return num0 + num1 + elif operator == '-': + return num0 - num1 + else: + print('unknown operator!') + return None +``` + +We can commit those changes: + +```console +git commit -a -m "Add function documentation" +``` + +Which outputs: + +``` +[main ef26741] Add function documentation + 1 file changed, 2 insertions(+) +``` + +Whew! Now we have that working version (you should make sure that the main +branch always works!), with documentation, we can send it to our supervisor and +go back to fixing our fancy iterable-unpacking version. + +```console +git switch use-unpacking +``` + +Now, more calmly, we see than in our rush to implement unpacking, we've +forgotten to convert our strings to numbers! A simple fix: + +```python +def compute(expression): + num0, operator, num1 = expression.split(' ') + num0, num1 = int(num0), int(num1) + if operator == '+': + return num0 + num1 + elif operator == '-': + return num0 - num1 + else: + print('unknown operator!') + return None +``` + +And we commit it: + +```console + $ git commit -a -m "Convert num strings to int" +[use-unpacking 81794b8] Convert num strings to int + 1 file changed, 1 insertion(+) +``` + +Now your `use-unpacking` branch is ready to become the main +branch of your program. But you also don't want to throw out the documentation +changes you made on the main branch! + +`git merge` can often automatically reconcile changes in two branched +histories: + +```console + $ git switch main +Switched to branch 'main' + + $ git merge use-unpacking +Auto-merging calc.py +Merge made by the 'recursive' strategy. + calc.py | 6 ++---- + 1 file changed, 2 insert:w + ions(+), 4 deletions(-) +``` + +Git has automatically done the work of resolving the different changes! + +```python +# Perform simple arithmetic encoded in an input string: +# '1 + 2' -> 3, or '1 - 2' -> -1. +def compute(expression): + num0, operator, num1 = expression.split(' ') + num0, num1 = int(num0), int(num1) + if operator == '+': + return num0 + num1 + elif operator == '-': + return num0 - num1 + else: + print('unknown operator!') + return None +``` + + +## Exercise 3: merge conflicts + +Sometimes, it's not clear *how* two branching changes should be merged, for +example if the same line is changed in both histories. Which change should +take precedence? Git does the right thing and defers the decision to you. +Let's see how it does this. + +Create a branch to add the multiplication operator: + +```console +git switch main --create add-multiplication +``` + +It's just a matter of adding the following lines: + +```python + elif operator == '*': + return num0 * num1 +``` + +Commit that change: + +```console +git commit -a -m "Add support for multiplication operator" +``` + +Now, repeat the same procedure, including the new branch, for division: + +```console +git switch main --create add-division +# ... make changes to python file... +git commit -a -m "Add support for division operator" +``` + +Finally, bring those changes into the main branch: + +```console +git switch main +git merge add-multiplication +git merge add-division +``` + +At this point git will, to use a technical term, *chuck a hissy fit*, and +refuse to perform the merge: + +```console +Auto-merging calc.py +CONFLICT (content): Merge conflict in calc.py +Automatic merge failed; fix conflicts and then commit the result. +``` + +```{note} +**Read error messages carefully!** + +git error messages can be scary, but it's worth taking the time to read +them carefully. They often provide the solution to the current problem. +``` + +In this case, we must go into the file and manually fix the conflicting +changes. Git places markers on the file where it has found conflicts, +so you can quickly identify those locations and decide on a fix: + +```python +# Perform simple arithmetic encoded in an input string: +# '1 + 2' -> 3, or '1 - 2' -> -1. +def compute(expression): + num0, operator, num1 = expression.split(' ') + num0, num1 = int(num0), int(num1) + if operator == '+': + return num0 + num1 + elif operator == '-': + return num0 - num1 +<<<<<<< HEAD + elif operator == '*': + return num0 * num1 +======= + elif operator == '/': + return num0 / num1 +>>>>>>> add-division + else: + print('unknown operator!') + return None +``` + +Git is saying that in the current branch ("HEAD"), the file at that +position contains the multiplication lines, while the +`add-division` branch, which we are trying to merge, contains the division +lines *at the very same spot*. + +You fix merge conflicts by navigating to the `=======` markers, comparing +the two conflicting versions, and leaving the file as you want it to be +committed: removing the markers, and combining the changes in a way +that makes sense, which a system like git can't figure out by itself. + +In this case, we simply remove all the markers, and leave the two line pairs +one after the other inside the file. + +```python +# Perform simple arithmetic encoded in an input string: +# '1 + 2' -> 3, or '1 - 2' -> -1. +def compute(expression): + num0, operator, num1 = expression.split(' ') + num0, num1 = int(num0), int(num1) + if operator == '+': + return num0 + num1 + elif operator == '-': + return num0 - num1 + elif operator == '*': + return num0 * num1 + elif operator == '/': + return num0 / num1 + else: + print('unknown operator!') + return None +``` + +Finally, tell git you've fixed the problem by `git add`ing the file, +commit, and the merge will complete! + +```console +$ git add calc.py +$ git commit +``` + +You can check that the history of our project using our created alias, `git +lsd`. (See {doc}`git-installation-and-setup` if `git lsd` doesn't work for you.) + +```console +$ git lsd +* 96b6056 (HEAD -> main) Merge branch 'add-division' +|\ +| * f07a196 (add-division) Add support for division operator +* | 81d75fc (add-multiplication) Add support for multiplication operator +|/ +* 8ab4b31 Merge branch 'use-unpacking' +|\ +| * 81794b8 (use-unpacking) Convert num strings to int +| * 215e7b8 Use fancy-schmancy iterable unpacking +* | a0d2ca0 Add function documentation +|/ +* a4e5f6d Add support for subtraction +* 18b68bc Return None, not 0, on invalid input +* f817c9a Initial work on a Python string calculator +``` diff --git a/using-python-for-science/github.md b/using-python-for-science/github.md new file mode 100644 index 0000000..7cdf838 --- /dev/null +++ b/using-python-for-science/github.md @@ -0,0 +1,256 @@ +# Working with GitHub + +(Or other git collaboration platforms such as GitLab.) + +Git is a tool to control revisions. GitHub is a web service centered on git. It +provides repository hosting as well as a suite of workflows to aid +collaboration. + +In this section you will learn about *remote repositories*, which are histories +that are not in to your working directory. The operations involved in using +these include clone, push, pull, fork, and pull requests. + +## Pushing your project back to GitHub + +You started out creating a project on GitHub, which you cloned to your +own computer. All the changes you recorded since then are stored +*locally*, on your computer. You need to explicitly tell git that you +want to *push* the changes back to GitHub. + +To understand how to do this, you need to keep in mind that git is +distributed: every copy of your data contains the entire history. There +is nothing special about your local copy vs GitHub's copy vs the copy +on your department's computing servers. Therefore, you have to deal +with "remotes", which are your local git's address book of other copies +of the history. Check it out: + +```console +$ git remote +origin +``` + +Git is saying that it knows about a single remote repository, called +origin. To see what it knows *about* this remote, use the `-v` (verbose) +flag: + +```console +$ git remote -v +origin git@github.com:jni/pycalc (fetch) +origin git@github.com:jni/pycalc (push) +``` + +That tells both the name of the remote, *and its location*. "origin" is +the default name for the remote from which you cloned the current +repository. + +Think of the output like an *address book* for the repo: the names on the +left, and the addresses on the right. + +Now, push your local changes back to the origin: + +```console +$ git push origin main:main +Counting objects: 36, done. +Delta compression using up to 4 threads. +Compressing objects: 100% (36/36), done. +Writing objects: 100% (36/36), 3.33 KiB | 0 bytes/s, done. +Total 36 (delta 25), reused 0 (delta 0) +To git@github.com:jni/pycalc + 8ab0457..8de6fb7 main -> main +``` + +Read the above as "push to "origin" my branch "main" onto its branch +"main". Branches are managed locally for each repository, so the branch names +don't actually have to match. That is, we could easily have written: + +```console +$ git push origin main:other-branch-name +``` + +and then the contents of our branch `main` locally would be mirrored in the +remote branch `other-branch-name` on `origin`. In order to tell git to keep +track of matching branch names, use the option `--set-upstream`: + +```console +$ git push origin --set-upstream main +``` + +This tells git: "push `main` onto `origin`'s `main`, and note that they are +mirrors of each other." This means that later, we only need to do: + +```console +$ git push +``` + +And git will know that `main` goes onto `origin`'s `main`. + +After this, you'll be able to refresh your page on GitHub and +browse your code's history. + +## Exercise 4: GitHub pull requests + +For this exercise you will have to pair up with a buddy, whom +we will name Alice. (Your name is Bob, in keeping with the computer +science literature.) Find a buddy to do this exercise with and decide now who +will be Bob and who will be Alice in the pair. + +As Bob, you should delete your "pycalc" repository on GitHub (this is done +under "Settings"): you've realised that Alice has her +own version and that you can both save effort by collaborating on this project. + +You've been wanting to do some arithmetic on some data, but the first +number you need to add is often a decimal number. In those cases, the +`compute` function falls short. (How?) + +You want to modify it so that the first number is allowed to be a decimal +number. + +Navigate to Alice's repository on GitHub +(https://github.com/[Alice's username]/pycalc), and click the "Fork" +button. + +This will create a copy of Alice's repo on your GitHub account, which +you can then clone on your machine as before. But note that you need to +delete your existing work, or git will complain! Instructions below (some of +the directories and obviously the "bob" username needs to be changed to +yours!): + +```console +$ pwd +/Users/bob/projects/pycalc +$ cd .. +$ rm -rf pycalc +$ git clone git@github.com:bob/pycalc +$ cd pycalc +$ git switch --create decimals +``` + +Edit the `calc.py` file so that `num0` is converted with `float` +instead of `int`. (Leave `num1` unchanged for now.) + +Now commit those changes and push them *to a new branch on GitHub*. If you use +the `--set-upstream` flag, you tell git to create a branch with the same +name that "tracks" the current branch. This makes future pushes easier. + +```console +$ git add calc.py +$ git commit -m "Allow num0 to be any decimal number" +$ git push origin --set-upstream decimals +``` + +Go to the GitHub page for the project. You should see a new button +showing that you've recently updated a branch and prompting you to +*initiate a pull request*. (You can also copy the "new pull request" address +from git's message when you push.) + +Here's how this works: you don't know Alice. You probably have never met +her. So it's natural that you can't just push random stuff willy-nilly to her +repository. However, you do have a *shared history*, because you *forked* hers, +and *she* has access to your new changes. So, instead of *pushing* your +changes, you *request* that she *pulls* from your own history. + +The pull request (PR, from now on) will tell Alice that you've made some +changes to the code and you would like her to incorporate them into +her project. Notice that you did this *without needing any special +access from Alice!* This is the magic of GitHub and open source. + +Check out the impact that GitHub has had on a few open source Python projects: + +![GitHub's impact on FOSS](images/gh.png) + +Click on the PR button and fill in the form. Filling in a useful +title and message here is very important! + +```{note} +**Pull request etiquette** + +Make sure your title and description are informative. 99% of the time, when +you make a pull request (PR), the person on the other end is very busy, knows +no background about the PR and doesn't understand why you made the changes +you made. In general, the onus is on the requester to comply with all the +repository's formatting guidelines and so forth. By convention, many +repositories have a `CONTRIBUTING.txt` file explaining the contribution +process. If it's there, be sure to read it before submitting a PR! + +When in Rome, do as the Romans do. Look at their existing codebase +and try to follow their example. (This is not to say that you can't +improve on it; but make sure your documentation and testing *at least* +meets their standards.) +``` + +Alice should get an email notification that there is a pull request to +her project. Clicking on it, she will be taken to the web form for the +PR, where she can examine the changes that Bob has made (the "Files" tab). + +Alice will note that this great change would be made much more useful if it +also used `float` for `num1`! She comments on the +PR page: "This is a great addition, thanks! Could you please do the same for +num1?" + +On his machine, Bob makes the requested change, commits, and pushes his changes: + +```console +$ # ... edit calc.py ... +$ git add calc.py +$ git commit -m "Use float for num1's conversion also" +$ git push # no need to specify repo or branch anymore, having `set-upstream` +``` + +If either Bob or Alice go back to the PR page, they will see that the PR has +been automagically updated with Bob's new changes! (They may need to +refresh the page.) + +Alice, satisfied with the update, can now click on the "Merge pull +request" button and incorporate Bob's changes to her code! + +One last thing needs to happen to really synchronise everyone's histories. +Although Alice has Bob's changes, *Bob doesn't have Alice's commit +incorporating his changes.* If he continues to work on his `decimals` branch, +their histories will diverge. And if he works on his `main` branch, his +changes won't be there! + +The solution is for him to *pull* the main branch *from Alice's repository*. +For this, he needs to add it to his list of remotes (remember remotes?): + +```console +$ git remote -v +origin git@github.com:bob/pycalc (fetch) +origin git@github.com:bob/pycalc (push) +$ git remote add upstream git@github.com:alice/pycalc +$ git remote -v +origin git@github.com:bob/pycalc (fetch) +origin git@github.com:bob/pycalc (push) +upstream git@github.com:alice/pycalc (fetch) +upstream git@github.com:alice/pycalc (push) +$ git switch main +$ git pull upstream main # get upstream's main branch, and merge +$ git push origin main +``` + +```{note} +In 2021, GitHub added a feature to synchronise forks from the web UI directly. +See [Syncing a +fork](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/syncing-a-fork) +on the GitHub help pages. +``` + +Bob can now inspect his history log and see that both his changes and Alice's +merge are there. Use GitKraken/GitTower/other GUI for this, or the `lsd` +alias we learned earlier, or a simple git-log will also do. + +## Bonus exercise 1: self-PRs and code review + +Do the reverse approach, which is a bit different. Alice has gained a +collaborator in Bob. Even though she still maintains control of the project +repository, she wants to enlist his help. She creates a branch, adds a line or +two (for example, she might want to add a test function, `test_compute`, that +runs `compute` for a few known values and makes sure the results match up), then +*creates a PR against her own repository.* She then asks Bob to review her +change by mentioning his username (e.g. `@bob`, as in Twitter) in a comment on +the PR page. Only when Bob gives his ok (or perhaps he spots a typo) does she +merge. + +This practice of requesting code reviews for pull requests, even when you +control the repository, is universal among programming teams, because it +dramatically improves the quality of the code. Two pairs of eyes are more than +twice as effective as one pair. diff --git a/using-python-for-science/images/gh.png b/using-python-for-science/images/gh.png new file mode 100644 index 0000000..aa8c88b Binary files /dev/null and b/using-python-for-science/images/gh.png differ diff --git a/using-python-for-science/images/revhell.png b/using-python-for-science/images/revhell.png new file mode 100644 index 0000000..674016f Binary files /dev/null and b/using-python-for-science/images/revhell.png differ