# STA 141b Week 2

_TA: Nick Ulle (naulle@ucdavis.edu)_

## Links

* [Python 2 vs 3](http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html)

* [How to Set Up a Virtual Machine](http://nick-ulle.github.io/virtual-machine/)

* [How to Set Up Bash on Win10](https://msdn.microsoft.com/en-us/commandline/wsl/install_guide)

## Git Merge Notes (from Wed Lecture)

Git is a distributed version control system (DVCS).

Git is _distributed_ because it can share repositories between different computers. Your computer and any repositories on its hard drive are _local_. Someone else's computer and any repositories they have are _remote_. When you _clone_ a repository, you're copying the repository from a remote computer (typically GitHub) to your computer.

Git is a _version control system_ because it keeps track of changes to files. A _commit_ is a bundle of saved changes. Commits are like checkpoints for your files.

When you work on a GitHub repository with other people, they might change a file, commit the changes, and then _push_ the commit to GitHub. Your local copy of the file won't change unless you _pull_ the new commit from GitHub. In other words, your local repository can easily get out of sync with the remote repository on GitHub. If you change your local copy of the file and commit the changes, you create a _conflict_. If you try to push the conflicting commit to GitHub, you'll see an error message:
```
git push

To github.com:USERNAME/REPOSITORY.git
 ! [rejected]        master -> master (fetch first)
error: failed to push some refs to 'git@github.com:USERNAME/REPOSITORY.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
```
When you see an error, __don't panic!__ The error message hints that you should try pulling commits from GitHub before pushing your commit. If you pull commits from GitHub, you might see another error message:
```
git pull

remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
From github.com:USERNAME/REPOSITORY
   6fe289c..48e44d3  master     -> origin/master
Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
Automatic merge failed; fix conflicts and then commit the result.
```
This is okay! Git tried to automatically fix the conflict by _merging_ your commit with the other person's commit, but couldn't figure out how because both commits changed the same file (`README.md` in the example). An automatic merge will only succeed if the commits being merged changed different files. Otherwise, it's up to you to resolve the conflict manually. If you open the file causing the conflict in a text editor, you'll see something like this:
```
# Our README.md

<<<<<<< HEAD
Here are the changes you made.
=======
Here are the changes the other person made.
>>>>>>> 48e44d3a60af614f3a0da794a1701d040221d40f

Here's some text that was added to the file in an earlier commit.
```
Git automatically marked which parts of the file conflict. Changes from your commit are shown between `<<<<<<<` and `=======`. Changes from the other person's commit are shown between `=======` and `>>>>>>>`. All you need to do is edit the file to look the way you want. If you wanted to keep your changes and the other person's changes (the polite thing to do), you could edit the file to look like this:
```
# Our README.md

Here are the changes you made.

Here are the changes the other person made.

Here's some text that was added to the file in an earlier commit.
```
When you're done editing, save and then commit the file. This is called a _merge commit_. Git will automatically provide a commit message indicating that you merged your commit with the other person's commit:
```
[master 9594c15] Merge branch 'master' of github.com:USERNAME/REPOSITORY
```
Finally, you can push your commit along with the merge commit to GitHub:
```
git push

Counting objects: 6, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (6/6), 602 bytes | 0 bytes/s, done.
Total 6 (delta 0), reused 0 (delta 0)
To github.com:USERNAME/REPOSITORY.git
   48e44d3..9594c15  master -> master
```
Note that if you pull and git asks you to merge a file, but you'd like to undo the pull and make more changes before merging, you can use the command `git merge --abort`. Git will remind you about "unmerged paths" in the `git status` message when it's waiting for you to merge a file:
```
git status

On branch master
Your branch and 'origin/master' have diverged,
and have 1 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)
You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)

Unmerged paths:
  (use "git add <file>..." to mark resolution)

        both modified:   README.md

no changes added to commit (use "git add" and/or "git commit -a")
```

## Discussion Notes

### Getting Help

Python is [well-documented](https://docs.python.org/2.7/)!

You can also access documentation with the `help()` function.

In [2]:
help(xrange)

Help on class xrange in module __builtin__:

class xrange(object)
 |  xrange(stop) -> xrange object
 |  xrange(start, stop[, step]) -> xrange object
 |  
 |  Like range(), but instead of returning a list, returns an object that
 |  generates the numbers in the range on demand.  For looping, this is 
 |  slightly faster than range() and more memory efficient.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  __len__(...)
 |      x.__len__() <==> len(x)
 |  
 |  __reduce__(...)
 |  
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  
 |  __reversed__(...)
 |      Returns a reverse iterator.
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __new__ = <built-in method __new__ of type object>
 |      T.__new__(S, ...) -> 

### Modules & Packages

Python has _modules_ and also _packages_. What's the difference?

A module is a single Python script (a `.py` file). You can load a module with the `import` command.

A package is a collection of modules prepared for distribution. You can install a package with `conda` or `pip`. Some packages only have one module.

Which of the built-in modules are important?

Module      | Description
----------- | -----------
sys         | info about Python (version, etc)
pdb         | Python debugger
os.path     | tools for file paths
collections | additional data structures
string      | string processing
re          | regular expressions
urlparse    | parse URLs
math        | simple math (but we'll mostly use NumPy instead)
itertools   | tools for iterators
functools   | tools for functions

In [3]:
import sys

sys.version

'2.7.13 (default, Dec 21 2016, 07:16:46) \n[GCC 6.2.1 20160830]'

### Division

__Be careful:__ In Python 2, the division operator `/` has different meanings depending on the type of the operands.

In [4]:
5.0 / 2.0

2.5

In [5]:
5.0 / 2

2.5

In [6]:
5 / 2.0

2.5

In [7]:
5 / 2

2

In [8]:
type(5)

int

In [9]:
type(5.0)

float

In Python 3, the division operator always does floating point division.

You can import the Python 3 behavior from the `__future__` module.

This is a good idea for any new scripts you write!

In [10]:
from __future__ import division

In [11]:
5 / 2

2.5

### Iterators, Generators, and List Comprehensions

The `range()` function creates a list of integers. Each integer is stored in memory.

In [13]:
range(10)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Creating a very long list will use lots of memory and may crash Python:

In [28]:
# A list with 1 billion elements, enough to crash Python on Nick's machine.
# x = range(1000000000)

On the other hand, if you wanted to print the first 10 numbers, starting from 0, you might write:

In [29]:
for x in range(10):
    print x

0
1
2
3
4
5
6
7
8
9


How can you extend this to print the first 1 billion numbers? Clearly you can't use `range()` here.

One solution is to use a while-loop:

In [34]:
x = 0
while x < 10: #00000000
    print x
    x = x + 1

0
1
2
3
4
5
6
7
8
9


The while-loop is cumbersome since you have to set up and keep track of `x` yourself, but notice that only 1 integer, `x`, needs to be stored in memory. This is because the while-loop "forgets" each number as soon as it's been printed.

You can keep track of a range the same way. The only information that needs to be stored is the start, end, and current position. This is what `xrange()` does. You can make an xrange of any size without crashing Python:

In [36]:
xrange(1000000000)

xrange(1000000000)

To use the xrange, turn it into an iterator with `iter()` and get the current element with `.next()`:

In [47]:
x = iter(xrange(1000))
x.next()

0

In [48]:
x.next()

1

Most of the time you won't need `iter()` and `.next()`, because you'll use xranges in for-loops:

In [5]:
for x in xrange(10):
    print x

0
1
2
3
4
5
6
7
8
9


You can also use xranges in list comprehensions. A _list comprehension_ applies an operation to each element of an iterator.

If you're familiar with set notation like $\bigl\{ x + 2 : x \in \{0, 1, 2, 3\}\bigr\}$ from mathematics, the syntax of list comprehensions is similar:

In [50]:
[x + 2 for x in xrange(4)]

[2, 3, 4, 5]

If you're familiar with R, you can also think of list comprehensions as the Python equivalent of apply functions.

The name of the variable in a list comprehension doesn't matter:

In [51]:
[y + 2 for y in xrange(4)]

[2, 3, 4, 5]

For more complex tasks, you can use several list comprehensions in a row:

In [54]:
# Sum (x^2 + 2) over all even x in {0, 1, ..., 9}
vals = [x ** 2 for x in xrange(10) if x % 2 == 0] # x^2 for all even x
vals = [x + 2 for x in vals] # add 2
sum(vals) # sum

130

Like ranges, list comprehensions can use a lot of memory if you aren't careful. _Generator expressions_ are like list comprehensions, but don't compute anything until a value is requested, and forget the previous value as soon as the next value is requested.

In other words, generator expressions are similar to xranges, while list comprehensions are similar to ranges.

The syntax for a generator expression is the same as for a list comprehension, but surrounded by parentheses `()` instead of brackets `[]`:

In [56]:
# Sum (x^2 + 2) over all even x in {0, 1, ..., 9}
vals = (x ** 2 for x in xrange(10) if x % 2 == 0) # x^2 for all even x
vals = (x + 2 for x in vals) # add 2
sum(vals) # sum

130

Whenever possible, use xranges and generator expressions rather than ranges and list comprehensions. This can substantially improve the performance of your code.

### Modular Arithmetic

Suppose it's 9am and you have a meeting 2 hours. If someone asks when the meeting is, you'd say 9 + 2 = 11.

What if the meeting is in 6 hours? Although 9 + 6 = 15, you'd probably say 3pm by calculating 9 + 6 = 15 and 15 - 12 = 3.

When you do clock arithmetic, you're actually doing _modular arithmetic_! In the example, you're computing $(9 + 6) \bmod 12 = 3$. Modular arithmetic is useful in any situation where the numbers wrap around, like the hours on a clock (0 - 11) or the days of the week (0 - 6). Note that modular arithmetic is zero-based, so $12 \bmod 12 = 0$ rather than $12$.

You can also think of modular arithmetic as the computing the remainder after division.

In Python, the modulo operator is `%`:

In [57]:
8 % 7

1

In [58]:
6 % 2

0

## Student Questions

__Q:__ How can I extract the last 2 digits of a number?

Use the modulo operator with a power of 10 to divide out larger digits:

In [60]:
356 % 10

6

If you have a string rather than an integer, you can extract digits with array indexing:

In [63]:
x = "fuzzy cat"
x[6:] # from 7th digit to end of string

'cat'

Note that negative indexes are counted from the right side of the string:

In [66]:
x[-3:] # from 3rd-to-last digit to end of string

'cat'