# Python for Data Analysis

This is the note for book: Python for Data Analysis. It includes the following topics:

* iPython
* Numpy
* Pandas
* Matplotlib
* Scipy

The supplemental files and examples can be found [here](https://resources.oreilly.com/examples/0636920023784/) and [here](https://github.com/wesm/pydata-book/tree/2nd-edition/datasets).

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# IPython

## Tab Completion

Use `Tab` after an object or a Path to save time.

## Introspection

Use `?` before or after an object to display general information.

Use `??` will also show functions' source code if possible.

`?` can also be used to search namespace.

In [3]:
np.*load*?

## Magic Command

%run - All of the variables will then be accessible in the iPython shell
* -i - Access to variables already defined in the iPython namespace
* -d invoke the debugger before executing any code, press `s`(step) to enter the script
* -b(num) - start the debugger with a breakpoint 
* -p (or use %prun) same as `cProfile`
    * -s sort order

%paste - takes text in the clipboard and executes it as a single block

%cpaste - can paste as 

code as you like before executing; use `Ctrl-C` to break out

%xmode

%time - run a statement once; report the total execution time

%timeit - run a statement multiple times to produce an accurate average runtime


%reset
* -f - force reset without asking for confirmation
* -s - only clear namespace, leaving history intact

%automagic

%quickref

%magic

%debug
* u - up
* d - down
* s - step
* c - continue
* n - next
* ! - !variable to examine the contents
* h(elp) - display command list
* help command - show documentation for `command`
* q(uit) - exit debugger without executing any more code
* b(reak) number - set breakpoint at `number` in current file
* a(rgs) - show arguments for current function
* l(ist) statement - show current position and context at current level of stack
* w(here) - print full stack trace with context at current position

%hist - print all or part of the input history

%pdb

%who, %who_ls, %whos

%xdel variable - remove all references to a particular object

%logstart - loggin the entire console session
%logoff
%logon
%logstate
%logstop

%bookmark [name] [path]
* -l list all the bookmarks
* -b override and use the bookmark location



## Keyboard Shortcuts

|Command|Description|
|---|---|
|`Ctrl-p` or `up-arrow`|Search backward in history|
|`Ctrl-n` or `down-arrow`|Search forward in history|
|`Ctrl-r`|Readline-style reverse history search|
|`Ctrl-Shift-v`|Paste text from clipboard|
|`Ctrl-C`|Interrupt currently-executing code|
|`Ctrl-a`|Move cursor to beginning of line|
|`Ctrl-e`|Move cursor to end of line|
|`Ctrl-k`|Delete text from cursor until end of line|
|`Ctrl-u`|Discard all text on current line|
|`Ctrl-f`|Move cursor forward one character|
|`Ctrl-b`|Move cursor back one character|
|`Ctrl-l`|Clear screen|

# NumPy

## ndarray

### Create

In [6]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([ 6. ,  7.5,  8. ,  0. ,  1. ])

In [7]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [8]:
arr2.ndim

2

In [9]:
arr2.shape

(2, 4)

In [10]:
arr2.dtype

dtype('int64')

In [11]:
np.zeros(10)

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [12]:
np.zeros((3, 7))

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]])

In [13]:
np.empty((2, 3, 2))

array([[[ -2.68156159e+154,  -2.68677871e+154],
        [ -2.68156159e+154,   2.00389111e+000],
        [  1.27319747e-313,   1.27319747e-313]],

       [[  2.96439388e-323,   1.90979621e-313],
        [  0.00000000e+000,   2.75859453e-313],
        [  0.00000000e+000,   8.34402697e-309]]])

In [14]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [15]:
np.eye(5)

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

### Data Types

In [17]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr1

array([ 1.,  2.,  3.])

In [18]:
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr2.dtype

dtype('int32')

In [20]:
arr3 = arr2.astype(np.float32)
arr3.dtype

dtype('float32')

### Operations

In [21]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [22]:
arr * arr

array([[ 1,  4,  9],
       [16, 25, 36]])

In [23]:
arr - arr

array([[0, 0, 0],
       [0, 0, 0]])

In [24]:
1 / arr

array([[ 1.        ,  0.5       ,  0.33333333],
       [ 0.25      ,  0.2       ,  0.16666667]])

In [25]:
arr ** 0.5

array([[ 1.        ,  1.41421356,  1.73205081],
       [ 2.        ,  2.23606798,  2.44948974]])

### Indexing & Slicing

In [27]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [28]:
arr[5]

5

In [29]:
arr[5:8]

array([5, 6, 7])

In [30]:
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [31]:
arr_slice = arr[5:8]
arr_slice[1] = 12345
arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,     9])

In [32]:
arr_slice[:] = 64
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [36]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]],
                  [[7, 8, 9], [10, 11, 12]]])
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [37]:
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

In [38]:
arr3d[1, 0]

array([7, 8, 9])

In [39]:
arr[1:6]

array([ 1,  2,  3,  4, 64])

In [41]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8 ,9]])
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [42]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [43]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [44]:
arr2d[1, :2]

array([4, 5])

In [45]:
arr2d[2, :1]

array([7])

In [46]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

### Boolean Indexing

In [54]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', "Joe"])
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], 
      dtype='<U4')

In [59]:
len(names)

7

In [55]:
data = np.random.randn(7, 4)
data

array([[-0.40080931,  0.73838598,  1.16037982,  1.08658554],
       [ 0.79158458, -0.46731747, -3.35068232, -0.32557877],
       [ 1.43340106,  0.05358676, -2.15654469, -1.81515339],
       [ 0.63816787,  1.17016115,  0.21288172,  0.60525099],
       [ 0.7065364 , -1.24656332,  0.30147498, -3.44365324],
       [-0.48499366, -1.47569519,  0.91633472,  2.9595649 ],
       [-2.30994799, -1.29350272, -0.57238644, -1.12780553]])

In [56]:
names == 'Bob'

array([ True, False, False,  True, False, False, False], dtype=bool)

In [57]:
data[names == 'Bob']

array([[-0.40080931,  0.73838598,  1.16037982,  1.08658554],
       [ 0.63816787,  1.17016115,  0.21288172,  0.60525099]])

In [60]:
data[names == 'Bob', 2:]

array([[ 1.16037982,  1.08658554],
       [ 0.21288172,  0.60525099]])

In [61]:
data[~(names == 'Bob')]

array([[ 0.79158458, -0.46731747, -3.35068232, -0.32557877],
       [ 1.43340106,  0.05358676, -2.15654469, -1.81515339],
       [ 0.7065364 , -1.24656332,  0.30147498, -3.44365324],
       [-0.48499366, -1.47569519,  0.91633472,  2.9595649 ],
       [-2.30994799, -1.29350272, -0.57238644, -1.12780553]])

In [62]:
data[names != 'Bob']

array([[ 0.79158458, -0.46731747, -3.35068232, -0.32557877],
       [ 1.43340106,  0.05358676, -2.15654469, -1.81515339],
       [ 0.7065364 , -1.24656332,  0.30147498, -3.44365324],
       [-0.48499366, -1.47569519,  0.91633472,  2.9595649 ],
       [-2.30994799, -1.29350272, -0.57238644, -1.12780553]])

In [65]:
mask = (names == 'Bob') | (names == 'Will')
mask

array([ True, False,  True,  True,  True, False, False], dtype=bool)

In [66]:
data[mask]

array([[-0.40080931,  0.73838598,  1.16037982,  1.08658554],
       [ 1.43340106,  0.05358676, -2.15654469, -1.81515339],
       [ 0.63816787,  1.17016115,  0.21288172,  0.60525099],
       [ 0.7065364 , -1.24656332,  0.30147498, -3.44365324]])

In [67]:
data[data < 0] = 0
data

array([[ 0.        ,  0.73838598,  1.16037982,  1.08658554],
       [ 0.79158458,  0.        ,  0.        ,  0.        ],
       [ 1.43340106,  0.05358676,  0.        ,  0.        ],
       [ 0.63816787,  1.17016115,  0.21288172,  0.60525099],
       [ 0.7065364 ,  0.        ,  0.30147498,  0.        ],
       [ 0.        ,  0.        ,  0.91633472,  2.9595649 ],
       [ 0.        ,  0.        ,  0.        ,  0.        ]])

### Fancy Indexing