#Python NumPy: Arrays and Vectorized Computation

This is a personal study note for Data Wrangling. It is meant to be a both a quick guide and reference for further research into these topics. 

*Reference: Python for Data Analysis by Wes McKinny*

##Array

A numpy array is a grid of values, all of the **same type**, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the **shape** of an array is a tuple of integers indicating the size of each dimension.  Arrays also have a "size" attribute. For a 1-dimensional array this is equivalent to its length. It is essentially a product of the dimensions.

The easiest way to create an array is to use the *array* function. This accepts any se- quence-like object (including other arrays) and produces a new NumPy array containing the passed data.


In [23]:
import numpy as np
a = np.array([1, 2, 3])  # Create a rank 1 array
print type(a), a.ndim, a.shape, a.size           

<type 'numpy.ndarray'> 1 (3,) 3


In [7]:
a[0] = 5                 # Change an element of the array
print a     

[5 2 3]


In [8]:
b = np.array([[1,2,3],[4,5,6]])   # Create a rank 2 array
print b

[[1 2 3]
 [4 5 6]]


In [9]:
print b.ndim, b.shape, b.size

2 (2, 3) 6


More on array creation: [Array creation routine](http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html)

##Datatypes

Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. 

In [13]:
x = np.array([1, 2])  # Let numpy choose the datatype
y = np.array([1, 2], dtype=np.int64)  # Force a particular datatype
z = x.astype(np.float64)  #Cast an array from one dtype to another 

print x.dtype, y.dtype, z.dtype

int64 int64 float64


More on dtype: [documentation](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)

##Array Indexing

###Slicing
Similar to Python lists, numpy arrays can be sliced. A slice of an array is a view into the *same data*.  Since arrays may be multidimensional, you must specify a slice for each dimension of the array.

In [22]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]
print b

# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print a[0, 1]  
b[0, 0] = 77    # b[0, 0] is the same piece of data as a[0, 1]
print a[0, 1]   

[[2 3]
 [6 7]]
2
77


In [26]:
c = np.array([[[1],[2],[3]], [[4],[5],[6]]])
print c.shape

#If the number of objects in the selection tuple is less than ndim ,
#then : is assumed for any subsequent dimensions.
d = c[1:2,0:2]  
print d,d.shape   


(2, 3, 1)
[[[4]
  [5]]] (1, 2, 1)


###Integer array indexing
When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array. Here is an example:

In [9]:
a = np.array([[1,2], [3, 4], [5, 6]])

# An example of integer array indexing.
# The returned array will have shape (3,) and 
print a[[0, 1, 2], [0, 1, 0]]  

# The above example of integer array indexing is equivalent to this:
print np.array([a[0, 0], a[1, 1], a[2, 0]])  

# When using integer array indexing, you can reuse the same
# element from the source array:
print a[[0, 0], [1, 1]]  

# Equivalent to the previous integer array indexing example
print np.array([a[0, 1], a[0, 1]])  

[1 4 5]
[1 4 5]
[2 2]
[2 2]


We can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array. 

In [28]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1,:]    # Rank 1 view of the second row of a  
row_r2 = a[1:2,:]  # Rank 2 view of the second row of a
print row_r1, row_r1.shape, row_r1.ndim  
print row_r2, row_r2.shape, row_r2.ndim 



[5 6 7 8] (4,) 1
[[5 6 7 8]] (1, 4) 2


###Boolean array indexing

Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:



In [3]:
a = np.array([[1,2], [3, 4], [5, 6]])
bool_idx = (a > 2)  # Find the elements of a that are bigger than 2;
                    # this returns a numpy array of Booleans of the same
                    # shape as a, where each slot of bool_idx tells
                    # whether that element of a is > 2.
bool_idx

array([[False, False],
       [ True,  True],
       [ True,  True]], dtype=bool)

We can use boolean array indexing to construct a *rank 1* array consisting of the elements of a corresponding to the True values of bool_idx:

In [36]:
print a[bool_idx] 

# We can do all of the above in a single concise statement:
print a[a > 2]     

[3 4 5 6]
[3 4 5 6]


We can use what NumPy calls "Boolean indexing", combined with the sum function, to count the number of True values in the array:

In [5]:
#number of elements in array that are greater than 2
print ((a > 2) == True).sum()

4


##Array Math

Basic mathematical functions (universal functions) operate **elementwise** on arrays, and are available both as operator overloads and as functions in the numpy module. 

[Universal functions documentation](http://docs.scipy.org/doc/numpy/reference/ufuncs.html)

##Sorting

Sorting works much like it does with built-in lists. The np.sort() function is a pure function that returns a sorted copy of the array while leaving the original array untouched, whereas the .sort() method is a modifier that sorts the array in place.



In [37]:
int_arr = np.random.randint(0,10,8) #generate 8 interger from range(10)
print int_arr

np.sort(int_arr)

print np.sort(int_arr)
print int_arr

[5 0 2 9 9 8 0 6]
[0 0 2 5 6 8 9 9]
[5 0 2 9 9 8 0 6]


In [38]:
int_arr.sort()
print int_arr

[0 0 2 5 6 8 9 9]


We can sort multidimensional arrays by passing in the axis along which you want to sort. For a 2D array, this means passing in axis 0 if you want to sort by columns and axis 1 if you want to sort by rows:

In [40]:
twod_int_arr = np.random.randint(0,10,(4,4))
print twod_int_arr

print np.sort(twod_int_arr,0) #sort by column

print np.sort(twod_int_arr,1) #sort by row

[[6 8 5 3]
 [7 6 4 0]
 [6 7 4 6]
 [6 7 4 5]]
[[6 6 4 0]
 [6 7 4 3]
 [6 7 4 5]
 [7 8 5 6]]
[[3 5 6 8]
 [0 4 6 7]
 [4 6 6 7]
 [4 5 6 7]]


np.argsort returns the indices that would sort an array.

In [28]:
arr = np.random.randint(0,10,10)
print arr

print arr.argsort()      
print arr.argsort()[::-1]   #reverse

[1 3 1 4 4 4 4 3 4 6]
[0 2 1 7 3 4 5 6 8 9]
[9 8 6 5 4 3 7 1 2 0]


##Some Useful NumPy Functions

numpy.where(condition[, x, y]) return elements, either from x or y, depending on condition. When True, yield x, otherwise yield y.

In [14]:
a = np.array([[1,2],[3,4]])
b = np.array([[9,8],[7,6]])
c = np.array([[True,False],[True,False]])

print np.where(c,a,b)

[[1 8]
 [3 6]]


in1d() function tests a set of input values for membership in a given array or set it returns an array of Booleans indicating which of the input set can be found in the target:

In [16]:
arr = np.random.randint(0,10,10)
print arr

print np.in1d([3,9,6],arr) 

[9 2 8 9 1 0 4 3 2 5]
[ True  True False]


unique() function returns a sorted list of unique values found in the input array:

In [20]:
arr = np.random.randint(0,10,10)
print arr

print np.unique(arr)

[4 8 7 9 6 3 5 1 1 9]
[1 3 4 5 6 7 8 9]


More:
[NumPy Reference](http://docs.scipy.org/doc/numpy/reference/routines.html)