## Gentle Intro to Arrays through Numpy ##

Before we get into what the Numpy library is good for, let's first go into what libraries are first, for those who are new to coding.

To put it really simply (maybe a little too simply), libraries are just collections of scripts that you can use for specific purposes. You could technically code everything from scratch using just basic Python, but why re-invent the wheel when someone else has already done it before?

For the analysis of data, the most common libraries that we will use, alongside what they are good for, are as follows:

- **Numpy** - Arrays, linear algebra operations, Fourier transformations, and random number capabilities
- **Pandas** - Data structures and functions for analyzing data - labeled array data structures; Series, DataFrame, and Panel, aggregation functions, datetime functions, I/O operations
- **Matplotlib** - line plots, contour plots, scatter plots, and Basemap plots
- **Scikit-learn** - supports various machine learning models, such as classification, regression, and clustering algorithms

Don't worry too much about the technical terms above. We will get into them in subsequent articles. For now, let's focus on Numpy.

If you are familiar with Matlab, then this is going to be familiar. Numpy allows you to the same  matrices and matrix operations. If not, just google for an introduction to matrix algebra (Khan Academy, or any of the multiple sources out there).

Let's first import Numpy and create our first array.

In [1]:
import numpy as np

new_array = np.array([[1,2],[3,4], [5,6]]) # creating a new 3x2 array - using a list of lists
new_array

array([[1, 2],
       [3, 4],
       [5, 6]])

In [2]:
new_array.ndim # dimensions

2

In [3]:
new_array.shape # shape of array

(3, 2)

In [4]:
len(new_array) # length of array's dimension

3

In [5]:
new_array.dtype # getting the type

dtype('int64')

The data types that can be stored in arrays include:
* bool
* int8, uint8, int16, uint16, int32, uint32, int64, uint64 - signed and unsigned integers of various bits
* float16, float32, float64
* complex64, complex128, complex256
* object
* string_
* unicode_


In [6]:
new_array_float = new_array.astype(np.float64)
new_array_float.dtype

dtype('float64')

In [7]:
new_array_float

array([[1., 2.],
       [3., 4.],
       [5., 6.]])

Creating Arrays

In [8]:
# creating an empty array - beware that it is not exactly zero
np.empty([2,2], dtype=np.float64) 

array([[1.72723371e-077, 1.72723371e-077],
       [2.17477370e-314, 2.78134232e-309]])

In [9]:
# creating an identity matrix
np.eye(3, dtype=np.int)

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

In [10]:
# create array of ones
np.ones([3,3], dtype=np.int)

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

In [11]:
# create array of zeros
np.zeros([4,3])

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [12]:
# array based on a range
np.arange(3,10)

array([3, 4, 5, 6, 7, 8, 9])

In [13]:
# fill array with specified number
np.full((3,3), 4, dtype=np.float32)

array([[4., 4., 4.],
       [4., 4., 4.],
       [4., 4., 4.]], dtype=float32)

In [14]:
# convert list to array
list = [1,2,3,4,5,6]
a = np.asarray(list)
a

array([1, 2, 3, 4, 5, 6])

In [15]:
b = np.copy(a)
b

array([1, 2, 3, 4, 5, 6])

** Slicing and Dicing **

In [16]:
a = np.array([[1,3,4],[3,5,7],[4,7,9],[2,3,4]])
a

array([[1, 3, 4],
       [3, 5, 7],
       [4, 7, 9],
       [2, 3, 4]])

In [17]:
a[0,2] # row 0, col 2

4

In [18]:
a[:,2] # all rows, col 2

array([4, 7, 9, 4])

In [19]:
a[2,:] # row 2, all col

array([4, 7, 9])

In [20]:
a[:3] # all items up to 2, because this is non-inclusive

array([[1, 3, 4],
       [3, 5, 7],
       [4, 7, 9]])

In [21]:
a[-1] # last row

array([2, 3, 4])

In [22]:
a[(a > 4)] # select only items that are more than 4

array([5, 7, 7, 9])

In [23]:
a[[0,1]] # select row 0 and 1

array([[1, 3, 4],
       [3, 5, 7]])

In [24]:
a[[0,1],[0,2]] # get 0,0 and 1,2 elements

array([1, 7])

In [25]:
a * 2 # operations are element wise

array([[ 2,  6,  8],
       [ 6, 10, 14],
       [ 8, 14, 18],
       [ 4,  6,  8]])

In [26]:
b = a + a
b

array([[ 2,  6,  8],
       [ 6, 10, 14],
       [ 8, 14, 18],
       [ 4,  6,  8]])

In [27]:
a == b

array([[False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False]])

In [28]:
np.array_equal(a,a)

True

In [29]:
a.shape

(4, 3)

In [30]:
a.reshape(3,4)

array([[1, 3, 4, 3],
       [5, 7, 4, 7],
       [9, 2, 3, 4]])

In [31]:
a

array([[1, 3, 4],
       [3, 5, 7],
       [4, 7, 9],
       [2, 3, 4]])

In [32]:
a.swapaxes(0,1)

array([[1, 3, 4, 2],
       [3, 5, 7, 3],
       [4, 7, 9, 4]])

In [33]:
a.T

array([[1, 3, 4, 2],
       [3, 5, 7, 3],
       [4, 7, 9, 4]])

In [34]:
np.dot(a, a.T) # multiplying a 4x3 by a 3x4

array([[ 26,  46,  61,  27],
       [ 46,  83, 110,  49],
       [ 61, 110, 146,  65],
       [ 27,  49,  65,  29]])

In [35]:
b = ([2,4,5,3],[5,9,1,3])
b

([2, 4, 5, 3], [5, 9, 1, 3])

In [36]:
np.sort(b)

array([[2, 3, 4, 5],
       [1, 3, 5, 9]])

In [37]:
np.sort(b, axis=0) # sorting over the first axis

array([[2, 4, 1, 3],
       [5, 9, 5, 3]])

In [38]:
c = np.argsort(b) # indexes for sort
c

array([[0, 3, 1, 2],
       [2, 3, 0, 1]])

In [39]:
np.argmax(b)

5

Some other useful functions available in Numpy include:
* sin, cos, tan, cosh, sinh, tanh, arcos, arctan, deg2rad
<br>
* around, round, rint, fix, floor, ceil, trunc
<br>
* sqrt, square, exp, expm1, exp2, log, log10, log1p, logaddexp
<br>
* add, negative, multiply, devide, power, substract, mod, modf, remainder
<br>
* greater, greater_equal, less, less_equal, equal, not_equal

In [40]:
a

array([[1, 3, 4],
       [3, 5, 7],
       [4, 7, 9],
       [2, 3, 4]])

In [41]:
np.sum(a, axis=0) # sum down col

array([10, 18, 24])

In [42]:
np.prod(a, axis=1) # mul along rows

array([ 12, 105, 252,  24])

In [43]:
np.diff(a, axis=0)

array([[ 2,  2,  3],
       [ 1,  2,  2],
       [-2, -4, -5]])

In [44]:
np.gradient(a) # calculate the gradient

[array([[ 2. ,  2. ,  3. ],
        [ 1.5,  2. ,  2.5],
        [-0.5, -1. , -1.5],
        [-2. , -4. , -5. ]]), array([[2. , 1.5, 1. ],
        [2. , 2. , 2. ],
        [3. , 2.5, 2. ],
        [1. , 1. , 1. ]])]

In [45]:
np.cross(a, a) # cross product

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [46]:
np.std(a) # std dev of an array

2.211083193570267

In [47]:
np.mean(a)

4.333333333333333

In [48]:
np.unique(a)

array([1, 2, 3, 4, 5, 7, 9])

In [49]:
x = [2,3,4,5]
y = [5,6,7,8]
np.intersect1d(x,y)

array([5])

In [50]:
np.save('test.npy', a)

In [51]:
np.savez('test_many.npz', arr0=a, arr1=x)

In [52]:
dic = np.load('test_many.npz')

In [53]:
dic['arr0']

array([[1, 3, 4],
       [3, 5, 7],
       [4, 7, 9],
       [2, 3, 4]])

In [54]:
np.savetxt('test3.out', x, delimiter=',')

In [55]:
np.loadtxt('test3.out', delimiter=',')

array([2., 3., 4., 5.])

** Illustrating Numpy's strengths ... **

In [56]:
# Let's generate a random sequence of numbers, could be treated as say, returns

import numpy.random as npr
randlist = []
npr.seed(1000)
for i in range(0, 10000000):
    randlist.append(npr.standard_normal())

In [57]:
randlist[:10]

[-0.8044583035248052,
 0.3209315470898572,
 -0.025482880472072204,
 0.6443238284268146,
 -0.3007966727870205,
 0.3894745542873072,
 -0.10743730169089667,
 -0.4799830753607686,
 0.5950355020765573,
 -0.4646675261953411]

#### Compare the difference in the time taken .... ####

In [58]:
%%time
rand_sum=0
for num in randlist:
    rand_sum = rand_sum + num

print("Mean is {0:.5f}".format(rand_sum/len(randlist)))

Mean is 0.00021
CPU times: user 981 ms, sys: 21.8 ms, total: 1 s
Wall time: 1 s


In [59]:
%%time
np_avg = np.mean(randlist)
print("Mean is %.5f" % np_avg)

Mean is 0.00021
CPU times: user 337 ms, sys: 43 ms, total: 380 ms
Wall time: 376 ms
