# Numpy Array Basics
http://numpy.org

NumPy is the fundamental base for scientific computing in python. It contains:

- N-dimensional array objects
- vectorization of functions
- Tools for integrating C/C++ and fortran code
- linear algebra, fourier transformation, and random number tools.

Now, NumPy is the basis for a lot of other packages like scikit-learn, scipy, pandas, among other packages but provides a lot of power in and of itself and they keep numpy pretty abstract however it provides a strong foundation for us to learn some of the operations and concepts we’ll be applying later on.

In [2]:
#import packages
from __future__ import division
import numpy as np
import pandas as pd
from numpy.random import randn
import numpy as np
np.set_printoptions(precision=4, suppress=True)

### Creating arrays
There are several ways to do this depending on what you're trying to do.  
Here's a few examples that demonstrate how to:
* create an array based on a range of values
* create an array with randomly generated numbers
* create an array by converting an existing list

In [3]:
#create an array of 10 values
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Note Python starts counting at zero.  This may seem trivial, but it can be confusing at times.

We created an array of values, but we didn't store it anywhere.  So, we need to put it in an object to be able to reference it.

In [4]:
#create an object that is an array containing 10 values
array1 = np.arange(10)
array1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [5]:
#create an array with randomly generated numbers
array2 = randn(2,3)
array2

array([[-1.1348,  0.4209,  0.0816],
       [-1.3106,  0.5281,  0.2784]])

In [6]:
#create an array based on an existing list
list1 = [6,8,20,16]
print(list1)
print(type(list1))

print('~~~~~~~~~')

array3 = np.array(list1)
print(array3)
print(type(array3))

[6, 8, 20, 16]
<class 'list'>
~~~~~~~~~
[ 6  8 20 16]
<class 'numpy.ndarray'>


Why do we need to worry about lists vs. arrays?  As you get into more analysis, you'll find some functions prefer different object types.

### Data types

In [7]:
#create two arrays containing the same values but different data types
arr4 = np.array([1, 2, 3], dtype=np.float64)
arr5 = np.array([1, 2, 3], dtype=np.int32)
print(arr4.dtype)
print(arr5.dtype)

float64
int32


In [8]:
arr6 = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr6)
print(arr6.dtype)

print('~~~~~~~')

print(arr6.astype(np.int32))


[ 3.7 -1.2 -2.6  0.5 12.9 10.1]
float64
~~~~~~~
[ 3 -1 -2  0 12 10]


Note the rounding behavior...

".astype()" can be applied to the entire array as in this example or you can apply it to specific columns of data.  We've already looked at this example, but let's revisit it.  You'll likely use this a lot.

In [None]:
#import data
video = pd.read_csv("Video_Store.csv")

In [None]:
#print a list of the data types
video.dtypes

In [None]:
#change a data type
video['Cust ID'] = video['Cust ID'].astype('object')

In [None]:
#check to make sure it worked appropriately
video.dtypes

In [None]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings

numeric string simply means you have a number stored as text.
dtype "S4" just means the array holds strings of length 4.

### Using basic operators with your array

In [None]:
#remember array1?
array1

In [None]:
#multiply all values by 10
array1 * 10

In [None]:
#double values by adding the array to itself
array1 + array1

Think these operators will work on our numeric strings?  Why or why not?

In [None]:
#numeric_strings * 2

What's the problem?
We'll see in a little bit that it's very easy to use basic operators on arrays.  However, since this particular one contains numbers stored as text, it won't work.

In [None]:
numeric_strings = numeric_strings.astype(float)
numeric_strings * 2

Just one of many reasons it's important to check your data types.  Error messages don't always make it obvious the issue is the data type.

In [None]:
np.sqrt(array1)

In [None]:
np.exp(array1)

### Indexing and slicing

We're going to look at a few examples of indexing.  More information can be found here:
https://docs.scipy.org/doc/numpy-1.10.0/user/basics.indexing.html


In [None]:
copy1 = array1.copy() #create a copy because we're going to change some values
copy2 = array1.copy() #create a copy because we're going to change some values

Note we would have run into issues if we just used copy = array1
It stems from mutable vs. immutable objects.  We're not going to go into a ton of detail about this, but you can read about it here: https://opensource.com/article/17/6/3-things-i-did-wrong-learning-python

The same zero-based counting method applies when selecting specific records.  

In [None]:
print(copy1)
print(copy1[0]) #select the first record in the array

print(copy1[5]) #select the sixth record in the array

In [None]:
print(copy1[5:8]) #print records 6-8
copy1[5:8] = 12 #change the value of records 6-8
print(copy1) 

In [None]:
print(copy2)
arr_slice = copy2[5:8]
arr_slice[1] = 12345
print(copy2)
arr_slice[:] = 64
print(copy2)

More examples of selecting records

In [None]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr2d)
print(arr2d[2])

In [None]:
arr2d[0][2]
arr2d[0, 2]

In [None]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

In [None]:
arr3d[0]

#### Indexing with slices

In [None]:
array1[1:6]

In [None]:
arr2d
arr2d[:2]

In [None]:
arr2d[:2, 1:]

In [None]:
arr2d[1, :2]
arr2d[2, :1]

In [None]:
arr2d[:, :1]

#### Boolean indexing

In [None]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = randn(7, 4)
print(names)
print(data)

In [None]:
names == 'Bob'

In [None]:
data[names == 'Bob']

In [None]:
data[names == 'Bob', 2:]
data[names == 'Bob', 3]

In [None]:
names != 'Bob'
data[~(names == 'Bob')]

In [None]:
mask = (names == 'Bob') | (names == 'Will')
mask
data[mask]

In [None]:
data[names != 'Joe'] = 0
data

### Transposing arrays and swapping axis

In [None]:
arr = np.arange(15).reshape((3, 5))
arr

In [None]:
arr.T

In [None]:
arr = np.random.randn(6, 3)
np.dot(arr.T, arr)

In [None]:
arr = np.arange(16).reshape((2, 2, 4))
arr
arr.transpose((1, 0, 2))

In [None]:
arr
arr.swapaxes(1, 2)

### Finding basic summary statistics

In [None]:
array1

In [None]:
array1.mean()

In [None]:
array1.sum()

In [None]:
array1.max()

In [None]:
array1.min()

### Sorting

In [None]:
arr = randn(8)
print(arr)

print('~~~~~~~')
arr.sort()
print(arr)

In [None]:
arr = randn(5, 3)
print(arr)

print('~~~~~~~')

arr.sort()
print(arr)

Note each row is sorted independently