In [1]:
import numpy as np
import pandas as pd

# 4 NumPy Basics: Arrays and Vectorized Computation

NumPY short for numerical python, good for numerical computing in python. ndarray, an efficient multidimentsional array good for flexible broadcasting capabilities. 

In [2]:
my_arr = np.arange(1000000)

In [3]:
my_list = list(range(1000000))

In [5]:
%time for _ in range(10): my_arr2 = my_arr * 2

CPU times: user 17.3 ms, sys: 14.8 ms, total: 32.1 ms
Wall time: 43.2 ms


In [6]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

CPU times: user 249 ms, sys: 75.2 ms, total: 324 ms
Wall time: 344 ms


NumPy based algorithms are generally 10 to 100 times faster than their pur python counterparts and use significantly less memory.

## 4.1 The NumPy ndarray: A Multidimensional Array Object

one of the key featuers of numpy is its n-dimensional array object (ndarray), which is a fast, flexible container for large datasets in python.

In [7]:
data = np.random.randn(2, 3) #generate some random data

In [8]:
data 

array([[-0.82570814, -0.68080259, -0.46071212],
       [-0.90995994, -0.12167156, -0.05096927]])

In [9]:
data * 10

array([[-8.2570814 , -6.8080259 , -4.60712123],
       [-9.0995994 , -1.21671559, -0.50969274]])

In [10]:
data + data

array([[-1.65141628, -1.36160518, -0.92142425],
       [-1.81991988, -0.24334312, -0.10193855]])

an ndarray is a generic multidimensional container for homogeneous data(all elements must be the same type)  
every array has shape and dtype  
.shape produces tuple indicating the size of each dimension  
.dtype produces the data type of the array

In [11]:
data.shape

(2, 3)

In [12]:
data.dtype

dtype('float64')

### Creating ndarrays

create array use the array function. accepts any sequence like object (including other arrays) and produces a new numpy array

In [13]:
data1 = [6, 7.5, 8, 0, 1]

In [14]:
arr1 = np.array(data1)

In [15]:
arr1

array([6. , 7.5, 8. , 0. , 1. ])

nested sequences will be converted into a multidimensonal array:

In [16]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]

In [17]:
arr2 = np.array(data2)

In [18]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

data2 was list of lists, numpy arr2 has 2 dimensions, with shape inferred from the data. can confirm this by inspecting the ndim(dimension) and shape attributes

In [19]:
arr2.ndim

2

In [20]:
arr2.shape

(2, 4)

np.array tries to infer a good data type for the array it creates. 

In [21]:
arr1.dtype

dtype('float64')

In [22]:
arr2.dtype

dtype('int64')

other functions for creating new data types 
  
    -zeros and ones create arrays of 0s or 1s with a given length or shape  
    -empty creates an array without initializing its values to any particular value. 
    to create a. higher dimensional array with these methods, pass a tuple for the shape

In [23]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [24]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [25]:
np.empty((2, 3, 2))

array([[[0.00000000e+000, 1.00937611e-320],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 1.16095484e-028]],

       [[6.59798177e+246, 2.59027907e-144],
        [4.82412328e+228, 1.04718130e-142],
        [3.99914461e+252, 1.46030983e-319]]])

**np.empty cna return an array of all 0s or it might return uninitialzed "garbage" values**

arange is an array-valued version of the built-in python range functions

In [26]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### Data Types for ndarrays

data type or dtype is a special object containing the info (metadata).

In [29]:
arr1 = np.array([1, 2, 3], dtype=np.float64)

In [31]:
arr2 = np.array([1, 2, 3,], dtype=np.int32)

In [32]:
arr1.dtype

dtype('float64')

In [33]:
arr2.dtype

dtype('int32')

you can convert or cast an array from one dtype to another using ndarray's **astype** method

In [34]:
arr = np.array([1, 2, 3, 4, 5])

In [36]:
arr.dtype

dtype('int64')

In [37]:
float_arr = arr.astype(np.float64)

In [38]:
float_arr.dtype

dtype('float64')

if cast floating-point to integer, the decimal part will be truncated:

In [39]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])

In [40]:
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [41]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10], dtype=int32)

converting strings(representing numbers) to numeric form using **astype**

In [42]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)

In [43]:
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

you can also use another array's dtype attribute:

In [44]:
int_array = np.arange(10)

In [45]:
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)

In [46]:
int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [47]:
empty_uint32 = np.empty(8, dtype='u4')

In [48]:
empty_uint32

array([         0, 1075314688,          0, 1075707904,          0,
       1075838976,          0, 1072693248], dtype=uint32)

**calling astype always creates a new array (a copy of the data), even if the new dtype is the same as the old dtype**

### Arithmetic with NumPy Arrays

arrays enable you to express batch operations on data without writing any for loops. this is called _vectorization_  any arithmetic operations between equal-size arrays applies the operation element-wise:

In [49]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])

In [50]:
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [51]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [52]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [53]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [54]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

comparisons between arrays of the same size yield boolean arrays:

In [None]:
arr2 = np.array()