<a href="https://colab.research.google.com/github/k-messick/ds1002-vmv6mp/blob/main/notebooks/07-numpy-continued.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

```
Course:   DS1002
Module:   Module 5
Topic:    NumPy Continued
```

### PREREQUISITES
- import / import as
- variables
- creating basic arrays

### SOURCES
- https://numpy.org/
- https://en.wikipedia.org/wiki/NumPy
- https://www.scipy.org/
- https://en.wikipedia.org/wiki/SciPy
- https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html
- https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html

### OBJECTIVES
- Introduction to Numpy

### CONCEPTS
- The numpy package contains useful functions for math operations
- The ndarray is the workhorse of the package

In [1]:
import numpy as np

# Data Types

One way to control the data type of a NumPy array is to declare it when the array is created using the `dtype` keyword argument. Take a look at the data type NumPy uses by default when creating an array with `np.zeros()`. Could it be updated?

* Using `np.zeros()`, create an array of zeros that has three rows and two columns; call it `zero_array`.  
* Print the data type of `zero_array`.


In [3]:
# Create an array of zeros with three rows and two columns
zero_array = np.zeros((3, 2))

# Print the data type of zero_array
#64 means the capacity
print(zero_array.dtype)
print(zero_array)

float64
[[0. 0.]
 [0. 0.]
 [0. 0.]]


* Create a new array of zeros called `zero_int_array`, which will also have three rows and two columns, but the data type should be `np.int32`.  

* Print the data type of `zero_int_array`.

In [5]:
# Create a new array of int32 zeros with three rows and two columns
zero_int_array = np.zeros((3, 2), dtype=np.int32)

# Print the data type of zero_int_array
print(zero_int_array.dtype)
print(zero_array)

int32
[[0. 0.]
 [0. 0.]
 [0. 0.]]


In [9]:
data1 = [6, 7.5, 8, 0, 1]        # create a list (comes from a python list & we can convert it to an ndarray)
arr1 = np.array(data1)           # turn list into a numpy array
arr1
#ndarrays also have to be homogenous in terms of data type

array([6. , 7.5, 8. , 0. , 1. ])

In [11]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]       # create a 2-dimensional list (it's just a list of lists)
arr2 = np.array(data2)                     # turn that list into a numpy array
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [12]:
arr2.ndim       # get the dimension

2

In [13]:
arr2.shape      # get the shape (gives the dimension and how much width to each dimension)

(2, 4)

In [14]:
arr1.dtype      # get the data type for arr1

dtype('float64')

In [15]:
arr2.dtype      # get the data type for arr2

dtype('int64')

In [18]:
arr1 = np.array([1, 2, 3], dtype=np.float64) #this is specifying something that should be a float
arr1.dtype

str(arr1) #to make it a string (b/c it has quote)

'[1. 2. 3.]'

In [17]:
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr2.dtype

dtype('int32')

In [19]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype

dtype('int64')

In [20]:
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

In [21]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr

array([ 3.7, -1.2, -2.6,  0.5, 12.9, 10.1])

In [22]:
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10], dtype=int32)

In [23]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

# Basic Array Manipulations + Calculations

NumPy has over 500 basic operations, most of which can be performed upon array data. Here are some common/obvious examples:

In [24]:
# Start with a basic two-dimensional array and manipulate in basic ways:
foo = np.array([[1,2,3,4,5],[6,7,8,9,10]])

# flip - reverse the data of an array
oof = np.flip(foo)
print(oof)

# copy - copy an array to an entirely separate array
goo = np.copy(foo)
print(goo)

# concatenate - combine all elements within an array into a single list
new_foo = np.concatenate(foo)
print(new_foo)

# min
foomin = np.min(foo)
print(foomin)

# max
foomax = np.max(foo)
print(foomax)

# mean
foomean = np.mean(foo)
print(foomean)

# sin/cos/tan
foosin = np.sin(foo)
print(foosin)

# standard deviation
foostd = np.std(foo)
print(foostd)

[[10  9  8  7  6]
 [ 5  4  3  2  1]]
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
[ 1  2  3  4  5  6  7  8  9 10]
1
10
5.5
[[ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]
 [-0.2794155   0.6569866   0.98935825  0.41211849 -0.54402111]]
2.8722813232690143


# Inserting + Dropping Array Values

There are times it's useful to drop a specific index or the start/end of an array of values.

In [25]:
# Drop the first item
# Don't use 0 b/c of indices

myarr = np.array([10,15,20,25,30,35,40,45,50])

drop_start = myarr[1:]
print(drop_start)

[15 20 25 30 35 40 45 50]


In [26]:
# Drop the last item

drop_end = myarr[:-1]
print(drop_end)

[10 15 20 25 30 35 40 45]


In [27]:
# Drop a specific index

drop_second_index = np.delete(myarr, 2)
print(drop_second_index)

[10 15 25 30 35 40 45 50]


In [28]:
# Drop every other item in the array
# Removes every other item starting with 0

every_other = np.delete(myarr, np.arange(0, len(myarr), 2))
print(every_other)

# Another version that removes every other starting with 1
every_other = np.delete(myarr, np.arange(0, len(myarr), 2))
print(every_other)

[15 25 35 45]
[15 25 35 45]


# Slicing

**Higher Dimensional Arrays**

In [29]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [30]:
arr2d[2]

array([7, 8, 9])

In [31]:
arr2d[0][2]
# means going to 0 array and getting value at position 2

3



**Slicing: Simplified notation**

In [32]:
arr2d[0, 2]
# means going to 0 array and getting value at position 2

3

A nice visual of a 2D array

<img src="https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781449323592/files/httpatomoreillycomsourceoreillyimages2172112.png" height="50%" width="50%"/>

**Two-Dimensional Array Slicing**

<img src="https://learning.oreilly.com/api/v2/epubs/urn:orm:book:9781449323592/files/httpatomoreillycomsourceoreillyimages2172114.png" height="50%" width="50%"/>

**3D arrays**

In [33]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [None]:
arr3d.shape

In [None]:
arr3d

If you find NumPy's way of showing the data a bit difficult to parse visually.

💡 **Here is a way to visualize 3 and higher dimensional data:**

```python
[ # AXIS 0                     CONTAINS 2 ELEMENTS (arrays)
    [ # AXIS 1                 CONTAINS 2 ELEMENTS (arrays)
        [1, 2, 3], # AXIS 3    CONTAINS 3 ELEMENTS (integers)
        [4, 5, 6]  # AXIS 3
    ],  
    [ # AXIS 1
        [7, 8, 9],
        [10, 11, 12]
    ]
]
```
Each axis is a level in the nested hierarchy, i.e. a tree or DAG (directed-acyclic graph).

* Each axis is a container.
* There is only one top container.
* Only the bottom containers have data.

**Omit lower indices**

In multidimensional arrays, if you omit later indices, the returned object will be a **lower-dimensional ndarray** consisting of all the data contained by the higher indexed dimension.

So in the 2 × 2 × 3 array `arr3d`:

In [None]:
arr3d[0]

Saving data before modifying an array.

In [None]:
old_values = arr3d[0].copy()
arr3d[0] = 42
arr3d

Putting the data back.

In [None]:
arr3d[0] = old_values
arr3d

Similarly, `arr3d[1, 0]` gives you all of the values whose indices start with (1, 0), forming a 1-dimensional array:

In [None]:
arr3d[1, 0]

In [None]:
x = arr3d[1]
x

In [None]:
x[0]