# Introduction to NumPy
NumPy Notes:

In [75]:
import numpy as np
np.__version__

'2.2.4'

## Differences between lists and NumPy arrays
* Lists - add/delete items, mutable size, different data types
* Arrays - immutable size, same data type, contiguous

In [76]:
gpas_as_list = [3.4, 3.2, 3.6]

In [77]:
# Can add elements to it
gpas_as_list.append(4.0)
# Can have multiple data types
gpas_as_list.insert(1, "Whatevs")
# Can have items removed
gpas_as_list.pop(1)

'Whatevs'

In [78]:
gpas_as_list

[3.4, 3.2, 3.6, 4.0]

In [79]:
gpas = np.array(gpas_as_list)

In [80]:
?gpas

[31mType:[39m        ndarray
[31mString form:[39m [3.4 3.2 3.6 4. ]
[31mLength:[39m      4
[31mFile:[39m        c:\users\blaise\appdata\local\programs\python\python313\lib\site-packages\numpy\__init__.py
[31mDocstring:[39m  
ndarray(shape, dtype=float, buffer=None, offset=0,
        strides=None, order=None)

An array object represents a multidimensional, homogeneous array
of fixed-size items.  An associated data-type object describes the
format of each element in the array (its byte-order, how many bytes it
occupies in memory, whether it is an integer, a floating point number,
or something else, etc.)

Arrays should be constructed using `array`, `zeros` or `empty` (refer
to the See Also section below).  The parameters given here refer to
a low-level method (`ndarray(...)`) for instantiating an array.

For more information, refer to the `numpy` module and examine the
methods and attributes of an array.

Parameters
----------
(for the __new__ method; see Notes below)

shape : 

In [81]:
gpas.dtype

dtype('float64')

In [82]:
gpas.itemsize

8

In [83]:
gpas.size

4

In [84]:
len(gpas)

4

In [85]:
gpas.nbytes

32

## Multidimensional arrays
* Keep in mind [floating point precision](https://numpy.org/doc/stable/user/basics.types.html#extended-precision) if adding floats from array
* Data structure is called `ndarray`, representing any **n**umber of **d**imensions
* Arrays can have multiple dimensions, you declare them on creation
* Dimensions help define what each element in the array represents.  A two dimensional array is just an array of arrays
* **Rank** defines how many dimensions an array contains 
* **Shape** defines the length of each of the array's dimensions
* Each dimension is also referred to as an **axis**, and they are zero-indexed. Multiples are called **axes**
* A 2d array is AKA **matrix**

In [86]:
student_gpas = np.array([
    [3.4, 3.2, 3.6, 4.0],
    [3.2, 3.8, 4.0, 4.0],
    [1.2, 2.4, 3.5, 3.8]
], np.float16)
student_gpas

array([[3.4, 3.2, 3.6, 4. ],
       [3.2, 3.8, 4. , 4. ],
       [1.2, 2.4, 3.5, 3.8]], dtype=float16)

In [87]:
student_gpas.ndim

2

In [88]:
student_gpas.shape

(3, 4)

In [89]:
student_gpas.size

12

In [90]:
len(student_gpas)

3

In [91]:
student_gpas.itemsize

2

In [92]:
student_gpas.itemsize * student_gpas.size

24

In [93]:
%whos ndarray

Variable        Type       Data/Info
------------------------------------
fake_log        ndarray    7: 7 elems, type `uint16`, 14 bytes
gpas            ndarray    4: 4 elems, type `float64`, 32 bytes
index           ndarray    2x2: 4 elems, type `int64`, 32 bytes
not_copied      ndarray    7x6: 42 elems, type `int64`, 336 bytes
orders          ndarray    4x4: 16 elems, type `int64`, 128 bytes
practice        ndarray    7x6: 42 elems, type `int64`, 336 bytes
practice_view   ndarray    3x14: 42 elems, type `int64`, 336 bytes
prices          ndarray    4: 4 elems, type `float64`, 32 bytes
student_gpas    ndarray    3x4: 12 elems, type `float16`, 24 bytes
study_minutes   ndarray    3x7: 21 elems, type `uint16`, 42 bytes
totals          ndarray    4: 4 elems, type `float64`, 32 bytes


In [94]:
np.info(student_gpas)

class:  ndarray
shape:  (3, 4)
strides:  (8, 2)
itemsize:  2
aligned:  True
contiguous:  True
fortran:  False
data pointer: 0x1c09a9e33c0
byteorder:  little
byteswap:  False
type: float16


In [95]:
student_gpas[2]

array([1.2, 2.4, 3.5, 3.8], dtype=float16)

In [96]:
student_gpas[2][3]

np.float16(3.8)

## About data types
* [data types](https://docs.scipy.org/doc/numpy-1.14.0/user/basics.types.html) are here
* Data types are maintained by wrapping values in a [scalar representation](https://numpy.org/doc/stable/reference/arrays.scalars.html)
* np.zeros is a good way to create an array of zeros and specify data type

In [97]:
study_minutes = np.zeros(7, np.uint16)
study_minutes

array([0, 0, 0, 0, 0, 0, 0], dtype=uint16)

In [98]:
%whos

Variable             Type           Data/Info
---------------------------------------------
copied               list           n=4
fake_log             ndarray        7: 7 elems, type `uint16`, 14 bytes
first_day_minutes    uint16         Shape: ()
fruit                list           n=4
gpas                 ndarray        4: 4 elems, type `float64`, 32 bytes
gpas_as_list         list           n=4
index                ndarray        2x2: 4 elems, type `int64`, 32 bytes
not_copied           ndarray        7x6: 42 elems, type `int64`, 336 bytes
np                   module         Shape: <function shape at 0x000001C0C2451F80>
orders               ndarray        4x4: 16 elems, type `int64`, 128 bytes
practice             ndarray        7x6: 42 elems, type `int64`, 336 bytes
practice_view        ndarray        3x14: 42 elems, type `int64`, 336 bytes
prices               ndarray        4: 4 elems, type `float64`, 32 bytes
rand                 RandomState    RandomState(MT19937)
second_day_

In [99]:
study_minutes[0] = 150

In [100]:
first_day_minutes = study_minutes[0]

In [101]:
first_day_minutes

np.uint16(150)

In [102]:
type(first_day_minutes)

numpy.uint16

In [103]:
study_minutes[1] = 60

In [104]:
second_day_minutes = study_minutes[1]

In [105]:
second_day_minutes

np.uint16(60)

In [106]:
study_minutes[2:6] = [80, 60, 30, 90]

## Creation 
* `np.random` package.  
  * `RandomState` lets you seed randomness in a repeatable way
* You can append:
   * Use the `np.append` method.  Make sure the new row is the same shape
   * Create/reassign a new array by including the existing array as part of the iterable in creation (enclose in hard brackets)

## Indexing
* Use indexing shortcut by separating dimensions with a comma
* Index using a `list` or `np.array`.  Values will be pulled out at that specific index (Fancy indexing)

In [107]:
study_minutes = np.array([
    study_minutes,
    np.zeros(7, np.uint16)
])

In [108]:
study_minutes.shape

(2, 7)

In [109]:
study_minutes[1][0] = 60

In [110]:
rand = np.random.RandomState(42)
fake_log = rand.randint(30, 180, size=7, dtype=np.uint16)
fake_log

array([132, 122, 128,  44, 136, 129, 101], dtype=uint16)

In [111]:
[fake_log[3], fake_log[4]]

[np.uint16(44), np.uint16(136)]

In [112]:
fake_log[[3, 4]]

array([ 44, 136], dtype=uint16)

In [113]:
index = np.array([
    [3, 4],
    [0, 1]
])
fake_log[index]

array([[ 44, 136],
       [132, 122]], dtype=uint16)

In [114]:
study_minutes = np.append(study_minutes, [fake_log], axis=0)
study_minutes

array([[150,  60,  80,  60,  30,  90,   0],
       [ 60,   0,   0,   0,   0,   0,   0],
       [132, 122, 128,  44, 136, 129, 101]], dtype=uint16)

In [115]:
study_minutes[1, 1] = 360

## Boolean Array Indexing
* Create a boolean array by using comparison operators on an array
  * Use boolean arrays for fancy indexing
  * Boolean arrays can be compared by using bitwise operators (`&`, `|`)
      * Do not use the `and` keyword (will result in error)
      * Order of operations is important when combining comparisons
* Boolean indexing returns a new array, but the existing array can be updated using a boolean index

In [116]:
fake_log[fake_log < 60] 

array([44], dtype=uint16)

In [117]:
study_minutes[study_minutes < 60]

array([30,  0,  0,  0,  0,  0,  0, 44], dtype=uint16)

In [118]:
# Output is True if both values in same index spot in different arrays is the same
np.array([False, True, True]) & np.array([True, False, True])

array([False, False,  True])

In [119]:
# Parenthesis for order of operations
study_minutes[(study_minutes < 60) & (study_minutes > 0)]

array([30, 44], dtype=uint16)

In [120]:
study_minutes[study_minutes < 60] = 0

In [121]:
study_minutes

array([[150,  60,  80,  60,   0,  90,   0],
       [ 60, 360,   0,   0,   0,   0,   0],
       [132, 122, 128,   0, 136, 129, 101]], dtype=uint16)

## Slicing
* Similar to normal list slicing
* Use commas to separate each dimension slice
* Always returns a data view (assuming your slicing an ndarray)
* Base object can be accessed using the `ndarray.base` property

In [122]:
fruit = ['apple', 'banana', 'cherry', 'blueberry']

In [123]:
# Slicing is exclusive ("up to but not including")
fruit[1:3]

['banana', 'cherry']

In [124]:
fruit[:3]

['apple', 'banana', 'cherry']

In [125]:
fruit[3:]

['blueberry']

In [126]:
# Slicing a list returns a copy
copied = fruit[:]

In [127]:
copied[3] = 'cheese'
copied, fruit

(['apple', 'banana', 'cherry', 'cheese'],
 ['apple', 'banana', 'cherry', 'blueberry'])

In [128]:
fruit[::2]

['apple', 'cherry']

In [129]:
fruit[::-1]

['blueberry', 'cherry', 'banana', 'apple']

In [130]:
# Operates similar to range but sets range value within an array
np.arange(20)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [131]:
practice = np.arange(42)
practice.shape = (7, 6)
practice

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40, 41]])

In [132]:
practice[2:5, 3::2]

array([[15, 17],
       [21, 23],
       [27, 29]])

In [133]:
# Any slicing of ndarray return a view, not copy
not_copied = practice[:]
not_copied[0, 0] = 90210
practice, not_copied

(array([[90210,     1,     2,     3,     4,     5],
        [    6,     7,     8,     9,    10,    11],
        [   12,    13,    14,    15,    16,    17],
        [   18,    19,    20,    21,    22,    23],
        [   24,    25,    26,    27,    28,    29],
        [   30,    31,    32,    33,    34,    35],
        [   36,    37,    38,    39,    40,    41]]),
 array([[90210,     1,     2,     3,     4,     5],
        [    6,     7,     8,     9,    10,    11],
        [   12,    13,    14,    15,    16,    17],
        [   18,    19,    20,    21,    22,    23],
        [   24,    25,    26,    27,    28,    29],
        [   30,    31,    32,    33,    34,    35],
        [   36,    37,    38,    39,    40,    41]]))

In [134]:
practice.base is None

True

In [135]:
not_copied.base is None

False

In [136]:
not_copied.base is practice

True

In [137]:
practice.flags['OWNDATA'], not_copied.flags['OWNDATA']

(True, False)

## Array Manipulation
* Documentation on manipulation is [here](https://numpy.org/doc/stable/reference/routines.array-manipulation.html)
* `Unravel` is good for a flattened array view
* `Flattened` is good for a copy of the array that is also flattened
* `Reshape` creates a view with a new shape

In [138]:
practice_view = practice.reshape(3, 14)
practice, practice_view, practice_view.base is practice

(array([[90210,     1,     2,     3,     4,     5],
        [    6,     7,     8,     9,    10,    11],
        [   12,    13,    14,    15,    16,    17],
        [   18,    19,    20,    21,    22,    23],
        [   24,    25,    26,    27,    28,    29],
        [   30,    31,    32,    33,    34,    35],
        [   36,    37,    38,    39,    40,    41]]),
 array([[90210,     1,     2,     3,     4,     5,     6,     7,     8,
             9,    10,    11,    12,    13],
        [   14,    15,    16,    17,    18,    19,    20,    21,    22,
            23,    24,    25,    26,    27],
        [   28,    29,    30,    31,    32,    33,    34,    35,    36,
            37,    38,    39,    40,    41]]),
 True)

In [139]:
practice.reshape(-1, 2).shape

(21, 2)

In [140]:
practice.ravel()

array([90210,     1,     2,     3,     4,     5,     6,     7,     8,
           9,    10,    11,    12,    13,    14,    15,    16,    17,
          18,    19,    20,    21,    22,    23,    24,    25,    26,
          27,    28,    29,    30,    31,    32,    33,    34,    35,
          36,    37,    38,    39,    40,    41])

In [141]:
np.ravel?

[31mSignature:[39m       np.ravel(a, order=[33m'C'[39m)
[31mCall signature:[39m  np.ravel(*args, **kwargs)
[31mType:[39m            _ArrayFunctionDispatcher
[31mString form:[39m     <function ravel at 0x000001C0C2451D00>
[31mFile:[39m            c:\users\blaise\appdata\local\programs\python\python313\lib\site-packages\numpy\_core\fromnumeric.py
[31mDocstring:[39m      
Return a contiguous flattened array.

A 1-D array, containing the elements of the input, is returned.  A copy is
made only if needed.

As of NumPy 1.10, the returned array will have the same type as the input
array. (for example, a masked array will be returned for a masked array
input)

Parameters
----------
a : array_like
    Input array.  The elements in `a` are read in the order specified by
    `order`, and packed as a 1-D array.
order : {'C','F', 'A', 'K'}, optional

    The elements of `a` are read using this index order. 'C' means
    to index the elements in row-major, C-style order,
    with the l

## Linear Algebra
* NumPy module for linear algebra, [linalg](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html)
* Solve for a system of equations using the [solve function](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html#numpy.linalg.solve)
    * It's possible to create a square 2 dimensional matrix and a constant row vector and solve for each variable column
    * Double check the answer using the inner product or [dot](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html#numpy.dot).
* Use the `@` to produce the dot product of two arrays.

In [142]:
# columns: tacos, burritos, horchatas, cokes
orders = np.array([
    [2, 0, 0, 0],
    [4, 1, 2, 2],
    [0, 1, 0, 1],
    [6, 0, 1, 2]
])
totals = np.array([3, 20.50, 10, 14.25])
prices = np.linalg.solve(orders, totals)
prices

array([1.5 , 8.  , 1.25, 2.  ])

In [143]:
# A * B
orders @ prices

array([ 3.  , 20.5 , 10.  , 14.25])

In [144]:
orders.dot(prices)

array([ 3.  , 20.5 , 10.  , 14.25])