# new notes on (re-)learning NumPy

(this time with Treehouse courses)

# Unit 1: Meet NumPy

## Getting Setup

In [1]:
import numpy as np
np.__version__

'1.14.0'

## Introducing Arrays

In [2]:
gpa_list = [4.0, 3.2, 3.5, 3.9]

In [3]:
gpa_array = np.array(gpa_list)

NumPy Array help menu

In [4]:
?gpa_array

### Key Array Attributes

In [5]:
gpa_array.dtype

dtype('float64')

In [6]:
gpa_array.itemsize

8

In [7]:
gpa_array.size

4

In [8]:
len(gpa_array)

4

In [9]:
gpa_array.nbytes

32

What is `gpa_array.nbytes`?

    `gpa_array.nbytes` is simply a product:
    
    `len(gpa_array) * gpa_array.itemsize`

## conceptual notes

two key characteristics of a NumPy array:

- the array data is **restricted**; the datatype of the elements must be uniform

- the array length is immutable:

    - length must be pre-defined 

    - elements can neither be removed nor inserted
    
    - elements can be changed

REF: [Datatypes (SciPy)](https://docs.scipy.org/doc/numpy/user/basics.types.html)

### comparison of NumPy Arrays to Python Lists

There is a trade-off that exists, between seamless efforts of constructing and manipulating iterables, vs computation speed 

- A List has lots of Python behind-the-scene operations, that we've been *taking for granted*

    - storing one List's elements into a memory address
    
    - immutable properties: insert, append, pop; varying size of elements; indexing/iterating through

    - result of these behind-the-scene operations: lots of overhead 

- NumPy Array

    - Each element is stored contiguously, with no space between them

        - makes retrieving each element a very easy math equation

* All of an array's elements must be of the same [data type](https://docs.scipy.org/doc/numpy-1.14.0/user/basics.types.html).

- **slicing**: slicing a List vs slicing an Array:

    - `copied = List[:]`  # successful copy
    
    - `not_copied = nparray[:]`  # NOT a copy -- this is a view of the same memory address as the nparray
    
    

### About data types

ages = np.array([29, 42, 6, 3], np.uint8)

(np.uint16 also works)

* By choosing the proper [data type](https://docs.scipy.org/doc/numpy-1.14.0/user/basics.types.html) you can greatly reduce the size required to store objects
* Data types are maintained by wrapping values in a [scalar representation](https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.scalars.html)
* `np.zeros` is a handy way to create an empty array filled with zeros.

Python’s floating-point numbers are usually 64-bit floating-point numbers, nearly equivalent to np.float64


## *Exercise: Creating the Study Log*

In [15]:
study_minutes = np.zeros(100, np.uint16)

# uint16 is this datatype: unsigned integer of 16 bits (0 to 2^16-1, or 65535)

# additional info: %whos

%whos

Variable        Type       Data/Info
------------------------------------
gpa_array       ndarray    4: 4 elems, type `float64`, 32 bytes
gpa_list        list       n=4
np              module     <module 'numpy' from '/ho<...>kages/numpy/__init__.py'>
study_minutes   ndarray    100: 100 elems, type `uint16`, 200 bytes


In [14]:
study_minutes[0] = 150

study_minutes[1] = 60

study_minutes[2:6] = [80, 60, 30, 90]

# Unit 2: Array Organization

[SciPy array manipulation](https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.array-manipulation.html?highlight=array%20manipulation)

a useful alternative to the help menu:

In [3]:
import numpy as np

np.lookfor("flat")

Search results for 'flat'
-------------------------
numpy.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.flatiter
    Flat iterator object to iterate over arrays.
numpy.put
    Replaces specified elements of an array with given values.
numpy.flatnonzero
    Return indices that are non-zero in the flattened version of a.
numpy.ravel
    Return a contiguous flattened array.
numpy.ma.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.unravel_index
    Converts a flat index or array of flat indices into a tuple
numpy.matrix.flatten
    Return a flattened copy of the matrix.
numpy.ma.flatten_mask
    Returns a completely flattened version of the mask, where nested fields
numpy.chararray.flatten
    Return a copy of the array collapsed into one dimension.
numpy.chararray.put
    Set ``a.flat[n] = values[n]`` for all `n` in indices.
numpy.ravel_multi_index
    Converts a tuple of index arrays into an array of fl

In [4]:
np.ravel?

## Indexing

You can use an indexing shortcut by separating dimensions with a comma.  
• Example: given: 


In [6]:
quarterly_revenue_by_year = np.array([
    [22.72, 29.13, 25.36, 35.75],
    [29.13, 30.4, 32.71, 43.74],
    [35.71, 37.96, 43.74, 60.5]
])

# equivalency
quarterly_revenue_by_year[1, 3] 
 
quarterly_revenue_by_year[1][3]


43.74

You can index using a `list` or `np.array`.  Values will be pulled out at that specific index.  This is known as fancy indexing.
  * Resulting array shape matches the index array layout.  Be careful to distinguish between the tuple shortcut and fancy indexing.

### REF: For more info:

[indexing with boolean or mask index arrays](https://docs.scipy.org/doc/numpy-1.14.0/user/basics.indexing.html#boolean-or-mask-index-arrays)


In [7]:
# Q: How do I retrieve [29.13, 35.75] from this?
quarterly_revenue = np.array([22.72, 29.13, 25.36, 35.75])

# ANS:
quarterly_revenue[[1, 3]]


array([29.13, 35.75])

[Advanced Indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing)


A different solution

There are actually a couple of ways to add a new axis to your array other than wrapping it with brackets like this: `[some_log]`

Another popular solution is to use the `np.newaxis` property. So the code would look something like this:

* Slice all the rows and add a new axis

In [10]:
some_log[:, np.newaxis]

quarterly_revenue = np.array([22.72, 29.13, 25.36, 35.75])

quarterly_revenue[[1, 3]]

array([29.13, 35.75])

### Creation

* You can create a random but bound grouping of values using the `np.random` package.  
  * `RandomState` let's you seed your randomness in a way that is repeatable.
* You can append a row in a couple of ways
   * You can use the `np.append` method.  Make sure the new row is the same shape.
   * You can create/reassign a new array by including the existing array as part of the iterable in creation.


In [8]:
rand = np.random.RandomState(42)

some_log = rand.randint(30, 60, size=6, dtype=np.uint16)


The shape of the array can be defined by the shape of the index:

In [11]:
index = np.array([
    [3,4],
    [7,8]
])
vector_1D = np.array(["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"])
vector_1D[index]

array([['D', 'E'],
       ['H', 'I']], dtype='<U1')

## Boolean Array Indexing

You can create a boolean array by using comparison operators on an array.

- You can use boolean arrays for fancy indexing.

- Boolean arrays can be compared by using bitwise operators (`&`, `|`)

- - Do not use the `and` keyword.

- - Remember to mind the order of operations when combining, using parentheses.

- Even though boolean indexing returns a new array, you can update an existing array using a boolean index.


In [33]:
some_log[some_log < 30]

fruit = ['apple', 'banana', 'cherry', 'durian']

import numpy as np

np.arange(20)

practice = np.arange(42)

practice.shape = 6, 7

practice[2:5, 3::2]


array([[17, 19],
       [24, 26],
       [31, 33]])

Another example

In [22]:
some_log

array([36, 31, 49, 40, 58, 32], dtype=uint16)

In [21]:
some_log[some_log < 45]

array([36, 31, 40, 32], dtype=uint16)

## Array Data Views

Q: Why do Data Views exist, and how are they efficient?

ANS: Data views do not take up additional memory from the data array that they view!


In [23]:
not_copied = practice[:]

not_copied[0, 0] = 9999


### original object vs Data View


find out whether an array is an original object or a Data View


In [25]:
practice.base is None  # True

True

In [29]:
not_copied.base is None  # False

False

In [30]:
not_copied.base

array([[9999,    1,    2,    3,    4,    5,    6],
       [   7,    8,    9,   10,   11,   12,   13],
       [  14,   15,   16,   17,   18,   19,   20],
       [  21,   22,   23,   24,   25,   26,   27],
       [  28,   29,   30,   31,   32,   33,   34],
       [  35,   36,   37,   38,   39,   40,   41]])

In [31]:
practice.flags  # helpful

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [32]:
practice.flags['OWNDATA']  # True

not_copied.flags['OWNDATA']  # False

False

## Array Manipulation

### Concept Notes

[SciPy array manipulation](https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.array-manipulation.html?highlight=array%20manipulation)

There's an extraordinary difference between: 

A) changing the shape of the array (e.g. `array.shape = 3,4`), vs...

B) changing the shape of a data view of that array (e.g. `array_view = array.reshape(2,6)`


In [35]:
### Code Notes

revenue = np.array([
    [22.72, 29.13, 25.36, 35.75],
    [29.13, 30.4, 32.71, 43.74],
    [35.71, 37.96, 43.74, 60.5]
])

revenue_v2 = revenue.reshape(2,6)

revenue_v2.base is revenue

True

In [37]:
revenue.base is None

True

In [None]:
### np.shape vs np.reshape

array1 = np.arange(0,12)

In [None]:
# A) changing the shape of the array 

array1.shape = 6,2

In [None]:
# B) changing the shape of a DATA VIEW of that array (w/o affecting array1)

array1_view = array1.reshape(3,4)

In [55]:
# You DON'T have to do the math!
# type -1

MONTHS_IN_YEAR = 12

months = np.arange(1,1+MONTHS_IN_YEAR)
months

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [53]:
months.shape = -1, 4
months

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [54]:
months.shape = 4, -1
months

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [56]:
# Unravel, with array.ravel()

months.ravel()  # This is a view, not a copy
months

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [58]:
x = np.array([[1,2],[3,4]])
x

array([[1, 2],
       [3, 4]])

In [61]:
x.T  # transpose

array([[1, 3],
       [2, 4]])