### NumPy First Steps

### Programming for Data Science
### Last Updated: Jan 15, 2023
---  

### PREREQUISITES
- import
- functions
- for ... in

### SOURCES 
- **Python for Data Analysis, Chapter 4 (be sure to read this)**
- https://numpy.org/
- https://en.wikipedia.org/wiki/NumPy
- https://www.scipy.org/
- https://en.wikipedia.org/wiki/SciPy
- https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html
- https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html

### OBJECTIVES
- Take your first steps with numpy
 


### CONCEPTS

- The numpy package contains useful functions for math operations
- The ndarray is the workhorse of the package


---

### numpy

In this lesson we will explore the python package *numpy*.  
This package is super powerful. 

Our focus will be on:
- importing and calling some random number generating functions
- illustrating operations on ndarrays
- learning how to subset ndarrays

### Motivation

I need a random number! I heard numpy can do that!  

In [2]:
import numpy as np
np.random.randint(1,7)

1

Learn what this function does:

In [4]:
help(np.random.randint)

Help on built-in function randint:

randint(...) method of numpy.random.mtrand.RandomState instance
    randint(low, high=None, size=None, dtype=int)

    Return random integers from `low` (inclusive) to `high` (exclusive).

    Return random integers from the "discrete uniform" distribution of
    the specified dtype in the "half-open" interval [`low`, `high`). If
    `high` is None (the default), then results are from [0, `low`).

    .. note::
        New code should use the `~numpy.random.Generator.integers`
        method of a `~numpy.random.Generator` instance instead;
        please see the :ref:`random-quick-start`.

    Parameters
    ----------
    low : int or array-like of ints
        Lowest (signed) integers to be drawn from the distribution (unless
        ``high=None``, in which case this parameter is one above the
        *highest* such integer).
    high : int or array-like of ints, optional
        If provided, one above the largest (signed) integer to be drawn
     

Include the size parameter to generate 10 random integers in [1, 6]

In [6]:
np.random.randint(1,7,10)

array([2, 1, 3, 2, 5, 6, 1, 2, 1, 3])

---

### The ndarray in Numpy

Details [here](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)

Represents a multidimensional, homogeneous array of fixed-size items. 

Uses very efficient operations

Arrays should be constructed using `array()`, `zeros()` or `empty()`.

### NumPy ndarray Operations


Create an array with some data

In [8]:
x = np.array(([2,3,1],[3,1,2]))
x

array([[2, 3, 1],
       [3, 1, 2]])

Get the shape of the data (rows, columns)

In [10]:
print(x.shape)

(2, 3)


In [12]:
# generate matrix of random normals 

x = np.random.randn(2, 3)
x

array([[-0.42816227,  0.33832832,  0.35428758],
       [-0.01562752,  0.8703654 , -1.25000794]])

In [14]:
help(np.random.randn)

Help on built-in function randn:

randn(...) method of numpy.random.mtrand.RandomState instance
    randn(d0, d1, ..., dn)

    Return a sample (or samples) from the "standard normal" distribution.

    .. note::
        This is a convenience function for users porting code from Matlab,
        and wraps `standard_normal`. That function takes a
        tuple to specify the size of the output, which is consistent with
        other NumPy functions like `numpy.zeros` and `numpy.ones`.

    .. note::
        New code should use the
        `~numpy.random.Generator.standard_normal`
        method of a `~numpy.random.Generator` instance instead;
        please see the :ref:`random-quick-start`.

    If positive int_like arguments are provided, `randn` generates an array
    of shape ``(d0, d1, ..., dn)``, filled
    with random floats sampled from a univariate "normal" (Gaussian)
    distribution of mean 0 and variance 1. A single float randomly sampled
    from the distribution is returned

Scale the matrix - notice how this operates elementwise

In [16]:
x * 2

array([[-0.85632454,  0.67665664,  0.70857516],
       [-0.03125504,  1.7407308 , -2.50001587]])

Addition

In [18]:
x + x

array([[-0.85632454,  0.67665664,  0.70857516],
       [-0.03125504,  1.7407308 , -2.50001587]])

Reciprocal

In [20]:
1 / x

array([[ -2.33556311,   2.95570883,   2.82256579],
       [-63.98968339,   1.14894273,  -0.79999492]])

#### Creating Matrices with Special Structure - can create arrays of zeros, ones, and identity matrices

In [28]:
x=np.zeros(5)
print(x)

[0. 0. 0. 0. 0.]


In [30]:
x.shape

(5,)

In [15]:
y=np.zeros([5,1])

In [17]:
y

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.]])

In [19]:
y.shape

(5, 1)

In [21]:
z=np.zeros([1,5])
print(z)

[[0. 0. 0. 0. 0.]]


In [23]:
z.shape

(1, 5)

In [34]:
q=np.zeros((2,3))
print(q)

[[0. 0. 0.]
 [0. 0. 0.]]


In [36]:
q.shape

(2, 3)

In [38]:
np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [40]:
np.identity(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

#### Indexing and Slicing

In [42]:
z = np.random.randn(5)
z

array([ 1.09758437,  0.90691315,  0.20816374, -0.54564281,  0.17438717])

In [44]:
# select from start to one before end
z[1:4]

array([ 0.90691315,  0.20816374, -0.54564281])

In [46]:
# does this make sense?
z[1:-1]

array([ 0.90691315,  0.20816374, -0.54564281])

In [48]:
# boolean indexing
z[z>0.15]

array([1.09758437, 0.90691315, 0.20816374, 0.17438717])

In [50]:
# assignment
z[1] = 3
z

array([ 1.09758437,  3.        ,  0.20816374, -0.54564281,  0.17438717])

In [54]:
# 2D
w = np.random.randn(3,3)
w

array([[ 0.21004436,  0.58020116,  0.51902879],
       [-0.11425864,  1.14333858, -0.35127001],
       [-0.78198449,  0.35591086,  1.32692909]])

In [56]:
# subset rows and columns
w[1:, :2]#最后一列是exclusive的

array([[-0.11425864,  1.14333858],
       [-0.78198449,  0.35591086]])

#### TRY FOR YOURSELF
Write code to generate a new array *w2* that starts with *w* and sets all negative values to 0.
Then print *w2* and *w*.  

Be sure to use:  
w2 = w.copy()  
or w will get updated as well

In [59]:
w2 = w.copy()
w2[w2 < 0] = 0
print('w2:\n',w2)
print('w:\n',w)#\n表示回车

w2:
 [[0.         0.13475363 0.        ]
 [0.         0.         0.37176886]
 [0.28199424 1.54262175 0.        ]]
w:
 [[-6.97146740e-02  1.34753626e-01 -4.56655091e-01]
 [-1.45489265e-03 -1.56867574e+00  3.71768857e-01]
 [ 2.81994243e-01  1.54262175e+00 -1.34299131e-02]]


In [55]:
w2 = w.copy()
w2[w2 < 0] = 0
print('w2:\n', w2)
print('')
print('w:\n', w)

w2:
 [[1.65902537 0.04439418 0.71550868]
 [0.         0.49429218 0.        ]
 [1.09127261 0.         1.27228613]]

w:
 [[1.65902537 0.04439418 0.71550868]
 [0.         0.49429218 0.        ]
 [1.09127261 0.         1.27228613]]


In [61]:
w2 = w
w2[w2 < 0] = 0
print('w2:\n', w2)
print('')
print('w:\n', w)#如果不用copy，原始值x也会被改变

w2:
 [[0.         0.13475363 0.        ]
 [0.         0.         0.37176886]
 [0.28199424 1.54262175 0.        ]]

w:
 [[0.         0.13475363 0.        ]
 [0.         0.         0.37176886]
 [0.28199424 1.54262175 0.        ]]


---  

#### Next Steps

This notebook provided some information to get you started.

To learn more, please refer to your textbook and the links in the Sources section at top.