# Creating and Reshaping Arrays

The exercises in this notebook will teach you to use a variety of common functions for creating and reshaping arrays.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

## `np.array`

The most common way to create an array is to construct one from a python list.

**Example:** Passing a list of scalars produces a 1-dimensional array.

In [2]:
np.array([1, 2, 3])

array([1, 2, 3])

**Example:** Passing a list of lists produces a 2-dimensional array.

In [3]:
np.array([[1, 2], 
          [2, 3]])

array([[1, 2],
       [2, 3]])

**Exercise:** Create a 1-dimensional array using `np.array` containing `[1, 2, 3, 4]`.

In [4]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

**Exercise:** Create a 2-dimensional array using `np.array` with two rows containing  `[1, 2, 3]` and `[4, 5, 6]`.

In [6]:
np.array([[1, 2, 3], [4, 5, 6]])

array([[1, 2, 3],
       [4, 5, 6]])

## `np.arange`

**Exercise:** Construct an array containing values from 0 to 9, inclusive, in ascending order.

In [10]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

**Exercise:** Construct an array containing all the integers between 1 and 3 inclusive, in ascending order.

In [11]:
np.arange(1, 4)

array([1, 2, 3])

**Exercise:** Construct an array containing all the integers between 5 and 10 inclusive, in ascending order.

In [12]:
np.arange(5, 11)

array([ 5,  6,  7,  8,  9, 10])

**Exercise:** Construct an array containing all the integers between 1 and 10 inclusive, in descending order.

In [14]:
np.arange(10, 0, -1)

array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

**Exercise:** Construct an array containing all the **even** integers between 2 and 10 inclusive, in ascending order.

In [17]:
np.arange(2, 11, 2)

array([ 2,  4,  6,  8, 10])

## `linspace`

**Exercise:** Construct an array containing 50 evenly-spaced values between -1 and 1.

In [18]:
np.linspace(-1, 1, 50)

array([-1.        , -0.95918367, -0.91836735, -0.87755102, -0.83673469,
       -0.79591837, -0.75510204, -0.71428571, -0.67346939, -0.63265306,
       -0.59183673, -0.55102041, -0.51020408, -0.46938776, -0.42857143,
       -0.3877551 , -0.34693878, -0.30612245, -0.26530612, -0.2244898 ,
       -0.18367347, -0.14285714, -0.10204082, -0.06122449, -0.02040816,
        0.02040816,  0.06122449,  0.10204082,  0.14285714,  0.18367347,
        0.2244898 ,  0.26530612,  0.30612245,  0.34693878,  0.3877551 ,
        0.42857143,  0.46938776,  0.51020408,  0.55102041,  0.59183673,
        0.63265306,  0.67346939,  0.71428571,  0.75510204,  0.79591837,
        0.83673469,  0.87755102,  0.91836735,  0.95918367,  1.        ])

## Exercise: `zeros`, `ones`, and `full`

**Exercise:** Construct arrays with the following shapes and values:
- 1-dimensional array with 10 entries, all containing the value 0.
- 2-dimensional array with 3 rows and 5 columns, all containing the value 1.
- 2-dimensional array with 5 rows and 3 columns, all containing the value 2.
- 3-dimensional array of shape (2, 5, 10), all containing the value 0.
- 5-dimensional array of shape (1, 2, 3, 4, 5), all containing the value 42.

In [19]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [21]:
np.ones((3, 5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [22]:
np.full((5, 3), 2)

array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

In [23]:
np.zeros((2, 5, 10))

array([[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]])

In [24]:
np.full((1, 2, 3, 4, 5), 42)

array([[[[[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]],

         [[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]],

         [[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]]],


        [[[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]],

         [[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]],

         [[42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42],
          [42, 42, 42, 42, 42]]]]])

## `identity`

**Exercise:** Construct a 5 x 5 array with 1s along the diagonal and zeros everywhere else.

In [25]:
np.identity(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

## Exercise: `random`

In [26]:
rng = np.random.RandomState(seed=42)

Construct an array containing 10 values drawn uniformly at random from the interval `[-1, 1]`.

In [27]:
rng.uniform(-1, 1, 10)

array([-0.25091976,  0.90142861,  0.46398788,  0.19731697, -0.68796272,
       -0.68801096, -0.88383278,  0.73235229,  0.20223002,  0.41614516])

Construct a 3 x 3 array with values drawn from a normal distribution centered at 0 with a standard deviation of 2.5.

In [28]:
rng.normal(0, 2.5, (3, 3))

array([[-1.17368596,  1.35640011, -1.15854423],
       [-1.16432438,  0.60490568, -4.78320061],
       [-4.31229458, -1.40571882, -2.5320778 ]])

## `pandas.read_csv`

Many people use the `pandas` module to read numerical data from external sources. The `.csv` (comma-separated value) format is often used for small and medium-sized datasets.

In [29]:
import pandas as pd

We can read a CSV into a DataFrame useing `pandas.read_csv`.

In [34]:
prices = pd.read_csv('prices.csv', index_col='dt', parse_dates=['dt'])
prices.head()

Unnamed: 0_level_0,AAPL,MSFT,TSLA,MCD,BK
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-10-02 13:31:00,154.34,74.88,342.33,156.38,52.736
2017-10-02 13:32:00,154.07,74.832,341.48,156.66,52.686
2017-10-02 13:33:00,153.72,74.835,341.83,156.324,52.756
2017-10-02 13:34:00,153.69,74.89,341.24,156.66,52.726
2017-10-02 13:35:00,153.45,74.81,341.873,156.67,52.706


DataFrames are composed of three parts:

- `index`, an array of row-labels
- `columns`, an array of column-labels
- `values`, an array of table values.

We can get a numpy array for each of these attributes by using the `.values` attribute:

In [35]:
prices.index.values

array(['2017-10-02T13:31:00.000000000', '2017-10-02T13:32:00.000000000',
       '2017-10-02T13:33:00.000000000', ...,
       '2017-10-31T19:58:00.000000000', '2017-10-31T19:59:00.000000000',
       '2017-10-31T20:00:00.000000000'], dtype='datetime64[ns]')

In [36]:
prices.columns.values

array(['AAPL', 'MSFT', 'TSLA', 'MCD', 'BK'], dtype=object)

In [37]:
prices.values

array([[154.34 ,  74.88 , 342.33 , 156.38 ,  52.736],
       [154.07 ,  74.832, 341.48 , 156.66 ,  52.686],
       [153.72 ,  74.835, 341.83 , 156.324,  52.756],
       ...,
       [169.1  ,  83.18 , 331.44 , 166.82 ,  51.415],
       [169.13 ,  83.17 , 331.8  , 166.87 ,  51.42 ],
       [169.05 ,  83.18 , 331.52 , 166.9  ,  51.45 ]])

**Exercise:** Use `pd.read_csv` to load the file "volumes.csv".

In [40]:
volumes = pd.read_csv('volumes.csv', index_col='dt', parse_dates=['dt'])
volumes.head()

Unnamed: 0_level_0,AAPL,MSFT,TSLA,MCD,BK
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-10-02 13:31:00,420042.0,409211.0,49907.0,85774.0,30276.0
2017-10-02 13:32:00,161960.0,49207.0,18480.0,6866.0,4511.0
2017-10-02 13:33:00,118283.0,24043.0,47039.0,3000.0,3001.0
2017-10-02 13:34:00,103544.0,62383.0,13444.0,4364.0,900.0
2017-10-02 13:35:00,88012.0,40175.0,36556.0,820.0,1500.0


**Exercise:** Get a numpy array of datetimes representing the row-labels of the DataFrame.

In [41]:
volumes.index.values

array(['2017-10-02T13:31:00.000000000', '2017-10-02T13:32:00.000000000',
       '2017-10-02T13:33:00.000000000', ...,
       '2017-10-31T19:58:00.000000000', '2017-10-31T19:59:00.000000000',
       '2017-10-31T20:00:00.000000000'], dtype='datetime64[ns]')

**Exercise:** Get a numpy array of strings representing the column-labels of the DataFrame.

In [43]:
volumes.columns.values

array(['AAPL', 'MSFT', 'TSLA', 'MCD', 'BK'], dtype=object)

**Exercise:** Get a numpy array of floats representing the table values of the DataFrame.

In [44]:
volumes.values

array([[420042., 409211.,  49907.,  85774.,  30276.],
       [161960.,  49207.,  18480.,   6866.,   4511.],
       [118283.,  24043.,  47039.,   3000.,   3001.],
       ...,
       [308468., 191973.,  24702.,   8959.,  32288.],
       [343843., 198143.,  35814.,  13696.,  56411.],
       [661452., 610933.,  59772.,  58947., 177300.]])

## Reshaping Arrays

Once we've created or loaded an array, a common next step is to reshape the array.

The most general way to reshape an array is to use the `.reshape` method of `ndarray`. `.reshape` accepts a tuple of new dimensions and 

In [45]:
data = np.arange(12)

**Exercise:** Reshape `data` into an array with three rows and four columns.

In [46]:
data.reshape(3, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

**Exercise:** Reshape `data` into an array with four rows and three columns:

In [47]:
data.reshape(4, 3)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

**Exercise:** Reshape `data` into an array of shape `(2, 2, 3)`.

In [48]:
data.reshape(2, 2, 3)

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

## Transpose

A common pattern, especially when doing linear algebra with 2D arrays, is to need to "rotate" an array by 90 degrees. This operation is commonly known as "transposing" the array.

**Exercise:** Use the `.transpose()` method to convert data from a `2 x 4` array into a `4 x 2` array.

In [49]:
data = np.arange(8).reshape(2, 4)
data

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [50]:
data.transpose()

array([[0, 4],
       [1, 5],
       [2, 6],
       [3, 7]])

Transposing arrays is so common in linear algebra that numpy provides a shorthand for it. The `.T` property provides a transposed view of an array.

**Exercise:** Transpose `data` using the `.T` property.

In [51]:
data.T

array([[0, 4],
       [1, 5],
       [2, 6],
       [3, 7]])

## Measuring Performance of Numpy vs. Pure Python

We've seen that numpy allows us to run simple numerical computations much faster than pure Python. To show that, we used a few different functions and tools:

- We used a `dot_product` method implemented in pure Python:

```python
def python_dot_product(xs, ys):
    return sum(x * y for x, y in zip(xs, ys))
```
- We used two numpy implementations of `dot_product`:

```python
def manual_numpy_dot(xs, ys):
    return (xs * ys).sum()

def native_numpy_dot(xs, ys):
    return xs.dot(ys)
```

- We used IPython's `%%timeit` magic as a simple way to measure how long a cell takes to run on average.

Unfortunately, nothing in programming comes for free. Numpy allows us to speed up computations on large arrays by performing one complex dispatch **per array** instead of a cheap dispatch **per array element**. **This only gives us a speedup if we have many array elements.**

### Exercise:
Using the ``%%timeit`` builtin, figure out how many data points you need to have for a numpy dot product to be faster than a pure-python implementation.

You can use the `make_list` function below to create Python lists of a given size. Use any of the functions from the exercises above to make numpy lists. Be sure not to include the list/array creation in your timings (that probably means you want to use separate cells for constructing arrays and testing timings).

In [53]:
def make_list(size):
    return list(range(size))

def python_dot_product(xs, ys):
    return sum(x * y for x, y in zip(xs, ys))

def manual_numpy_dot(xs, ys):
    return (xs * ys).sum()

def native_numpy_dot(xs, ys):
    return xs.dot(ys)

On my machine, the cutoff seems to be around 5 elements.

In [78]:
l = make_list(5)
n = np.arange(5)

In [79]:
%%timeit
python_dot_product(l, l)

826 ns ± 5.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [80]:
%%timeit
native_numpy_dot(n, n)

694 ns ± 2.06 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
