# Advanced NumPy

In [None]:
import numpy as np
import pandas as pd
np.random.seed(12345)
import matplotlib.pyplot as plt
plt.rc('figure', figsize=(10, 6))
PREVIOUS_MAX_ROWS = pd.options.display.max_rows
pd.options.display.max_rows = 20
np.set_printoptions(precision=4, suppress=True)

* In this appendix, I will go deeper into the NumPy library for array computing.
* This
will include more internal detail about the ndarray type and more advanced array
manipulations and algorithms.


## A.1  ndarray Object Internals

* The NumPy ndarray provides a means to interpret a block of homogeneous data
(either contiguous or strided) as a multidimensional array object. 
* The data type, or
dtype, determines how the data is interpreted as being floating point, integer, boolean,
or any of the other types we’ve been looking at.


* Part of what makes ndarray flexible is that every array object is a strided view on a
block of data. 
* You might wonder, for example, how the array view arr[::2, ::-1]
does not copy any data. 
* The reason is that the ndarray is more than just a chunk of
memory and a dtype; it also has “striding” information that enables the array to move
through memory with varying step sizes. 


More precisely, the ndarray internally consists of the following:


* A *pointer to data*—that is, a block of data in RAM or in a memory-mapped file
* The *data type* or dtype, describing fixed-size value cells in the array
* A tuple indicating the array’s *shape*
* A tuple of *strides*, integers indicating the number of bytes to “step” in order to
advance one element along a dimension

<img style="float: left;" src="pic/pic_A_1.png" width="700">

For example, a 10 × 5 array would have shape (10, 5)

In [None]:
np.ones((10, 5)).shape

A typical 3 × 4 × 5 array of float64 (8-byte) values has strides (160, 40,
8) (knowing about the strides can be useful because, in general, the larger the strides
on a particular axis, the more costly it is to perform computation along that axis)

In [None]:
np.ones((3, 4, 5), dtype=np.float64).strides

While it is rare that a typical NumPy user would be interested in the array strides,
they are the critical ingredient in constructing “zero-copy” array views. Strides can
even be negative, which enables an array to move “backward” through memory (this
would be the case, for example, in a slice like obj[::-1] or obj[:, ::-1]).

## Advanced Array Manipulation

### Reshaping Arrays

* In many cases, you can convert an array from one shape to another without copying
any data. 
* To do this, pass a tuple indicating the new shape to the reshape array
instance method. 

For example, suppose we had a one-dimensional array of values
that we wished to rearrange into a matrix (the result is shown in Figure A-3):

<img style="float: left;" src="pic/pic_A_2.png" width="600">

In [None]:
arr = np.arange(8)

In [None]:
arr

In [None]:
arr.reshape((4, 2))

* A multidimensional array can also be reshaped.

In [None]:
arr.reshape((4, 2)).reshape((2, 4))

* One of the passed shape dimensions can be –1, in which case the value used for that
dimension will be inferred from the data.

In [None]:
arr = np.arange(15)

In [None]:
arr.reshape((5, -1))

* Since an array’s shape attribute is a tuple, it can be passed to reshape, too.

In [None]:
other_arr = np.ones((3, 5))

In [None]:
other_arr.shape

In [None]:
arr.reshape(other_arr.shape)

* The opposite operation of reshape from one-dimensional to a higher dimension is
typically known as *flattening* or *raveling*.

In [None]:
arr = np.arange(15).reshape((5, 3))

In [None]:
arr

In [None]:
arr.ravel()

In [None]:
arr.flatten()

### Concatenating and Splitting Arrays

* **numpy.concatenate** takes a sequence (tuple, list, etc.) of arrays and joins them
together in order along the input axis.

In [None]:
arr1 = np.array([[1, 2, 3], [4, 5, 6]])

In [None]:
arr2 = np.array([[7, 8, 9], [10, 11, 12]])

In [None]:
arr1

In [None]:
arr2

In [None]:
np.concatenate([arr1, arr2], axis=0)

In [None]:
np.concatenate([arr1, arr2], axis=1)

* There are some convenience functions, like **vstack** and **hstack**, for common kinds of
concatenation.

In [None]:
np.vstack((arr1, arr2))

In [None]:
np.hstack((arr1, arr2))

* **split**, on the other hand, slices apart an array into multiple arrays along an axis.

In [None]:
arr = np.random.randn(5, 2)

In [None]:
arr

In [None]:
first, second, third = np.split(arr, [1, 3])

* The value [1, 3] passed to **np.split** indicate the indices at which to split the array
into pieces.

In [None]:
first

In [None]:
second

In [None]:
third

In [None]:
first, second, third = np.split(arr, [2,4])

In [None]:
first

In [None]:
second

In [None]:
third

<img style="float: left;" src="pic/pic_A_3.png" width="600">

#### Stacking helpers: r_ and c_

# 생략

* There are two special objects in the NumPy namespace, r_ and c_, that make stacking
arrays more concise.

In [None]:
#arr = np.arange(6)

In [None]:
#arr1 = arr.reshape((3, 2))

In [None]:
#arr2 = np.random.randn(3, 2)

In [None]:
#arr

In [None]:
#arr1

In [None]:
#arr2

In [None]:
#np.r_[arr1, arr2]

In [None]:
#np.c_[np.r_[arr1, arr2], arr]

* These additionally can translate slices to arrays.

In [None]:
#np.c_[1:6, -10:-5]

### Repeating Elements: tile and repeat

* Two useful tools for repeating or replicating arrays to produce larger arrays are the **repeat** and **tile** functions. 
* **repeat** replicates each element in an array some number
of times, producing a larger array.

In [None]:
arr = np.arange(3)

In [None]:
arr

In [None]:
arr.repeat(3)

<img style="float: left;" src="pic/pic_0_2.png">

The need to replicate or repeat arrays can be less common with
NumPy than it is with other array programming frameworks like
MATLAB. One reason for this is that broadcasting often fills this
need better, which is the subject of the next section.

* By default, if you pass an integer, each element will be repeated that number of times.
* If you pass an array of integers, each element can be repeated a different number of
times:

In [None]:
arr.repeat([2, 3, 4])

* Multidimensional arrays can have their elements repeated along a particular axis.

In [None]:
arr = np.random.randn(2, 2)

In [None]:
arr

In [None]:
arr.repeat(2, axis=0)

* Note that if no axis is passed, the array will be flattened first, which is likely not what
you want. 
* Similarly, you can pass an array of integers when repeating a multidimensional
array to repeat a given slice a different number of times:

In [None]:
arr.repeat([2, 3], axis=0)

In [None]:
arr.repeat([2, 3], axis=1)

In [None]:
arr.repeat(2)

* **tile**, on the other hand, is a shortcut for stacking copies of an array along an axis.
* Visually you can think of it as being akin to “laying down tiles”:

In [None]:
arr

In [None]:
np.tile(arr, 2)

* The second argument is the number of tiles; with a scalar, the tiling is made row by
row, rather than column by column. 
* The second argument to tile can be a tuple
indicating the layout of the “tiling”:

In [None]:
arr

In [None]:
np.tile(arr, (2, 1))

In [None]:
np.tile(arr, (3, 2))

## A.3  Broadcasting

* *Broadcasting* describes how arithmetic works between arrays of different shapes.   
* It can be a powerful feature, but one that can cause confusion, even for experienced
users. 
* The simplest example of broadcasting occurs when combining a scalar value
with an array:

In [None]:
arr = np.arange(4)

In [None]:
arr

In [None]:
arr * 5

In [None]:
arr * np.array([5,5,5,5])

* Here we say that the scalar value 5 has been broadcast to all of the other elements in
the multiplication operation.

### broadcasting over axis 0 of 2 D array

In [None]:
arr = np.arange(4)

In [None]:
arr

In [None]:
arr1 = arr.repeat(3)

In [None]:
arr1

In [None]:
arr2=arr1.reshape(4,3)

In [None]:
arr2

In [None]:
arr2.shape

In [None]:
arr3=np.arange(1,4)

In [None]:
arr3

In [None]:
arr3.shape

<img style="float: left;" src="pic/pic_A_4.png" width="600">

In [None]:
arr2+arr3

For another example, we can demean each column of an array by subtracting the column
means.

In [None]:
arr = np.random.randn(4, 3)

In [None]:
arr

In [None]:
arr.mean(0)

In [None]:
demeaned = arr - arr.mean(0)

In [None]:
demeaned

In [None]:
demeaned.mean(0)

### broadcasting over axis 1 of a 2D array

In [None]:
arr2

In [None]:
arr4=np.arange(1,5).reshape((4,1))

In [None]:
arr4

<img style="float: left;" src="pic/pic_A_5.png" width="600">

In [None]:
arr2+arr4

Another example.

In [None]:
arr

In [None]:
row_means = arr.mean(1)

In [None]:
row_means

In [None]:
row_means.shape

In [None]:
row_means.reshape((4, 1))

In [None]:
demeaned = arr - row_means.reshape((4, 1))

In [None]:
demeaned

In [None]:
demeaned.mean(1)

In [None]:
demeaned.mean(1).reshape((4, 1))

### broadcasting over axis 0 of a 3D array

In [None]:
arr1=np.arange(24).reshape((3,4,2))

In [None]:
arr1

In [None]:
arr2=np.arange(8).reshape(4,2)

In [None]:
arr2

In [None]:
arr1+arr2

<img style="float: left;" src="pic/pic_A_6.png" width="600">

## Rules of Broadcasting¶

* Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
* Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
* Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

To make these rules clear, let's consider a few examples in detail.

<pre>
Let's consider an operation on these two arrays. The shape of the arrays are

M.shape = (2, 3)
a.shape = (3,)

We see by rule 1 that the array a has fewer dimensions, so we pad it on the left with ones:

M.shape -> (2, 3)
a.shape -> (1, 3)

By rule 2, we now see that the first dimension disagrees, so we stretch this dimension to match:

M.shape -> (2, 3)
a.shape -> (2, 3)

The shapes match, and we see that the final shape will be (2, 3).

------------------------------------------------------------------------------------------

<pre>

a.shape = (3, 1)
b.shape = (3,)

Rule 1 says we must pad the shape of b with ones:

a.shape -> (3, 1)
b.shape -> (1, 3)

And rule 2 tells us that we upgrade each of these ones to match the corresponding size of the other array:

a.shape -> (3, 3)
b.shape -> (3, 3)

Because the result matches, these shapes are compatible. 

--------------------------------------------------------------

<pre>
M.shape = (3, 2)
a.shape = (3,)

Again, rule 1 tells us that we must pad the shape of a with ones:

M.shape -> (3, 2)
a.shape -> (1, 3)

By rule 2, the first dimension of a is stretched to match that of M:

M.shape -> (3, 2)
a.shape -> (3, 3)

Now we hit rule 3–the final shapes do not match, so these two arrays are incompatible.


#### 연습문제
<br>  
<pre>

A      (4d array):  256 x 256 x 3
B      (3d array):              3
Result (4d array):  

A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  

A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  
    
A      (1d array):  3
B      (1d array):  4         
Result           :

A      (2d array):      2 x 1
B      (3d array):  8 x 4 x 3 
Result           :
    
A                :          3  
B                :  5 x 4 X 3 
Result           :  

A                :          5  
B                :  5 x 4 X 3 
Result           :  

A                :      5  
B                :  5 x 4 
Result           :  


#### 해답

<pre>

A      (3d array): 256 x 256 x 3
B      (1d array):             3
Result (3d array): 256 x 256 x 3
    
A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5

A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5
    
A      (1d array):  3
B      (1d array):  4          
Result           :  error       # trailing dimensions do not match

A      (2d array):      2 x 1
B      (3d array):  8 x 4 x 3  
result           :  error       # second from last dimensions mismatched    
    
A                :          3  
B                :  5 x 4 X 3 
Result           :  5 x 4 X 3 

A                :          5  
B                :  5 x 4 x 3
Result           :  error

A                :      5  
B                :  5 x 4 
Result           :  error



In [None]:
x=np.arange(20).reshape((5,4))
x

In [None]:
y=np.arange(5)
y

In [None]:
x+y

### Broadcasting Over Other Axes

* Broadcasting with higher dimensional arrays can seem even more mind-bending(meaning: difficult), but
it is really a matter of following the rules. 
* If you don’t, you’ll get an error like this.

In [None]:
 arr = np.random.randn(4, 3)

In [None]:
arr

In [None]:
arr - arr.mean(1)

In [None]:
arr.mean(1)

* It’s quite common to want to perform an arithmetic operation with a lower dimen‐
sional array across axes other than axis 0. 
* According to the broadcasting rule, the
“broadcast dimensions” must be 1 in the smaller array. 
* In the example of row
demeaning shown here, this meant reshaping the row means to be shape (4, 1)
instead of (4,):

In [None]:
arr - arr.mean(1).reshape((4, 1))

* In the three-dimensional case, broadcasting over any of the three dimensions is only
a matter of reshaping the data to be shape-compatible.
* Figure A-7 nicely visualizes the
shapes required to broadcast over each axis of a three-dimensional array.

<img style="float: left;" src="pic/pic_A_7.png" width="600">

* A common problem, therefore, is needing to add a new axis with length 1 specifically
for broadcasting purposes. 
* Using **reshape** is one option, but inserting an axis
requires constructing a tuple indicating the new shape. 
* This can often be a tedious
exercise. 
* NumPy arrays offer a special syntax for inserting new axes by indexing.
* We use the special **np.newaxis** attribute along with “full” slices to insert the new
axis.

In [None]:
arr = np.zeros((4, 4))

In [None]:
arr

In [None]:
arr.shape

In [None]:
arr_3d = arr[:, np.newaxis, :]

In [None]:
arr_3d.shape

In [None]:
arr_3d

In [None]:
arr_1d = np.random.normal(size=3)

In [None]:
arr_1d[:, np.newaxis]

In [None]:
arr_1d[:, np.newaxis].shape

In [None]:
arr_1d[np.newaxis, :]

In [None]:
arr_1d[np.newaxis, :].shape

* Thus, if we had a three-dimensional array and wanted to demean axis 2, say, we
would need to write.

In [None]:
arr = np.random.randn(3, 4, 5)

In [None]:
arr

In [None]:
depth_means = arr.mean(2)

In [None]:
depth_means

In [None]:
depth_means.shape

In [None]:
demeaned = arr - depth_means[:, :, np.newaxis]

In [None]:
demeaned.mean(2)

In [None]:
demeaned_2 = arr - depth_means

### Setting Array Values by Broadcasting

* The same broadcasting rule governing arithmetic operations also applies to setting
values via array indexing. 

In a simple case, we can do things like:


In [None]:
arr = np.zeros((4, 3))

In [None]:
arr

In [None]:
arr[:] = 5

In [None]:
arr

However, if we had a one-dimensional array of values we wanted to set into the columns of the array, we can do that as long as the shape is compatible:

In [None]:
col = np.array([1.28, -0.42, 0.44, 1.6])

In [None]:
col

In [None]:
arr[:] = col[:, np.newaxis]

In [None]:
arr

In [None]:
arr[:2] = [[-1.37], [0.509]]

In [None]:
arr

## A.6  More About Sorting

* Like Python’s built-in list, the ndarray **sort** instance method is an in-place sort,
meaning that the array contents are rearranged without producing a new array.

In [None]:
arr = np.random.randn(6)

In [None]:
arr

In [None]:
arr.sort()

In [None]:
arr

* When sorting arrays in-place, remember that if the array is a view on a different
ndarray, the original array will be modified:

In [None]:
arr = np.random.randn(4, 5)

In [None]:
arr

In [None]:
arr[:, 0].sort()  # Sort first column values in-place

In [None]:
arr

In [None]:
arr[:, 3].sort()  # Sort first column values in-place

In [None]:
arr

* On the other hand, **numpy.sort** creates a new, sorted copy of an array. 
* Otherwise, it
accepts the same arguments (such as kind) as **ndarray.sort**.

In [None]:
arr = np.random.randn(5)

In [None]:
arr

In [None]:
np.sort(arr)

In [None]:
arr

* All of these sort methods take an axis argument for sorting the sections of data along
the passed axis independently.

In [None]:
arr = np.random.randn(3, 5)

In [None]:
arr

In [None]:
arr2=arr.copy()
arr3=arr.copy()

In [None]:
arr.sort(axis=1)

In [None]:
arr

* You may notice that none of the sort methods have an option to sort in descending
order. 
* This is a problem in practice because array slicing produces views, thus not
producing a copy or requiring any computational work. 
* Many Python users are
familiar with the “trick” that for a list values, values[::-1] returns a list in reverse
order. 
* The same is true for ndarrays.

In [None]:
arr[:, ::-1]

In [None]:
arr2

In [None]:
arr2.sort(axis=0)

In [None]:
arr2

In [None]:
arr3

In [None]:
arr4=arr3.copy()
arr4

In [None]:
arr3.sort()

In [None]:
arr3

In [None]:
arr4.sort(1)
arr4

# 여기까지

### Indirect Sorts: argsort and lexsort

* In data analysis you may need to reorder datasets by one or more keys. For example, a
table of data about some students might need to be sorted by last name, then by first
name. 
* This is an example of an *indirect* sort, and if you’ve read the pandas-related
chapters you have already seen many higher-level examples. 
* Given a key or keys (an
array of values or multiple arrays of values), you wish to obtain an array of integer
*indices* (I refer to them colloquially as *indexers*) that tells you how to reorder the data
to be in sorted order. 
* Two methods for this are **argsort** and **numpy.lexsort**. 

As an
example:

In [None]:
values = np.array([5, 0, 1, 3, 2])

In [None]:
values

In [None]:
indexer = values.argsort()

In [None]:
indexer

In [None]:
values[indexer]

As a more complicated example, this code reorders a two-dimensional array by its
first row.

In [None]:
arr = np.random.randn(3, 5)

In [None]:
arr

In [None]:
arr[0] = values

In [None]:
arr

In [None]:
arr[:, arr[0].argsort()]

* **lexsort** is similar to argsort, but it performs an indirect lexicographical sort on multiple key arrays. 
* Suppose we wanted to sort some data identified by first and last
names.

In [None]:
first_name = np.array(['Bob', 'Jane', 'Steve', 'Bill', 'Brittany'])

In [None]:
first_name

In [None]:
last_name = np.array(['Jones', 'Arnold', 'Arnold', 'Jones', 'Walters'])

In [None]:
last_name

In [None]:
np.lexsort?

In [None]:
sorter = np.lexsort((first_name, last_name)) # sort by last_name, then first_name

In [None]:
sorter

* **lexsort** can be a bit confusing the first time you use it because the order in which the
keys are used to order the data starts with the last array passed. 
* Here, last_name was
used before first_name.

In [None]:
x=zip(last_name[sorter], first_name[sorter])

In [None]:
print(tuple(x))