<a href="https://colab.research.google.com/github/fbeilstein/machine_learning/blob/master/lecture_2_numpy_arrays.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![np array vs list](https://raw.githubusercontent.com/fbeilstein/machine_learning/master/lecture_01_and_lecture_02_intro_and_python/python_list.png)

#The Basics of NumPy Arrays

##NumPy Array Attributes

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_``  | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 

In [0]:
import numpy as np
np.random.seed(0) # seed for reproducibility
x1 = np.random.randint(10, size=6) # One-dimensional array
x2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array

In [0]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

In [0]:
print("dtype:", x3.dtype)

In [0]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

##Array Indexing: Accessing Single Elements

In [0]:
x1

In [0]:
x1[0]

In [0]:
x1[4]

In [0]:
x1[-1]

In [0]:
x1[-2]

In [0]:
x2

In [0]:
x2[0, 0]

In [0]:
x2[2, 0]

In [0]:
x2[2, -1]

In [0]:
x2[0, 0] = 12
x2

In [0]:
x1[0] = 3.14159 # this will be truncated!
x1

##Array Slicing: Accessing Subarrays

In [0]:
x = np.arange(10)
x

In [0]:
x[:5] # first five elements

In [0]:
x[5:] # elements after index 5

In [0]:
x[4:7] # middle subarray

In [0]:
x[::2] # every other element

In [0]:
x[1::2] # every other element, starting at index 1

In [0]:
x[::-1] # all elements, reversed

In [0]:
x[5::-2] # reversed every other from index 5

In [0]:
x2

In [0]:
x2[:2, :3] # two rows, three columns

In [0]:
x2[:3, ::2] # all rows, every other column

In [0]:
x2[::-1, ::-1]

In [0]:
print(x2[:, 0]) # first column of x2

In [0]:
print(x2[0, :]) # first row of x2

In [0]:
print(x2[0]) # equivalent to x2[0, :]

In [0]:
print(x2)

In [0]:
x2_sub = x2[:2, :2]
print(x2_sub)

In [0]:
print(x2)

In [0]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

In [0]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

In [0]:
print(x2)

##Reshaping of Arrays

In [0]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

In [0]:
x = np.array([1, 2, 3])
# row vector via reshape
x.reshape((1, 3))

In [0]:
x[np.newaxis, :]

In [0]:
x.reshape((3, 1))

In [0]:
x[:, np.newaxis]

##Array Concatenation and Splitting

In [0]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

In [0]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

In [0]:
grid = np.array([[1, 2, 3], [4, 5, 6]])
# concatenate along the first axis
np.concatenate([grid, grid])

In [0]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

In [0]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7], [6, 5, 4]])
# vertically stack the arrays
np.vstack([x, grid])

In [0]:
# horizontally stack the arrays
y = np.array([[99], [99]])
np.hstack([grid, y])

In [0]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

In [0]:
grid = np.arange(16).reshape((4, 4))
grid

In [0]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

In [0]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

#Computation on NumPy Arrays: Universal Functions

| Operator	    | Equivalent ufunc    | Description                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|

##The Slowness of Loops

In [0]:
import numpy as np
np.random.seed(0)


def compute_reciprocals(values):
  output = np.empty(len(values))
  for i in range(len(values)):
    output[i] = 1.0 / values[i]
  return output


values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)

In [0]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

##Introducing UFuncs

In [0]:
print(compute_reciprocals(values))
print(1.0 / values)

In [0]:
%timeit (1.0 / big_array)

In [0]:
np.arange(5) / np.arange(1, 6)

In [0]:
x = np.arange(9).reshape((3, 3))
2 ** x

##Exploring NumPy’s UFuncs

In [0]:
x = np.arange(4)
print("x =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2) # floor division

In [0]:
print("-x = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2 = ", x % 2)

In [0]:
-(0.5*x + 1) ** 2

In [0]:
np.add(x, 2)

In [0]:
x = np.array([-2, -1, 0, 1, 2])
abs(x)
np.absolute(x)
np.abs(x)

In [0]:
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)

In [0]:
theta = np.linspace(0, np.pi, 3)
print("theta = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))

In [0]:
x = [-1, 0, 1]
print("x = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))

In [0]:
x = [1, 2, 3]
print("x =", x)
print("e^x =", np.exp(x))
print("2^x =", np.exp2(x))
print("3^x =", np.power(3, x))

In [0]:
x = [1, 2, 4, 10]
print("x =", x)
print("ln(x) =", np.log(x))
print("log2(x) =", np.log2(x))
print("log10(x) =", np.log10(x))

In [0]:
x = [0, 0.001, 0.01, 0.1]
print("exp(x) - 1 =", np.expm1(x))
print("log(1 + x) =", np.log1p(x))

In [0]:
from scipy import special
# Gamma functions (generalized factorials) and related functions
x = [1, 5, 10]
print("gamma(x) =", special.gamma(x))
print("ln|gamma(x)| =", special.gammaln(x))
print("beta(x, 2) =", special.beta(x, 2))

In [0]:
# Error function (integral of Gaussian)
# its complement, and its inverse
x = np.array([0, 0.3, 0.7, 1.0])
print("erf(x) =", special.erf(x))
print("erfc(x) =", special.erfc(x))
print("erfinv(x) =", special.erfinv(x))

##Advanced Ufunc Features

In [0]:
x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)

In [0]:
y = np.zeros(10)
np.power(2, x, out=y[::2])
print(y)

In [0]:
x = np.arange(1, 6)
np.add.reduce(x)

In [0]:
np.multiply.reduce(x)

In [0]:
np.add.accumulate(x)

In [0]:
np.multiply.accumulate(x)

In [0]:
x = np.arange(1, 6)
np.multiply.outer(x, x)

#Aggregations: Min, Max, and Everything in Between

##Summing the Values in an Array

In [0]:
import numpy as np
L = np.random.random(100)
sum(L)

In [0]:
np.sum(L)

In [0]:
big_array = np.random.rand(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)

##Minimum and Maximum


|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

In [0]:
min(big_array), max(big_array)

In [0]:
np.min(big_array), np.max(big_array)

In [0]:
%timeit min(big_array)
%timeit np.min(big_array)

In [0]:
print(big_array.min(), big_array.max(), big_array.sum())

In [0]:
M = np.random.random((3, 4))
print(M)

In [0]:
M.sum()

In [0]:
M.min(axis=0)

In [0]:
M.max(axis=1)

#Example: What Is the Average Weight of US Cars?

In [0]:
import numpy as np
from vega_datasets import data
data.list_datasets()

In [0]:
d_cars = data.cars()
d_cars.head()

In [0]:
d_weights = np.array(d_cars['Weight_in_lbs'])
print(d_weights)

In [0]:
print("Mean weight: ", d_weights.mean())
print("Standard deviation:", d_weights.std())
print("Minimum weight: ", d_weights.min())
print("Maximum weight: ", d_weights.max())

In [0]:
print("25th percentile: ", np.percentile(d_weights, 25))
print("Median: ", np.median(d_weights))
print("75th percentile: ", np.percentile(d_weights, 75))

In [0]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set() # set plot style
plt.hist(d_weights)
plt.title('Weight Distribution of US Cars')
plt.xlabel('weight (lbs)')
plt.ylabel('number');

#Computation on Arrays: Broadcasting

![broadcast](https://jakevdp.github.io/PythonDataScienceHandbook/figures/02.05-broadcasting.png)

##Introducing Broadcasting

In [0]:
import numpy as np
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
a + b

In [0]:
a + 5

In [0]:
M = np.ones((3, 3))
M

In [0]:
M + a

In [0]:
a = np.arange(3)
b = np.arange(3)[:, np.newaxis]
print(a)
print(b)
a + b

##Rules of Broadcasting

*   Rule 1: If the two arrays differ in their number of dimensions, the shape of the
one with fewer dimensions is padded with ones on its leading (left) side.
*   Rule 2: If the shape of the two arrays does not match in any dimension, the array
with shape equal to 1 in that dimension is stretched to match the other shape.
*  Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is
raised.



In [0]:
M = np.ones((2, 3))
a = np.arange(3)
print("M = ", M)
print("a = ", a)
M + a

In [0]:
a = np.arange(3).reshape((3, 1))
b = np.arange(3)
a + b

In [0]:
M = np.ones((3, 2))
a = np.arange(3)
print("ERROR SHOULD BE GENERATED")
M + a

In [0]:
a[:, np.newaxis].shape
M + a[:, np.newaxis]

In [0]:
np.logaddexp(M, a[:, np.newaxis]) # broadcasting works with many other functions

##Broadcasting in Practice

In [0]:
X = np.random.random((10, 3))

In [0]:
Xmean = X.mean(0)
Xmean

In [0]:
X_centered = X - Xmean
X_centered.mean(0)

In [0]:
# x and y have 50 steps from 0 to 5
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 50)[:, np.newaxis]
z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)

%matplotlib inline
import matplotlib.pyplot as plt

plt.imshow(z, origin='lower', extent=[0, 5, 0, 5],
cmap='viridis')
plt.colorbar();

#Comparisons, Masks, and Boolean Logic

##Example: Obesity Rate

In [0]:
import numpy as np
from vega_datasets import data

d_obesity = data.obesity()
d_rate = np.array(d_obesity['rate'])

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set() # set plot styles
plt.hist(d_rate, 10);

##Comparison Operators as ufuncs

| Operator	    | Equivalent ufunc    || Operator	   | Equivalent ufunc    |
|---------------|---------------------||---------------|---------------------|
|``==``         |``np.equal``         ||``!=``         |``np.not_equal``     |
|``<``          |``np.less``          ||``<=``         |``np.less_equal``    |
|``>``          |``np.greater``       ||``>=``         |``np.greater_equal`` |

In [0]:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
print(x)
print(x < 3)  # less than
print(x > 3)  # greater than
print(x <= 3) # less than or equal
print(x == 3) # equal 
print(x != 3) # not equal
print(x >= 3) # greater than or equal 
print((2 * x) == (x ** 2))

In [0]:
rng = np.random.RandomState(0)
x = rng.randint(10, size=(3, 4))
x

In [0]:
x < 6

##Working with Boolean Arrays

In [0]:
print(x)

In [0]:
# how many values less than 6?
np.count_nonzero(x < 6)

In [0]:
np.sum(x < 6)

In [0]:
# how many values less than 6 in each row?
np.sum(x < 6, axis=1)

In [0]:
# are there any values greater than 8?
np.any(x > 8)

In [0]:
# are there any values less than zero?
np.any(x < 0)

In [0]:
# are all values less than 10?
np.all(x < 10)

In [0]:
# are all values equal to 6?
 np.all(x == 6)

In [0]:
# are all values in each row less than 8?
np.all(x < 8, axis=1)

In [0]:
import numpy as np
from vega_datasets import data

d_obesity = data.obesity()
d_rate = np.array(d_obesity['rate'])

print(np.sum((d_rate > 0.1) & (d_rate < 1.0)))
print(np.sum(~( (d_rate <= 0.1) | (d_rate >= 1) )))

In [0]:
print("Number states without obesity: ", np.sum(d_rate == 0))
print("Number states with obesity: ", np.sum(d_rate != 0))
print("States with more than 0.15 obesity rate:", np.sum(d_rate > 0.15))
print("States with < 0.15 obesity rate:", np.sum((d_rate > 0) & (d_rate < 0.15)))

##Boolean Arrays as Masks

In [0]:
x

In [0]:
x < 5

In [0]:
x[x < 5]

In [0]:
import numpy as np
from vega_datasets import data

d_cars = data.cars()
d_cars.head()
d_weights = np.array(d_cars['Weight_in_lbs'])
d_cylinders = np.array(d_cars['Cylinders'])
d_power = np.array(d_cars['Horsepower'])

d_heavy = (d_weights > 2500)
d_four_cylinders = (d_cylinders == 4)

# Nan values are present! Use special form of median and max
print("Median power of heavy cars (horsepower): ",
 np.nanmedian(d_power[d_heavy]))
print("Median power of four-cylinder cars (horsepower): ",
 np.nanmedian(d_power[d_four_cylinders]))
print("Maximum power of heavy cars (horsepower): ",
 np.nanmax(d_power[d_heavy]))
print("Median power of non-heavy four-cylinder cars (horsepower):",
 np.nanmedian(d_power[d_four_cylinders & ~d_heavy]))

In [0]:
bool(42), bool(0)

In [0]:
bool(42 and 0)

In [0]:
bool(42 or 0)

In [0]:
bin(42), bin(59)

In [0]:
bin(42 & 59)

In [0]:
bin(42 | 59)

In [0]:
A = np.array([1, 0, 1, 0, 1, 0], dtype=bool)
B = np.array([1, 1, 1, 0, 1, 1], dtype=bool)
A | B

In [0]:
print("ERROR SHOULD BE GENERATED")
A or B

In [0]:
x = np.arange(10)
(x > 4) & (x < 8)

In [0]:
print("ERROR SHOULD BE GENERATED")
(x > 4) and (x < 8)

#Fancy Indexing

##Exploring Fancy Indexing

In [0]:
import numpy as np
rand = np.random.RandomState(42)

x = rand.randint(100, size=10)
print(x)

In [0]:
 [x[3], x[7], x[2]]

In [0]:
ind = [3, 7, 4]
x[ind]

In [0]:
ind = np.array([[3, 7], [4, 5]])
x[ind]

In [0]:
X = np.arange(12).reshape((3, 4))
X

In [0]:
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]

**Note:** fancy indexing follows broadcasting rules

In [0]:
X[row[:, np.newaxis], col]

In [0]:
row[:, np.newaxis] * col

##Combined Indexing

In [0]:
import numpy as np
rand = np.random.RandomState(42)

X = np.arange(12).reshape((3, 4))
X

In [0]:
X[2, [2, 0, 1]] #  fancy and simple indices

In [0]:
X[1:, [2, 0, 1]] # fancy indexing with slicing

In [0]:
mask = np.array([1, 0, 1, 0], dtype=bool)
X[row[:, np.newaxis], mask] # fancy indexing with masking

##Example: Selecting Random Points

In [0]:
mean = [0, 0]
cov = [[1, 2],
[2, 5]]
X = rand.multivariate_normal(mean, cov, 100)
X.shape

In [0]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set() # for plot styling
plt.scatter(X[:, 0], X[:, 1]);

In [0]:
indices = np.random.choice(X.shape[0], 20, replace=False)
indices

In [0]:
selection = X[indices] # fancy indexing here
selection.shape

In [0]:
plt.scatter(X[:, 0], X[:, 1], alpha=1.0)
plt.scatter(selection[:, 0], selection[:, 1], facecolor='red', s=200, alpha=0.2);

##Modifying Values with Fancy Indexing

In [0]:
x = np.arange(10)
i = np.array([2, 1, 8, 4])
x[i] = 99
print(x)

In [0]:
x[i] -= 10
print(x)

In [0]:
x = np.zeros(10)
x[[0, 0]] = [4, 6] # note: ~ x[0] = 4, followed by x[0] = 6
print(x)

In [0]:
i = [2, 3, 3, 4, 4, 4]
x[i] += 1 # note: x[i] += 1 ~ x[i] = x[i] + 1, NO ACCUMULATION
x

In [0]:
x = np.zeros(10)
np.add.at(x, i, 1) # if accumulation is needed
print(x)

##Example: Binning Data

In [0]:
np.random.seed(42)
x = np.random.randn(100)
# compute a histogram by hand
bins = np.linspace(-5, 5, 20)
counts = np.zeros_like(bins)
# find the appropriate bin for each x
i = np.searchsorted(bins, x)
# add 1 to each of these bins
np.add.at(counts, i, 1)
# plot the results
plt.plot(bins, counts, linestyle='steps');

In [0]:
plt.hist(x, bins, histtype='step'); # standard way

In [0]:
# small size
print("NumPy routine:")
%timeit counts, edges = np.histogram(x, bins)
print("Custom routine:")
%timeit np.add.at(counts, np.searchsorted(bins, x), 1)

In [0]:
# large size
x = np.random.randn(1000000)
print("NumPy routine:")
%timeit counts, edges = np.histogram(x, bins)
print("Custom routine:")
%timeit np.add.at(counts, np.searchsorted(bins, x), 1)

#Sorting Arrays

In [0]:
import numpy as np

def selection_sort(x):
  for i in range(len(x)):
    swap = i + np.argmin(x[i:])
    (x[i], x[swap]) = (x[swap], x[i])
  return x

x = np.array([2, 1, 4, 3, 5])
selection_sort(x)

In [0]:
# drug addict's sort
def bogosort(x):
  while np.any(x[:-1] > x[1:]):
    np.random.shuffle(x)
  return x

x = np.array([2, 1, 4, 3, 5])
bogosort(x)

##Fast Sorting in NumPy: np.sort and np.argsort

In [0]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)

In [0]:
x.sort()
print(x)

In [0]:
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
print("indices", i)
x[i]

In [0]:
rand = np.random.RandomState(42)
X = rand.randint(0, 10, (4, 6))
print(X)

In [0]:
# sort each column of X
np.sort(X, axis=0)

In [0]:
# sort each row of X
np.sort(X, axis=1)

##Partial Sorts: Partitioning

In [0]:
x = np.array([7, 2, 3, 1, 6, 5, 4])
np.partition(x, 3) # 3 smallest values in the array

In [0]:
np.partition(X, 2, axis=1)

##Example: k-Nearest Neighbors

In [0]:
X = rand.rand(10, 2)
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set() # Plot styling
plt.scatter(X[:, 0], X[:, 1], s=100);

In [0]:
dist_sq = np.sum((X[:,np.newaxis,:] - X[np.newaxis,:,:]) ** 2, axis=-1)
dist_sq.diagonal() # check that each point has 0 distance to itself

In [0]:
nearest = np.argsort(dist_sq, axis=1)
print(nearest)

In [0]:
K = 2
nearest_partition = np.argpartition(dist_sq, K + 1, axis=1)

plt.scatter(X[:, 0], X[:, 1], s=100)
# draw lines from each point to its two nearest neighbors
K = 2
for i in range(X.shape[0]):
  for j in nearest_partition[i, :K+1]:
    # plot a line from X[i] to X[j]
    # use some zip magic to make it happen:
    plt.plot(*zip(X[j], X[i]), color='black')

#Structured Data: NumPy’s Structured Arrays

In [0]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

In [0]:
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
'formats':('U10', 'i4', 'f8')})
print(data.dtype)

In [0]:
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

In [0]:
# Get all names
print(data['name'])
# Get first row of data
print(data[0])
# Get the name from the last row
print(data[-1]['name'])
print(data[data['age'] < 30]['name'])

##Creating Structured Arrays

| Character        | Description           | Example                             |
| ---------        | -----------           | -------                             | 
| ``'b'``          | Byte                  | ``np.dtype('b')``                   |
| ``'i'``          | Signed integer        | ``np.dtype('i4') == np.int32``      |
| ``'u'``          | Unsigned integer      | ``np.dtype('u1') == np.uint8``      |
| ``'f'``          | Floating point        | ``np.dtype('f8') == np.int64``      |
| ``'c'``          | Complex floating point| ``np.dtype('c16') == np.complex128``|
| ``'S'``, ``'a'`` | String                | ``np.dtype('S5')``                  |
| ``'U'``          | Unicode string        | ``np.dtype('U') == np.str_``        |
| ``'V'``          | Raw data (void)       | ``np.dtype('V') == np.void``        |

In [0]:
np.dtype({'names':('name', 'age', 'weight'),
'formats':('U10', 'i4', 'f8')})

In [0]:
np.dtype({'names':('name', 'age', 'weight'),
'formats':((np.str_, 10), int, np.float32)})

In [0]:
np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])

In [0]:
np.dtype('S10,i4,f8')

##More Advanced Compound Types

In [0]:
tp = np.dtype([('id', 'i8'), ('mat', 'f8', (3, 3))])
X = np.zeros(1, dtype=tp)
print(X[0])
print(X['mat'][0])

##RecordArrays: Structured Arrays with a Twist

In [0]:
print(data['age'])
data_rec = data.view(np.recarray)
print(data_rec.age) # fields can be accessed as attributes

In [0]:
%timeit data['age']
%timeit data_rec['age']
%timeit data_rec.age

# Few points to recall

**Copy on slice**

In [0]:
x = np.arange(3 * 5).reshape(3, 5)
print("Input array:")
print(x)

y = x[1:, 2:4]
print("Sliced array:")
print(y)

y[0, 0] = 123

print("X after setting y[0,0] = 123")
print(x)

print("Y after setting y[0,0] = 123")
print(y)

**flatten() and ravel()**

In [0]:
x = np.arange(3 * 5).reshape(3, 5)
print("Input array:")
print(x)

y = x.flatten()
print("Flattened array:")
print(y)

z = x.ravel()
print("Raveled array:")
print(z)

y[0] = 123
z[1] = 987

print("X:")
print(x)
print("Y:")
print(y)
print("Z:")
print(z)

**transpose does not copy**

In [0]:
x = np.arange(3 * 5).reshape(3, 5)
print("Input array:")
print(x)

y = x.transpose(0, 1)
print("After setting axes order to (0, 1) nothing should be changed:")
print(y)

z = x.transpose(1, 0)
print("After setting axes order to (1, 0) we get transposed array:")
print(z)

In [0]:
x = np.arange(2 * 3 * 2 * 3).reshape(2, 3, 2, 3)
print("Input array:")
print(x)

y = x.transpose(2, 0, 1, 3)
print("After setting axes order to (2, 0, 1, 3):")
print(y)

print("Comparing x[1, 2, 0, 2] and y[0, 1, 2, 2]:")
print(x[1, 2, 0, 2] == y[0, 1, 2, 2])

In [0]:
x = np.arange(3 * 5).reshape(3, 5)
print("Input array:")
print(x)

y = x.T
print("In case of 2-dimentional array we can use np.ndarray.T method for transposing:")
print(y)

In [0]:
x = np.arange(3 * 5).reshape(3, 5)
print("Input array:")
print(x)

y = x.transpose(1, 0)
y[:] = 500

print("X after setting all values of transposed array to 500:")
print(x)

print("Y after setting all values of transposed array to 500:")
print(y)

**broadcasting with many axis**

In [0]:
a = np.arange(2 * 3 * 4).reshape(2, 3, 4)

b1 = a.reshape(1, 2, 1, 3, 4)
b2 = (a * 1000).reshape(2, 1, 3, 1 ,4)

print("b1 shape:", b1.shape)
print("b2 shape:", b2.shape)
print("b1+b2 shape:", (b1+b2).shape)

**numerical derivative (x[1:] - x[:-1]) / dt**

In [0]:
x = np.arange(3 * 4).reshape(3, 4)**2
dt0 = 1e-5
dt1 = 1e-5

print("Input x:")
print(x)

print("dx/dt0 (partial over axis=0):")
print((x[1:]-x[:-1]) / dt0)

print("dx/dt1 (partial over axis=1):")
print((x[:, 1:]-x[:, :-1]) / dt1)