Using NumPY


In [0]:
import numpy as np

First thing we will do is create an array. An array is just a structure that has the same value types. Typically, we will deal with integers and floating numbers

In [0]:
a = np.array([1, 2, 3, 5])

We can see what that array looks like using print

In [84]:
print(a)

[1 2 3 5]


We can look at specific parts of this array and slice it in different ways


In [35]:
a[0]

1

In [40]:
a[1]

2

In [42]:
a[0:2]

array([1, 2])

In [46]:
a[1:3]

array([2, 3])

In [47]:
a[-1]

5

With an array, we can see various built in functions. A lot of these will be various useful for us

In [27]:
a.shape
np.shape(a)

(3,)

In [29]:
a.mean()
np.mean(a)

2.0

In [30]:
a.min()
np.min(a)

1

In [31]:
a.max()
np.max(a)

3

In [78]:
np.log(a)


array([0.        , 0.69314718, 1.09861229, 1.60943791])

In [79]:
np.sqrt(a)

array([1.        , 1.41421356, 1.73205081, 2.23606798])

In [82]:
a.reshape(4,1)
np.reshape(a,[4,1])

array([[1],
       [2],
       [3],
       [5]])

In [83]:
a+1

array([2, 3, 4, 6])

Let's create a slightly more complicated array

In [89]:
b = np.array([[1,2,3],[4,5,6]])
print(b)

[[1 2 3]
 [4 5 6]]


We can then apply those previous functions, but along different axes

In [97]:
b.sum(axis=0)

array([5, 7, 9])

Numpy is also useful for creating empty or random arrays


In [56]:
b = np.zeros((1,2))
print(b)                                

[[0. 0.]]


In [63]:
c = np.ones((2,2))   
print(c) 

[[1. 1.]
 [1. 1.]]


You can also make arrays of random integers

In [62]:
e = np.random.randint(0,10,[2,2])
print(e)

[[2 8]
 [4 6]]


And arrays of random numbers

In [54]:
f = np.random.randn(2,3)
print(f)

[[ 0.62453697  0.38675541  0.31681719]
 [ 0.703783   -0.43577772  0.44724003]]


We can manipulate our arrays using a variety of functions


In [68]:
np.add(e,c)
e+c

array([[3., 9.],
       [5., 7.]])

In [71]:
e - b
np.subtract(e, b)

[[2. 8.]
 [4. 6.]]
[[2. 8.]
 [4. 6.]]


In [80]:
e - b
np.multiply(e, b)

array([[0., 0.],
       [0., 0.]])

In [76]:
e / c
np.divide(e, c)

array([[2., 8.],
       [4., 6.]])

One thing that will come in handy often is the ability to use boolean stuff with arrays

In [6]:
1 > 0

True

In [23]:
1 < 0

False

In [24]:
1 == 0

False

In [15]:
x = np.array([1,2,3,4,5,6,7,8])
print(x)

[1 2 3 4 5 6 7 8]


In [16]:
boolean = x > 4
print(boolean)

[False False False False False  True  True  True]


In [20]:
boolean = x >= 4
print(boolean)

[False False False  True  True  True  True  True]


In [21]:
x[boolean]

array([4, 5, 6, 7, 8])

Last thing I briefly wanted to cover was appending and concatanating. Sometimes, we want to add numbers to our array.

In [25]:
x

array([1, 2, 3, 4, 5, 6, 7, 8])

In [0]:
y = np.array([9,10])

In [32]:
np.concatenate([x,y])

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [34]:
z = np.array([11,12,13]).reshape(3,1)
print(z)

[[11]
 [12]
 [13]]


In [35]:
np.concatenate([x,y,z])

ValueError: ignored

In [0]:
import pandas as pd

In [0]:
data = pd.read_csv('sample_data/california_housing_train.csv')

In [0]:
data.describe()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
count,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0,17000.0
mean,-119.562108,35.625225,28.589353,2643.664412,539.410824,1429.573941,501.221941,3.883578,207300.912353
std,2.005166,2.13734,12.586937,2179.947071,421.499452,1147.852959,384.520841,1.908157,115983.764387
min,-124.35,32.54,1.0,2.0,1.0,3.0,1.0,0.4999,14999.0
25%,-121.79,33.93,18.0,1462.0,297.0,790.0,282.0,2.566375,119400.0
50%,-118.49,34.25,29.0,2127.0,434.0,1167.0,409.0,3.5446,180400.0
75%,-118.0,37.72,37.0,3151.25,648.25,1721.0,605.25,4.767,265000.0
max,-114.31,41.95,52.0,37937.0,6445.0,35682.0,6082.0,15.0001,500001.0
