## Numpy Learning

This is my attempt to learn numpy from ground up instead of just learning on the go.


### Notes

[Beginning read](https://www.datacamp.com/community/tutorials/python-numpy-tutorial#gs.h3DvLnk)
- numpy array is comparable to python lists. (Munpy = Numeric Python)
- At structural level, an array is basically a combination of a memory address, a data type, a shape and strides.
    - The strides are the number of bytes that should be skipped in memory to go to the next element. If your strides are (10,1), you need to proceed one byte to get to the next column and 10 bytes to locate the next row.
- Axis numbering
    - axis = 0 is rows
    - axis = 1 is columns
    - axis = 2 is the next dimension in 3d array (you get the idea)
- There are different datatypes we can explicitly pass while creating `np.array`. [Reffer to this cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)
- to load data directly
    - `genfromtxt()`
    - `loadtxt()` (least preffered)
    - or read pandas and convert to np array (imho)
- Functions for IO are listed in [DOCs here](https://docs.scipy.org/doc/numpy/reference/routines.io.html)
- When working with operations on datasets with different dimension, you have to use *Broadcasting*. Numpy Boradcasting




In [3]:
import numpy as np

In [4]:
ar = np.array([1,2,3,4,5])
# Print out memory address
print(ar.data)

# Print out the shape of `my_array`
print(ar.shape)

# Print out the data type of `my_array`
print(ar.dtype)

# Print out the stride of `my_array`
print(ar.strides)

<memory at 0x110b32a70>
(5,)
int64
(8,)


In [5]:
st_ar = np.array(['aaasdf','ab','gdf'], dtype=np.string_)
# Print out memory address
print(st_ar.data)

# Print out the shape of `my_array`
print(st_ar.shape)

# Print out the data type of `my_array`
print(st_ar.dtype)

# Print out the st. Note that it takes the bytes required to store the longest string
print(st_ar.strides)

<memory at 0x110b32a70>
(3,)
|S6
(6,)


In [6]:
np.ones((2,3,1,3),dtype=np.string_)

array([[[[b'1', b'1', b'1']],

        [[b'1', b'1', b'1']],

        [[b'1', b'1', b'1']]],


       [[[b'1', b'1', b'1']],

        [[b'1', b'1', b'1']],

        [[b'1', b'1', b'1']]]], dtype='|S1')

In [7]:
np.full((2,2),7)

array([[7, 7],
       [7, 7]])

In [8]:
# Create an array of evenly-spaced values like a range function
np.arange(10,65,5)

array([10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60])

In [9]:
# Create an array of evenly-spaced values
np.linspace(1,9,9)

array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [10]:
np.eye(21,6)

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [11]:
np.identity(10) 

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

**NOTE:** When working with larger dataset following functions can be handy to understand the limitations

In [12]:
# Print the number of `my_array`'s dimensions
print(ar.ndim)

# Print the number of `my_array`'s elements
print(ar.size)

# Print information about `my_array`'s memory layout
print(ar.flags)

# Print the length of one array element in bytes
print(ar.itemsize)

# Print the total consumed bytes by `my_array`'s elements
print(ar.nbytes)

1
5
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
8
40


### NumPy Broadcasting 

(very important in practical datascience)

1. to make sure that the broadcasting is successful, the dimensions of your arrays need to be compatible. Two dimensions are compatible when they are equal. 

In [20]:
# During addition the dimensions should be same NxM + NxM or NxM + Mx1 
x = np.ones((3,4))
print(x.shape)
print(x)
y = np.random.random((4,))
print(y)

print(x + y)

(3, 4)
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[0.03633351 0.75439175 0.01032737 0.25677923]
[[1.03633351 1.75439175 1.01032737 1.25677923]
 [1.03633351 1.75439175 1.01032737 1.25677923]
 [1.03633351 1.75439175 1.01032737 1.25677923]]


In [21]:
x = np.ones((3,4))
print(x.shape)
y = np.arange(4)
print(y.shape)

print(x - y)

(3, 4)
(4,)
[[ 1.  0. -1. -2.]
 [ 1.  0. -1. -2.]
 [ 1.  0. -1. -2.]]


In [23]:
x = np.ones((3,4))
print(x)
y = np.random.random((5,1,4))
print(y)

print(x + y)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[[1.22395417e-01 4.41625292e-01 9.14463037e-01 4.18771330e-04]]

 [[7.75391499e-01 1.63862166e-01 2.29606305e-01 1.52346782e-02]]

 [[4.63351438e-01 2.11646036e-01 6.82347742e-01 6.45281786e-01]]

 [[5.65436517e-01 9.82006835e-01 3.58801873e-01 8.58727981e-01]]

 [[6.17594810e-01 9.44329798e-01 8.32953733e-01 2.81887730e-01]]]
[[[1.12239542 1.44162529 1.91446304 1.00041877]
  [1.12239542 1.44162529 1.91446304 1.00041877]
  [1.12239542 1.44162529 1.91446304 1.00041877]]

 [[1.7753915  1.16386217 1.2296063  1.01523468]
  [1.7753915  1.16386217 1.2296063  1.01523468]
  [1.7753915  1.16386217 1.2296063  1.01523468]]

 [[1.46335144 1.21164604 1.68234774 1.64528179]
  [1.46335144 1.21164604 1.68234774 1.64528179]
  [1.46335144 1.21164604 1.68234774 1.64528179]]

 [[1.56543652 1.98200684 1.35880187 1.85872798]
  [1.56543652 1.98200684 1.35880187 1.85872798]
  [1.56543652 1.98200684 1.35880187 1.85872798]]

 [[1.61759481 1.9443298  1.83295373 1.28

**The maximum size along each dimension of x and y is taken to make up the shape of the new, resulting array.**

In [29]:
y = np.ones((5,1,4))
print("STD: {}".format(np.std(y)))
print("MEAN: {}".format(np.mean(y)))

STD: 0.0
MEAN: 1.0


In [32]:
x = np.zeros((3,5))
z = np.zeros((3,5))

In [40]:
print(x)
print(z)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


In [38]:
print(np.array_equal(x,y))
print(np.array_equal(x,z))

False
False
