## Basics of Primary Data Science Libraries - DataBrains

We're going to go over some of the basics of the main Python libraries that data scientists use. This overview by no means covers everything. 

## Numpy

In data science, we deal a lot with matrices and linear algebra. This is because we're often given a matrix to represent the data itself: the rows of the matrix represent each individual observation in a set of data, and the columns represent a particular feature of the data. We'll get to that more when we get to Pandas.

In [3]:
# creating a numpy 
import numpy as np

# create a numpy array from a list
lst = [1,2,3]
np_arr = np.array(lst)
np_arr

array([1, 2, 3])

In [4]:
# see the shape of the matrix
# which we see is a column vector
# with 3 entries
# (essentially a matrix with three rows and one column)
print(np_arr.shape) 

(3,)


In [6]:
# now let's create a 3x3 matrix
# must be a list of lists
three_by_three = np.array([[1,2,3],[4,5,6],[7,8,9]])
three_by_three

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

We deal with numpy arrays in similar ways to lists with regards to indexing them. It's like indexing a list of lists.

In [8]:
# get entry in the first row of the third column
three_by_three[0][2]

3

What if you wanted to get the third column of the matrix?
We can also get rows and columns in terms of the following notation:

In [9]:
# all rows of the second column
# slicing just like with lists
three_by_three[:,2]

array([3, 6, 9])

There are a few more complex ways to index things, we just covered the basics. You can find more on this here:
https://github.com/trapatsas/Python-Data-Science-and-Machine-Learning-Bootcamp/blob/master/Python-for-Data-Analysis/NumPy/Numpy%20Indexing%20and%20Selection.ipynb

We can also do operations on numpy arrays, much like we can with matrices. Of course, just like with matrices, the dimensions have to match up properly.

In [14]:
# addition
A = np.array([[1,2],[3,4]])
B = np.array([[4,5],[7,5]])
C = A + B
print("sum: \n {}".format(C))

# multiplication
D = A.dot(B)
print("product: \n {}".format(D))

sum: 
 [[ 5  7]
 [10  9]]
product: 
 [[18 15]
 [40 35]]


These are the basics of Numpy. There are other useful things you can do, including generating an evenly spaced array between two numbers, generating a matrix of numbers sampled from the normal distribution, reshape an array, and more. For a good tutorial/reference guide, go here: https://github.com/trapatsas/Python-Data-Science-and-Machine-Learning-Bootcamp/blob/master/Python-for-Data-Analysis/NumPy/NumPy%20Arrays.ipynb