# Introduction to NumPy and Pandas

## What is NumPy?

NumPy is a package in Python used for powerful, scalable computation. NumPy has a special array object that allows it store data in a smaller memory size and have faster access/edit times than ordinary Python list objects. It also has a large number of useful mathematical functions for linear algebra, fourier transforms, etc. We're going to mainly focus on the array objects today.

## Installation of Packages
Before we dive into Numpy, we need to first install the package and then import it. First, you want to make sure that you have pip3 installed. This is a tool that makes it really easy to install the numerous packages that are found on the Python Package Index. Chances are that your computer already has pip installed but you need pip3, since we're working in Python 3. 

Go to your terminal and type:

```terminal
sudo easy_install pip

```
After you enter your login password, you have successfully installed the most recent version of pip and pip3! Now anytime you want to install a package, you type in the terminal: 

```terminal
pip3 install insert_package_name_here
```
Or if you just want to do it within the Jupyter Notebook, just type: 
```terminal
!pip3 install insert_package_name_here
```
Try installing 'numpy' on your own!

## Importing Packages

You only have to install a new package once using terminal. However, the package is not automatically added into your program; you need to first import it. We are going to import NumPy's contents into our program. Anytime we use a NumPy array or methods, we can call it using an abbreviated name like np instead of typing the whole name (numpy). 

In [None]:
# Note that there is no output when you import a package
import numpy as np

## NumPy Arrays
The NumPy array is a homogeneous, multidimensional array. Let's break that down. 

## INSERT PIC OF NUMPY ARRAY VS. PYTHON LIST

1. Homogeneous 
    * Every element in a NumPy array is of the same datatype. 
        * The result is that you can save much more memory than storing the data in a List
    * They can be all integers, all floats, or all strings. 
    * What happens if there's a mix of datatypes in your NumPy array? 
        * You'll experiment shortly and see what happens.


2. Multidimensional
    * You can have 1D NumPy arrays, 2D NumPy matrices, and more! 
    * Multidimensional NumPy arrays are much more easy to work with than multidimensional Python Lists, as you will see

3. Array
    * An array is an ordered group of elements stored in contiguous memory. 
    * Python actually does not have a native array data structure. 
        * Python only has lists which have its elements scattered all across memory!
    * NumPy arrays have their elements stored in one continuous block of memory. 
        * The result is much faster access to elements!
        
For more information on how Numpy Arrays work "under the hood" check out its [documentation](https://docs.scipy.org/doc/numpy/reference/internals.html)

Let's actually get started!

## Creating a NumPy Array

In [None]:
# There are many ways to create an array! 

# Creating an array from a list
lis = [[1,2,3],[4,5,6]]
arr = np.array(lis)
print(arr)
print(type(arr))

print("\n")  # \n creates a new line break

# Or creating an array using the arange function.
arr2 = np.arange(8)
print(arr2)
arr2 = np.arange(1, 9)
print(arr2)
arr2 = np.arange(1, 9, 2) 
print(arr2)

print("\n")


In [None]:
# The shape of a 2D array is the number of rows by the number of columns in an array. 

# The shape is a property of a NumPy array object.
# That means you can get its value by the dot notation or a function
print("The shape of a 2D array is " + str(arr.shape))
# In the last print statement, substitute 'arr.shape' for 'np.shape(arr)'
print("\n")
# If we want to change the shape of an array, we can use the reshape function.
arr2 = np.arange(1,9).reshape(2,4)
print(arr2)
arr2 = np.arange(1,9).reshape(2,-1) # What happens if one dimension is -1?
print(arr2)
print("\n")


# You can use the zeros function to create an array of place-holder zeros with a certain shape
arr3 = np.zeros((3,5))
print(arr3)


### Explore:
 Use the next cell to briefly experiment with the shape of different 1D arrays and 3D arrays. 
 Also figure out what the properties 'size' and 'ndim' do without looking it up on the documentation. 

In [None]:
# We can also stack arrays on top of each other or next to each other
x = np.arange(0,10,2)                     
y = np.arange(5)   
print(x)
print(y)
print()

# Dimensions of arguments must match exactly
xTopOfY = np.vstack([x,y])  
print(xTopOfY)
print() 
xNextToY = np.hstack([x,y])   
print(xNextToY)

## NumPy Array Datatypes

In [None]:
# An important part of NumPy arrays are that every element is of the same data type
# We can use the property datatype to figure out what datatype it holds
arr = np.arange(1,9).reshape(2,4)
print(arr)
print("Every element in the array has datatype: " + str(arr.dtype))

print("\n")
# You can also force cast each element to be the same type by adding another argument
arr = np.array([[1,2], [3,4]], float)
print(arr)
print("Every element in the array has datatype: " + str(arr.dtype))


### Explore:
Use the next cell to figure out what happens if you have an array with  mixture of types (float and int), (int and string), etc. What is the resulting datatype of the array? The behavior that you will observe is called upcasting. 

In [None]:
# Now let's iterate through a NumPy array
print(a)
print("\n")

for row in a:
    print(row)
print("\n")

for col in a.T: # T means transpose (switching rows and columns)
    print(col)
print("\n")
 
for element in a.flatten(): # flatten function basically takes multi-dim array and returns a 1D array representation
    print(element)


## Iterating through NumPy Arrays

In [None]:
# Arithemtic operations on NumPy arrays are applied element to element
a = np.arange(5, 35, 5).reshape((2,3))
print(str(a) + "\n")
b = np.array([[3, 2, 4], [8, 7, 14]])
print(str(b) + "\n")
print(a-b)

print("\n")

print(a>15)

## NumPy Indexing 

In [None]:
# My favorite part of NumPy is the powerful indexing you can do when selecting in an array

# Creating a list of squares from 1 to 12 inclusive using list comprehension (abbreviated way of making a list)
squares = [(i+1)**2 for i in range(12)]

# In Python, selecting squares with multiple indices is a bit complex
# Say we want to print the first, fifth, ninth, and fourth elements in that order. 
print([squares[0], squares[4], squares[8], squares[3]])

# In NumPy, we can use an array of indices to select for certain values in an array
squares_arr = np.array(squares)
indices = np.array([0,4,8,3]) # indices is just an array of integers
print(squares_arr[indices])

print()

# If the values are all next to each other, we can use slicing to get the values too! 
print(squares_arr[2:7])
print()
# Select squares_arr values which are greater than 20 
# This is called Boolean Indexing
print(squares_arr[squares_arr>20])


In [None]:
squares_arr = squares_arr.reshape((3,4))
print(squares_arr)
print("\n")

# 3rd row, 3rd col
print(squares_arr[2][2])
print("\n")

# printing all entries in the fourth column
print(squares_arr[:,3])

row_indices = np.array([0,2])
col_indices = np.array([1,2])
print(squares_arr[row_indices, col_indices])

# Try printing 49, 81, 16 in one list using the above array indexing technique



## Wrapping Up NumPy

We just finished up the basics of NumPy. Although we didn't cover all that NumPy has to offer, you can see that it's more powerful than a list in speed, memory, and flexibility. Now, we move onto Pandas, which is heavily based off NumPy. Having a good knowledge of NumPy is helpful to understaning how Pandas works. 

## What is Pandas? 

## Sources:
Adapted from NumPy's QuickStart Tutorial
https://docs.scipy.org/doc/numpy-dev/user/quickstart.html