# Intro to numpy

####  Review and Outline

Great Work! We have made it this far...we know some basic calculations, built-in data types and structures (lists, tuples, strings, dictionaries), we also know some key operations if else conditional operations, for loops, etc. 

Where are we going know...we will get into the key scientific computing packages in python: **numpy**. 

What is numpy, a short for **Numerical Python**. It can be used for high performance computing and data analysis. 
* **Efficiency**: it provides the most efficient data structure in python: `ndarray` for this type of computing. Imagine when you need to conduct calculations on more than 200k rows with 10k columns over and over again.
* **Data analysis**: though itself does not provide very high-level data analytical function as `pandas`, having an understanding of it will help us use tools in pandas with less pain.

[This notebook largely follows the discussion in the Book.](https://nyudatabootcamp.gitbooks.io/data-bootcamp/content/py-fun2.html)

#### Python

First we need to import the `numpy` package. 

Then we will learn the key data structures in **numpy** and their attributes and methods. Moreover, we will learn how to select data in **`ndarray`** and then do computations afterwards.

**Buzzwords.** NdArray

---
## Basics

This says import the package `numpy` then the "as np" says call it `np` (our alias)
this just simplifies our life without having to always type `numpy`, we just
type `np`. IF you're lost on this, go back to our chapter on [importing packages](https://nyudatabootcamp.gitbooks.io/data-bootcamp/content/packages.html). 

Let's first get to know the most important data structure in `numpy`.

In [None]:
import numpy as np

### Array

The `ndarray` is the primary building block of numpy. It enables us to perform mathematical computations efficiently using similar syntax to the equivalent operations for scalar elements as we learned in python fundamental notebook 1. So let's creat an array object via `array` methods in `numpy`.

In [None]:
#create an array


In [None]:
# Let's create an another array


Now we can do some simple computations like we've done for scalars in python fundamental notebook 1.

In [None]:
#add the arrays


In [None]:
#multiply the arrays


In [None]:
#look at shape


It seems that there is something missing after the comma. Why? Is it wrong or undefined.

No, it is not wrong but will sometimes lead to unexpected results in computations, especially for operations among matrices and this type of arrays. So we recommend using the `reshape` methods in `numpy` to specify the second dimension as 1.

In [None]:
#reshape arr1


In [None]:
# reshape b


Three more ways to initialize 1-d or 2-d arrays:

In [None]:
# Initiallize an array with zeros


In [None]:
# Initiallize an array with ones


In [None]:
# Initiallize an array with ones only in the diagonal


In fundamental notebook 2, we have learned the `range` object when using it with for loops. Here we present the `numpy` array version of it.

In [None]:
#arange


In [None]:
#linspace


### Transpose an array

In `numpy`, transpose an 1-d or 2-d array is super easy and fast via `.T`.


In [None]:
#transpose
arr3 = np.array([[1, 2, 3], [4, 5, 6]])


In [None]:
#shorthand method


### A Gentle Touch on Broadcasting

Arrays with different sizes cannot be added, subtracted, or generally be used in arithmetic.

A way to overcome this is to duplicate the smaller array so that it is the dimensionality and size as the larger array. This is called array **broadcasting** and is available in `numpy` when performing array arithmetic, which can greatly reduce and simplify your code.

For example, what will be the results?

In [None]:
# add 2


It broadcasts the scalar value **2** five times and add it to the each value in the **arr1**.

---
### Time to practice

**Exercises.** Initialize a 4 by 1 array with 2 and named it as arrE1.

In [None]:
arrE1 = ''

**Exercises.** Initialize a 1 by 4 array with number 3 and named it as arrE2.

In [None]:
arrE2 = ''

**Exercises.** Can you perform an element wise add operation of the arrE1 and arrE2?

**Exercises (challenging).** How to create a 3 by 3 array with only zeros in diagonal while the rest is 2?

In [None]:
arrE3 = ''

---
## Slicing

Slicing in `numpy` array is like we have done for lists. Let's first define a two-dimensional array and then review what we have learned. 


In [None]:
arr4=np.array([[2,3,4],[8,5,7]])
arr4

How to get number **3** from the above 2-dimensional arrays?

In [None]:
#slice row, col


In [None]:
#slice row, then col


In [None]:
#slice using :


Can you figure out why this line of code only return one number instead of 3 and 4? In particular, this is different for the methods in pandas `iloc` dataframe methods. Be careful with the indexing hassals for different data structure, it may result potential errors and hard to identify. 

Let's see the example first and we will cover more details in next "intro to pandas" notebook.

In [None]:
import pandas as pd

arr4_datafram=pd.DataFrame(arr4)
arr4_datafram.iloc[0,1:2]

In addition, we can continue using **forward** counter, a **backward** counter, and **:** operator like we did with list or string data structures when selecting data.

---
## Useful Math Methods in Numpy


### Elementwise Methods

Remeber in python fundamental notebook 1, when we want to compute the log of a scalar, it returns an error, saying not defined. Yes, it is. Since in python, the majority of math operations like log, exp and so on are defined in `numpy` package. 

Let's see the following examples...

In [None]:
#log


In [None]:
#e^x


In [None]:
#sqrt


### Array-wise Operation

In [None]:
arr3

What will we get in the following？

In [None]:
#sum the array


Interesting, it only returns one number which is the sum of all the elements of the array. 

But can we perform row or column sum? 

Yes, we can...

In [None]:
arr3

In [None]:
# Column Sum -- axis 0


In [None]:
# Row Sum -- axis 1


I know what does **axis** mean in the above function call may seem confusing right now. Let's remember one principle: when setting axis, always think about the operation first, whether it will be done across column or across row. If the former, setting axis = 1, otherwise, sett

And we'll see more examples about this in next "intro to pandas" notebook.

---
### Time to practice

**Exercises.** How to compute the **column** mean?

**Exercises.** How to compute the **column** mean in second and third column?

**Exercises.** How to compute the **row** mean?

### Random number generator 

We can use randn random number generator to generate an `numpy` with samples from a  “standard normal” distribution in specified shape.

For example, we generate a 2 by 4 random number array...

In [None]:
#random 2 x 4 array


### Saving array objects

In [None]:
#create a large array X


In [None]:
#save the array


In [None]:
#load back in


---
## Summary

**Congratulations!** First, it's amazing that you have made it this far. Reflect on what you knew before working through this notebook, namely what we did in python fundamental notebooks. Now reflect on what you can do...AMAZING!!! Let us summarize some key things that we covered.

* **Numpy Core Objects**: An `array` with one demension is essentially just a vector of data while a `array` with two dimension can be thought a table of data with rows and columns. We will not cover dimension more than 2 in this course.

* **Understanding the 2-d `Array`**:
    * Learn how to initialize an array with desired values and dimensions. 
    * Become familiar with python built-in computations, e.g., `+`, `-`, among 1-d or 2-d arrays and the implicit usage of broadcasting in them.
    * Know how to grab elements from an arraym, the elements could be a number or part of the original arrays.
    * Two types of useful mathmatic methods in array.
        * Operations perform on each individual elements of the array, e.g., `np.log`.
        * Operations across columns or rows, e.g., `np.sum`. This one require the correctly setting the `axis` parameters in the `numpy` methods.

* **Axis Understanding**: when setting **axis**, always think about the operation first, whether it will be done across column or across row. If the former, setting axis = 1. For this course, the **axis** will always be **0** or **1**. We will cover more examples in "intro to pandas notebook".
