### Numpy arrays ###

Why do we want numpy arrays anyway?
* arrays can be n-dimensional
* arrays are pre-allocated; 
  * use a predictable amount of space 
  * can't easily change size, say, with `.append()`
* easily perform elementwise and matrix operations

There is a pretty good lesson on numpy by Ariel Rokem (now U. Washington) :
    https://github.com/dgasmith/SICM2-Software-Summer-School-2014/blob/master/Software_Carpentry/Numerical_Analysis_NumPy/numpy.ipynb
    

In [4]:
log(10)

NameError: name 'log' is not defined

In [5]:
import numpy as np
import pandas as pd

In [6]:
log(10)

NameError: name 'log' is not defined

In [7]:
np.log(10)

2.302585092994046

In [12]:
lol = [[ 1,2,3], [2,4,6], [7,14,21]]  # list of lists 
# lol 

In [11]:
print(lol)

[[1, 2, 3], [2, 4, 6], [7, 14, 21]]


In [13]:
type(lol)

list

In [14]:
type(lol[0])

list

In [15]:
lol[0]

[1, 2, 3]

In [16]:
type(lol[0][0])

int

In [18]:
arr = np.array(lol)
print(arr)

[[ 1  2  3]
 [ 2  4  6]
 [ 7 14 21]]


In [19]:
arr

array([[ 1,  2,  3],
       [ 2,  4,  6],
       [ 7, 14, 21]])

In [20]:
# So can I get the first row of lol? 
lol[1]   

[2, 4, 6]

In [None]:
# Why is this not the first row?

In [21]:
lol[1,1]   # This doesn't work for lists-of-lists.

TypeError: list indices must be integers or slices, not tuple

In [22]:
arr[1][1]  # This works for arrays

4

In [23]:
arr[1,1]  # and so does this. 

4

In [24]:
arr

array([[ 1,  2,  3],
       [ 2,  4,  6],
       [ 7, 14, 21]])

In [25]:
arr * 2

array([[ 2,  4,  6],
       [ 4,  8, 12],
       [14, 28, 42]])

In [26]:
np.sqrt(arr)

array([[1.        , 1.41421356, 1.73205081],
       [1.41421356, 2.        , 2.44948974],
       [2.64575131, 3.74165739, 4.58257569]])

numpy arrays have the attribute `.dtype`, which marks the data type all the elements are stored with.  It can be displayed: 

In [27]:
arr.dtype

dtype('int64')

Or set when calling np.array():

In [28]:
arrfloat=np.array(lol, dtype=np.float)

In [29]:
arrfloat

array([[ 1.,  2.,  3.],
       [ 2.,  4.,  6.],
       [ 7., 14., 21.]])

If we needed to change a numpy array into a particular data type, how would we?

In [None]:
If we needed to change a numpy array into a particular data type, how would we

In [31]:
np.float(arr)  

TypeError: only size-1 arrays can be converted to Python scalars

In [None]:
# Well that didn't work.  
https://letmegooglethat.com/?q=convert+numpy+array+dtype

Google tells us we have to use the `.astype()` method, which is a subroutine that we use by appending `.astype()` to the end of the name of a numpy array.  


Sometimes we use **functions** that are built-in, like 

    print() or type()
    
Sometimes we use **functions included from libraries**, like

    np.sqrt(), plt.scatter(), pd.DataFrame()
    
Sometimes we access **attributes** -- python objects associated with our data

    arr.dtype, df.iloc, arr.T 
    
and sometimes we are going to have to use **methods** that are subroutines associated with our data.  

    arr.astype(), df.head()
   
and finally, for many data structures, we will use **indexes** to reach inside a datastructure and get one element out

    arr[1]

In [32]:
# To get our array of integers as a float data type we need to append .astype(np.float) **to the end of the name of the array**.
arrfloat2 = arr.astype(np.float)
arrfloat2

array([[ 1.,  2.,  3.],
       [ 2.,  4.,  6.],
       [ 7., 14., 21.]])

Maddenningly, all of these magic spells use different punctuation, and some of them are written before the dataset and some are written after it.  You just have to memorize which functions are written after and which are written before, sorry.

Functions always have ().
Square brackets always index.


In [None]:
df.head()

In [None]:
df.iloc[1:,1:]

In [None]:
lol[1:][1:]

In [None]:
arr[1:,1:]

In [None]:
df[1:,1:]

In [None]:
arr

In [None]:
lol


In [None]:
df

In [None]:
lol[1]

In [None]:
lol[1][1]

In [None]:
arr[1]

In [None]:
arr[1:,1:]

In [None]:
df[1]

In [None]:
df.loc[1]

In [None]:
df.iloc[1]

In [None]:
df.iloc[1:,1:]

In [None]:
lol[1,1]

In [None]:
1:4

In [None]:
list1 = [1, 2, 3, 4]
list1[1:3]

In [33]:
import pandas as pd
minarddata = pd.read_csv("minard.csv")

In [34]:
minarddata

Unnamed: 0,Longitude,Latitude,City,Direction,Survivors
0,32.0,54.8,Smolensk,Advance,145000
1,33.2,54.9,Dorogobouge,Advance,140000
2,34.4,55.5,Chjat,Advance,127100
3,37.6,55.8,Moscou,Advance,100000
4,34.3,55.2,Wixma,Retreat,55000
5,32.0,54.6,Smolensk,Retreat,24000
6,30.4,54.4,Orscha,Retreat,20000
7,26.8,54.3,Moiodexno,Retreat,12000


In [35]:
minarddata[0]


KeyError: 0

In [36]:
minarddata[Latitude]


NameError: name 'Latitude' is not defined

In [37]:
minarddata["Latitude"]


0    54.8
1    54.9
2    55.5
3    55.8
4    55.2
5    54.6
6    54.4
7    54.3
Name: Latitude, dtype: float64