## Data structures with numpy

In this worksheet we will work on creating data structures with numpy. Oftentimes when we are doing scientific computing one of our biggest challenges has to do with working with the data set we are given. That can mean lots of different things. Arranging data, summarizing data, dealing with incomplete observations. But before we can do any of that, we have to get our data into some kind of structure. 

Numpy (The name comes from **Num**eric **py**thon) is a library for python that contains many powerful functions for working with data. The principal data structure we will deploy is a structure called a **numpy array**.

To work with any python library, we first need to import that library into our workspace. This allows us to access any functions or other elements contained within that library as if we defined them ourselves as part of our python script.

In [2]:
import numpy as np

Once you have run the code above, we will have access to numpy's functions. Note that we created a shorthand for the function - so we can refer to it as np, and can call any function contained within the library by using a command of the form: 
``` 
np.function_name()
```
In the command below, we will use the numpy function `zeros()` to create an empty numpy array with a single dimension. This array will hold five pieces of data which the function will fill with zeros.

In [33]:
v = np.zeros(5)
print(v) 

[0. 0. 0. 0. 0.]


Note that numpy is fundamentally a system for holding and performing mathematical operations on numerical data, so the data added to a numpy array will be set to the type `float64` by default. Note that the function can be indexed just like a list. We can change the values in the array by assigning a value to a position in the array designated by its index.

In [24]:
print(type(v[0]))      # Figure out and report what the data type of each element is

v[0]=6.5               # Substitute a different value for the first item in the array
print(v)


<class 'numpy.float64'>
[6.5 0.  0.  0.  0. ]


In [26]:
v[3]= "banana"  

ValueError: could not convert string to float: 'banana'

Try running the code above - unlike a list, if we add different kinds of data to a numpy array we will get an error. 

The following code uses a new numpy function, `ones` , the output should not surprise you. 

In [37]:
w = np.ones(6)
print(w)

[1. 1. 1. 1. 1. 1.]


We can also use the numpy function `array` in a couple of different ways. First, we can use it to define an array manually by putting square brackets, as if we were defining a list, inside the parentheses of the numpy function:


Numpy arrays can be defined with more than one dimension.    to create a one dimensional array (which you might call a matrix)

In [39]:
manual_array = np.array([1.735, 3.14, 2.89, 7.5])
print(manual_array)

[1.735 3.14  2.89  7.5  ]


Likewise, we can convert a `list` to a numpy array by using it as the input to the `array` function. 

In [43]:
list_to_convert = [42.3, 74.1, 89.3, 100.2]
print(type(list_to_convert))                   

converted_list = np.array(list_to_convert)
print(type(converted_list))

<class 'list'>
<class 'numpy.ndarray'>


Numpy arrays have certain characteristics that allow them to be useful for data manipulation. One of them is that every numpy array has an attribute known as `shape`. We have already used shapes. When we defined our intial numpy array above, we told numpy to create an array with shape 5. 

In [44]:
print(v.shape)

(5,)


This notation means that the shape of the array is one dimensional, with a length of five. We can also use the `shape` command to change the shape of a data structure. 

In [55]:
x = np.ones(24)
print("old shape " + str(x.shape))   # report the shape
print(x)

x.shape = (4,6)                      # now let's change the shape
print("new shape " + str(x.shape))   # report the new shape
print(x)

old shape (24,)
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
new shape (4, 6)
[[1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]]


We can also use the shape function when we define the function in the first place, just note that for any array with more than one diemension the shape specfication must be enclosed in parentheses, even within the call to the array making function (`ones` in this case)
Also, when we index data, we now need to use both dimensions in the specification of which datapoint we are talking about. 

In [59]:
w = np.ones((6,6))
print("original array")
print(w)

w[0,0] = 0
w[5,5] = 8
print("array with substitutions")
print(w)

original array
[[1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]]
array with substitutions
[[0. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 8.]]


Just like a list, the indices of an array will count from 0, so an array of length 6 will consist of positions 0 through 5. 

Show some out of bounds error and intrepret the IndexError statement  - note there is axis 1 and axis 0 in a 2 dimensional array. 

There ar 6 spaces in the x array. We're trying to change it to a 2x4 array, which means there should be 8 spaces. Doesn't work. Let's change it to a 2x3 array instead.

In [65]:
w.shape = (4,4)    # change to numbers that are bigger, do the numbers of rows and columns have to exactly match data.

ValueError: cannot reshape array of size 36 into shape (4,4)

In [62]:
w[3,6]


IndexError: index 6 is out of bounds for axis 1 with size 6

In [3]:
z = np.linspace(0,100,21)
z

array([  0.,   5.,  10.,  15.,  20.,  25.,  30.,  35.,  40.,  45.,  50.,
        55.,  60.,  65.,  70.,  75.,  80.,  85.,  90.,  95., 100.])

linspace = linear spaced. The variables in the array are spaced out linearly. 

In [None]:
z.

Importing data from an external file directly into a numpy array. 