<a href="https://colab.research.google.com/github/neuralabc/PythonTools4Neuroimaging/blob/main/PSYC458_01_Help_Data_Vectors_Matrices_Plotting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working with data
This week we will walk through some exercises to introduce you to how data is represented, accessed, changed, and plotted within python. We will use two of the **most commonly** used packages in all of scientific computation.  
## numpy and matplotlib
- Working with numerical data in Python: https://numpy.org/
  - numpy is **the** fundamental package for working with numerical data in python. It has many many many functions, most of which we will not talk about or use. The documentation is, in general, excellent, and accomplishing the task that you need to accomplish can likely be done in more than one way. The number of functions in numpy can make working with it confusing. Looking things up, exploring functions, and *most importantly* testing with toy examples that you can easily understand are the best ways to learn about and use these functions properly.
  - to import numpy, execute the following command in a code block: `import numpy as np`
    - `import numpy` tells the notebook that you would like to have access to the functions within numpy, `as np` tells it that you would like to create an alias to numpy functions. This is just so that you do not have to type `numpy` many many times when you want to access functions. In this case, by common convention, we use `np` as the alias for numpy. Once this command has been run, you can refer to any functions within numpy by typing `np.`

- Plotting in Python: https://matplotlib.org/
  - matplotlib was initially designed to cover the same functionality of matlab plotting. It has since diverged a bit, but it is generaly the case that anything that you can plot with matlab can be plotted with matplotib
  - we generally use the matplotlib.pyplot functions, which are commonly imported as follows: `import matplotlib.pyplot as plt`
    - alternatively, in notebooks you can use a shortcut _magic command_: `%pyplot`


In [None]:
#lets get started, import numpy here, and create an alias for it so that you don't need to type "numpy" all the time
import numpy as np

In [None]:
#now, refer to a specific function within numpy
np.arange(0,10) #this function creates a vector (1d array that starts at 0 and has 10 elements)

# 1. Getting help
1. Within python
  - once you have imported a package, you can use `tab completion` to see what functions are available at the current level (note, you can think of the package as giving you access to a single directory, using the `.` syntax allows you look into the next subdirectory of the package to find functions)
    - in google colab, you simply need to type the package name followed by a dot (e.g., `np.`) and wait for a small popup window to show you what functions are available at this level. You can also traverse to deeper levels to see what is there.
    - once you have found your function, you can use the `?` character to request the in-line help for this function
2. On the web
  - package documentation: the official documentation of the package 
    - e.g. web search: https://duckduckgo.com/?q=create+random+vector+with+numpy&t=brave&ia=web
    - first hit gives you access to the official documentation, take a look at it
      - `numpy.random.rand()`, in our case this would be `np.random.rand()`
        - this tells us we are accessing the function `rand` within the "subdirectory" `random` within the "directory" `np` (this is our alias for `numpy`)
    - in general, you can find most official documentation by searching for the package name and `documentation`
  - stack exchange: loads of information here, but sometimes at a very high level
    - https://stackexchange.com/
  - other web sites: any web search will likely produce **MANY** hits, you will have to filter through them to find the information that you want.
3. Talk to your colleagues!


In [None]:
#try tab completion here, using numpy.random
#type the "." then wait while the notebook loads the available functions at this level
#once they are there, use the up and down arrows to navigate
np.

In [None]:
# type np.arange() and then wait, you should see a popup with some brief contextual help

In [None]:
#now learn about a specific function by putting "?" at the end of it
np.random.rand? #check out the help popup! compare that to the results from the official package from your earlier 

In [None]:
#based on the help, lets use this function to create a random vector
vec_length = 10 #first define a variable that will determine the length of the vector (i.e., the number of values in the vector)
vec = np.random.rand(vec_length) # here we assign the output of the np.random.rand call to the variable "vec"
#lets check its shape to make sure we did what we thought we were doing
vec.shape #if you leave a command like this at the end of the code block, you will see the output of running the command. If you ran something else after it you would not
#in that case, you could you the "print" command to print the output of the command - print just prints output back to the screen for you

# A side note on using the `print` statment
  - put commands or variables inside the print statement `print()` to ensure that they are output to the screen. This **does not** save any results, it **only** presents it to the screen for you to see

In [None]:
#compare this cell to the following cell
2*2
"Hi there! You won't see this because it is not the last command to be run in the cell and you didn't call print before it!"
2*100

In [None]:
#compare this cell to the following cell
print(2*2)
print("Hi there!")
2*100

# 2. Working with vectors

In [None]:
#ok, now lets see what that vector we created looks like, we can print it!
print(vec) #yep, those are numbers between 0 and 1

In [None]:
#we can also plot it with matplotlib 
import matplotlib.pyplot as plt
plt.plot? #lets pull up the help
plt.figure() #create a new figure first (you don't need to do this if you don't have any other figures, plt.plot will do this automatically for you)
plt.plot(vec)

#lets give it some axes too
plt.figure() #first create a new figure, otherwise we would plot over top of the previous figure
plt.plot(vec) #plot that data
plt.xlabel("Time")
plt.ylabel("Interest in cats")

In [None]:
#lets create some more data, this time two vectors that are correlated with eachother
# to do this, we need to use another function to generate correlated data
#from: https://stackoverflow.com/questions/18683821/generating-random-correlated-x-and-y-points-using-numpy
import numpy as np

num_els = 1000 #number of elements in each vector
xx = np.array([0, 1]) #range of x-values
yy = np.array([0, 20]) #range of y-values
means = [xx.mean(), yy.mean()]  
stds = [xx.std() / 3, yy.std() / 3]
corr = 0.8         # correlation
covs = [[stds[0]**2          , stds[0]*stds[1]*corr], 
        [stds[0]*stds[1]*corr,           stds[1]**2]] 

var1, var2 = np.random.multivariate_normal(means, covs, num_els).T
plt.plot(var1,var2,'.')

In [None]:
#we asked for a specific correlation between x and y, lets check out if it worked
np.corrcoef?
print(np.corrcoef(var1,var2))

# 3. Working with 2d arrays 

In [None]:
#we can create random ones in a similar way

mat = np.random.rand(5,10) #we pass a tuple that defines the shape of the data that we want to generate

In [None]:
# this tells us the shape, it should look different from the 1d array we made above
print(mat.shape) 

In [None]:
print(mat) #note how this array is defined

In [None]:
#you cannot plot this with plt.plot anymore, since plot only works with 1d arrays
#in this case, we use "imshow"
plt.imshow?


In [None]:
plt.imshow(mat)
plt.colorbar() #we should figure out how colors are mapped to values

In [None]:
#add some axes!

# 4. Accessing individual datapoints within arrays
  - using slice syntax `[]`
    - use numerical indices, starting at 0, for each dimension that will be accessed
    - you _can_ get fancy too https://numpy.org/doc/stable/user/basics.indexing.html#basics-indexing
    - use `:`to denote "all" or "the remainder" in a dimension
      - e.g., `vec[:]` is the entire vector, `vec[1:]` is from the 1st element (i.e., the _2nd_ element in the vector because python is 0-based) to the end of the vector
  - indexing can get complicated, **always** test what you are doing with some sample data to confirm your understanding

In [None]:
#lets create a simple vector
vec = np.array([0.5,10,22,5,33]) #can use np.array() to explicitly create an array
a_list = [1,2,3,4] #this creates a list, which is a basic Python type and cannot be used to do everything that np.array can be used to do! 
print(vec)
print(a_list) #but they look the same when you print them!

In [None]:
print(type(vec)) #but these variables have different types
print(type(a_list))

In [None]:
#so lets convert that list to an array
an_array = np.array(a_list) #easy!

In [None]:
#python is 0-based, 0 corresponds to the first element in the given dimension
#for this 1d array, we only need one indexing value
print(vec[0]) # the fist element of the array

In [None]:
print(vec[3])

In [None]:
#we do the same thing for multi-dimensional arrays, except we need to specify as many indices as there are dimensions
mat = np.array([[0,1],[10,20],[30,40]])
print(mat)
print(f"matrix shape: {mat.shape}")

In [None]:
print(mat[0]) #gives us the first element of the array, does it do what was expected?

In [None]:
# a clearer and unambiguous way to select a row
print(mat[0,:]) #0zth row, all elements


In [None]:
# to get specific element in the array, specify all dimensions
print(mat[0,2]) #what is wrong here?

In [None]:
print(mat[2,1])

In [None]:
#we can also select multiple elements from within the array with more complex syntax
print(mat[1:,:])
print("") #prints a carriage return between outputs
print(mat[(1,2),:])
print("")
print(mat[(0,2),:])

In [None]:
# practice indexing a 3d array
import numpy as np
vol = np.random.rand(4,4,4) #lets start small, so that we can see it
print(vol) #again, look at how this is stored, it is a concatenation 2d arrays (which, in turn, are concatenations of 1d arrays)

In [None]:
print(vol[0,0,0]) 

In [None]:
#it is hard to know where we are, so lets overwrite this vol variable with something easier to follow
#to do so, we create an empty array (with zeros, but we could fill it with anything) then we create incremental values from 0 to the total number of elements in the array, then we fill the array with the values
el_per_dim = 4 #number of elements per dimension
np.zeros?
vol = np.zeros((el_per_dim,el_per_dim,el_per_dim)) #create an array of zeros
incremental_values_vec = np.arange(0,el_per_dim**3) #we need this many values, incrementing by one
vol[:] = incremental_values_vec.reshape((el_per_dim,el_per_dim,el_per_dim)) #overwriting the contents of the array

In [None]:
print(vol) #wwhy does this work?

In [None]:
# hmmm, we didn´t even need to create an empty array to start with!
vol = incremental_values_vec.reshape((el_per_dim,el_per_dim,el_per_dim)) #gives us everything we want!

In [None]:
#now see if you can grab specific values from this array
vol[0,1,3]


In [None]:
#try plotting each of the 2d arrays that make up this array

