## Introducing ```numpy``` and arrays

To begin processing image data, we need to understand what's going on behind the scenes.

We can do that using a library called ```numpy```, which stands for __Numerical Python__. 

In general, you should use this library when you want to do fancy mathemtical operations with numbers, especially if you have arrays or matrices.

In [None]:
# tools for interacting with the operating system
import os

# tool fo working with arrays
# creating an abbreviation to save keystrokes
import numpy as np

In [None]:
# load data
path = "../data/sample-data/sample-data-01.csv"

# The pandas way
import pandas as pd
df_pan = pd.read_csv(path)

# The numpy way
df_num = np.loadtxt(fname = path, delimiter = ",")

In [None]:
# show array
df[0:3]

The expression ```numpy.loadtxt(...)``` is a function call that asks Python to run the function ```loadtxt``` which belongs to the ```numpy``` library. This dotted notation is used everywhere in Python: the thing that appears before the dot contains the thing that appears after.


```numpy.loadtxt``` has two parameters: the name of the file we want to read and the delimiter that separates values on a line. These both need to be character strings (or strings for short), so we put them in quotes.

__Assign to variable__

In [None]:
# load array
df_num = np.loadtxt(path, delimiter=",")

In [None]:
df_pan[1:3]

In [None]:
# inspect array
df_num[1:3]

In [None]:
# print data type
print(type(df_num))

__numpy.ndarray__ tells us that we are working with an N-dimensional array

In this case, it's 2-dimensional

In [None]:
# print type of data points
print(df_num.dtype)

In [None]:
# print shape
print(df_num.shape)

In [None]:
# check shape (number of observations)
60*40

__Index__

Indexing is similar to lists and strings, but we need to inlcude both row and column

In [None]:
# your code here
df_num[0]

In [None]:
# The first observation in the data
df_num[0,0]

__Question:__ What is the middle value of the array?

In [None]:
# your code here
df_num[30,20]

Print the value of ```middle_value``` to the screen:

In [None]:
# your code here

<img src="../data/viz/python-zero-index.svg">

__Slice__

An index like [30, 20] selects a single element of an array, but we can select whole sections as well. 

For example, we can select the first ten columns of values for the first four rows like this:

In [None]:
# your code here
df_num[0:4,0:10]

First ten columns, rows five-ten

In [None]:
# your code here
df_num[5:11,0:10]

__Select only one row__

In [None]:
# print row zero all the way from beginning to the end
df_num[0,:]

__Select only one column__

In [None]:
# print column zero all the way from beginning to the end
df_num[:,0]

__Numpy functions__

```numpy``` comes with a range of built-in methods which allow you to quickly and efficiently calculate descriptive statistics for an array.

In [None]:
# Find average score or std dev or max or min value for all patients across all days
print(np.mean(df_num))
print(np.std(df_num))
print(np.max(df_num))
print(np.min(df_num))

In [None]:
# Assign multiple variables at once
max_value, min_value, mean_value = np.max(df_num), np.min(df_num), np.mean(df_num)



Show numpy + dot + tab, access full range of options. Show ```help()```

"Average score per day"

__Operation along columns__

"Average score per patient"

<img src="../data/viz/numpy-axes.png">

This is a good overview to show how things work wiht ```numpy```:

https://www.sharpsightlabs.com/blog/numpy-axes-explained/

## Exercise

- We saw how to calculate descriptive statistics for a single array. In the data folder, there are more examples of sample data in the folder called [data/sample-data]("../data/sample-data").
  - Write some code which does the following steps:
    - Load every CSV data file in the input folder one at a time
    - For each CSV file, calculate: 
      - The mean and median values for each patient
        - Create a list of tuples for each CSV 
          - Eg: [(```patient0_mean, patient0_median```),
                 (```patient1_mean, patient1_median```),
                 etc, etc]
      - The same as above, but this time calculating the mean, median, and modal values for each day
       

## Basic image processing with OpenCV

We start by loading all of the modules we'll need for this class

In [None]:
# We need to include the home directory in our path, so we can read in our own module.
import sys
sys.path.append("..")

In [None]:
# python framework for working with images
import cv2

# some utility functions for plotting images
from utils.imutils import jimshow

__Read image__

We can load an image using a handy function from OpenCV

In [None]:
# Defining file path in a more universal way so it runs on both mac and windows
path_to_image = os.path.join("..", "data", "img", "trex.png") # os.path.join makes the path universal (instead of / we use ,)

In [None]:
# load image using opencv
image = cv2.imread(path_to_image)

In [None]:
# Show the actual image
jimshow(image, "my cute dino")

__Save image__

In [None]:
# Defining output file path
outpath = os.path.join("..", "data", "img", "trex2.png")

In [None]:
# Writing the file
cv2.imwrite(outpath, image) #image refers to what I named the image

__Inspect image__

In [None]:
# your code here
print(type())

## What is an image?

__Remember how ```numpy``` arrays work!__

ROWSxCOLUMNS == HEIGHTxWIDTH

In [None]:
# your code here

In our image, there are 228*350 = 79,800 pixels

__What about the last one?__

In [None]:
# your code here

__NB!__

```OpenCV``` stores RGB tuples in REVERSE ORDER

__What colour is a specific pixel?__

In [None]:
# your code here

In [None]:
print(f"[INFO] pixels at (0, 0) - Red: {r}, Green {g}, Blue: {b}")

__Modify colour__

In [None]:
# your code here

In [None]:
print(f"[INFO] pixels at (0, 0) - Red: {r}, Green {g}, Blue: {b}")

__Image slice__

In [None]:
# your code here

In [None]:
# your code here

__Change corner colour__

In [None]:
# your code here

In [None]:
# your code here