# Code

## Reading Data

In [None]:
import numpy as np

### Loading Data from File

In [None]:
X = np.genfromtxt("ionosphere.txt", delimiter=",") # load data from file 

In [None]:
print(X[:3])

### Extracting Data

- `delimiter` defines the string to be used for separating values
- Default `delimiter` of spaces is not used, but rather a comma `,` is used instead
- `usecols` Which columns to read, with 0 being the first 
  - For example, `usecols = (1, 4, 5)` will extract the 2nd, 5th and 6th columns.
  - `np.arrange(34)` creates a list (similar to list comprehension) up to 33
  - This means that all the columns up to 33 will be extracted
  - This is because the number of features (dependent variable vector) is 34

In [None]:
X = np.genfromtxt("ionosphere.txt", delimiter=",", usecols = np.arange(34)) # load data from file 

- Now that the labels are extracted, they can be stored in the variable `y` which represents the labels (dependent variable vector)
- `dtype` of `int` specifies that the data to be stored needs to be integer

In [None]:
y = np.genfromtxt("ionosphere.txt", delimiter=",", usecols = np.arange(34), dtype='int') # load data from file 

In [None]:
print(y)

## Concatenating 

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5,])

- `np.concatenate()` concatenates 2 sequences together
- It only takes on argument for simple concatenations, a tuple containing the sequences to be concatenated
  - This is not Pythonic design as it it possible to pass multiple arguments per parameter using tuple unpacking (`*args`)

In [None]:
np.concatenate((a, b))

- Checking if list comprehension is the same as `np.arange()` which is true
- `np.arange()` generated an array, in this case to `10 ^ 6`

In [None]:
print([x for x in range(10 ** 6)] in np.arange(10 ** 6))

- `np.array([4])` creates an array with the single element 4 in it

In [None]:
print(np.array([4]))

- 2 arrays, `np.arange(10 ** 6)` and `np.array([4])` are concatenated together

In [None]:
my_array = np.concatenate((np.arange(10 ** 6), np.array([4])))

## Computing Minimum

- The Nearest Neighbour method depends on the ability to compute the minimum of an array of real numbers
- In evaluating conformal predictors you may need to compute the largest p-value

In [None]:
import math

- `math.inf` is infinity
- Infinity is the initial value of the minimum until something smaller/closer is found
- For each element in the index, the current element is compared with the current minimum
- If the current element is smaller/closer than the current minimum, then the current element becomes the new minimum
- This process is repeated for the entire list

In [None]:
current_minimum = math.inf # infinity
for index, element in enumerate(my_array): # iterate over array
	if current_minimum > element: # check if current element is smaller than current minimum
		current_minimum = element # update current minimum if necessary
print(current_minimum)

# Exercises

## Question 1
Describe briefly the array my_array in English.

- It concatenates sequences together if they are on dimensional (vectors)

## Question 2
Why did we repeat opening and closing parentheses for `np.concatenate()`?

- Because the function takes only argument when concatenating, a tuple containing the sequence types to be concatenated
- This is actually not Pythonin as multiple arguments (sequence types) can be passed per parameter via tuple unpacking (`*args`)

## Question 3
Why to you think the result for `sum(4 >= my array)` is 6?

In [None]:
print(sum(4 >= np.array([1, 2, 3, 4, 5, 6]))) # number of elements in array that are smaller than 4 
print(sum(np.array([1, 2, 3, 4, 5, 6]))) # sum of all elements in array

- The `sum()` function has 2 capabilities:
  - Compute the sum of all the elements
  - Compute the number of elements that are within a certain range or comparison

## Question 4
Modify the code to also find the position of the minimum (i.e.,the index of the smallest element of the array; you may assume that there is a unique smallest element)

In [None]:
current_minimum = math.inf # infinity
minimum_index = -1
for index, element in enumerate(my_array): # iterate over array
	if current_minimum > element: # check if current element is smaller than current minimum
		current_minimum = element # update current minimum if necessary
		minimum_index = index # update index of current minimum
print(current_minimum, minimum_index)

## Question 5/6/7
Write a function for computing the squared Euclidean norm of a vector represented as a NumPy array. You may use for loops. Your function should take two inputs, the array and its length.

In [None]:
def squared_euclidean_norm(array: list) -> float:
	return (math.sqrt(sum([x ** 2 for x in array])))

In [None]:
print(squared_euclidean_norm([1, 2, 3]))

# Revision Questions

## Question 1
Give the definition of a conformity measure in the context of conformal prediction.

- A conformity measure is a function that quantifies the degree to which a prediction set contains the labels of the data points it was generated from

## Question 2
Give the definition of a nonconformity measure in the context of conformal prediction.

- A nonconformity measure is a function that quantifies the degree to which a prediction set does not contain the labels of the data points it was generated from

## Question 3
Define the notion of a conformal predictor.

- A conformal predictor is a function that takes as input a data point and a prediction set, and outputs a label that is predicted to be in the set.

## Question 4
Compare and contrast nonconformity measures and conformity measures.

- A conformity measure quantifies the degree to which a prediction set contains the labels of the data points it was generated from
- A nonconformity measure quantifies the degree to which a prediction set does not contain the labels of the data points it was generated from.

## Question 5
In the context of conformal prediction, what is the minimal possible p value for a training set of size `n`?


The minimal possible p-value for a training set of size n is $$\frac{1}{n}$$

## Question 6
In the context of conformal prediction, what are the possible p-values for a training set of size `n`?

The possible p-values for a training set of size n are $$ \frac{1}{n}, \frac{2}{n}, \frac{3}{n}, ..., 1 $$