# Code

## Reading Data

In [1]:
import numpy as np

### Loading Data from File

In [2]:
X = np.genfromtxt("ionosphere.txt", delimiter=",") # load data from file 

In [3]:
print(X[:3])

[[ 1.       0.       0.99539 -0.05889  0.85243  0.02306  0.83398 -0.37708
   1.       0.0376   0.85243 -0.17755  0.59755 -0.44945  0.60536 -0.38223
   0.84356 -0.38542  0.58212 -0.32192  0.56971 -0.29674  0.36946 -0.47357
   0.56811 -0.51171  0.41078 -0.46168  0.21266 -0.3409   0.42267 -0.54487
   0.18641 -0.453    1.     ]
 [ 1.       0.       1.      -0.18829  0.93035 -0.36156 -0.10868 -0.93597
   1.      -0.04549  0.50874 -0.67743  0.34432 -0.69707 -0.51685 -0.97515
   0.05499 -0.62237  0.33109 -1.      -0.13151 -0.453   -0.18056 -0.35734
  -0.20332 -0.26569 -0.20468 -0.18401 -0.1904  -0.11593 -0.16626 -0.06288
  -0.13738 -0.02447 -1.     ]
 [ 1.       0.       1.      -0.03365  1.       0.00485  1.      -0.12062
   0.88965  0.01198  0.73082  0.05346  0.85443  0.00827  0.54591  0.00299
   0.83775 -0.13644  0.75535 -0.0854   0.70887 -0.27502  0.43385 -0.12062
   0.57528 -0.4022   0.58984 -0.22145  0.431   -0.17365  0.60436 -0.2418
   0.56045 -0.38238  1.     ]]


### Extracting Data

- `delimiter` defines the string to be used for separating values
- Default `delimiter` of spaces is not used, but rather a comma `,` is used instead
- `usecols` Which columns to read, with 0 being the first 
  - For example, `usecols = (1, 4, 5)` will extract the 2nd, 5th and 6th columns.
  - `np.arrange(34)` creates a list (similar to list comprehension) up to 33
  - This means that all the columns up to 33 will be extracted
  - This is because the number of features (dependent variable vector) is 34

In [4]:
X = np.genfromtxt("ionosphere.txt", delimiter=",", usecols = np.arange(34)) # load data from file 

- Now that the labels are extracted, they can be stored in the variable `y` which represents the labels (dependent variable vector)
- `dtype` of `int` specifies that the data to be stored needs to be integer

In [5]:
y = np.genfromtxt("ionosphere.txt", delimiter=",", usecols = np.arange(34), dtype='int') # load data from file 

In [6]:
print(y)

[[ 1  0 -1 ... -1 -1 -1]
 [ 1  0  1 ... -1 -1 -1]
 [ 1  0  1 ... -1 -1 -1]
 ...
 [ 1  0 -1 ... -1 -1 -1]
 [ 1  0 -1 ... -1 -1 -1]
 [ 1  0 -1 ... -1 -1 -1]]


## Concatenating 

In [7]:
a = np.array([1, 2, 3])
b = np.array([4, 5,])

- `np.concatenate()` concatenates 2 sequences together
- It only takes on argument for simple concatenations, a tuple containing the sequences to be concatenated
  - This is not Pythonic design as it it possible to pass multiple arguments per parameter using tuple unpacking (`*args`)

In [8]:
np.concatenate((a, b))

array([1, 2, 3, 4, 5])

- Checking if list comprehension is the same as `np.arange()` which is true
- `np.arange()` generated an array, in this case to `10 ^ 6`

In [9]:
print([x for x in range(10 ** 6)] in np.arange(10 ** 6))

True


- `np.array([4])` creates an array with the single element 4 in it

In [10]:
print(np.array([4]))

[4]


- 2 arrays, `np.arange(10 ** 6)` and `np.array([4])` are concatenated together

In [11]:
my_array = np.concatenate((np.arange(10 ** 6), np.array([4])))

## Computing Minimum

- The Nearest Neighbour method depends on the ability to compute the minimum of an array of real numbers
- In evaluating conformal predictors you may need to compute the largest p-value

In [12]:
import math

- `math.inf` is infinity
- Infinity is the initial value of the minimum until something smaller/closer is found
- For each element in the index, the current element is compared with the current minimum
- If the current element is smaller/closer than the current minimum, then the current element becomes the new minimum
- This process is repeated for the entire list

In [13]:
current_minimum = math.inf # infinity
for index, element in enumerate(my_array): # iterate over array
	if current_minimum > element: # check if current element is smaller than current minimum
		current_minimum = element # update current minimum if necessary
print(current_minimum)

0


# Exercises

## Question 1
Describe briefly the array my_array in English.

- It concatenates sequences together if they are on dimensional (vectors)

## Question 2
Why did we repeat opening and closing parentheses for `np.concatenate()`?

- Because the function takes only argument when concatenating, a tuple containing the sequence types to be concatenated
- This is actually not Pythonin as multiple arguments (sequence types) can be passed per parameter via tuple unpacking (`*args`)

## Question 3
Why to you think the result for `sum(4 >= my array)` is 6?

In [14]:
print(sum(4 >= np.array([1, 2, 3, 4, 5, 6]))) # number of elements in array that are smaller than 4 
print(sum(np.array([1, 2, 3, 4, 5, 6]))) # sum of all elements in array

4
21


- The `sum()` function has 2 capabilities:
  - Compute the sum of all the elements
  - Compute the number of elements that are within a certain range or comparison

## Question 4
Modify the code to also find the position of the minimum (i.e.,the index of the smallest element of the array; you may assume that there is a unique smallest element)

In [15]:
current_minimum = math.inf # infinity
minimum_index = -1
for index, element in enumerate(my_array): # iterate over array
	if current_minimum > element: # check if current element is smaller than current minimum
		current_minimum = element # update current minimum if necessary
		minimum_index = index # update index of current minimum
print(current_minimum, minimum_index)

0 0


## Question 5/6/7
Write a function for computing the squared Euclidean norm of a vector represented as a NumPy array. You may use for loops. Your function should take two inputs, the array and its length.

In [16]:
def squared_euclidean_norm(array: list) -> float:
	return (math.sqrt(sum([x ** 2 for x in array])))

In [17]:
print(squared_euclidean_norm([1, 2, 3]))

3.7416573867739413


# Revision Questions

## Question 1
Give the definition of a conformity measure in the context of conformal prediction.

- A conformity measure is a function that quantifies the degree to which a prediction set contains the labels of the data points it was generated from

## Question 2
Give the definition of a nonconformity measure in the context of conformal prediction.

- A nonconformity measure is a function that quantifies the degree to which a prediction set does not contain the labels of the data points it was generated from

## Question 3
Define the notion of a conformal predictor.

- A conformal predictor is a function that takes as input a data point and a prediction set, and outputs a label that is predicted to be in the set.

## Question 4
Compare and contrast nonconformity measures and conformity measures.

- A conformity measure quantifies the degree to which a prediction set contains the labels of the data points it was generated from
- A nonconformity measure quantifies the degree to which a prediction set does not contain the labels of the data points it was generated from.

## Question 5
In the context of conformal prediction, what is the minimal possible p value for a training set of size `n`?


The minimal possible p-value for a training set of size n is $$\frac{1}{n}$$

## Question 6
In the context of conformal prediction, what are the possible p-values for a training set of size `n`?

The possible p-values for a training set of size n are $$ \frac{1}{n}, \frac{2}{n}, \frac{3}{n}, ..., 1 $$

## Question 7
What is the main property of validity for conformal prediction?

- The main property of validity for conformal prediction is that it is guaranteed to never overfit the data
- The coverage probability of the prediction sets should be at least equal to a preset confidence level

## Question 8
What is meant by the efficiency of a conformal predictor?

- The efficiency of a conformal predictor is a measure of how well it performs compared to other conformal predictors
- Efficiency requires that the prediction sets should be as small as possible

## Question 9
Give three examples of conformity measures based on the method of Nearest Neighbours. 

- Euclidean distance
- Manhattan distance
- Minkowski distance

## Question 10
Consider the following training set in a multiclass classification problem:
- samples of class A: (−1, 0) and (−1, −1);
- samples of class B: (0, 0) and (0, 1);
- samples of class C: (1, 1), (2, 1), and (2, 0).

The test sample is `(1, 0)`. When answering the following questions, use the conformity measure defined as the distance to the nearest sample of a different class divided by the distance to the nearest sample of the same class.

### Part A
What are the three p-values?

### Part B
What are the point prediction, confidence, and credibility?

### Part C
Suppose the label of the text sample is B. Compute the average false p-value.

## Question 11
How are conformity measures used for computing p-values in the context of conformal prediction?

- Conformity measures are used for computing p-values in the context of conformal prediction by quantifying the degree to which a prediction set contains the labels of the data points it was generated from.

## Question 12
How are nonconformity measures used for computing p-values in the context of conformal prediction?

- Nonconformity measures are used for computing p-values in the context of conformal prediction by quantifying the degree to which a prediction set does NOT contain the labels of the data points it was generated from.

## Question 13
Give at least two examples of nonconformity measures suitable for regression problems.

- Squared error
- Absolute error

## Question 14
Discuss advantages and disadvantages of the conformity measures `αi =|yi − yˆi|` and `αi = |yi − yˆi| /σi`, where `ˆyi`
is a prediction for `yi` and `σi > 0` is an estimate of its accuracy.

*Advantages:*
- `αi =|yi − yˆi|` is that it is easier to compute than the measure `αi = |yi − yˆi| /σi`
-` αi = |yi − yˆi| /σi` is that it is more accurate than the measure `αi =|yi − yˆi`


*Disadvantage:*
- `αi =|yi − yˆi|` is that it is less accurate than the measure `αi = |yi − yˆi| /σ`
- `αi = |yi − yˆi|` /σi is that it is more difficult to compute than the measure `αi =|yi − yˆi`

## Question 15
How would you define a nonconformity measure based on the Nearest Neighbour algorithm and suitable for regression problems?

- A nonconformity measure based on the Nearest Neighbour algorithm and suitable for regression problems can be defined as the squared error between the test point and its nearest neighbour.

## Question 16
How would you define a nonconformity measure based on the K Nearest Neighbours algorithm and suitable for regression problems?

- A nonconformity measure based on the K Nearest Neighbours algorithm and suitable for regression problems can be defined as the average squared error between the test point and its K nearest neighbours.

## Question 17
Write a pseudocode (or Python code) for using a grid for conformal prediction in the problem of regression. You may assume that the prediction set is an interval.

*Input:*
- Training data: {(x1, y1), …, (xn, yn)}
- Test data: {(x1, y1), …, (xm, ym)}
- Parameters: grid_size

*Output:* 
- Predicted labels for the test data

*Steps:*
1. For each test datapoint `(xi, yi)`, find the grid point that is closest to `xi`. Let this be point `j`. 
2. Compute the value of the nonconformity measure for `(xi, yi)` with respect to the training data using the grid point `j`. 
3. Output the predicted label for `(xi, yi)` as the label of the training datapoint with the smallest value of the nonconformity measure.

## Question 19
Briefly explain how conformal prediction can be used for anomaly detection.

Conformal prediction can be used for anomaly detection by considering the data points that are not in the prediction - set as anomalies

## Question 20
Define the point prediction, confidence, and credibility in the context of conformal prediction.

- The **point prediction** is the label that is predicted to be in the set
- The **confidence** is the proportion of times the label is predicted to be in the set
- The **credibility** is the proportion of times the label is predicted to be in the set given that it is in the set.

## Question 21
Let the size of the training set be `n`. Prove that the probability of error for a conformal predictor at significance level `ϵ = 1/(n+ 1)` does not exceed `ϵ`. (As usual in machine learning, you may make the IID assumption.)

- P(error) = P(xp not in Cxp) = P(Z > 1) where Z ~ N(0,1)
- P(error) = P(Z > 1) = 1-P(Z <= 1) = 1-Φ(1)
  - where Φ is the CDF of the standard normal.
- P(error) = 1-Φ(1) = 1 - (1/2 + 1/2erf(1/sqrt(2))) = 1- 0.68268949213708585
- P(error) < 1/(n+1) for n = 1,2,3,...
- P(error) < 0.5 for n >= 2
- P(error) < 0.25 for n >= 3
- P(error) < 0.2 for n >= 4

## Question 22
Let the size of the training set be n. Prove that the probability of error for a conformal predictor at significance level `ϵ = 2/(n+ 1)` does not exceed `ϵ`.

- P(error) = P(xp not in Cxp) = P(Z > 1) where Z ~ N(0,1)
- P(error) = P(Z > 1) = 1-P(Z <= 1) = 1-Φ(1)
- where Φ is the CDF of the standard normal.
- P(error) = 1-Φ(1) = 1 - (1/2 + 1/2erf(1/sqrt(2))) = 1- 0.68268949213708585
- P(error) < 2/(n+1) for n = 1,2,3,...
- P(error) < 1 for n >= 2
- P(error) < 0.5 for n >= 3
- P(error) < 0.4 for n >= 4

## Question 23
When is a p-value regarded as statistically significant? highly statistically
significant?


- A p-value is generally regarded as statistically significant if it is less than `0.05`. A p-value is highly statistically significant if it is less than `0.01`.

## Question 24
Define randomized p-values in the context of conformal prediction.

Randomized p-values are p-values that are randomly generated from a uniform distribution.

## Question 25
Define randomized prediction sets in the context of conformal prediction. 

- Randomized prediction sets are prediction sets that are generated by randomly permuting the labels of the training data.

## Question 26
State the main property of validity of randomized conformal predictors in the online mode of prediction.

- The main property of validity of randomized conformal predictors in the online mode of prediction is that the prediction sets will eventually contain the correct label with high probability

## Question 27
Define the average false p-value in the context of conformal prediction.

- The average false p-value is the expected value of the p-value over all possible values of the label.