# Fingerprinting analysis - Matrix-like data: Step-by-step

Ok, it's all great to know about the rationale and everything behind fingerprinting, but now let's get to the fun part of it: how do we actually use the module?

Here, I demonstrate, step-by-step, how to run the fingerprinting analysis. You can follow along once you have installed `sihnpy` and opened a Jupyter Notebook.

```{note}
Note that currently the fingerprinting module will only work with matrix-like data, as is common for functional or structural connectivity. A future release of `sihnpy` will also include a way to do fingerprinting when you have tabular data.
```

Note also that the fingerprinting module, for now, only supports data in matrix-like format. Specifically, it accepts two paths containing matrices that you want to fingerprint together. A future release of `sihnpy` will also include a way to do fingerprinting when you have tabular data.

Already read the tutorial before and you just want the code (a.k.a. too long; didn't read)? Head on out to the {ref}`tl;dr section <1.fingerprinting/fingerprinting_module:tl;dr>`.

## 1. Preparing the data

To run a fingerprinting module for matrix-like data, we need three things:
* The path to a list of participants to analyze
* The path to the folder containing the matrices of the first session of brain imaging
* The path to the folder containing the matrices of the second session brain imaging

If you already have the above for your data, you can skip ahead to {ref}`the next section <1.fingerprinting/fingerprinting_module:2. Importing the data for fingerprinting>`. Otherwise, `sihnpy` also offers a small subset of data from the Prevent-AD cohort. Remember that by using the Prevent-AD data to practice to use the code, [you agree to the terms of use](../license.md). You can access the data using the code below:

In [1]:
from sihnpy.datasets import pad_fp_input

id_list, path_participant_list, path_data_fp = pad_fp_input()

As we discuss in the {ref}`section on using Prevent-AD data <0.pad_data/datasets_usage:Using the datasets module in sihnpy>`, the `id_list` variable (or whatever you want to call it) contains the IDs and the basic information on participants included in the dataset.

In [2]:
id_list #Or use print(id_list) if you are not using a Jupyter Notebook

Unnamed: 0,participant_id,sex,test_language,handedness_score,handedness_interpretation
0,sub-1000173,Male,French,100,Right-handed
1,sub-1002928,Female,French,100,Right-handed
2,sub-1004359,Female,French,90,Right-handed
3,sub-1016072,Female,French,-100,Left-handed
4,sub-1031654,Male,French,100,Right-handed
5,sub-1072774,Female,French,100,Right-handed
6,sub-1076159,Female,French,100,Right-handed
7,sub-1121981,Female,French,100,Right-handed
8,sub-1154932,Male,French,30,Ambidextrous
9,sub-1176949,Female,French,80,Right-handed


For the fingerprinting analysis, the rest of the variables in the dataframe above are not really important for now. The only important variable is the column `participant_id`, which should always be the first column in the dataset (though the header of the column can be name whatever you feel like). We will also need the variable `path_participant_list` for the fingerprinting. The `id_list` given by `pad_fp_input()` is more for the benefit of the user to check quickly who are the participants included (i.e., if you want to make sure you know who is included).

Next, we'll take a look at the variable `path_data_fp`. This one is specific to the fingerprinting module.

In [3]:
path_data_fp

{'BL00': {'rest_run1': '/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/BL00/rest_run1',
  'rest_run2': '/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/BL00/rest_run2',
  'encoding': '/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/BL00/encoding',
  'retrieval': '/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/BL00/retrieval'},
 'FU12': {'rest_run1': '/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/FU12/rest_run1',
  'rest_run2': '/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/FU12/rest_run2',
  'encoding': '/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/FU12/encoding',
  'retrieval': '/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/FU12/retrieval'}}

For this specific module, we get a nested dictionaries with paths to the functional connectivity matrices which we will use to launch the fingerprinting.

Let's unpack this a bit. For those not familiar with dictionaries you might feel like "Woah! What's all that weird output??". I won't go into too much details, but the idea is that we create "dictionary entries", where each **key** (before the colon) has a **value** (after the colon) for each path to the connectivity matrices. These entries tell us **what is the path we should use for the fingerprinting**, depending on what interest us.

For this fingerprinting practice, we will use the functional connectivity matrices derived from the **resting state (run 1) at baseline** and the functional connectivity matrices derived from the **resting state (run 1) 12 months later**. Looking at our dictionary, we see two higher level entries `BL00` and `FU12`, respectively meaning baseline and 12-months follow-up. We see that both `BL00` and `FU12` have nested dictionaries with a **key** named **rest_run1**, and two paths pointing to different directories.

```{tip}
Feel free to use any of the paths provided while you are practicing. Do you see any differences in the results?
```

Once we understand the dictionary, it becomes really easy to get the path we are interested in. The syntax is simply `dictionary_name[key_level1][key_level2]`:

In [4]:
print(path_data_fp['BL00']['rest_run1'])
print(path_data_fp['FU12']['rest_run1'])

/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/BL00/rest_run1
/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/FU12/rest_run1


Easy as pie! We now have the paths we need to launch the fingerprinting.

````{tip}
The paths you get on your own computer might look a lot weirder than the ones you see above, depending on where `sihnpy` is installed in your Python installation. However, they should be easily accessible on our own computer. You can test it out by doing the following in your terminal:

```bash
# Replace the path below by the path sihnpy is telling you the data is in
$ ls ~/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/BL00/rest_run1
```

Consequently, since they are accessible, it also means you can take a look at the individual matrices should you wish to do so. I do love to look at a good connectivity matrix.
````

## 2. Importing the data for fingerprinting

```{warning}
If you skipped ahead section 1 (or if you did the tutorial and are now using your own data), you need to make sure of a couple of very important things before you start

1. `sihnpy` will accept the file with participant IDs if it is in `.csv`, `.tsv` or is a `.txt` file (with or without the `.txt` extension). However, it expects the file to be formatted in columns, and to have a column header (that can be named whatever you want). Otherwise, it might drop the first participant in the list.
2. The participant IDs column **MUST** be the first column. Otherwise, whatever is first will be selected as IDs
3. The integral participant IDs in the file **MUST** be present in the file name corresponding to that participant. For instance, if in the list of participant you have `sub-666`, then `sub-666` must also be in the filename (e.g., `sub-666_whatever_modality`).
4. There should not be exact duplicates within each modality. For instance, there should not be two files for participants `sub-666` in the folder for the first modality. Otherwise, `sihnpy` will throw an error.
```

The first step in running the fingerprinting module is to import the package we need. 

In [5]:
from sihnpy import fingerprinting #That's the only module we need here

Next, we need to import the participant ids. At this stage, we just need the path to the file containing the IDs.

In [6]:
list_of_ids = fingerprinting.import_fingerprint_ids(path_participant_list) 
print(list_of_ids)

['sub-1000173', 'sub-1002928', 'sub-1004359', 'sub-1016072', 'sub-1031654', 'sub-1072774', 'sub-1076159', 'sub-1121981', 'sub-1154932', 'sub-1176949', 'sub-1177880', 'sub-1263509', 'sub-1283278', 'sub-1284264', 'sub-1322140']


What happened here is that:
1. `sihnpy` imported the `.tsv` containing the IDs of the participants (using `pandas`)
2. It then extracted the first column (`participant_id`)
3. And converted the values in a simple list 

We see that we have our 15 participants, as expected.

## 3. Create a "fingerprinting object"

For the next step, `sihnpy` needs the list of IDs we just created and the paths to the functional connectivity matrices. To facilitate some internal processing (or well... because at the time I thought it was the best way to code this), `sihnpy` creates a python class (i.e., object-oriented programming). You don't really need to understand object-oriented programming here, but it's more to give you context.

The code here is quite straightforward:

In [7]:
fp_mats = fingerprinting.FingerprintMats(list_of_ids, path_data_fp['BL00']['rest_run1'], path_data_fp['FU12']['rest_run1'])

What we get is an object called `fp_mats` (the name doesn't matter much, you can call it what you want as long as you keep using the same name going forward). The object contains the original variables we gave it to start:

In [8]:
print(fp_mats.id_ls)
print(fp_mats.path_m1)
print(fp_mats.path_m2)

['sub-1000173', 'sub-1002928', 'sub-1004359', 'sub-1016072', 'sub-1031654', 'sub-1072774', 'sub-1076159', 'sub-1121981', 'sub-1154932', 'sub-1176949', 'sub-1177880', 'sub-1263509', 'sub-1283278', 'sub-1284264', 'sub-1322140']
/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/BL00/rest_run1
/Users/fredericst-onge/Desktop/sihnpy/src/sihnpy/data/pad_conp_minimal/FU12/rest_run1


As you can see, if we look at variables in the object, we find the variables we fed to the function. We're ready to move to the next step.

```{hint}
As you probably noticed here, I used the dictionary from the Prevent-AD data that we saw in the {ref}`first section of this tutorial <1.fingerprinting/fingerprinting_module:1. Preparing the data>`, which refers to a specific string.

When running the function on your own data, simply replace `path_data_fp['BL00']['rest_run1']` and `path_data_fp['FU12']['rest_run1']` by the paths to the functional connectivity matrices on your computer.
```

## 4. File and subject selection

We are all setup to start actually getting the functional connectivity matrices from the files. In this step, we are going to import all the names of the matrices we have, and match them to our list of participant IDs so that when we run the **fingerprinting** we can import individual matrices to be correlated.

So first, import the names of the matrices. This function does not require any argument.

In [9]:
files_m1, files_m2 = fp_mats.fetch_matrix_file_names()
print(files_m1) #Print the file names of the first modality
print(files_m2) #Print the file names of the second modality

['sub-1031654_ses-BL00_task-rest_run-1.tsv', 'sub-1322140_ses-BL00_task-rest_run-1.tsv', 'sub-1284264_ses-BL00_task-rest_run-1.tsv', 'sub-1283278_ses-BL00_task-rest_run-1.tsv', 'sub-1002928_ses-BL00_task-rest_run-1.tsv', 'sub-1154932_ses-BL00_task-rest_run-1.tsv', 'sub-1176949_ses-BL00_task-rest_run-1.tsv', 'sub-1263509_ses-BL00_task-rest_run-1.tsv', 'sub-1072774_ses-BL00_task-rest_run-1.tsv', 'sub-1004359_ses-BL00_task-rest_run-1.tsv', 'sub-1016072_ses-BL00_task-rest_run-1.tsv', 'sub-1177880_ses-BL00_task-rest_run-1.tsv', 'sub-1076159_ses-BL00_task-rest_run-1.tsv', 'sub-1000173_ses-BL00_task-rest_run-1.tsv', 'sub-1121981_ses-BL00_task-rest_run-1.tsv']
['sub-1176949_ses-FU12_task-rest_run-1.tsv', 'sub-1154932_ses-FU12_task-rest_run-1.tsv', 'sub-1284264_ses-FU12_task-rest_run-1.tsv', 'sub-1322140_ses-FU12_task-rest_run-1.tsv', 'sub-1000173_ses-FU12_task-rest_run-1.tsv', 'sub-1076159_ses-FU12_task-rest_run-1.tsv', 'sub-1177880_ses-FU12_task-rest_run-1.tsv', 'sub-1016072_ses-FU12_task-res

Ok great, we have the names of the files.

```{hint}
Do you notice something is different between the two lists? I'll get back to that in a second.
```

Once we have these lists, we intersect it with our subject list. This will confirm how many participants we will keep in the end (i.e., how many people have both modalities). This function requires the two variables `files_m1` and `files_m2` that we created earlier (or however you names them).

In [10]:
sub_final, final_m1, final_m2 = fp_mats.subject_selection(files_m1=files_m1, files_m2=files_m2, verbose=True)

We have 15 subjects in the list.
We have in total 15 participants in modality 1 & 10 participants in modality 2.
A total of 10 have both modalities. Only these are used.
['sub-1000173', 'sub-1004359', 'sub-1016072', 'sub-1072774', 'sub-1076159', 'sub-1154932', 'sub-1176949', 'sub-1177880', 'sub-1284264', 'sub-1322140']


Now do you see the difference between the first and second list of files? There are participants missing! This can happen for a bunch of reasons, but in the case of the Prevent-AD it can happen because participants simply didn't come back for a follow-up or that at the time of the data freeze, these participants had not yet received their 12-month follow-up.

Thankfully, `sihnpy` is ready. Under the hood, it checks what is the **intersection** between the list of IDs we give it, the files of the first modality and the files of the second modality, and it finds the participants that come back across both modalities.

Note that if you don't want `sihnpy` to tell you all that information, you can turn it off by setting `verbose` to `False`.

```{important}
Also note that it {ref}`if you did not heed the warnings I made earlier when we prepared the data <1.fingerprinting/fingerprinting_module:2. Importing the data for fingerprinting>`, it is possible that the number of participants `sihnpy` will give you will be incorrect, that `sihnpy` will not find any files or that `sihnpy` will throw an error.
```

## 5. Fingerprinting 

We are finally there! The best part of the fingerprinting module is... well... the fingerprinting function. The function is quite straightforward to run, but there is an important detail that needs to be added, which is, what nodes you want to select to run the fingerprinting.

As I mention in {ref}`the fingerprinting introduction <1.fingerprinting/fp_intro:Use cases and limitations>`, an advantage of this script is that you can directly specify **which part of the functional connectivity matrices** you want to use. It's a bit unpolished, but the general idea is that `sihnpy` accepts a list of integer values, where `0` is the first node and `n-1`, where `n` is the total number of nodes, is the last node (remember here that Python is 0-indexed, which is why the first node is 0).

In the Prevent-AD data, the connectivity matrices are based on the Schaefer atlas (400 nodes). More info on {ref}`how the data was preprocessed here. <0.pad_data/datasets_usage:Additional information on brain imaging preprocessing>` For a first pass, let's simply use all the nodes to compute the fingerprinting. To generate the list we simply need to write `list(range(0,400))`, which will generate a list ranging from 0 to 400:

`[0, 1, 2, 3... 397, 398, 399]`

Let's run the code!

In [11]:
similarity_matrix = fp_mats.fingerprint_mats(nodes_index_within=list(range(0,400)), norm=True, corr_type="Pearson", verbose=True)

Working on participant 1: sub-1000173
Working on participant 2: sub-1004359
Working on participant 3: sub-1016072
Working on participant 4: sub-1072774
Working on participant 5: sub-1076159
Working on participant 6: sub-1154932
Working on participant 7: sub-1176949
Working on participant 8: sub-1177880
Working on participant 9: sub-1284264
Working on participant 10: sub-1322140


The output of this function is a similarity matrix, where **each cell represents a correlation** between the functional connectivity patterns of two participants. The **diagonal** of the matrix is **the correlation within each participant** (so the matrices of the same participant over time in this case) and the **off diagonal** elements are the **correlation between each other participants**. See below the matrix we get in our case. 

In [12]:
similarity_matrix

array([[0.25927082, 0.14739691, 0.13083276, 0.13045085, 0.13452714,
        0.15191495, 0.15016591, 0.13866468, 0.13834814, 0.13457094],
       [0.14739691, 0.20955892, 0.12416142, 0.12060363, 0.1355009 ,
        0.13762913, 0.14931803, 0.13266136, 0.13901744, 0.13040703],
       [0.13083276, 0.12416142, 0.20563803, 0.13469094, 0.13910196,
        0.15017463, 0.16069675, 0.14068546, 0.136542  , 0.13224686],
       [0.13045085, 0.12060363, 0.13469094, 0.22875025, 0.13224248,
        0.14234101, 0.15750932, 0.13240777, 0.12790694, 0.12975919],
       [0.13452714, 0.1355009 , 0.13910196, 0.13224248, 0.22381991,
        0.14654018, 0.15156584, 0.13428944, 0.13406215, 0.13890551],
       [0.15191495, 0.13762913, 0.15017463, 0.14234101, 0.14654018,
        0.27093165, 0.16421014, 0.13795648, 0.13711563, 0.13150307],
       [0.15016591, 0.14931803, 0.16069675, 0.15750932, 0.15156584,
        0.16421014, 0.28130816, 0.14959845, 0.15347509, 0.15238234],
       [0.13866468, 0.13266136, 0.1406854

Ok so it's not the most straight-forward output to understand, so let me put it differently: 

```{image} ../images/similarity_matrix_example.png
:align: center
:scale: 25
```

In the image above, you see the same matrix as above, but this time, with the participant IDs and some color. The **diagonal** (in green) represents the correlations within the same participants while the **off-diagonal** elements (in orange) represent the correlations between participants.

For example, the first green cell at the top left, cell B2, is the correlation between the *functional connectivity matrix at baseline during rest* and the *functional connectivity matrix at the 12-month follow-up during rest* for participant **sub-1000173**. The orange cell C2 is the correlation between the *functional connectivity matrix of participant **sub-1000173** at baseline during rest* and the *functional connectivity matrix at the 12-month of participant **sub-1004359** follow-up during rest*.

One thing you will probably notice is that the matrix is **symmetric**, meaning that the top part is a mirror of the bottom part. For instance, cell B3 is the same as cell C2.

I'll get more into the specifics of how we compute the different metrics I mention in the {ref}`fingerprinting introduction <1.fingerprinting/fp_intro:Definitions>` in the next section. But, in a nutshell, **the hard step is done and you've effectively fingerprinted functional connectivity! Congrats!**

```{admonition} Intermediate topic: Normalization and correlation options
:class: warning

**Normalization**

You probably noticed above that we have an argument called `norm` in the `fingerprint_mats` function. The idea is to normalize the functional connectivity matrices so that any issues due to the data can be resolved. In the original literature by Finn et al.[^Finn_2015], this is done by using the [Fisher normalization](https://en.wikipedia.org/wiki/Fisher_transformation), which was adapted in this script. 

To date however, there is no consensus (to my knowledge) on whether this normalization should be done. In the Prevent-AD data included in `sihnpy`, there was very marginal differences when using normalization or not. By default, normalization is done. If you find out better ways to normalize the data, or if you have an answer for whether a consensus exists for this type of normalization, feel free to let me know [by opening an issue!](https://github.com/stong3/sihnpy/issues)

**Correlation**

In most of the literature to date, the correlation between functional connectivity matrices to get the fingerprint measures is done using Pearson correlations (or product-moment correlation). However, others have used different methods such as Spearman correlations.[^Ousdal_2020] Currently, Pearson correlation is the only implemented method in `sihnpy`, but other methods can be implemented quite easily as needed. Request it on Github [by opening an issue!](https://github.com/stong3/sihnpy/issues)

```

````{admonition} Advanced topic: Selecting specific nodes (within- and between-network)
:class: danger

**Selecting nodes - Within-network**

An interesting point from multiple paper on fingerprinting is that the number of nodes, the functional network used and the localization of the nodes all seem to play a role in how well fingerprinting works.[^Finn_2015],[^Amico_2018],[^St_Onge_2023]

As a first pass, I definitely encourage you to take all the nodes available like in the example. It simplifies the analyses a lot and gives you a good idea of whether fingerprinting works in your sample or not. Most of the time, using all nodes available should result in high fingerprinting capacity. [^Finn_2015],[^St_Onge_2023] 

In `sihnpy`, you can specify which nodes you want to use by using **a list of integers** to identify their position on a connectivity matrix. When extracting functional connectivity with `nilearn` using the Schaefer atlas, `nilearn` outputs labels for every node of the functional connectivity matrix. For instance, the **visual** network spans nodes 0 to 30 and 200 to 229 inclusively (again, Python is 0-indexed, so everything starts at 0 instead of 1, and the last number of a list is not included in the list; a bit confusing I know). If you wanted to use these nodes only, you could create a variable in advance, and use that variable in the `fingerprintint_mats` method:

```python
>>> visual_net_nodes = list(range(0, 31)) + list(range(200, 230))
>>> similarity_matrix = fp_mats.fingerprint_mats(nodes_index_within=visual_net_nodes, norm=True, corr_type="Pearson", verbose=True)
```

The code above will restrict the matrices to the 59 nodes in the visual network by selecting specifically the nodes we identify. A fun thing with this is that it is really flexible, and should work across parcellations. One thing we did in the paper[^St_Onge_2023] was check whether a random collection of nodes also led to good fingerprinting. You can do something similar with just a few lines:

```python
>>> np.random.seed(667) #Set a seed for reproducibility of the random array
>>> random_net_array = list(np.sort(np.random.randint(0,400,91))) #Give 91 random numbers between 0 and 400
>>> similarity_matrix = fp_mats.fingerprint_mats(nodes_index_within=random_net_array, norm=True, corr_type="Pearson", verbose=True)
```

However, this method does come with the drawback that you need to make sure the nodes in the matrix are indeed organized in the order you are expecting and you need to make sure that the nodes you are selecting are the right ones. The script won't be able to tell you whether you selected the nodes correctly or not.

**Selecting between-network edges**

All of the literature before our paper[^St_Onge_2023] usually focused on using within-network edges in the matrices to do functional connectivity. But within-network edges only represent a small portion of edges; between-network edges (i.e., connections between different networks) are more numerous.

In our results, we found that fingerprinting using these edges works even better than within-network edges. And you can test that out for yourself too using the `nodes_between_index` argument in `fingerprint_mats`! But how do we select between-network edges? First, a quick, very simplified illustration of within- and between-network edges.

```{image} ../images/between_network.png
:align: center
:scale: 25
```
In this example, we imagine three brain networks: the visual, the default mode (DMN) and the frontoparietal (FP) networks, each containing a single node (i.e., 1 row per network). This is a very much simplified example, but really just to illustrate the idea. 

Each square in the matrix is representing an edge (i.e., a link between two nodes). When I write about **within-network** edges, I refer to the blue squares. For example, the first blue square at the top left corner represents **the link within the visual network nodes**. However, if I want to talk about **between-network** edges, I mean all the other colored squares that **aren't blue**. For instance, the between-network edges between the visual network and the default-mode and frontoparietal networks are illustrated in orange, in the top row. It represents the **link between the visual and the two other networks**. 

Once you understand that, it becomes easy to adapt it in `sihnpy` for your own needs: `sihnpy` simply needs to know what are the indices of the **rows** (`nodes_within_index`) of the matrix we want and the indices of the **columns** we need (`nodes_between_index`). 

When we give `sihnpy` only the `nodes_within_index` argument, it assumes that we want the same rows and columns (i.e., within-network), and that the sub-matrix we are selecting is symmetric. In `numpy` notation, this is equivalent to `matrix[nodes_within_index][:,nodes_within_index]`. In our toy example, it would grab only the blue squares. However, in a real life example, with connectivity data, it will discard the diagonal and the lower triangle before computing the fingerprinting.

On the other hand, if we give the `nodes_between_index` argument, `sihnpy` will grab the specific sub-matrix we specify. In `numpy` notation, the two arguments I mention roughly translate to `matrix[nodes_within_index][:,nodes_between_index]`. In our toy example, this should only grab the orange square if we are interested in the visual network.

Overall, this application of the method allows for a lot of flexibility, because you can select virtually any between-network edges you want. For instance, you could specifically request edges between the visual and default-mode network only to do the fingerprinting.

```{warning}
With great flexibility comes great danger.

`sihnpy` won't be able to tell you whether the nodes you selected are the right ones. So if you forget a node or include a node that shouldn't have been included, `sihnpy` will still proceed ahead and it will affect your results. This is further complicated by the 0-indexed Python notation (i.e., the "5th" column is actually index 4), which means that human error is also likely, particularly for those not used to Python.

When using between-network edges, `sihnpy` also doesn't remove duplicated edges. This is because, technically, there shouldn't be any duplicated edges since we aren't looking at within-network edges. If your indices for rows (`nodes_within_index`) and for columns (`nodes_between_index`) overlap, you will include duplicate edges in your analysis, which will affect your results.
```

````

## 6. Fingerprinting metrics

Almost all done! The next step is to extract the {ref}`fingerprinting metrics <1.fingerprinting/fp_intro:Definitions>`. Thankfully, `sihnpy` comes with a function that does it for you: no need to do anything! Isn't it great?

The function only takes two arguments: the similarity matrix you just computed and a "name" and it returns a `pandas.DataFrame` with the relevant information computed. The **"name"** argument is simply a string that will be added to the name of the columns and can be anything you like: it's really just for you to keep track of what did you fingerprint with.

In [15]:
coef_data = fp_mats.fp_metrics_calc(similar_matrix=similarity_matrix, name='tutorial')
coef_data

Unnamed: 0_level_0,si_tutorial,oi_tutorial,fia_tutorial,di_tutorial
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
sub-1000173,0.259271,0.139652,1.0,0.119618
sub-1004359,0.209559,0.135188,1.0,0.07437
sub-1016072,0.205638,0.138793,1.0,0.066845
sub-1072774,0.22875,0.134212,1.0,0.094538
sub-1076159,0.22382,0.138526,1.0,0.085294
sub-1154932,0.270932,0.144376,1.0,0.126556
sub-1176949,0.281308,0.154325,1.0,0.126984
sub-1177880,0.239168,0.141358,1.0,0.097809
sub-1284264,0.247956,0.141618,1.0,0.106337
sub-1322140,0.240154,0.138231,1.0,0.101922


In order, the script outputs the **self-identifiability** (`si`), the **others-identifiability** (`oi`), the **fingerprint identification accuracy** (`fia`) and the **differential identifiability** (`di`). 

`si`, `oi` and `di` will always range between 0 and 1 as they are correlation coefficients. `fia` will be either 0 or 1, where 1 represents an accurate identification of the participant. `oi` is the average of all between-individual correlations for a specific individual (i.e., on average, how similar is a participant to the rest of the cohort).

```{tip}
For the "name" variable, you can use your own naming scheme. What I found most effective during the writing of the package for the paper was to use the following (here only used as an illustration if it can help you out):

`mod1_mod2_edges_parcellation_network_correlationtype`

`mod1`: name of the first modality/session used for fingerprinting
`mod2`: name of the second modality/session used for fingerprinting
`edges`: `within` or `between` network edges
`parcellation`: name of the atlas used for fingerprinting (e.g., Schaefer)
`network`: name of the network used (e.g., default-mode)
`correlationtype`: whether I used partial correlation (pcor) or Pearson correlation (corr) to generate the functional connectivity matrices

```

## 7. Exporting the results

Ok so now we have all the results we need, but the results still only live within Python: we need to export them so we can use them for our awesome paper. Again, `sihnpy` does this very simply with a special made function. It requires you to provide 1) the full path where the folder will be created, 2) the name of the variable with the computed coefficients, 3) the name of the variable with the similarity matrix and 4) the "name" you want the folders and files to bear.

```python
fp_mats.fp_mat_export("~/Desktop/test_output_fp", coef_data=coef_data, similar_matrix=similarity_matrix, name='tutorial', out_full=True, dir_struct=True)
```

Doing this, you will now have a directory within `test_output_fp` called `tutorial` (i.e., the name you are giving) looking like so:

```
tutorial
|___fp_metrics_tutorial.csv
|___similarity_matrices
    |___similarity_matrix_tutorial.csv
|___subject_list
    |___subject_list_tutorial.csv
```

The `fp_metrics_tutorial.csv` file is the `.csv` document containing the fingerprint metrics. In the subfolder `similarity_matrices` you will see the similarity matrix (i.e., the correlations between each pair of participant) and the subfolder `subject_list` contains the list of participants included in the fingerprinting run. 

Most of the time, you will likely only need the `.csv` file with the computed metrics. However, if you want to check the correlation between participants (e.g., if you have twins or brother/sisters), you can use the similarity matrix to zoom in on specific correlations between people. To facilitate manipulations on the similarity matrix, you will notice that it bears no subject IDs. This is where the `subject_list` file comes into play: it holds the subject IDs, with no column header, in the same order as the similarity matrix.

```{tip}
By default, `sihnpy` will create sub-folders for the similarity matrices and subject_list. This is somewhat an artefact from how the code was built before, but also I do like the organization in separate folders. That said, `sihnpy` is also flexible. You can set the argument `dir_struct` to `False` to output the `similarity_matrix` and the `subject_list` in the same directory. 

If you decide you don't want the `similarity_matrix` or the `subject_list`, you can ask `sihnpy` to not output it by setting `out_full` to `False`.
```

## Conclusion

You made it through the whole fingerprinting tutorial! (or well, you skipped ahead to here) I hope I was able to make the steps clear for you and that you enjoyed following along. If things weren't clear in the documentation, please [submit an issue on Github](https://github.com/stong3/sihnpy/issues).

Don't forget to {ref}`cite the package and the paper <index:Authors>` describing this method if you end up using it in one of your paper!

If you want to learn more, I also discuss more advanced topic ahead.

## tl;dr

Got bored during the tutorial? You already finished the tutorial and just want a quick reminder of the main functions you need? Or you just want to bash ahead with the code without reading? I got you. Here's a condensed form of the code:

```python
from sihnpy.datasets import pad_fp_input #Import data for fingerprinting

#Preparation
id_list, path_participant_list, path_data_fp = pad_fp_input() #Basic info, path to the basic info and path to the matrices. Start here if you want to you the Prevent-AD data
list_of_ids = fingerprinting.import_fingerprint_ids(path_participant_list) #Import the list of IDs in `sihnpy`. If you have your own data, you start here and replace `path_participant_list` by the path to your file with the IDs as the first column

#Fingerprinting initialization
fp_mats = fingerprinting.FingerprintMats(list_of_ids, path_data_fp['BL00']['rest_run1'], path_data_fp['FU12']['rest_run1']) #Replace second and third argument with your own paths if using your own data
files_m1, files_m2 = fp_mats.fetch_matrix_file_names() #Get name of files for both modalities
sub_final, final_m1, final_m2 = fp_mats.subject_selection(files_m1=files_m1, files_m2=files_m2, verbose=True) #Extract participant IDs and figure out who has data in both modalities

#Running the fingerprinting and computing the metrics
similarity_matrix = fp_mats.fingerprint_mats(nodes_index_within=list(range(0,400)), norm=True, corr_type="Pearson", verbose=True) #Takes a long time if running a lot of participants
coef_data = fp_mats.fp_metrics_calc(similar_matrix=similarity_matrix, name='tutorial') #Replace the name by whatever you prefer

#Export the data
fp_mats.fp_mat_export("~/Desktop/test_output_fp", coef_data=coef_data, similar_matrix=similarity_matrix, name='tutorial', out_full=True, dir_struct=True) #Replace the path by whatever path you prefer on your local computer.
```

## Advanced topic: Command line-based script for fingerprinting and high performance computing

The script presented above is great when you have a few network or modalities to run. But what if you have a lot more? How do you organize it? In the coming weeks I will prepare a tutorial on how to write a command line script to be able to run the fingerprinting for many different modalities and options.

## References

List of relevant references for this script. {ref}`More info and key papers on fingerprinting are available in the intro. <1.fingerprinting/fp_intro:References>`

[^Finn_2015]: Finn et al. (2015). Nat Neuro. [10.1038/nn.4135](https://doi.org/10.1038/nn.4135)
[^Ousdal_2020]: Ousdal et al. (2020). Hum Brain Mapp. [10.1002/hbm.24833](https://10.1002/hbm.24833)
[^Amico_2018]: Amico et al. (2018). Sci Reports. [10.1038/s41598-018-25089-1](https://doi.org/10.1038/s41598-018-25089-1)
[^St_Onge_2023]: St-Onge et al. (2023). In revision.