# 0. create project structure and import utility functions
After doing this, put the required files in the "Input_files" folder. Including approximate names, these should be:
- Slide-tags spatial data   :   "df_whitelist_{Sample/Project/Run}.csv"
- 10x barcode pairs         :   "3M-february-2018.txt"                      or equivalent
- Known cell barcodes       :   "barcodes_{Sample}.csv"                     Single-cell derived barcodes
- Slide-tags results        :   "{Sample}_spatial.csv"                      Slide-tags results, used to analyse check results but not strictly required until then

It is also possible in each step to re-load config if changes are done in the middle.

For the Tonsil analyses, leave the config as is or to perform the unipartite mouse embryonic analysis, switch to the config to "config_standard_processes_mouse_embryo_uni.py", otherwise the configs has to be manually changed or a new config created.

In [None]:
from Utils import *
create_structure()
# default options: config_standard_processes.py or config_standard_processes_mouse_embryo_uni.py
config = ConfigLoader('config_standard_processes.py') 


# 1. Process raw edge file

The edge file (df_whitelist_{Project}.txt) generated from Russell et als' pipeline will go through slight modification for easier handling in the downstream.

- The first cell loads packages and creates project structure if not already present, after running this the required files should be added to the "Input_files" folder. 

- The second step is mandatory, and entails switching the cell barcodes with their counterpart in the 10X provided barcode document denoted "3M-february-2018.txt".

- The third step is optional, and entails filtering the edge list, keeping only cells with a known position, as determined by Russell et al. This can also be used to remove any other barcodes you do not want to analyze. Both of these files are downloaded from the NCBI single-cell database for each specific experiment.

In [None]:
from initial_processing_functions import perform_preprocessing
perform_preprocessing(config)

# 2. Filter edgelist
The main purpose is to remove certain beads with undesirable properties. 
By default, the options are that of the tonsil analysis in the manuscript:
- Lower limit of 2 UMI's per bead
- Upper limit of 1500 UMIs per bead
- Removal of any bead with an "N" in the bead sequence 

There are also other options for filtering that are not active by default:
- Filtering beads based on degree
- Imposing a lower limit of number of UMIs per edge

In [None]:
from filtering_functions import perform_filtering

perform_filtering(config)

# 3. Generate subgraphs
A subgraph is a component of a network in case that netqwork is not fully connected and is how all networks are referred to in the pipeline. This function allows us to impose multiple lower bound filterings to analyze how and if subgraphs are formed at each filtering threshold.
By default, the options are that of the tonsil analysis in the manuscript:
- The bead-cell bipartite network is used
- Only subgraphs with 50 or nodes (cells and/or beads) are kept
- No other filtering is applied

Other subgraph generation options include:
- Switching network type - bipartite bead-cell (default) AND/OR unipartite cell-cell
- Changing subgraph lower size limit (50 default)
- Unipartite filtering type - Unipartite cell-cell networks can be filtered either based on bead-per-edge or UMIs-per-edge
- Changing filtering thresholds - Multiple lower threshoplds can be specified and all will be used and generate subgraphs (default None)


In [None]:
from subgraph_processing_functions import perform_subgraph_generation_by_filtering
perform_subgraph_generation_by_filtering(config)

# 4. Reconstruct
After generating the desired subgraphs, for which there are serveral options;
- Number of reconstructions - performing multiple recosntructions to i.e. assess variance in quality metrics (default: 1)

There multiple tiered options for choosing exactly which subgraphs to reconstruct:
1. Choosing which dimension to recosntruct into - the reconstruction algorithm can produce any dimension of int hef inal coordianates, although only 2 and 3 dimension is supported in this pipeline (default: 2D)
2. Choosing to reconstruct all or not - if True, all subgraphs from all thresholds are reconstructed (default: False)
3. Choosing the network type to reconstruct - bi- (bead-cell) or unipartite (cell-cell), if unipartite also the filtering type (beads, umi, or both) wehile bipartiote only has the "umi" filtering type (default: bipartite, umi)
4. Choosing to reconstruct all filtering threshold or a selection of specific ones (default: all)
5. Choosing which specific subgraphs to reconstruct - options include all subgraphs, the biggest subgraph, or choosing a specific one based on its number (default: all)

One additional option is whether to delete the files in the STRND structure after reconstruction completion. This is by default True, since all files are copied to other locations it is recommended to keep True

Summarized, the default is that all subgraphs (of which there should generally only be a single subgraph) of the bipartite bead-cell network is reconstructed in 2D once without additional filtering. If the default was used for the prior steps as well, the reconstruction should take around 10 minutes to complete, depending on exact computing specifications.

In [None]:
from reconstruction_functions import interpret_config_and_reconstruct
interpret_config_and_reconstruct(config)

# 5. Reconstruction analysis

Assessing the results of the reconstruction is the final step of a first-pass reconstruction, and of this notebook.
Note it uses a separate config file. Where as the config for the previous steps was used to choose processing steps, for the analysis the config is used to identify the subgraphs of interest. Similarly ot the preprocessing,  two configs are available by default; tonsil bipartite and mouse embryonic unipartite

This function does two main things:
1. Produces three files required in further analysis; a detailed edgelist, a summary of the reconstructed positions for all reconstructions, and a summary of the per-node quality metrics
2. Produce a plot showing the reconstruction side-by-side with the available reference positions which includes quality metrics CPD and KNN

The options for choosing subgraphs to analyse are similar to previous steps i.e. reconstruction dimension, network type, and various filtering parameters. Default is finding the subgraphs produced by the previous defaults.

In the context of this first-pass reconstruction notebook, notable options are:
- Choosing *K* for the KNN quality metric (Default: 15)
- Various visualization options include:
  - Plotting one or multiple reconstructions (if present)
  - Showing the reconstructed points or their distortion, and choosing between unmorphed or morphed versions (Default: Base reconstruction)
  - Cell coloring scheme (Default: by cell type, with colors provided by cell_colors.py)
  - Choosing colormap (Default: magma_r)
  - image output format (Default: PNG)

There are also other options, but they are more relevant after further analysis performed in other notebooks.
After performing this step, there are several options on how to proceed:
1. Perform more in-depth analysis on the reconstruction with many options in the "additional_subgraph_analysis" notebook
2. Perform an iterative reconstruction in the "subgraph_modification" notebook
3. Perform the biological analysis using the R-based "slidetags-network" R project

In [None]:
from Utils import *
from subgraph_analysis_functions import perform_analysis_actions
# Default configs: config_subgraph_analysis.py or config_subgraph_analysis_mouse_embryo_uni.py
config_analysis = ConfigLoader('config_subgraph_analysis.py') 

perform_analysis_actions(config_analysis)