In [1]:
import os
os.getcwd()


<span style="color:red"> *some emphasized markdown text*</span>


ToDo: DSP => Digital Spatial Profiler

# DSP Protein nCounter workflow

# Contents
[1. How to use this notebook](#howTo)

[2. Overview of the workflow in this notebook](#overview)




[3. View and clean annotations](#clean)

[4. View and QC data](#qc)

[5. Normalise data](#norm)

[6. Set up comparisons](#compare)

[7. Run DGE](#dge)

[8. Convert EdgeR plots to volcano plots](#convert)


# 1. How to use this notebook <a class="anchor" id="howTo"></a>

<!-- Cell from Paul Watmore -->

This is a [Jupyter notebook](https://jupyter.org/) for DSP data exploration, normalisation and analysis. 

Jupyter notebooks are interactive documents that contain 'live code', which allows the user to complete an analysis by running code 'cells', which can be modified, updated or added to by the user.

Individual Jupyter notebooks are based on a specific 'kernel', or analysis envirnment (mostly programming languages). This particular notebook is based on R. To see which version of R this notebook is based on, and as an example of running a code cell, click on the cell below and press the 'Run' button (top of the page).

# Overview of the workflow in this notebook <a class="anchor" id="overview"></a>

This notebook links to a number of auxilary notebooks that walk through a series of steps for data exploration, data cleaning and data analysis.

To help with reproducibility, input and output files have been standardised as much as possible (this is a work in progress). <span style="color:red">The folder structure is shown in section ###, and descriptions of the key files are shown in section ###.</span>

This workflow is designed to be run in a collaborative manner on QUT compute facilities. As such, this utilises a number of different compute resources that each have different access restrictions and security protocols. Every effort has been made to ensure that sensitive data is protected, however, this may not be suitable for every project. It is every users responibility to ensure that data is stored and secured properly. No encryption is currently implemented in this workflow, relying on propper use of data use and storage according to QUT policies and procedures.





The high level overview of the workflow is as follows:

A. Decide on a working directory on the HPC to set up file structure.
NOTE: HPC storage is preferred to facilitate colaboration and help with data security.

2. View and clean annotations


1. View and QC Data


1. Merge or exclude AOIs


1. Normalise Data


1. Set up comparisons


1. Run DGE


1. Convert EdgeR plots to volcano plots

### Some basic setup

In [None]:
# input the root directory (shared work folder)

### File structure
```

## Raw data files that should be kept secure:


 ├─ RDSS

smb://qut.edu.au/Documents/Research/Acquisitions/ife/carf_spatial_omics/caf_dsp_20230921
/mnt/hpccs01/work

/external/rdss/acquisitions

 ├─ Research
 │ ├─ Acquisitions
 │ │ ├─ ife
 │ │ │ ├─ carf_spatial_omics
 │ │ │ │ ├─ caf_dsp_20230921
 │ │ │ │ │ ├─ Images
 │ │ │ │ │ │ ├ GRC Series A IP_Syndecan-1.png
 │ │ │ │ │ │ ├ GRC Series A IP_Syndecan-1_clean.png
 │ │ │ │ │ │ ├ GRC Series A IP_Syndecan-1.zip
 │ │ │ │ │ │ ├ GRC Series A IP_Syndecan-1.ome.tiff
 │ │ │ │ │ │ ├ GRC Series B IP_Syndecan-1.png
 │ │ │ │ │ │ ├ GRC Series B IP_Syndecan-1_clean.png
 │ │ │ │ │ │ ├ GRC Series B IP_Syndecan-1.zip
 │ │ │ │ │ │ ├ GRC Series B IP_Syndecan-1.ome.tiff
 │ │ │ │ │ ├─ Worksheets

Lab worksheet
rcs file?

 │ │ │ │ │ ├─ Data
 │ │ │ │ │ │ ├ 
 │ │ │ │ │ │ ├ 
 │ │ │ │ │ │ ├ 
 │ │ │ │ │ │ ├ 
 │ │ │ │ │ │ ├ 
 │ │ │ │ │ │ ├ 

RCC files


## Processed data files that should be kept secure, but can also be edited and shared by researchers:

 ├─ HPCFS
 │ ├─ Project_Folder (multiple researchers with access)
 │ │ ├─ DSP_Data_Analysis
 │ │ │ ├ Initial Dataset.xlsx
 │ │ │ ├ Default_QC.xlsx

 │ │ │ ├ failAOIs.csv         # Check location exportPath
 │ │ │ ├ FailProbes.csv       # Check location exportPath

 │ │ │ ├─ Normalisation
 │ │ │ │ ├ QC_#Researcher#_#Project#_RUV.csv
 │ │ │ │ ├ 
 │ │ │ │ ├ **RUV output (NS Norm)**
 │ │ │ │ │ ├ 
 │ │ │ │ │ ├ 

 │ │ │ ├─ EdgeR_Norm25


## Files on Github
## Should not contain any hard links to QUT servers

 ├─ Git



## Config file
## To be sent separately to researchers and contain the following details:
# - Usernames with read access to RDSS Acquisitions folder
# - Folder name(s) on RDSS to read raw data from
# - Folder name on HPCFS to write data to while processing data


'''



### File structure
```
- Root
 ├─ DSP_Protein_Data
 │ ├ 
 │ ├─ Initial Dataset.xlsx
 │ ├─ Default_QC.xlsx
 │ ├─ 
 │ ├─ Lab_Worksheet_P100###Plate#1.txt
 │ ├─ Lab_Worksheet_P100###Plate#2.txt
 │ ├─ 
 │ ├─ AOI_Well_Mappings_Plate1.csv
 │ ├─ AOI_Well_Mappings_Plate2.csv
 │
 ├─ DSP QC
 │ ├─ AOI_Well_Mappings_Plate2.csv |- files
 │ ├─ AOI_Well_Mappings_Plate2.csv
 │
 ├─ DSP EDA
 │ ├─ AOI_Well_Mappings_Plate2.csv |- files
 │
 │
 ├─ Data_Normalisation
 │ ├─ RUVIII_***_NSNorm.R
 │ ├─ ERCC_***_RUV_Expressed.csv
 │ ├─ SampleInfo***.csv
 │ ├─
 │ ├─ RUVIII_NSNorm_Grouped_Expressed
 │ ├─├─ NanoString_mRNA_norm...
 │ │ ├─ NanoStringNorm_28_none_mean_housekeeping.geo.mean.csv
 │
 │
 │
 │ │ EdgeR_Grouped_28
 │ │ ├─
 │ │ ├─
 │ │ ├─
 │ │ ├─
 │
 │
 ├─ EdgeR
 │ ├─ files
 │
 │
 │
 │
 │


DataNorm Output
/data/bak/QUT/upton6/Documents/Nanostring/projects/NS_Liver_HCC_DSP/Data_Normalisation/RUVIII_NSNorm_Grouped_Expressed/

```
ToDo: Add write output functionality to all files




# Prepare config file for running analysis

# View and clean annotations <a class="anchor" id="clean"></a>

First up we want to confirm that all data entered into the DSP files is correct and clean


1. Download "Annotation template file" from DSP

2. Add in factors for AOI annotation

3. Manually review all AOIs

4. Ensure the comment line has been deleted (row 1 in downloaded file). The header row should be row 1.

5. Upload file to DSP and select replace tags and factors
  
6. Note: tags and factors are case sensitive. No aditional characters should be present. All tags must be comma separated

7. Note: "Initial Dataset" and "Default QC" file must be re-generated if AOI annotations are updated.



<i>Note: Correlating AOIs to plates and wells was done using the lab worksheet documents and by matching the surface areas in those sheets with the surface areas in the DSP output excel files. 231206_DSP_nCounter_Protein_QC_Subramaniam_HCC_TMA_01 contains code for this but may not be completely up to date.</i>

# View and QC data <a class="anchor" id="qc"></a>

The next step is to chech the quality of the data that has been obtained from the DSP run.

This is done for both the probes abd the AOIs to determine if any probes or AOIs should be excluded from analysis.


[link to QC notebook](240123_DSP_nCounter_Protein_QC_Git.ipynb)

<i>Note: The above notebook also contains the code for cleaning and filling out data annotations. May want to break this down into separate notebooks for clarity
</i>

#### ToDo:
'''
Merge pre-norm portions of EDA notebook into QC notebook.

return AOIs to ignore in a flat text file


Return AOI groups with whole annotations of all AOIS in each group


    # What is the best file format to use for this? 
'''

[EDA notebook](240119_DSP_nCounter_Protein_Pre-Norm_EDA.ipynb)


# Normalise data <a class="anchor" id="norm"></a>

Normalisation was done in a separate R script.

<i>ToDo: Create a jupyter notebook to run this R script remotely
</i>

### Notebook for normalisation data using Nanostring library:

RUVIII_#Researcher#_#Project#_NSNorm.R


# Set up comparisons <a class="anchor" id="compare"></a>

Currently done in EdgeR.

Will require setting up linear models in python to be useful in Jupyter

# Run DGE (EdgeR or DESeq) <a class="anchor" id="dge"></a>

[EdgeR R script for grouped samples](/Users/upton6/Documents/Nanostring/projects/NS_Liver_HCC_DSP/EdgeR/NS_HCC_GLM_Grouped_02.R)

# Convert EdgeR plots to volcano plots <a class="anchor" id="convert"></a>

[EdgeR to Volcano plot notebook](231130_EdgeR_to_Volcano_plots_NS_msWTA.ipynb)


# Create HCC tables summary

[HCC Tables summary notebook](HCC_Tables_Summary.ipynb)








In [None]:
# Create HCC tables summary