![image](https://cdn.discordapp.com/attachments/996200880351215636/1065002848355631165/New_Atlantis.png) 

---
# Introduction
---

## Initialization Code 

In [1]:
#Code to make directories
#Run this code once you make a new template to automatically generate the accompanying directories
#this will also make empty requirements files and utility nb to be filled in
import os 

if not os.path.exists('./src'):
    os.makedirs('./src')
    os.makedirs('./Data')
    os.makedirs('./Data/Input_Data')
    os.makedirs('./Data/Output_Data')
    os.makedirs('./Data/Refernce_Data')
    with open('./src/requirements.txt', 'w') as f:
        pass
    with open('./src/NB_Utility.ipynb') as nb:
        pass

__NOTE__:

* Typo: ./Data/Refernce_Data -> /Data/Reference_Data

The purpose of an execution notebook is to give a complete template on the execution of a single tool or protocol hosted on the New Atlantis Cloud Data Lab (CDL). By reading and running an Execution notebook, the user should know what data is used as an input, what the tool does, what the output data looks like, and have the code necessary to run the tool on their own data. The Execution Notebook should also have two included directories, one for source, environment information and utility functions, and one for containing input and output data. The end product should be both functional as a notebook based interface for using the tool and well educational and allow for the user to understand what the tool is doing. 

For questions about using this template contact Kelvin: 
- kelvin@newatlantis.io 
- kellyd73 on Discord 




Remove this cell from final product.

## Tool Purpose  
---

Markdown cell giving a brief explanation of the tool. Cover what insight is hoped to be generated by the tool and a extremely brief desciption of its methods.

## Input Data 
---

Desctiption of what data is required to be input into the tool.
- Qualitative (DNA, Long Reads, Ocean Salinity etc)
- File types and formatting (.fasta, fastq etc)
- Source of the sample data used for demonstation purpose.

## Output Data 
---

Desctiption of what data is required to be input into the tool.
- Qualitative (Functional Capaticy, Proteomic Assembly etc)
- File types and formatting (.fasta, Blast6, csv/dataframe with schema etc)

__NOTES__:

* This section should be about the output data.

---
# Environment
---

This section should contain all the information one would require to replicate the operating environment on their own system if so desired. 
It is also reccomended to have an accompanying requirements.txt file stored in the src folder for either pip or conda. Be sure to specify all package versions in the requirements.txt folder. 
Eventually, we will be able to deploy conda environements directly onto the JHub, but this section is still important for documentation purposes.
If any special installs of packages are required (no-deps, overriding warnings) demonstate those steps here. 
If installing any packages using !pip install in this section, please follow the command with -q to hide long console print outs.

__NOTES__:

* Are conda environments allowed here? If so, besides requirements.txt, we could include an environment.yml file for these cases.

---
## Dependencies (point to requirements.txt and textual list) 
---

This section should point to where the requirements file is store and should be a list of necessary depedencies, install instructions and any auxilliary informantion required to set of the evironment. 
All requirements files should include the "import-ipynb" package in addition to any required to run the tool. This package allows for having notebook based file for storing and documenting utility functions that will be applied later.

In [None]:
#install for import-ipynb that will be applied later
!pip install import-ipynb -q

__NOTES__:

* Don't see the difference between this section and the previous one, perhaps they could be merged.

---
## Import Statements (code)(import ipynb)
---

In [None]:
import import_ipynb 

#import all untility functions from utility_func nb 
from src.NB_utility import *

Put all python import statements in a single cell in this section.
If any global settings for packages are altered (such as setting matplotlib to inline) do that here in a separate cell immediately after the import statements.

__NOTES__:

* Would a regular python module / file work to store helper functions? Using a module seems more natural than using a notebook to store helper functions.

---
# Parameters
---

This section should describe all parameters and setting used to input into the tool. All user supplied arguments should be defined and explained in this section.
This section should alternate markdown and code, the markdown explains what a parameter does and what the options are. The first parameter cell should always be paths to data.
The first section should contain one cell that only specifies data locations, such that a user only has to edit this cell in order to point the notebook to their data. The next parameter cell should contain paths to any config files or reference databases. The rest of the section should alternate between markdown explaining the parameter followed by code to set the parameter to some default value in either python or env variables. Duplicate the follwoing examples for each paramter used during data cleaning and execution of the tool.

## Input and output data directories 
---

These variables are strings that conatin the path to the input and output data directories.

In [2]:
input_dir = 'path/to/data'
output_dir = 'path/to/data'

This cell sets an environment variable to be used later. This is a useful way to set parameters for shell scripts or bash used later in the notebook.

In [3]:
import os 

os.environ['MESSAGE'] = 'Hello Worlds'

---
# Data Precleaning (if required) 
---

This section aims to demonstrate any modifications done to the data from the orginal source, to the format it the tool needs for processing. Follwoing the instructions in this section, a user should be able to replicate the steps required to take the original data set (or any similar data set) and perform requistite transformation required for the tool. If there is any metadata, augmentations to refernce data bases or similar required to run your tool, all those steps should also be included here. 
This section should follow these steps:
- Load data into notebook (pd.read_csv, bash csv, etc)(if required)
- Validate the loaded data is the correct format (for user replication) 
- Perform any other necessary validations (depending on protocol/tool)
- Alternate markdown cell explaining manipulation with code cell executing the manipulation.
- Mark each step heading with ### to insert into tabel fo contents 

If any of these steps are not required for your specific tool, do omit them.

__NOTES__:

* Typo: Follwoing -> Following
* Typo: requistite -> requisite
* Typo: refernce -> reference
* Typo: tabel fo contents -> table of contents

---
# Execution of Tool 
---

This section aims to demonstrate how to execute the tool and performs a sample run on test data. This portion of the notebook may be fairly code intensive and is the most important part of the notebook. To improve readibility and clarity, most of the more verbose code sections should be written as fucntions in python, as shell scripts stored in the src directory, or whatever language necessary for the tool you are using.

If the step you hope to perform involves more than a couple lines of code, please see the function definition format in the src/NB_utility.ipynb and wrap you rcode in a function using that format. If you prefer to use bash scripts, wrtie your commands into a shell file, and then execute them in this portion of the maind deck. once your code is wrapped as function in the utility NB, you can inport and run it using the format below.
Each step should be separated by a markdown heading with a brief explanation followed by the necessary code.
Use the parameter variables definied earlier in the notebook as arguments for functions written here. If there is an input that is not already included in the parameters section, include it there.

Example function:

__NOTES__:

* Typo: wrtie -> write
* Typo: maind deck -> main deck

In [2]:
#remember this code was executed earlier 
import import_ipynb
from src.NB_utility import *

#now I can execute the hello_world function that was defined in the utility_nb
hello_world()

importing Jupyter notebook from /home/jovyan/shared/Active_Projects/Templates/src/NB_utility.ipynb
Hello World!


True

---
# Data Post Processing (if required) 
---

## Write to output directory
---
If the tool does not do it automatically, use this cell to write the output data to the output directory defined in the parameter section.

This section aims to contain all the code necessary to perform the data cleaning, formatting or analysis that would be performed on the output of this tool. Use the same formatting as previously mentioned in the execution section of the notebook:
- Offload long code sections to the src/Utility_NB and import the code 
- Add validation to catch errors in and irregularities in the data 
- Alternate code and markdown cells 
- Include a markdown header for each step using ### to add it to the table of contents
- Display data and tranformations were necessary. 

__NOTES__:

* Typo: tranformations -> transformations

---
# Visualization 
---

If there is a visualization you would like to include here, generate it here.
Phrase the code used to generate the visualization as a function in the format mentioned in the execution section of this notebook.
Place the function is the utility NB such that it can be reused to generate new visualizations on future data. 
If the vizualization has additional options and parameters, there is no need to add them to the parameters section, and those parameters can be included into a miniature parameter section  in this section.

---
# Conclusion
---
Include any final parting thoughts in this section.
This section may also incude:
- Common mistakes and fixes. 
- Debugging tips.
- Contact for the author.
- Any other information you would like to include