# Description of Inputs
----------

coralME takes a total of 7 inputs, 2 required and 5 optional:

### Required
1. __Genome file__ (<code>genome.gb</code>)
2. __M-model__ (<code>m_model.json</code> or <code>m_model.xml</code>)

### Optional
Downloadable from an existing **BioCyc** database under <code>Special SmartTables</code>. If no optional files are provided, coralME complements them with <code>genome.gb</code>

3. __Genes file__, by default:<code>genes.txt</code>
4. __RNAs file__, by default:<code>RNAs.txt</code>
5. __Proteins file__, by default:<code>proteins.txt</code>
6. __TUs file__, by default:<code>TUs.txt.</code>
7. __Sequences file__, by default:<code>sequences.fasta</code>

<img src="./pngs/inputs.png" alt="Drawing" style="width: 800px;"/>

## 1. Genome (<code>genome.gb</code>)
----------

The genome file contains provides coralME with:
* Gene annotations.
* Gene sequences.

#### Requirements
1. **Locus tags (locus_tag or old_locus_tag) MUST be consistent with <code>m_model.json</code>**. Make sure you download the same genome file that was used to reconstruct the M-model.
2. Has name <code>genome.gb</code>.
3. Genbank-compliant file. Must be read by BioPython correctly.
4. It must contain the entire genome sequence. Make sure to enable <code>Customize View</code>><code>Show Sequence</code> before downloading the genbank file from NCBI.

<b>See an example of [genome.gb](./helper_files/pputida_tutorial/inputs/genome.gb) and [sequences.fasta](./helper_files/pputida_tutorial/inputs/sequences.fasta)</b>

## 2. M-model (<code>m_model.json</code>)
----------

The M-model provides coralME with the metabolic model components:
* Metabolic network (M-matrix)
* Gene-protein-reaction associations
* Environmental and internal constraints
* Reaction subsystems
* Biomass composition

#### Requirements
This file should meet the following requirements:
1. **Gene identifiers MUST be consistent with <code>genome.gb</code> locus_tag or old_locus_tag**. Make sure you download the same genome file that was used to reconstruct the M-model.
2. Has name <code>m_model.json</code>.
3. COBRApy-compliant. Must be read by cobrapy-0.25.0.

<b>See an example of [m_model.json](./helper_files/m_model.json)</b>

## 3. Gene dictionary (<code>genes.txt</code>) [optional]
----------

<code>genes.txt</code> is a gene information table that can be downloaded from the <b>All genes of <i>organism</i> SmartTable</b> of the <b>[BioCyc](https://biocyc.org/)</b> database. Click <code>Export</code>><code>to Spreadsheet File</code>><code>frame IDs</code>. <b>This file is optional and is meant to complement the information from <code>genome.gb</code> in case the latter is missing genes.</b>

<code>genes.txt</code> provides coralME with:
* Gene locus tags
* Gene names
* Gene annotations
* Gene positions
* Gene products (protein, tRNA, etc.)

#### Requirements
This file should meet the following requirements:
1. Contains the index <b>Gene Name</b> and columns <b>Accession-1</b>, <b>Left-End-Position</b>, <b>Right-End-Position</b>, and <b>Product</b>.
2. **<b>Accession-1</b> MUST be consistent with the gene IDs in the GPRs of <code>m_model.json</code> and with the locus_tag (or old_locus_tag) in <code>genome.gb</code>**.
3. <b>Gene Name</b> is consistent with:
    * Column <b>Genes of polypeptide, complex, or RNA</b> of <code>proteins.txt</code>
    * Column <b>Gene</b> of <code>RNAs.txt</code> 
    * Column <b>Genes of transcription unit</b> of <code>TUs.txt</code>
    * Gene identifiers in <code>sequences.fasta</code>
4. <b>Product</b> is consistent with:
    * Index of <code>proteins.txt</code>
    * Index of <code>RNAs.txt</code>
5. Must be tab-separated
    
#### Notes
* <b>Requirements 3, 4 and 5</b> regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.
* <b>Left-End-Position</b> and <b>Right-End-Position</b> do not need to be consistent with the positions <code>genome.gb</code>. coralME will keep the positions in <code>genome.gb</code> over the specified ones in <code>genes.txt</code>
    

<b>See an example of [genes.txt](./helper_files/genes.txt)</b>

## 4. Proteins (<code>proteins.txt</code>) [optional]
----------

<code>proteins.txt</code> is a protein complex information table that can be downloaded from the <b>All proteins of <i>organism</i> SmartTable</b> of the <b>[BioCyc](https://biocyc.org/)</b> database. Click <code>Export</code>><code>to Spreadsheet File</code>><code>frame IDs</code>. <b>This file is optional and is meant to complement the information from <code>genome.gb</code>.</b>

<code>proteins.txt</code> provides coralME with:
* Protein complex compositions

#### Requirements
This file should meet the following requirements:
1. Contains the index <b>(Proteins Complexes)</b> and columns <b>Common-Name</b>, <b>Genes of polypeptide, complex, or RNA</b>, and <b>Locations</b>.
2. <b>(Proteins Complexes)</b> is consistent with:
    * Column <b>Product</b> of <code>genes.txt</code>
3. <b>Genes of polypeptide, complex, or RNA</b> is consistent with:
    * Index <b>Gene Name</b> of <code>genes.txt</code>
4. Must be tab-separated
    
#### Notes
* <b>Requirements 2, 3 and 4</b> regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    

<b>See an example of [proteins.txt](./helper_files/proteins.txt)</b>

## 5. RNAs (<code>RNAs.txt</code>) [optional]
----------

<code>RNAs.txt</code> is an RNA annotation table that can be downloaded from the <b>All RNAs of <i>organism</i> SmartTable</b> of the <b>[BioCyc](https://biocyc.org/)</b> database. Click <code>Export</code>><code>to Spreadsheet File</code>><code>frame IDs</code>. <b>This file is optional and is meant to complement the information from <code>genome.gb</code>.</b>

<code>RNAs.txt</code> provides coralME with:
* Genes annotated as RNA products (e.g. tRNA, rRNA, etc.)
* RNA gene annotations (e.g. amino acids - tRNA associations)

#### Requirements
This file should meet the following requirements:
1. Contains the index <b>(All-tRNAs Misc-RNAs rRNAs)</b> and columns <b>Common-Name</b>, and <b>Gene</b>
2. <b>(All-tRNAs Misc-RNAs rRNAs)</b> is consistent with:
    * Column <b>Product</b> of <code>genes.txt</code>
3. <b>Gene</b> is consistent with:
    * Index <b>Gene Name</b> of <code>genes.txt</code>
4. Must be tab-separated
    
#### Notes
* <b>Requirements 2, 3 and 4</b> regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    

<b>See an example of [RNAs.txt](./helper_files/RNAs.txt)</b>

## 6. TUs (<code>TUs.txt</code>) [optional]
----------

<code>TUs.txt</code> is a transcription unit annotation table that can be downloaded from the <b>All TUs of <i>organism</i> SmartTable</b> of the <b>[BioCyc](https://biocyc.org/)</b> database. Click <code>Export</code>><code>to Spreadsheet File</code>><code>frame IDs</code>. <b>This file is optional and is meant to complement the information from <code>genome.gb</code>.</b>

<code>TUs.txt</code> provides coralME with:
* Co-transcribed genes (operons).
* Direction of transcription.
* TU IDs.

#### Requirements
This file should meet the following requirements:
1. Contains the index <b>Transcription-Units</b> and columns <b>Genes of transcription unit</b>, and <b>Direction</b>
2. <b>Genes of transcription unit</b> is consistent with:
    * Index <b>Gene Name</b> of <code>genes.txt</code>
3. Must be tab-separated
    
#### Notes
* <b>Requirements 2 and 3</b> regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    

<b>See an example of [TUs.txt](./helper_files/TUs.txt)</b>

## 7. Gene sequences (<code>sequences.fasta</code>) [optional]
----------

<code>sequences.fasta</code> is a nucleotide FASTA file that can be downloaded from the <b>All genes of <i>organism</i> SmartTable</b> of the <b>[BioCyc](https://biocyc.org/)</b> database. Click <code>Export</code>><code>FASTA</code>><code>Find sequences</code>. <b>This file is optional and is meant to complement the information from <code>genome.gb</code> in case the latter is missing genes.</b>

<code>sequences.fasta</code> provides coralME with:
* Gene sequences

#### Requirements
This file should meet the following requirements:
1. Gene identifiers are consistent with:
    * Index <b>Gene Name</b> of <code>genes.txt</code>    
2. Must be tab-separated
    
#### Notes
* <b>Requirements 1, 2 and 3</b> regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.
    

<b>See an example of [sequences.fasta](./helper_files/sequences.fasta)</b>