# Description of Inputs

coralME takes a total of 6 inputs, 2 required and 4 optional:

### Required
1. Genome file (genome.gb)
2. M-model (m_model.json)

### Optional
Downloadable from an existing BioCyc database:

3. genes.txt
4. RNAs.txt
5. proteins.txt
6. TUs.txt. 


![](./images/inputs.png)

## 1. Genome (<code>genome.gb</code>)

The genome file contains provides coralME with:
* Gene annotations.
* Gene positions.

#### Requirements

1. Has name <code>genome.gb</code>.
2. Genbank-compliant file. Must be read by BioPython correctly.
3. If it does not contain the entire genome sequence, you must provide a second file as a FASTA file called <code>sequence.fasta</code>.

## 2. M-model (<code>m_model.json</code>)

The M-model provides coralME with the metabolic model components:
* Metabolic network (M-matrix)
* Gene-protein-reaction associations
* Environmental and internal constraints
* Reaction subsystems
* Biomass composition

#### Requirements
This file should meet the following requirements:
1. Has name <code>genome.gb</code>.
2. COBRApy-compliant. Must be read by cobrapy-0.25.0.
3. If it does not contain the entire genome sequence, you must provide a second file as a FASTA file called <code>sequence.fasta</code>.

## 3. Gene dictionary (<code>genes.txt</code>) [optional]

<code>genes.txt</code> is a gene information table that can be downloaded from the <b>All genes of <i>organism</i> SmartTable</b> of the <b>[BioCyc](https://biocyc.org/)</b> database. <b>This file is optional and is meant to complement the information from <code>genome.gb</code> in case it is missing genes.</b>

<code>genes.txt</code> provides coralME with:
* Gene locus tags
* Gene names
* Gene annotations
* Gene positions
* Gene products (protein, tRNA, etc.)

#### Requirements
This file should meet the following requirements:
1. Contains the index <b>Gene Name</b> and columns <b>Accession-1</b>, <b>Left-End-Position</b>, <b>Right-End-Position</b>, and <b>Product</b>.
2. <b>Gene Name</b> is consistent with:
    * Column <b>Genes of polypeptide, complex, or RNA</b> of <code>proteins.txt</code>
    * Column <b>Gene</b> of <code>RNAs.txt</code> 
    * Column <b>Genes of transcription unit</b> of <code>TUs.txt</code>
3. <b>Accession-1</b> is consistent with the gene IDs in the GPRs of <code>m_model.json</code> and with the locus_tag (or old_locus_tag) in <code>genome.gb</code>.
4. <b>Product</b> is consistent with:
    * Index of <code>proteins.txt</code>
    * Index of <code>RNAs.txt</code>
    
#### Notes
* <b>Requirements 2, 3 and 4</b> regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.
* <b>Left-End-Position</b> and <b>Right-End-Position</b> do not need to be consistent with the positions <code>genome.gb</code>. coralME will keep the positions in <code>genome.gb</code> over the specified ones in <code>genes.txt</code>
    

## 4. Proteins (<code>proteins.txt</code>) [optional]

<code>proteins.txt</code> is a protein complex information table that can be downloaded from the <b>All proteins of <i>organism</i> SmartTable</b> of the <b>[BioCyc](https://biocyc.org/)</b> database. <b>This file is optional and is meant to complement the information from <code>genome.gb</code>.</b>

<code>proteins.txt</code> provides coralME with:
* Protein complex compositions

#### Requirements
This file should meet the following requirements:
1. Contains the index <b>(Proteins Complexes)</b> and columns <b>Common-Name</b>, <b>Genes of polypeptide, complex, or RNA</b>, and <b>Locations</b>.
2. <b>(Proteins Complexes)</b> is consistent with:
    * Column <b>Product</b> of <code>genes.txt</code>
3. <b>Genes of polypeptide, complex, or RNA</b> is consistent with:
    * Index <b>Gene Name</b> of <code>genes.txt</code>
    
#### Notes
* <b>Requirements 2 and 3</b> regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    

## 5. RNAs (<code>RNAs.txt</code>) [optional]

<code>RNAs.txt</code> is an RNA annotation table that can be downloaded from the <b>All RNAs of <i>organism</i> SmartTable</b> of the <b>[BioCyc](https://biocyc.org/)</b> database. <b>This file is optional and is meant to complement the information from <code>genome.gb</code>.</b>

<code>RNAs.txt</code> provides coralME with:
* Genes that annotated as RNA products (e.g. tRNA, rRNA, etc.)
* RNA gene annotations (e.g. amino acids - tRNA associations)

#### Requirements
This file should meet the following requirements:
1. Contains the index <b>(All-tRNAs Misc-RNAs rRNAs)</b> and columns <b>Common-Name</b>, and <b>Gene</b>
2. <b>(All-tRNAs Misc-RNAs rRNAs)</b> is consistent with:
    * Column <b>Product</b> of <code>genes.txt</code>
3. <b>Gene</b> is consistent with:
    * Index <b>Gene Name</b> of <code>genes.txt</code>
    
#### Notes
* <b>Requirement 2</b> regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    

## 6. TUs (<code>TUs.txt</code>) [optional]

<code>TUs.txt</code> is an transcription unit annotation table that can be downloaded from the <b>All TUs of <i>organism</i> SmartTable</b> of the <b>[BioCyc](https://biocyc.org/)</b> database. <b>This file is optional and is meant to complement the information from <code>genome.gb</code>.</b>

<code>TUs.txt</code> provides coralME with:
* Co-transcribed genes (operons).
* Direction of transcription.
* TU IDs.

#### Requirements
This file should meet the following requirements:
1. Contains the index <b>Transcription-Units</b> and columns <b>Genes of transcription unit</b>, and <b>Direction</b>
2. <b>Genes of transcription unit</b> is consistent with:
    * Index <b>Gene Name</b> of <code>genes.txt</code>
    
#### Notes
* <b>Requirement 2</b> regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    