# Description of Inputs

coralME takes a total of 7 inputs, 2 required and 5 optional:

## Types of inputs

### Required

1. __Genome file__ (**<code>genome.gb</code>**)
2. __M-model__ (**<code>m_model.json</code>** or **<code>m_model.xml</code>**)

### Optional

Downloadable from an existing **BioCyc** database under **<code>Special SmartTables</code>**. If no optional files are provided, coralME complements them with **<code>genome.gb</code>**

3. __Genes file__, by default: **<code>genes.txt</code>**
4. __RNAs file__, by default: **<code>RNAs.txt</code>**
5. __Proteins file__, by default: **<code>proteins.txt</code>**
6. __TUs file__, by default: **<code>TUs.txt.</code>**
7. __Sequences file__, by default: **<code>sequences.fasta</code>**

### Configuration
8. __Paths file__, by default: **inputs.json**
9. __Parameters file__, by default: **organism.json**

<img src="./pngs/inputs.png" alt="Drawing" style="width: 800px;"/>

## Description
### Genome (**<code>genome.gb</code>**)

#### Description

The genome file contains provides coralME with:

* Gene annotations.
* Gene sequences.

#### Requirements

1. Locus tags (locus_tag or old_locus_tag) MUST be consistent with **<code>m_model.json</code>**. Make sure you download the same genome file that was used to reconstruct the M-model.
2. Has name **<code>genome.gb</code>**.
3. Genbank-compliant file. Must be read by BioPython correctly.
4. It must contain the entire genome sequence. Make sure to enable **<code>Customize View</code>**>**<code>Show Sequence</code>** before downloading the genbank file from NCBI.

See an example of [genome.gb](./helper_files/tutorial/inputs/genome.gb) and [sequences.fasta](./helper_files/tutorial/inputs/sequences.fasta)

### M-model (**<code>m_model.json</code>**)

#### Description

The M-model provides coralME with the metabolic model components:

* Metabolic network (M-matrix)
* Gene-protein-reaction associations
* Environmental and internal constraints
* Reaction subsystems
* Biomass composition

#### Requirements

1. Gene identifiers MUST be consistent with **<code>genome.gb</code>** locus_tag or old_locus_tag. Make sure you download the same genome file that was used to reconstruct the M-model.
2. Has name **<code>m_model.json</code>**.
3. COBRApy-compliant. Must be read by cobrapy-0.25.0.

See an example of [m_model.json](./helper_files/tutorial/inputs/m_model.json)

### Gene dictionary (**<code>genes.txt</code>**) [optional]

#### Description

**<code>genes.txt</code>** is a gene information table that can be downloaded from the **All genes of <i>organism</i> SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **<code>Export</code>**>**<code>to Spreadsheet File</code>**>**<code>frame IDs</code>**. This file is optional and is meant to complement the information from **<code>genome.gb</code>** in case the latter is missing genes.

**<code>genes.txt</code>** provides coralME with:

* Gene locus tags
* Gene names
* Gene annotations
* Gene positions
* Gene products (protein, tRNA, etc.)

#### Requirements

1. Contains the index **Gene Name** and columns **Accession-1**, **Left-End-Position**, **Right-End-Position**, and **Product**.
2. **Accession-1** MUST be consistent with the gene IDs in the GPRs of **<code>m_model.json</code>** and with the locus_tag (or old_locus_tag) in **<code>genome.gb</code>**.
3. **Gene Name** is consistent with:

    * Column **Genes of polypeptide, complex, or RNA** of **<code>proteins.txt</code>**
    * Column **Gene** of **<code>RNAs.txt</code>** 
    * Column **Genes of transcription unit** of **<code>TUs.txt</code>**
    * Gene identifiers in **<code>sequences.fasta</code>**
4. **Product** is consistent with:

    * Index of **<code>proteins.txt</code>**
    * Index of **<code>RNAs.txt</code>**
    
5. Must be tab-separated

See an example of [genes.txt](./helper_files/tutorial/inputs/genes.txt)

<div class="alert alert-info">
**Note:** **Requirement 2** regarding ID consistency should be directly met if the files are downloaded from the correct BioCyc database.
</div>

<div class="alert alert-info">
**Note:** **Requirements 3, 4 and 5** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.
</div>


    

### Proteins (**<code>proteins.txt</code>**) [optional]

#### Description
**<code>proteins.txt</code>** is a protein complex information table that can be downloaded from the **All proteins of <i>organism</i> SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **<code>Export</code>**>**<code>to Spreadsheet File</code>**>**<code>frame IDs</code>**. This file is optional and is meant to complement the information from **<code>genome.gb</code>**.

**<code>proteins.txt</code>** provides coralME with:
* Protein complex compositions

#### Requirements

1. Contains the index **(Proteins Complexes)** and columns **Common-Name**, **Genes of polypeptide, complex, or RNA**, and **Locations**.
2. **(Proteins Complexes)** is consistent with:
    * Column **Product** of **<code>genes.txt</code>**
3. **Genes of polypeptide, complex, or RNA** is consistent with:
    * Index **Gene Name** of **<code>genes.txt</code>**
4. Must be tab-separated

See an example of [proteins.txt](./helper_files/tutorial/inputs/proteins.txt)

<div class="alert alert-info">
**Note:** **Requirements 2, 3 and 4** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    
</div>



### RNAs (**<code>RNAs.txt</code>**) [optional]

#### Description
**<code>RNAs.txt</code>** is an RNA annotation table that can be downloaded from the **All RNAs of <i>organism</i> SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **<code>Export</code>**>**<code>to Spreadsheet File</code>**>**<code>frame IDs</code>**. This file is optional and is meant to complement the information from **<code>genome.gb</code>**.

**<code>RNAs.txt</code>** provides coralME with:

* Genes annotated as RNA products (e.g. tRNA, rRNA, etc.)
* RNA gene annotations (e.g. amino acids - tRNA associations)

#### Requirements

1. Contains the index **(All-tRNAs Misc-RNAs rRNAs)** and columns **Common-Name**, and **Gene**
2. **(All-tRNAs Misc-RNAs rRNAs)** is consistent with:

    * Column **Product** of **<code>genes.txt</code>**
3. **Gene** is consistent with:

    * Index **Gene Name** of **<code>genes.txt</code>**
4. Must be tab-separated
    
See an example of [RNAs.txt](./helper_files/tutorial/inputs/RNAs.txt)

<div class="alert alert-info">
**Note:** **Requirements 2, 3 and 4** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    
</div>

### TUs (**<code>TUs.txt</code>**) [optional]

#### Description

**<code>TUs.txt</code>** is a transcription unit annotation table that can be downloaded from the **All TUs of <i>organism</i> SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **<code>Export</code>**>**<code>to Spreadsheet File</code>**>**<code>frame IDs</code>**. This file is optional and is meant to complement the information from **<code>genome.gb</code>**.

**<code>TUs.txt</code>** provides coralME with:

* Co-transcribed genes (operons).
* Direction of transcription.
* TU IDs.

#### Requirements

1. Contains the index **Transcription-Units** and columns **Genes of transcription unit**, and **Direction**
2. **Genes of transcription unit** is consistent with:

    * Index **Gene Name** of **<code>genes.txt</code>**
3. Must be tab-separated
    
See an example of [TUs.txt](./helper_files/tutorial/inputs/TUs.txt)
    
<div class="alert alert-info">
**Note:** **Requirements 2 and 3** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.    
</div>

### Gene sequences (**<code>sequences.fasta</code>**) [optional]

#### Description
**<code>sequences.fasta</code>** is a nucleotide FASTA file that can be downloaded from the **All genes of <i>organism</i> SmartTable** of the **[BioCyc](https://biocyc.org/)** database. Click **<code>Export</code>**>**<code>FASTA</code>**>**<code>Find sequences</code>**. This file is optional and is meant to complement the information from **<code>genome.gb</code>** in case the latter is missing genes.

**<code>sequences.fasta</code>** provides coralME with:

* Gene sequences

#### Requirements

1. Gene identifiers are consistent with:

    * Index **Gene Name** of **<code>genes.txt</code>**    
2. Must be tab-separated
    
See an example of [sequences.fasta](./helper_files/tutorial/inputs/sequences.fasta)
    
<div class="alert alert-info">
**Note:** **Requirements 1, 2 and 3** regarding ID consistency should be directly met if the files are downloaded from the same BioCyc database.
</div>

### Configuration of paths to files (**<code>inputs.json**<code>)

#### Description
**<code>inputs.json</code>** is a JSON file containing paths to input files for coralME.

**<code>inputs.json</code>** provides coralME with:

* Paths to input files

#### Requirements

1. Must be JSON-compliant
2. Must contain paths to required files (M-model and Genome).
3. All defined files  must exist.

See an example of [input.json](./helper_files/tutorial/input.json)

### Configuration of parameters (**<code>organism.json**<code>)

#### Description

**<code>organism.json</code>** is a JSON file containing paths to input files for coralME.

**<code>organism.json</code>** provides coralME with:

* ME-modeling parameters

#### Requirements

1. Must be JSON-compliant
2. Must contain the standard fields.

See an example of [organism.json](./helper_files/tutorial/organism.json)