## Goal: Annotate the genome of Aspergillus nidulans in JBrowse


![Image](picsproject/JGI+portal+image+Aspergillus+nidulans+by+Yainitza+Hernandez+Rodriguez+at+UGA+adjusted.jpg)

## Run Maker

### Modify CTL files

In [None]:
/data/project/Asp_nidulans/runmaker$ vim maker_opts.ctl

![Image](picsproject/Screen Shot 2017-10-30 at 3.13.59 PM.png)

In [None]:
protein_gff=  #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=all #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein=/data/project/applications/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm= #GeneMark HMM file
augustus_species=aspergillus_fumigatus #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
run_evm=1 #run EvidenceModeler, 1 = yes, 0 = no
est2genome=0 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=0 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
snoscan_meth= #-O-methylation site fileto have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no
allow_overlap= #allowed gene overlap fraction (value from 0 to 1, blank for default)

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3 file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene predictors
pred_stats=0 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
min_intron=20 #minimum intron length (used for alignment polishing)
single_exon=1 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=1 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files

In [None]:
/data/project/Asp_nidulans/runmaker$ vim maker_exe.ctl

![Image](picsproject/Screen Shot 2017-10-30 at 3.15.32 PM.png)

In [None]:
#-----Location of Executables Used by MAKER/EVALUATOR
makeblastdb=/data/project/applications/maker/bin/../exe/blast/bin/makeblastdb #location of NCBI+ makeblastdb executable
blastn=/data/project/applications/maker/bin/../exe/blast/bin/blastn #location of NCBI+ blastn executable
blastx=/data/project/applications/maker/bin/../exe/blast/bin/blastx #location of NCBI+ blastx executable
tblastx=/data/project/applications/maker/bin/../exe/blast/bin/tblastx #location of NCBI+ tblastx executable
formatdb= #location of NCBI formatdb executable
blastall= #location of NCBI blastall executable
xdformat= #location of WUBLAST xdformat executable
blasta= #location of WUBLAST blasta executable
prerapsearch= #location of prerapsearch executable
rapsearch= #location of rapsearch executable
RepeatMasker=/data/project/applications/maker/bin/../exe/RepeatMasker/RepeatMasker #location of RepeatMasker executable
exonerate=/data/project/applications/maker/bin/../exe/exonerate/bin/exonerate #location of exonerate executable

#-----Ab-initio Gene Prediction Algorithms
snap=/usr/bin/snap #location of snap executable
gmhmme3= #location of eukaryotic genemark executable
gmhmmp= #location of prokaryotic genemark executable
augustus=/data/project/applications/maker/exe/augustus/bin/augustus #location of augustus executable
fgenesh= #location of fgenesh executable
evm=/data/project/applications/EVidenceModeler-1.1.1/evidence_modeler.pl #location of EvidenceModeler executable
tRNAscan-SE= #location of trnascan executable
snoscan= #location of snoscan executable

#-----Other Algorithms
probuild= #location of probuild executable (required for genemark)
~                                                                     

### Command to run Maker

In [None]:
nohup /data/project/applications/maker/bin/maker &>> makererr.log &

This was repeated 7 more times to run each chromosome on a CPU
Maker gave us GFF3 and FASTA files

In [None]:
#!/usr/bin/env python3

import sys

fh = open(sys.argv[1], 'r')
#maker_file = open('maker.txt', 'a')


for line in fh:
        if not line.startswith('#'):
                line = line.rstrip()
                col = line.split('\t')
                if col[1] is not '.':
                        outfile = open(col[1]+'.gff', 'a')

                        outfile.write(line+'\n')
                        outfile.close()

fh.close()

![Image](picsproject/Screen Shot 2017-10-30 at 3.02.25 PM.png)

![Image](picsproject/Screen Shot 2017-10-30 at 3.05.16 PM.png)

## Upload GFF3 files to JBrowse

In [None]:
$ bin/flatfile-to-json.pl --gff /data/project/Asp_nidulans/Aspnid1.filtered_proteins.AspGD_genes.gff3 --trackType CanvasFeatures --trackLabel jgi_models

## Link to JBrowse

http://jbrownies.programmingforbiology.org/jbrowse/?loc=ChrIII_A_nidulans_FGSC_A4%3A1979485..1979993&tracks=DNA%2Calignments&highlight=