# Motif: Search for the presence of mutational motifs in samples
Use "mutagene motif" to search for the presence of mutational motifs in mutational data. A motif is defined as a characteristic pattern of DNA mutation and its local DNA context. It is often associated with a specific carcinogen or a biological process. MutaGene represents motifs as a string of characters, where characters in brackets represent the single-base substitutions and characters outside brackets represent the unmutated DNA context. The motif must be in quotes to be recognized by MutaGene. "A[C>A]G" represents the DNA sequence "ACG" mutated into the DNA sequence "AAG".

## 1. Collect the necessary files
[If you do not have sample or reference files click here](fetch.ipynb)

## 2. Get help on required and optional arguments for motif command

In [11]:
!mutagene motif -h

usage: mutagene motif [-h] [--infile INFILE] [--genome GENOME]
                      [--input-format {MAF,VCF}] [--motif MOTIF]
                      [--outfile [OUTFILE]] [--window-size WINDOW_SIZE]
                      [--strand {+,-,=,+-=}] [--threshold THRESHOLD]

optional arguments:
  -h, --help            show this help message and exit

Required arguments:
  --infile INFILE, -i INFILE
                        Input file in MAF or VCF format with one or multiple
                        samples
  --genome GENOME, -g GENOME
                        Location of genome assembly file in 2bit format

Optional arguments:
  --input-format {MAF,VCF}, -f {MAF,VCF}
                        Input format: MAF, VCF
  --motif MOTIF, -m MOTIF
                        Motif to search for, use the 'R[C>T]GY' syntax for the
                        motif. Use quotes
  --outfile [OUTFILE], -o [OUTFILE]
                        Name of output file, will be generated in TSV format

Advanced arguments:
  --

## 3. Search for all pre-identfied motifs in skcm_yale/data_mutations_mskcc.txt using genome hg19 in any strand
Note: If more than 3 or 4 warnings, may want to use another genome

In [3]:
!mutagene motif -i ./skcm_yale/data_mutations_mskcc.txt -g hg19 -s "="

  0%|                                                    | 0/89 [00:00<?, ?it/s]
  0%|                                                     | 0/7 [00:00<?, ?it/s][A
 14%|██████▍                                      | 1/7 [00:00<00:00,  6.69it/s][A
 29%|████████████▊                                | 2/7 [00:00<00:00,  7.32it/s][A
 43%|███████████████████▎                         | 3/7 [00:00<00:00,  7.84it/s][A
 57%|█████████████████████████▋                   | 4/7 [00:00<00:00,  7.97it/s][A
 71%|████████████████████████████████▏            | 5/7 [00:00<00:00,  7.99it/s][A
 86%|██████████████████████████████████████▌      | 6/7 [00:00<00:00,  7.39it/s][A
100%|█████████████████████████████████████████████| 7/7 [00:00<00:00,  7.96it/s][A
  1%|▍                                           | 1/89 [00:00<01:16,  1.15it/s][A
  0%|                                                     | 0/7 [00:00<?, ?it/s][A
 14%|██████▍                                      | 1/7 [00:00<00:00,  7.68it/s

## 4. Search for the presence of the C[A>T] motif in skcm_yale/data_mutations_mskcc.txt using hg19
Note: If more than 3 or 4 warnings, may want to use another genome

In [10]:
!mutagene motif -i ./skcm_yale/data_mutations_mskcc.txt -g hg19 -m 'C[A>T]'

  0%|                                                    | 0/89 [00:00<?, ?it/s]
  0%|                                                     | 0/1 [00:00<?, ?it/s][A
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  4.24it/s][A
  1%|▍                                           | 1/89 [00:00<00:20,  4.23it/s][A
  0%|                                                     | 0/1 [00:00<?, ?it/s][A
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  3.61it/s][A
  2%|▉                                           | 2/89 [00:00<00:21,  4.02it/s][A
  0%|                                                     | 0/1 [00:00<?, ?it/s][A
                                                                                [A
  0%|                                                     | 0/1 [00:00<?, ?it/s][A
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  2.23it/s][A
  4%|█▉                                          | 4/89 [00:00<00:20,  4.06it/s