# 1 Import motifs from gimmemotifs dataset.

Gimmemotifs provides many motif dataset that was generated from public motif database including CisDB, ENCODE, HOMER, and JASPAR.
https://gimmemotifs.readthedocs.io/en/master/overview.html



## 1.1 gimme.vertebrate.v5.0.

By default GimmeMotifs uses a non-redundant, clustered database of known vertebrate motifs. 
These motifs come from CIS-BP (http://cisbp.ccbr.utoronto.ca/) and other sources. 
This motif dataset can be easily loaded with the following command.

If your dataset is Mouse or Human, this one will be a good default choice.

In [1]:
# Compare with default motifs in gimmemotifs
from gimmemotifs.motif import default_motifs
motifs =  default_motifs()

# Check first 10 motifs
motifs[:10]

[GM.5.0.Sox.0001_AACAAT,
 GM.5.0.Homeodomain.0001_AGCTGTCAnnA,
 GM.5.0.Mixed.0001_snnGGsssGGs,
 GM.5.0.Nuclear_receptor.0001_TAwsTrGGTCAsTrGGTCA,
 GM.5.0.Mixed.0002_GCTAATTA,
 GM.5.0.Nuclear_receptor.0002_wnyrCTTCCGGGkC,
 GM.5.0.bHLH.0001_ACGTG,
 GM.5.0.Myb_SANT.0001_rrCCGTTAAACnGyy,
 GM.5.0.C2H2_ZF.0001_GCGkGGGCGG,
 GM.5.0.GATA.0001_TTATCTsnnnnnnnCA]

## 1.2 Another motif that is provided with gimmemotifs package

Many other motifs are provided by GimmeMotifs

You can load them as follows.

### 1.2.1. Get motif data list

In [2]:
# Get folder path that stores motif data.
import os, glob
from gimmemotifs.motif import MotifConfig
config = MotifConfig()
motif_dir = config.get_motif_dir()

# Get motif data names
motifs_data_name = [i for i in os.listdir(motif_dir) if i.endswith(".pfm")]
motifs_data_name.sort()
motifs_data_name

['CIS-BP.pfm',
 'ENCODE.pfm',
 'HOCOMOCOv10_HUMAN.pfm',
 'HOCOMOCOv10_MOUSE.pfm',
 'HOCOMOCOv11_HUMAN.pfm',
 'HOCOMOCOv11_MOUSE.pfm',
 'HOMER.pfm',
 'IMAGE.pfm',
 'JASPAR2018.pfm',
 'JASPAR2018_fungi.pfm',
 'JASPAR2018_insects.pfm',
 'JASPAR2018_nematodes.pfm',
 'JASPAR2018_plants.pfm',
 'JASPAR2018_urochordates.pfm',
 'JASPAR2018_vertebrates.pfm',
 'JASPAR2020.pfm',
 'JASPAR2020_fungi.pfm',
 'JASPAR2020_insects.pfm',
 'JASPAR2020_nematodes.pfm',
 'JASPAR2020_plants.pfm',
 'JASPAR2020_urochordates.pfm',
 'JASPAR2020_vertebrates.pfm',
 'RSAT_insects.pfm',
 'RSAT_plants.pfm',
 'RSAT_vertebrates.pfm',
 'SwissRegulon.pfm',
 'factorbook.pfm',
 'gimme.vertebrate.v5.0.pfm']

## 1.2.2. Load motifs

In [3]:
# You can load motif files with "read_motifs"
from gimmemotifs.motif import read_motifs

path = os.path.join(motif_dir, "JASPAR2020_vertebrates.pfm")
motifs = read_motifs(path)

# Check first 10 motifs
motifs[:10]

[MA0006.1_Ahr::Arnt_yGCGTG,
 MA0854.1_Alx1_nGnnyTAATTArTnnnn,
 MA0634.1_ALX3_nnyAATTAnn,
 MA0853.1_Alx4_CGnnyTAATTArnnnnn,
 MA0007.3_Ar_rGGwACAynGTGTwCyn,
 MA1463.1_ARGFX_nCTAATTArn,
 MA0151.1_Arid3a_ATyAAA,
 MA0601.1_Arid3b_nTATTAATwnn,
 MA0602.1_Arid5a_nyAATATTGnnAnn,
 MA0004.1_Arnt_CACGTG]

You can use this motif data for CallOracle motif analysis.

In [4]:
m = motifs[0]

In [6]:
m.factors

{'direct': ['Arnt', 'Ahr'], 'indirect\nor predicted': []}