# 1. Prepare input data

If you want to use another motif data source, you need to make a list of `motif` class in gimmemotifs.
The easiest way to make such object is to use `read_motifs` function provided by gimmemotifs package.

This function can load motif data text file. 
You need to prepare two files, XXX.motif2factors.txt and XXX.pfm.



## 3.1 XXX.motif2factors.txt
The text file, XXX.motif2factors.txt includes TF factor annotation for each motifs.
The file should be like a tsv file like below. 

- The first column should be motif name, the motif name should match with motif name in pfm file.
- The second column is gene symbol, the thrid column is datasource. This column is not important.
- The third column is data sorce name. Don't include space in the data source name.
- The forth column is additional information for this factor. Please enter "Y" if the factor information was confirmed by some experimental evidence. Otherwise, please enter "N".

In [1]:
# Download example XXX.motif2factors.txt data
!wget https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/CisBP_ver2_Danio_rerio.motif2factors.txt

# If you are using macOS, please try the following command.
#!curl -O https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/CisBP_ver2_Danio_rerio.motif2factors.txt

--2021-07-16 14:22:51--  https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/CisBP_ver2_Danio_rerio.motif2factors.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2813285 (2.7M) [text/plain]
Saving to: ‘CisBP_ver2_Danio_rerio.motif2factors.txt’


2021-07-16 14:22:52 (72.3 MB/s) - ‘CisBP_ver2_Danio_rerio.motif2factors.txt’ saved [2813285/2813285]



In [2]:
path_motif2factors = "CisBP_ver2_Danio_rerio.motif2factors.txt"

# Print the contents.
with open(path_motif2factors, "r") as f:
    for i, j in enumerate(f):
        print(j)
        if i>5:
            break

Motif	Factor	Evidence	Curated

M00008_2.00	hmga1a	PBM	N

M00008_2.00	hmga2	PBM	N

M00045_2.00	foxj2	PBM	N

M00056_2.00	en2a	PBM	N

M00056_2.00	gbx2	PBM	N

M00056_2.00	uncx4.1	PBM	N



## 3.2 XXX.pfm
The second file, XXX.pfm. should includes motif pwm information.
The file shoud be like below.

The motif name in this pfm file should exactly match with the motif name in motif2factor.txt file.

In [3]:
# Download example XXX.motif2factors.txt data
!wget https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/CisBP_ver2_Danio_rerio.pfm
    
# If you are using macOS, please try the following command.
#!curl -O https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/CisBP_ver2_Danio_rerio.pfm

--2021-07-16 14:23:50--  https://raw.githubusercontent.com/morris-lab/CellOracle/master/docs/demo_data/CisBP_ver2_Danio_rerio.pfm
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3796293 (3.6M) [text/plain]
Saving to: ‘CisBP_ver2_Danio_rerio.pfm’


2021-07-16 14:23:50 (70.8 MB/s) - ‘CisBP_ver2_Danio_rerio.pfm’ saved [3796293/3796293]



In [5]:
path_pfm = "CisBP_ver2_Danio_rerio.pfm"

with open(path_pfm, "r") as f:
    for i, j in enumerate(f):
        print(j)
        if i>10:
            break

>M00008_2.00

0.28295895050258896	0.248464961579856	0.24599269390504502	0.22258339401250998

0.31665902396267703	0.18453249282953102	0.236944410987863	0.261864072219928

0.377777871501523	0.11265233353358699	0.20941258176308603	0.300157213201805

0.6163930334523089	0.0689847041998677	0.11148441459121801	0.203137847756605

0.675764369206464	0.0427703219414491	0.0631709440515372	0.21829436480055

0.326032179952778	0.133886990300119	0.0820227041025529	0.458058125644551

0.321153050529938	0.137265609257083	0.11238357564255498	0.429197764570424

>M00045_2.00

0.222345132743363	0.00110619469026549	0.7754424778761059	0.00110619469026549

0.00110619469026549	0.00110619469026549	0.00110619469026549	0.9966814159292041

0.9966814159292041	0.00110619469026549	0.00110619469026549	0.00110619469026549



## 3.3 Load files as motif list
We can load files using read_motifs function in gimmemotifs.

First, please prepare two files, XXX.motif2factors.txt and XXX.pfm. in the same directly.
If you have theses two file in a different place, we cannot use the read_motifs function.



In [7]:
from gimmemotifs.motif import read_motifs

# Check path for pfm file
print(path_pfm)

# Read motifs
motifs = read_motifs(path_pfm)

# Check first 10 motifs
motifs[:10]

CisBP_ver2_Danio_rerio.pfm


[M00008_2.00_nnnAAww,
 M00045_2.00_GTAAACAA,
 M00056_2.00_TAATAAAT,
 M00066_2.00_nsGTTGCyAn,
 M00070_2.00_nrAACAATAnn,
 M00111_2.00_nGCCynnGGs,
 M00112_2.00_CCTsrGGCnA,
 M00113_2.00_nsCCnnAGGs,
 M00114_2.00_nnGCCynnGG,
 M00115_2.00_nnATnAAAn]