## Pre-process of omics trait data for downstream analysis：

This notebook explains the process for obtaining and pre-processing omics data of HS Rats used for downstream association studies.

In [2]:
using CSV, DataFrames, DelimitedFiles # packages for manipulating data
using GeneNetworkAPI # package for accessing data from GeneNetwork via API

First, let's find the name of the HS rats dataset we would like to obtain from GN.

In [3]:
Rats_groups = list_groups("rat");
HS_group = Rats_groups[4, :Name];

LoadError: HTTP.Exceptions.RequestError(HTTP.Messages.Request:
"""
GET /api/v_pre1/groups/rat HTTP/1.1
Host: gn2.genenetwork.org
Accept: */*
User-Agent: HTTP.jl/1.7.3
Content-Length: 0
Accept-Encoding: gzip

""", EOFError())

In [4]:
HS_datasets = list_datasets(HS_group);

LoadError: UndefVarError: HS_group not defined

In [24]:
HS_datasets

Row,AvgID,CreateTime,DataScale,FullName,Id,Long_Abbreviation,ProbeFreezeId,ShortName,Short_Abbreviation,confidentiality,public
Unnamed: 0_level_1,Int64,String,String,String,Int64,String,Int64,String,String,Int64,Int64
1,24,"Mon, 27 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Nucleus Accumbens Core RNA-Seq (Aug18) rlog,860,HSNIH-Rat-Acbc-RSeq-Aug18,347,HSNIH-Palmer Nucleus Accumbens Core RNA-Seq (Aug18) rlog,HSNIH-Rat-Acbc-RSeq-0818,0,1
2,24,"Sun, 26 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Infralimbic Cortex RNA-Seq (Aug18) rlog,861,HSNIH-Rat-IL-RSeq-Aug18,348,HSNIH-Palmer Infralimbic Cortex RNA-Seq (Aug18) rlog,HSNIH-Rat-IL-RSeq-0818,0,1
3,24,"Sat, 25 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Lateral Habenula RNA-Seq (Aug18) rlog,862,HSNIH-Rat-LHB-RSeq-Aug18,349,HSNIH-Palmer Lateral Habenula RNA-Seq (Aug18) rlog,HSNIH-Rat-LHB-RSeq-0818,0,1
4,24,"Fri, 24 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Prelimbic Cortex RNA-Seq (Aug18) rlog,863,HSNIH-Rat-PL-RSeq-Aug18,350,HSNIH-Palmer Prelimbic Cortex RNA-Seq (Aug18) rlog,HSNIH-Rat-PL-RSeq-0818,0,1
5,24,"Thu, 23 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Orbitofrontal Cortex RNA-Seq (Aug18) rlog,864,HSNIH-Rat-VoLo-RSeq-Aug18,351,HSNIH-Palmer Orbitofrontal Cortex RNA-Seq (Aug18) rlog,HSNIH-Rat-VoLo-RSeq-0818,0,1
6,24,"Fri, 14 Sep 2018 00:00:00 GMT",log2,HSNIH-Palmer Nucleus Accumbens Core RNA-Seq (Aug18) log2,868,HSNIH-Rat-Acbc-RSeqlog2-Aug18,347,HSNIH-Palmer Nucleus Accumbens Core RNA-Seq (Aug18) log2,HSNIH-Rat-Acbc-RSeqlog2-0818,0,0
7,24,"Fri, 14 Sep 2018 00:00:00 GMT",log2,HSNIH-Palmer Infralimbic Cortex RNA-Seq (Aug18) log2,869,HSNIH-Rat-IL-RSeqlog2-Aug18,348,HSNIH-Palmer Infralimbic Cortex RNA-Seq (Aug18) log2,HSNIH-Rat-IL-RSeqlog2-0818,0,0
8,24,"Fri, 14 Sep 2018 00:00:00 GMT",log2,HSNIH-Palmer Lateral Habenula RNA-Seq (Aug18) log2,870,HSNIH-Rat-LHB-RSeqlog2-Aug18,349,HSNIH-Palmer Lateral Habenula RNA-Seq (Aug18) log2,HSNIH-Rat-LHB-RSeqlog2-0818,0,0
9,24,"Fri, 14 Sep 2018 00:00:00 GMT",log2,HSNIH-Palmer Prelimbic Cortex RNA-Seq (Aug18) log2,871,HSNIH-Rat-PL-RSeqlog2-Aug18,350,HSNIH-Palmer Prelimbic Cortex RNA-Seq (Aug18) log2,HSNIH-Rat-PL-RSeqlog2-0818,0,0
10,24,"Fri, 14 Sep 2018 00:00:00 GMT",log2,HSNIH-Palmer Orbitofrontal Cortex RNA-Seq (Aug18) log2,872,HSNIH-Rat-VoLo-RSeqlog2-Aug18,351,HSNIH-Palmer Orbitofrontal Cortex RNA-Seq (Aug18) log2,HSNIH-Rat-VoLo-RSeqlog2-0818,0,0


In [30]:
HS_dataset_to_test = HS_datasets[4, :]

Row,AvgID,CreateTime,DataScale,FullName,Id,Long_Abbreviation,ProbeFreezeId,ShortName,Short_Abbreviation,confidentiality,public
Unnamed: 0_level_1,Int64,String,String,String,Int64,String,Int64,String,String,Int64,Int64
4,24,"Fri, 24 Aug 2018 00:00:00 GMT",log2,HSNIH-Palmer Prelimbic Cortex RNA-Seq (Aug18) rlog,863,HSNIH-Rat-PL-RSeq-Aug18,350,HSNIH-Palmer Prelimbic Cortex RNA-Seq (Aug18) rlog,HSNIH-Rat-PL-RSeq-0818,0,1


#### Get raw omics traits data:

In [31]:
@time omic_pheno = get_omics(HS_dataset_to_test[:Short_Abbreviation]);

947.436818 seconds (4.85 M allocations: 2.239 GiB, 0.05% gc time, 0.03% compilation time)


In [33]:
omic_pheno

Row,id,ENSRNOG00000000001,ENSRNOG00000000007,ENSRNOG00000000008,ENSRNOG00000000009,ENSRNOG00000000010,ENSRNOG00000000012,ENSRNOG00000000017,ENSRNOG00000000021,ENSRNOG00000000024,ENSRNOG00000000033,ENSRNOG00000000034,ENSRNOG00000000036,ENSRNOG00000000040,ENSRNOG00000000041,ENSRNOG00000000042,ENSRNOG00000000043,ENSRNOG00000000044,ENSRNOG00000000047,ENSRNOG00000000048,ENSRNOG00000000053,ENSRNOG00000000054,ENSRNOG00000000060,ENSRNOG00000000062,ENSRNOG00000000064,ENSRNOG00000000065,ENSRNOG00000000066,ENSRNOG00000000068,ENSRNOG00000000070,ENSRNOG00000000073,ENSRNOG00000000075,ENSRNOG00000000081,ENSRNOG00000000082,ENSRNOG00000000091,ENSRNOG00000000095,ENSRNOG00000000096,ENSRNOG00000000098,ENSRNOG00000000104,ENSRNOG00000000105,ENSRNOG00000000108,ENSRNOG00000000111,ENSRNOG00000000112,ENSRNOG00000000113,ENSRNOG00000000121,ENSRNOG00000000122,ENSRNOG00000000123,ENSRNOG00000000127,ENSRNOG00000000129,ENSRNOG00000000130,ENSRNOG00000000133,ENSRNOG00000000137,ENSRNOG00000000138,ENSRNOG00000000142,ENSRNOG00000000145,ENSRNOG00000000150,ENSRNOG00000000151,ENSRNOG00000000155,ENSRNOG00000000156,ENSRNOG00000000157,ENSRNOG00000000158,ENSRNOG00000000161,ENSRNOG00000000164,ENSRNOG00000000165,ENSRNOG00000000166,ENSRNOG00000000167,ENSRNOG00000000168,ENSRNOG00000000169,ENSRNOG00000000170,ENSRNOG00000000172,ENSRNOG00000000175,ENSRNOG00000000177,ENSRNOG00000000184,ENSRNOG00000000185,ENSRNOG00000000186,ENSRNOG00000000187,ENSRNOG00000000190,ENSRNOG00000000195,ENSRNOG00000000196,ENSRNOG00000000201,ENSRNOG00000000204,ENSRNOG00000000219,ENSRNOG00000000221,ENSRNOG00000000230,ENSRNOG00000000231,ENSRNOG00000000233,ENSRNOG00000000236,ENSRNOG00000000237,ENSRNOG00000000239,ENSRNOG00000000244,ENSRNOG00000000245,ENSRNOG00000000246,ENSRNOG00000000247,ENSRNOG00000000248,ENSRNOG00000000249,ENSRNOG00000000250,ENSRNOG00000000251,ENSRNOG00000000257,ENSRNOG00000000258,ENSRNOG00000000262,ENSRNOG00000000264,⋯
Unnamed: 0_level_1,String15,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,Float64?,⋯
1,00071F4FAF,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,⋯
2,00071F6771,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,⋯
3,00071F768E,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,⋯
4,00071F95F9,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,⋯
5,00071FB160,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,⋯
6,00071FB747,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,⋯
7,00072069AD,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,⋯
8,0007207A73,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,⋯
9,0007207BE7,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,⋯
10,00072126F3,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,missing,⋯


In [37]:
HS_dataset_to_test[:Short_Abbreviation]

"HSNIH-Rat-PL-RSeq-0818"

#### Remove missing values - get all samples that have no missings

We would like to see the proportion of missing values from the structured omics data:

In [39]:
count = 0;
for i in 1:size(omic_pheno, 1)
    for j in 1:size(omic_pheno, 2)
        if ismissing(omic_pheno[i, j])
            count = count +1;
        end
    end
end

In [40]:
count / (size(omic_pheno, 1) * size(omic_pheno, 2))

0.9870058818601999

We would like to get the omic traits observations for samples that have no missing values...

In [59]:
is_nonmissing = map(x -> !ismissing(x), omic_pheno[:, 2]);
true_samples = omic_pheno[is_nonmissing, 1];
omic_pheno_true = omic_pheno[is_nonmissing, :];

In [60]:
count = 0;
for i in 1:size(omic_pheno_true, 1)
    for j in 1:size(omic_pheno_true, 2)
        if ismissing(omic_pheno_true[i, j])
            count = count +1;
        end
    end
end

In [68]:
to_test = dropmissing(omic_pheno);

In [70]:
sum(to_test[:, 1] .== true_samples)

80

In [62]:
count = 0

0

In [61]:
count / (size(omic_pheno_true, 1) * size(omic_pheno_true, 2))

0.0

#### Finally, write to file the omics traits with no missing values

In [5]:
# CSV.write("HSNIH-Rat-PL-RSeq-0818_nomissing.csv", omic_pheno_true)