# SRA ChIP-Seq Data

We are wanting pol II ChIP-Seq data in the testis. There is one dataset in GEO, but I want to look in SRA if there are any additional datasets. This notebook uses my MongoDB from ncbi_remap for easy querying in SRA. 

In [1]:
# %load defaults.py
# Imports
import os
import sys
from pathlib import Path
import re

from IPython.display import display, HTML, Markdown
import numpy as np
import pandas as pd

import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Project level imports
from larval_gonad.notebook import Nb

In [2]:
# Setup notebook
nbconfig = Nb.setup_notebook()

last updated: 2018-07-19 
Git hash: 09c6fb287994a19d25111e9a64bdc2fd8dcb68b3


In [3]:
from pymongo import MongoClient
host = 'localhost'
mongoClient = MongoClient(host=host, port=27017)
db = mongoClient['sra']
ncbi = db['ncbi']

In [4]:
rnadat = [x['_id'] for x in ncbi.aggregate([
    {
        '$match': {
            'sra.experiment.library_strategy': 'RNA-Seq'
        }
    },
    {
        '$project': {
            '_id': 1
        }
    }
])]

In [58]:
aggr = list(ncbi.aggregate([
    {
        '$match': {
            '_id': {'$in': rnadat}
        }
    },
    {
        '$unwind': {
            'path': '$sra.sample.attributes'
        }
    },
    {
        '$project': {
            'title': '$sra.sample.title',
            'attr_name': '$sra.sample.attributes.name',
            'attr_val': '$sra.sample.attributes.value',
            'bp': '$bioproject.bioproject_accn',
        }
    },
    {
        '$group': {
            '_id': '$_id',
            'title': {'$first': '$title'},
            'vals': {'$addToSet': '$attr_val'},
            'bp': {'$first': '$bp'}
        }
    },
]))

def quick_query(query):
    regex = re.compile(query, re.IGNORECASE)
    res = []
    for i in aggr:
        _title = i.get('title', False)
        if _title:
            attrs = [_title]
        else:
            attrs = []
        attrs.extend(i['vals'])
        string = ' '.join(attrs)
        if re.search(regex, string):
            res.append((i['bp'], i['_id'], string))
    print(len(res))
    return pd.DataFrame(res, columns=['BioProject', 'SRX', 'desc']).set_index(['BioProject', 'SRX']).sort_index()

In [59]:
pd.options.display.max_colwidth = 1000
pd.options.display.max_rows = 1000

In [60]:
quick_query(r'diap')

8


Unnamed: 0_level_0,Unnamed: 1_level_0,desc
BioProject,SRX,Unnamed: 2_level_1
PRJEB7251,ERX555844,midgut UAS-dIAP overexpression source C1 UAS-dIAP no1
PRJEB7251,ERX555853,midgut UAS-dIAP overexpression source C4 UAS-dIAP no4
PRJEB7251,ERX555854,midgut UAS-esg+UAS-dIAP overexpression source B4 UAS-esg+dIAP no4
PRJEB7251,ERX555855,midgut UAS-dIAP overexpression source C3 UAS-dIAP no3
PRJEB7251,ERX555859,midgut UAS-esg+UAS-dIAP overexpression source B1 UAS-esg+dIAP no1
PRJEB7251,ERX555860,midgut UAS-esg+UAS-dIAP overexpression source B2 UAS-esg+dIAP no2
PRJEB7251,ERX555865,UAS-esg+UAS-dIAP overexpression midgut source B3 UAS-esg+dIAP no3
PRJEB7251,ERX555866,midgut UAS-dIAP overexpression source C2 UAS-dIAP no2


In [79]:
quick_query(r'dred')

0


Unnamed: 0_level_0,Unnamed: 1_level_0,desc
BioProject,SRX,Unnamed: 2_level_1


In [61]:
quick_query(r'bam')

22


Unnamed: 0_level_0,Unnamed: 1_level_0,desc
BioProject,SRX,Unnamed: 2_level_1
PRJNA117723,SRX014984,Bam mutant testes testis mRNA from bam[1]/bam[114] mutant males testis
PRJNA117723,SRX014986,Bam mutant ovaries ovary mRNA from bam[1]/bam[delta86] mutant female ovaries
PRJNA143877,SRX079960,Bam Mutant Bam Mutant ovaries bam(delta)86/bam(delta)86 Bam Mutant (developmental control)
PRJNA169498,SRX156291,pKF63 BamHI S2 S2 cell culture line pRS425 circular pKF63 BamHI cut Firefly luciferase PCR product
PRJNA169498,SRX156292,pKF63 circular S2 S2 cell culture line pKF63 circular Firefly luciferase PCR product pRS425 BamHI
PRJNA306537,SRX1493953,OSS control Model organism or animal 5 hr 0.1% ethanol (carrier only) w[1118]; P[w+ hsp-70 bam+] bam[D86] ry e/d bam[D86] P[ovo-lacZ] P[vas-egfp] OSS Drosophila Genome Resource Center stock 190 not applicable ovary adult female
PRJNA306537,SRX1493954,OSS ecdysone Model organism or animal w[1118]; P[w+ hsp-70 bam+] bam[D86] ry e/d bam[D86] P[ovo-lacZ] P[vas-egfp] OSS Drosophila Genome Resource Center stock 190 not applicable 5 hr 1E-6 M 20-hydroxyecdysone ovary adult female
PRJNA327202,SRX1884277,ribosomal RNA depleted total RNA from OSS cells w[1118]; P [w+ hsp-70 bam+] 11-d bam[D86] ry e/d bam[D86] P[ovo-lacZ) P[vas-egfp] no treatment ovarian somatic sheath cells (OSS) OSS cells
PRJNA343120,SRX2166019,SG-bam-1 bam mutant mitotic germ cells testis
PRJNA343120,SRX2166020,SG-bam-2 bam mutant mitotic germ cells testis


In [62]:
nos = quick_query(r'nos')
nos[~(nos.desc.str.contains('ovar') | nos.desc.str.contains('female') | nos.desc.str.contains('embryo') | nos.desc.str.contains('eye'))]

114


Unnamed: 0_level_0,Unnamed: 1_level_0,desc
BioProject,SRX,Unnamed: 2_level_1
PRJNA139041,SRX056910,MiT[w-]3R2 MiT[w-]3R2 retrieved from Minos-based insertional mutagenesis screen adults resistant line
PRJNA159443,SRX142027,Zeus knockdown testis expression profiling 1 Zeus testis knockdown 1-7d adult dissected testis Oregon-R nosGal4>>UAS-Zeus-RNAi
PRJNA159443,SRX142028,Zeus knockdown testis expression profiling 2 Zeus testis knockdown 1-7d adult dissected testis Oregon-R nosGal4>>UAS-Zeus-RNAi
PRJNA159443,SRX142029,Caf40 knockdown testis expression profiling 1 Caf40 testis knockdown 1-7d adult nosGal4>>UAS-Caf40-RNAi dissected testis Oregon-R
PRJNA159443,SRX142030,Caf40 knockdown testis expression profiling 2 Caf40 testis knockdown 1-7d adult nosGal4>>UAS-Caf40-RNAi dissected testis Oregon-R
PRJNA187504,SRX220312,Ovary shWhite Rep2 Ovaries shWhite x nos-gal4; control knockdown of a protein irrelevant to the system we studied (White) Bloomington Drosophila Stock Center Ovary shWhite_RNA-Seq
PRJNA277742,SRX970009,Drosophila melanogaster larva challenged by trypanosomatid tryp_control Model organism or animal Oregon-R larva whole organism not collected


In [63]:
quick_query(r'test[ie]s')

155


Unnamed: 0_level_0,Unnamed: 1_level_0,desc
BioProject,SRX,Unnamed: 2_level_1
PRJDB5877,DRX090704,Harwich_testis_S1_L001_I1_001 testis testis1 Harwich
PRJDB5877,DRX090705,OM5_testis_S2_L001_I1_001 testis testis2 OM5
PRJDB5877,DRX090706,testis testis3 KY74_testis_S3_L001_I1_001 KY74
PRJDB5877,DRX090707,KY101_testis_S4_L001_I1_001 testis4 testis KY101
PRJEB22205,ERX2162339,Male_Testis_1; FlyAtlas2 Male_Testis 2017-09-06 ERS1885685 Testis male 2017-08-25
PRJEB22205,ERX2162340,Male_Testis_2 Male_Testis 2017-09-06 ERS1885686 Testis male 2017-08-25
PRJEB22205,ERX2162341,Male_Testis_3 Male_Testis 2017-09-06 Testis male 2017-08-25 ERS1885687
PRJEB2414,ERX010180,E-MTAB-493:s3 testes ERS022741 7 days mixed_sex Canton S wild type
PRJNA117723,SRX014984,Bam mutant testes testis mRNA from bam[1]/bam[114] mutant males testis
PRJNA117723,SRX014985,"Wild type testes testis mRNA from wild-type y,w male fly testis"
