In [1]:
import sragent
import pandas as pd

In [2]:
projects = ['PRJNA262623', 'PRJNA227448', 'PRJNA140547', 'PRJNA989169',
            'PRJNA954824', 'PRJNA912607', 'PRJNA831793', 'PRJNA783027',
            'PRJNA753826', 'PRJNA737490', 'PRJNA721183', 'PRJNA672715',
            'PRJNA643248', 'PRJNA588479', 'PRJNA559331', 'PRJNA492238',
            'PRJNA487157', 'PRJNA450434', 'PRJNA384583', 'PRJNA320298',
            'PRJNA278334', 'PRJNA274975', 'PRJNA254082', 'PRJNA231240',
            'PRJNA153387']


The `annotate()` function takes project ids (as string or list) and returns annotated experiment metadata.\
It works in two steps. \
1. a fetch step - pulling metadata from the SRA with the entrez tool kit.
2. an annotation step that prompts one of the OpenAI models to summarize and annotate experiment metadata.

By default `annotate()` will perform both of these tasks for every project it's passed.
But if we want, we can just pull metadata prior to annotation by setting `annotate = False`

In [3]:
meta = sragent.annotate(projects, model = 'gpt-4o-mini', annotate = False)

Fetching PRJNA262623...
PRJNA262623 fetch complete...
Fetching PRJNA227448...
PRJNA227448 fetch complete...
Fetching PRJNA140547...
PRJNA140547 fetch complete...
Fetching PRJNA989169...
PRJNA989169 fetch complete...
Fetching PRJNA954824...
PRJNA954824 fetch complete...
Fetching PRJNA912607...
PRJNA912607 fetch complete...
Fetching PRJNA831793...
PRJNA831793 fetch complete...
Fetching PRJNA783027...
PRJNA783027 fetch complete...
Fetching PRJNA753826...
PRJNA753826 fetch complete...
Fetching PRJNA737490...
PRJNA737490 fetch complete...
Fetching PRJNA721183...
PRJNA721183 fetch complete...
Fetching PRJNA672715...
PRJNA672715 fetch complete...
Fetching PRJNA643248...
PRJNA643248 fetch complete...
Fetching PRJNA588479...
PRJNA588479 fetch complete...
Fetching PRJNA559331...
PRJNA559331 fetch complete...
Fetching PRJNA492238...
PRJNA492238 fetch complete...
Fetching PRJNA487157...
PRJNA487157 fetch complete...
Fetching PRJNA450434...
PRJNA450434 fetch complete...
Fetching PRJNA384583...
PRJN

`meta` is a pandas dataframe with all of the project ids, experiment ids, abstracts, protocols, experiment titles, and experiment attributes.\
`annotate()` can also take this dataframe as input, that way we don't have to unnecessarily repeat metadata pulls from the SRA.\
It also let's us manipulate an subset the metadata before we annotate. \
For instance, in this vignette we have a total of 25 different projects that include histone PTM ChIP-seq experiments in yeast. \
But there are non-histone ChIP-seq experiments within these projects as well that we don't care about for now. \
Let's try to filter down to just experiments with histone targets.\

In [11]:
ptms = ['k4me','k4me2','k4me3','k27ac']
for x in meta['title']:
    for ptm in ptms:
        if ptm in x.lower():
            print(x)

h3k4me3_0
h3k4me3_0
h3k4me3_4
h3k4me3_4
h3k4me3_8
h3k4me3_8
h3k4me3_15
h3k4me3_15
h3k4me3_30
h3k4me3_30
h3k4me3_60
h3k4me3_60
h3k4me2_0
h3k4me2_0
h3k4me2_4
h3k4me2_4
h3k4me2_8
h3k4me2_8
h3k4me2_15
h3k4me2_15
h3k4me2_30
h3k4me2_30
h3k4me2_60
h3k4me2_60
h3k4me_0
h3k4me_4
h3k4me_8
h3k4me_15
h3k4me_30
h3k4me_60
h3k27ac_0
h3k27ac_4
h3k27ac_8
h3k27ac_15
h3k27ac_30
h3k27ac_60
H3K4me3_ChIP-seq_t16
H3K4me3_ChIP-seq_t16
H3K4me3_ChIP-seq_t15
H3K4me3_ChIP-seq_t15
H3K4me3_ChIP-seq_t14
H3K4me3_ChIP-seq_t14
H3K4me3_ChIP-seq_t13
H3K4me3_ChIP-seq_t13
H3K4me3_ChIP-seq_t12
H3K4me3_ChIP-seq_t12
H3K4me3_ChIP-seq_t11
H3K4me3_ChIP-seq_t11
H3K4me3_ChIP-seq_t10
H3K4me3_ChIP-seq_t10
H3K4me3_ChIP-seq_t9
H3K4me3_ChIP-seq_t9
H3K4me3_ChIP-seq_t8
H3K4me3_ChIP-seq_t8
H3K4me3_ChIP-seq_t7
H3K4me3_ChIP-seq_t7
H3K4me3_ChIP-seq_t6
H3K4me3_ChIP-seq_t6
H3K4me3_ChIP-seq_t5
H3K4me3_ChIP-seq_t5
H3K4me3_ChIP-seq_t4
H3K4me3_ChIP-seq_t4
H3K4me3_ChIP-seq_t3
H3K4me3_ChIP-seq_t3
H3K4me3_ChIP-seq_t2
H3K4me3_ChIP-seq_t2
H3K4me3_ChIP-s

In [10]:
!pip install -e ..

Defaulting to user installation because normal site-packages is not writeable
Obtaining file:///work/users/m/j/mjn15/sragent
Installing collected packages: sragent
  Running setup.py develop for sragent
Successfully installed sragent
