Tree data( #47) #62

dwinter · 2016-01-25T05:04:07Z

What do you think about this as a solution to #47? Here's how it works at the moment.

Demonstration

There are two functions for getting external IDs, one for studies...

study_external_IDs("pg_1940")

External data identifiers for study 
 $doi:  10.1017/S001667231000008X 
 $pubmed_id:  20433773 
 $popset_ids: vector of 5 IDs 
 $nucleotide_ids: vector of 164 IDs
 $external_data_url http://purl.org/phylo/treebase/phylows/study/TB2:S10691

... and another for taxa

taxon_external_IDs(712902)

  source      id
1   ncbi  325167
2   gbif 4827728
3   gbif 4267261
4  irmng 1249869
5  irmng 1452570

Those two only get you as far as finding IDs, rather than importing data into an R session. I did write some (currently un-exported) functions to summarise a set of nucleotide or popset IDs. Here's one example

summarize_nucleotide_data(ids$nucleotide_ids[1:10])

                uid
295388256 295388256
295388254 295388254
295388253 295388253
295388251 295388251
295388249 295388249
295388247 295388247
295388245 295388245
295388243 295388243
295388241 295388241
295388239 295388239
                                                                                                                      title
295388256                Hirtodrosophila thoracis cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388254                 Hirtodrosophila duncani cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388253 Hirtodrosophila sp. KVDL-2010 cytochrome c oxidase subunit III-like (COIII) gene, partial sequence; mitochondrial
295388251                Mycodrosophila claytonae cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388249                     Drosophila pinicola cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388247                   Drosophila macrospina cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388245                    Drosophila guttifera cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388243                      Drosophila falleni cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388241                      Zaprionus indianus cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
295388239                     Zaprionus sepsoides cytochrome c oxidase subunit III (COIII) gene, partial cds; mitochondrial
          slen                      organism completeness
295388256  445      Hirtodrosophila thoracis             
295388254  445       Hirtodrosophila duncani             
295388253  363 Hirtodrosophila sp. KVDL-2010             
295388251  445      Mycodrosophila claytonae             
295388249  445           Drosophila pinicola             
295388247  445         Drosophila macrospina             
295388245  445          Drosophila guttifera             
295388243  445            Drosophila falleni             
295388241  445            Zaprionus indianus             
295388239  445           Zaprionus sepsoides

I'm not sure how useful this really is, and think it might just be better to document this sort of workflow including packages for ncbi/gbif/whatever?

What's in the PR

In addition to the functions there is

Documentation for the exported functions
A new section in the mashups vignette demonstrating their use
Tests for these functions and thier print method
A small tweak to the strip_ott_ids that's let's users optionally replace underscores with spaces
A new import for rentrez to get the NCBI data

Study external fxn handles missing data gracefully Added function to retrieve taxonomic IDs

fmichonneau · 2016-01-30T22:17:43Z

That looks @dwinter. Thanks for doing that!

Were you thinking that all of this should go into a vignette or just the summary functions?

dwinter · 2016-02-01T16:58:51Z

Hi @fmichonneau,

So, I was really questioning wether the summarize* functions. were going to be very useful. It's hard to predict what users will want to do with the IDs, and we can't really write wrappers for every package the might use them?So, it might make more sense to have rotl functions for gathering IDs, and let users do whatever they want with them.

If you and @josephwb agree. Maybe I'll modify this PR to remove the summarize* functions, but include a similar example in the vignette (maybe finding sequences for a given taxon). We can use that to talk about packages that can make use of the other IDs.?

*SQUASH* rewrite tree_data docs Explicitly print for tests of printed output (new for testthat 1.0) taxonomy_taxon -> taxonomy_taxon_info for v3 Pick study_external test case that doesn't throw warning Comment out not-very-helpful summary functions Add sequencing fetching e.g. to data mashup vignette Use new strip_ott_id fxn in metaanalysis vignette

dwinter · 2016-04-21T18:57:06Z

Hey @josephwb and @fmichonneau

I think this should now pass all tests.

Just to remind you, it adds two functions, study_external_IDs and taxon_external_IDs that gather whatever external data is available for studies or taxa. There are also tests and vignette examples for these.

I decided to remove the summarize_nucleotide_data function, which I just don't thnk would be very helpful. Instead I added an example of how to fetch DNA sequences using the external IDs to the vignette.

Oh, and a new version of rentrez is about to go to CRAN, which should remove the message about guessing the encoding is UTF-8 in the vignette.

Tell me what you think

fmichonneau · 2016-04-21T21:25:38Z

Thanks for this! I think it's going to be really useful.

dwinter added 13 commits November 3, 2015 21:09

Start on #47, functions to summarize external IDs for study

eb1e7ad

Merge branch 'master' into tree_data

4d8dfad

Tests for #47

f2ad698

Work towards #47

6af322f

Study external fxn handles missing data gracefully Added function to retrieve taxonomic IDs

Create docs for external ID fxns

2d4fba9

Add option to replace "_" with " " in strip_IDs

1e97207

Include some #47 examples in the mashup vignette

d56d007

Add import for rentrez

d8d575a

export external data print fxn

f29d833

Add external data fxns .Rd files

c3118c0

fix external IDs example

64e52ed

recompile docs to include external ID e.g. fix

1cdcabe

Finally fix the documentation?

dfeb2c6

dwinter added 2 commits April 21, 2016 11:02

Merge all the v3 changes into tree_data

1319357

fmichonneau merged commit 3f883ca into master Apr 21, 2016

fmichonneau deleted the tree_data branch June 12, 2023 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tree data( #47) #62

Tree data( #47) #62

dwinter commented Jan 25, 2016

fmichonneau commented Jan 30, 2016

dwinter commented Feb 1, 2016

dwinter commented Apr 21, 2016

fmichonneau commented Apr 21, 2016

Tree data( #47) #62

Tree data( #47) #62

Conversation

dwinter commented Jan 25, 2016

Demonstration

What's in the PR

fmichonneau commented Jan 30, 2016

dwinter commented Feb 1, 2016

dwinter commented Apr 21, 2016

fmichonneau commented Apr 21, 2016