extract_form_esummary matrix cannot be cleanly written to csv #65

gadepallivs · 2015-09-16T22:59:01Z

Hi David,
Below is the example. I did not understand why title, fulljournalname, pubtype has the text data extending to second column.

PM.ID <- c("26287849", "25979833", "25667274", "25430497", "24968756", "24846037", "24296758", "24281417", "24128713", "24055406","23489023")
p.data <- entrez_summary(db = "pubmed", id = PM.ID  )
pubrecord.table <- extract_from_esummary(esummaries = p.data , elements = c("uid","title","fulljournalname", "pubtype", "volume", "issue", "pages",                                                                           "lastauthor","pmcrefcount", "issn", "pubdate" ))
is(pubrecord.table) #  "matrix"         "array"          "structure"      "vector"         "vectorORfactor"
pubrecord.table <- t(pubrecord.table) # transpose the rows into columns
write.csv(pubrecord.table , file = "test12.csv" )

The text was updated successfully, but these errors were encountered:

dwinter · 2015-09-17T01:47:08Z

This is not really a problem with rentrez, just a property of NCBI records and R objects.

In this case, the pubtype field is variably-sized:

sapply(pubrecord.table[4,], length)

26287849 25979833 25667274 25430497 24968756 24846037 24296758 24281417 
       2        2        1        2        1        3        1        2 
24128713 24055406 23489023 
       1        2        2

When you try and write the matrix it represents the vectrors like you'd type them in (c(..., ...)) which adds a comma which breaks the csv format.

In this case, you can collapse the vectors:

pubrecord.table[4,] <- sapply(pubrecord.table[4,], paste, collapse=" & ")

and unlist each matrix row to allow them to be written out

f <- tempfile()
write.csv( apply(pubrecord.table, 1, unlist), f)
re_read <- read.csv(f)
re_read$pmcrefcount

 [1]  0  1  3  2  1 26 10  4  3  2 21

gadepallivs · 2015-10-12T17:19:56Z

Hi david,
The solution above works on certain PMID queries, but for others I still get an error. Depending on PMID the variable field lengths are noted in Title, Journal name , pubtype or something else. I thought just removing the row number will fix the issue. But, I get error when trying to write a table on Rshiny
pubrecord.table[,] <- sapply(pubrecord.table[,], paste, collapse=" & ")

Error in apply(pubrecord.reference, 1, unlist) : dim(X) must have a positive length
P.S Why was the function extract_form_esummary designed to return a matrix ? The data it extracts is a mix of character, string , numeric vectors and so by definition dataframe would ideal to store these kind of data, while matrix is is expected to store data of the same type ?

dwinter · 2015-10-12T17:54:05Z

I'm not sure what you are trying to in the example, but it seems like it's hitting empty fields?

extract_form_esummary is really a wrapper to sapply, it doesn't return data.frames because I think most users don't expect data.frame columns to contain vectors like

df <- as.data.frame(t(pubrecord.table))
df$pubtype

$`26287849`
[1] "Journal Article"   "Multicenter Study"

$`25979833`
[1] "Journal Article"             "Randomized Controlled Trial"

$`25667274`
[1] "Journal Article"
.
.
.

Structured data like that would seem to fit a list better than a data.frame, and you can get that by setting simplify=FALSE.

gadepallivs · 2015-10-12T19:00:10Z

*Edited, noted the issue *
Hi david, I noted the issue was with empty abstract fields for some entries.

PM.ID <- c("26391251","26372702","26372699","26371045","26338018","26317919",
            "26315966","26301800","26301799","26258891")
fetch.pubmed <- entrez_fetch(db = "pubmed", id = pubmed.search$ids,
                              rettype = "xml", parsed = T)
abstracts = xpathApply(fetch.pubmed, '//PubmedArticle//Article', function(x) xmlValue(xmlChildren(x)$Abstract))

This results in NA for PMIDs where abstracts are empty. But, when It is being rendered using Rshiny it has problem displaying the table just shows "Processing" but does not display any table. need to learn more about it.
This is not related to rentrez package.
Thank you

dwinter · 2015-10-13T16:38:17Z

OK, good luck to getting to the bottom of the shiny problem :)

dwinter changed the title ~~extract_form_esummary results matrix of irrelevant column data~~ extract_form_esummary matrix cannot be cleanly written to csv Sep 17, 2015

dwinter closed this as completed Sep 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract_form_esummary matrix cannot be cleanly written to csv #65

extract_form_esummary matrix cannot be cleanly written to csv #65

gadepallivs commented Sep 16, 2015

dwinter commented Sep 17, 2015

gadepallivs commented Oct 12, 2015

dwinter commented Oct 12, 2015

gadepallivs commented Oct 12, 2015

dwinter commented Oct 13, 2015

extract_form_esummary matrix cannot be cleanly written to csv #65

extract_form_esummary matrix cannot be cleanly written to csv #65

Comments

gadepallivs commented Sep 16, 2015

dwinter commented Sep 17, 2015

gadepallivs commented Oct 12, 2015

dwinter commented Oct 12, 2015

gadepallivs commented Oct 12, 2015

dwinter commented Oct 13, 2015