Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize into a data.frame? #5

Closed
karthik opened this issue Sep 6, 2014 · 3 comments
Closed

Serialize into a data.frame? #5

karthik opened this issue Sep 6, 2014 · 3 comments

Comments

@karthik
Copy link
Member

karthik commented Sep 6, 2014

It would be nice to provide some helper functions to serialize the results into a data.frame, especially since the fields returned are often * (but not always) standard. For e.g.

Arrrr> library(aRxiv)
Arrrr> z <- arxiv_search(id_list = "1403.3048,1402.2633,1309.1192")
sapply(z, length)
Arrrr> sapply(z, length)
entry entry entry
   17    17    15

and fields returned are also not super consistent.

Arrrr> sapply(z, names)
$entry
 [1] "id"               "updated"          "published"        "title"
 [5] "summary"          "author"           "author"           "author"
 [9] "author"           "doi"              "link"             "comment"
[13] "journal_ref"      "link"             "link"             "primary_category"
[17] "category"

$entry
 [1] "id"               "updated"          "published"        "title"
 [5] "summary"          "author"           "author"           "author"
 [9] "author"           "author"           "author"           "author"
[13] "comment"          "link"             "link"             "primary_category"
[17] "category"

$entry
 [1] "id"               "updated"          "published"        "title"
 [5] "summary"          "author"           "doi"              "link"
 [9] "comment"          "journal_ref"      "link"             "link"
[13] "primary_category" "category"         "category"

This helper function could take a rbind.fill approach to get an even data.frame returned, or you could consult the API and get a complete list of field names and construct a standard data.frame into which search results can be coerced. Feel free to discard the idea -- just throwing out a suggestion.

@karthik
Copy link
Member Author

karthik commented Sep 6, 2014

The link fields could use some tidyr and get collapsed into a list within a data.frame (otherwise you'll repeat the other information across multiple rows). 🐙

@kbroman
Copy link
Member

kbroman commented Sep 7, 2014

This is what I'm working on next. My plan was to combine things like the multiple author names into a single field, with | separators and then put it all into a data frame.
The links are particularly weird...there are abstract, pdf, and DOI links differentiated with different attributes. But I think I'm on top of this.

@kbroman
Copy link
Member

kbroman commented Sep 7, 2014

With PR #7, I have arxiv_search() returning a data frame.

Maybe I should have had the columns like authors be lists within the data frame; instead, I pasted together to make a single string with | separators.

@kbroman kbroman closed this as completed Sep 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants