This repository has been archived by the owner on Sep 9, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 112
/
rplos_vignette.Rmd
169 lines (118 loc) · 4.54 KB
/
rplos_vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
<!--
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{rplos tutorial}
-->
rplos tutorial
=====
```{r echo=FALSE}
knitr::opts_chunk$set(
comment = "#>",
warning = FALSE,
message = FALSE
)
```
The `rplos` package interacts with the API services of [PLoS](http://www.plos.org/) (Public Library of Science) Journals. You used to need an API key to work with this package - that is no longer needed!
This tutorial will go through three use cases to demonstrate the kinds
of things possible in `rplos`.
* Search across PLoS papers in various sections of papers
* Search for terms and visualize results as a histogram OR as a plot through time
* Text mining of scientific literature
### Load package from CRAN
```{r eval=FALSE}
install.packages("rplos")
```
```{r}
library('rplos')
```
### Search across PLoS papers in various sections of papers
`searchplos` is a general search, and in this case searches for the term
**Helianthus** and returns the DOI's of matching papers
```{r searchplos1}
searchplos(q= "Helianthus", fl= "id", limit = 5)
```
Get only full article DOIs
```{r searchplos2}
searchplos(q="*:*", fl='id', fq='doc_type:full', start=0, limit=5)
```
Get DOIs for only PLoS One articles
```{r searchplos3}
searchplos(q="*:*", fl='id', fq='cross_published_journal_key:PLoSONE', start=0, limit=5)
```
Get DOIs for full article in PLoS One
```{r searchplos4}
searchplos(q="*:*", fl='id',
fq=list('cross_published_journal_key:PLoSONE', 'doc_type:full'),
start=0, limit=5)
```
Search for many terms
```{r searchplos5}
q <- c('ecology','evolution','science')
lapply(q, function(x) searchplos(x, limit=2))
```
### Search on specific sections
A suite of functions were created as light wrappers around `searchplos` as a shorthand to search specific sections of a paper.
* `plosauthor` searchers in authors
* `plosabstract` searches in abstracts
* `plostitle` searches in titles
* `plosfigtabcaps` searches in figure and table captions
* `plossubject` searches in subject areas
`plosauthor` searches across authors, and in this case returns the authors of the matching papers. the fl parameter determines what is returned
```{r plosauthor}
plosauthor(q = "Eisen", fl = "author", limit = 5)
```
`plosabstract` searches across abstracts, and in this case returns the id and title of the matching papers
```{r plosabstract}
plosabstract(q = 'drosophila', fl='id,title', limit = 5)
```
`plostitle` searches across titles, and in this case returns the title and journal of the matching papers
```{r plostitle}
plostitle(q='drosophila', fl='title,journal', limit=5)
```
### Faceted search
Facet by journal
```{r facet1}
facetplos(q='*:*', facet.field='journal')
```
Using `facet.query` to get counts
```{r facet2}
facetplos(q='*:*', facet.field='journal', facet.query='cell,bird')
```
Date faceting
```{r facet3}
facetplos(q='*:*', url=url, facet.date='publication_date',
facet.date.start='NOW/DAY-5DAYS', facet.date.end='NOW', facet.date.gap='+1DAY')
```
### Highlighted search
Search for the term _alcohol_ in the abstracts of articles, return only 10 results
```{r high1}
highplos(q='alcohol', hl.fl = 'abstract', rows=2)
```
Search for the term _alcohol_ in the abstracts of articles, and return fragment size of 20 characters, return only 5 results
```{r high2}
highplos(q='alcohol', hl.fl='abstract', hl.fragsize=20, rows=2)
```
Search for the term _experiment_ across all sections of an article, return id (DOI) and title fl only, search in full articles only (via `fq='doc_type:full'`), and return only 10 results
```{r high3}
highplos(q='everything:"experiment"', fl='id,title', fq='doc_type:full',
rows=2)
```
### Search for terms and visualize results as a histogram OR as a plot through time
`plosword` allows you to search for 1 to K words and visualize the results
as a histogram, comparing number of matching papers for each word
```{r plosword1}
out <- plosword(list("monkey", "Helianthus", "sunflower", "protein", "whale"),
vis = "TRUE")
out$table
```
```{r plosword1plot, fig.width=6, fig.height=4}
out$plot
```
You can also pass in curl options, in this case get verbose information on the curl call.
```{r plosword2}
plosword('Helianthus', callopts=list(verbose=TRUE))
```
### Visualize terms
`plot_throughtime` allows you to search for up to 2 words and visualize the results as a line plot through time, comparing number of articles matching through time. Visualize with the ggplot2 package, only up to two terms for now.
```{r throughtime1, fig.width=6, fig.height=4}
plot_throughtime(terms = "phylogeny", limit = 200) + geom_line(size=2, color='black')
```