forked from CenterForOpenScience/scrapi
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG
261 lines (210 loc) · 7.22 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
*********
ChangeLog
*********
0.10.7 (2015-11-16)
===================
- Update Cambridge long name
- Update UTAustin url and approved sets
0.10.6 (2015-11-10)
===================
- Fix springer API key name
0.10.5 (2015-11-09)
===================
- Make single document query much faster for apiserver
- Add status route, so the status can be checked without being super slow
0.10.4 (2015-11-06)
===================
- Prevent Django from executing a count query for pagination
0.10.3 (2015-11-05)
===================
- Prevent unicode decode errors when stripping bytes out of json blobs
0.10.2 (2015-11-05)
===================
- Log single failed documents on the cross_db migration, but don't halt it
0.10.1 (2015-11-05)
===================
- Fix invalid database serialization declaration
0.10.0 (2015-11-04)
===================
- Add harvesters:
- Portland State University
- Philadelphia College of Osteopathic
- University of Tennessee Knoxville
- Springer Link
- Kent State
- Lake Winnipeg Basin Information Network Data Hub
- VIVO
- Research Online @ University of Wollongong
- University of Cambridge
- CiteSeerX
- UKY
- University of Illinois
- Chapman
- Neurovault
- NIH
- Add Django REST API for raw and normalized documents, as well as
a postgres backend for scrAPI
- Prevent mapper parsing exceptions in elasticsearch from preventing a
document from being indexed when it is a date field that breaks
- Make pubmedcentral API endpoint point to the correct place
- Add support for running doctests
- Various other testing and test coverage improvements
- FRONTEND_KEYS is no longer a required setting, and will default to allowing
all fields
- Various documentation fixes
0.9.10 (2015-10-26)
===================
- Fix incorrect map serialization in OAI harvesters
0.9.9 (2015-10-26)
==================
- Scitech documents will now process again
0.9.8 (2015-10-20)
==================
- Fix url-gathering for pubmedcentral (after they had a schema change)
0.9.7 (2015-09-18)
==================
- Bugfixes for OAI url gathering
0.9.6 (2015-09-17)
==================
- Elasticsearch now retries requests after connection errors
- Calhoun harvester now ignores that the SSL cert is invalid
- OAI url parser now terminates the regex capture after finding an invalid DOI
character
- harvester invoke task now puts the default start date as settings.DAYS_BACK
days before the end date
- scrapi.requests now exposes the requests.exceptions module
- Update README.md with updated date information
0.9.5 (2015-09-14)
==================
- Clinical Trials harvester now dumps lxml elements to dicionaries in
the otherProperties field
0.9.4 (2015-09-10)
==================
- Biomedcentral harvester now filters out results from the future
0.9.3 (2015-09-01)
==================
- Capture more uris from pubmedcentral harvester
- Update favicons so that all favicons are .icos (fixes IE display bug)
- Fix longname for Portland State University harvester
0.9.2 (2015-09-01)
==================
- fix specification of canonicalUri requirements in schema
- update harvesters to reflect change in specification
0.9.1 (2015-09-01)
==================
- Document __repr__ no longer throw exceptions (allowing errors to be reported)
0.9.0 (2015-08-27)
==================
- Update setup documentation
- add harvesters:
- WHOAS at MBLWHOI Library
- The OAKTrust Digital Repository at Texas A&M
- DigitalCommons@PCOM
- PDXScholar
- ScholarsArchive@OSU
- stricter date/time, email, uri validation
- automated data cleaning (strip out optional values with no semantic information)
- extraction of PDF links for OAI harvesters
- more consistent date formatting
- more consistent DOI extraction
- fix URLs for auto generated OAI harvesters
- OSF harvester now sorts by correct date
0.8.4 (2015-08-21)
==================
- Fix url gathering for datacite harvester
0.8.3 (2015-08-19)
==================
- Add funding information to the crossref harvester
0.8.2 (2015-08-11)
==================
- Add harvester for Washington University Open Scholarship
0.8.1 (2015-08-06)
==================
- Scitech harvester now uses the correct start and end dates
0.8.0 (2015-07-28)
==================
- Add harvesters for Smithsonian Digital Repository, Hacettepe,
Harvard Dataverse, Cyberleninka, Howard University, Scholarworks Umass,
Inter-University Consortium for Political and Social Research
- Python 3 support
- Fix DOI harvesting for OAI harvesters
- Fix OAI harvesters having their otherProperties overwritten when they
defined a new schema.
- Fix resumption tokens in OAI harvesters
- Fix date parsing for DOE schema harvesters
- Stop JSON processor from swallowing exceptions
- Update harvesters to make their schemas more closely match the spec
0.7.6 (2015-07-10)
==================
- Fix language harvesting for DOE and OAI harvesters
0.7.5 (2015-07-10)
==================
- Fix shareok harvester (SSL verification failures ignored)
0.7.4 (2015-07-08)
==================
- Fix probabilistic test failures
0.7.3 (2015-07-07)
==================
- Add Daily SSRN harvester
0.7.2 (2015-06-30)
==================
- Make harvesters run monday-sunday by default
0.7.1 (2015-06-15)
==================
- Base OAI schema now includes DOIs as object URIs
- If a migration begins to fail due to cassandra connection errors, we now
attempt to re-establish the connection
0.7.0 (2015-06-12)
==================
- Add University of Delaware, Harvard Dash,
Data Dryad, and Iowa Research harvesters
- Update skip logic for shareok
- Rewrote cassandra models to partition data to make migrations more efficient
- Added migration script for new models
- Rewrote migrations to take advantage of celery
- Added automatic malformed XML recovery
0.6.6 (2015-06-08)
==================
- Fixed small bug in dryad where documents without URIs were created
0.6.5 (2015-06-08)
==================
- Add harvard-dash, iowa research, and data dryad harvesters
- Make migrations a little more resilient (with autoretries)
- Fix a bug with introspection into function arguments for logging
0.6.0 (2015-05-04)
==================
- Better logging
- Add tests for harvesters
- Add the rename migration script
- Add the delete migration script
- Add the Zenodo, Scholarsbank, SHARE OK, CU Scholar, Calhoun, Caltech
Authors, BHL, and CogPrints harvesters
0.5.0 (2015-04-13)
==================
- Adds the Osf harvester
0.4.0 (2015-04-10)
==================
- Data One now uses XMLHarvester
- PLoS now uses XMLHarvester
- Crossref is no longer limited to collect 1000 documents
- Add the BioMed harvester
- Requests no longer crashes when recording is turned off
- Cassandra now only stores new versions of documents, no more duplicate
versions
- Use the jsonschema library for JSON transformer
- Implement the new schema
0.2.0 (2015-03-16)
==================
- Requests made with scrapi.requests are now recorded and replayed via
cassandra
- Improved test coverage
- Removed website, see erinspace/shareregistration or osf.io/share/ for its
replacement
- Manifest system for harvesters removed and replaced with metaclassing
- Added an img/ folder that stores the favicons of providers
- Implemented the transformer system which refactors how normalize is defined
for xml based harvesters
- Removed the storage module
0.1.0 (2015-03-09)
==================
Initial release