Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4865 file reingest api #4914

Merged
merged 13 commits into from
Aug 6, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
40 changes: 29 additions & 11 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -503,22 +503,27 @@ The review process can sometimes resemble a tennis match, with the authors submi


Files
~~~~~
-----

.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
Adding Files
~~~~~~~~~~~~

Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* ::
.. Note:: Files can be added via the native API but the operation is performed on the parent object, which is a dataset. Please see the Datasets_ endpoint above for more information.

GET http://$SERVER/api/access/datafile/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB
Accessing (downloading) files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. Note:: Access API has its own section in the Guide: :doc:`/api/dataaccess`

Adding Files
^^^^^^^^^^^^
**Note** Data Access API calls can now be made using persistent identifiers (in addition to database ids). This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.

Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* ::

GET http://$SERVER/api/access/datafile/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB

.. note:: Please note that files can be added via the native API but the operation is performed on the parent object, which is a dataset. Please see the "Datasets" endpoint above for more information.

Restrict Files
^^^^^^^^^^^^^^
~~~~~~~~~~~~~~

Restrict or unrestrict an existing file where ``id`` is the database id of the file or ``pid`` is the persistent id (DOI or Handle) of the file to restrict. Note that some Dataverse installations do not allow the ability to restrict files.

Expand All @@ -531,7 +536,7 @@ A curl example using a ``pid``::
curl -H "X-Dataverse-key:$API_TOKEN" -X PUT -d true http://$SERVER/api/files/:persistentId/restrict?persistentId={pid}

Replacing Files
^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~

Replace an existing file where ``id`` is the database id of the file to replace or ``pid`` is the persistent id (DOI or Handle) of the file. Note that metadata such as description and tags are not carried over from the file being replaced

Expand Down Expand Up @@ -610,10 +615,23 @@ Example python code to replace a file. This may be run by changing these parame
Uningest a File
~~~~~~~~~~~~~~~

Reverse the ingest process performed on a file where ``id`` is the database id of the file to process. Note that this requires "super user" credentials::
Reverse the tabular data ingest process performed on a file where ``{id}`` is the database id of the file to process. Note that this requires "super user" credentials::

POST http://$SERVER/api/files/{id}/uningest?key={apiKey}


Reingest a File
~~~~~~~~~~~~~~~

Attempt to ingest an existing datafile as tabular data. This API can be used on a file that was not ingested as tabular back when it was uploaded. For example, a Stata v.14 file that was uploaded before ingest support for Stata 14 was added (in Dataverse v.4.9). It can also be used on a file that failed to ingest due to a bug in the ingest plugin that has since been fixed (hence the name "re-ingest").

Note that this requires "super user" credentials::

POST http://$SERVER/api/files/{id}/reingest?key={apiKey}

POST http://$SERVER/api/files/{id}/uningest?key=$apiKey
(``{id}`` is the database id of the file to process)

Also, note that, at present the API cannot be used on a file that's already ingested as tabular.

Provenance
~~~~~~~~~~
Expand Down
1 change: 1 addition & 0 deletions scripts/database/upgrades/upgrade_v4.9.1_to_v4.9.2.sql
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
ALTER TABLE datavariable ADD COLUMN factor BOOLEAN;
ALTER TABLE ingestrequest ADD COLUMN forceTypeCheck BOOLEAN;
2 changes: 1 addition & 1 deletion src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -2671,7 +2671,7 @@ public String save() {

// Call Ingest Service one more time, to
// queue the data ingest jobs for asynchronous execution:
ingestService.startIngestJobs(dataset, (AuthenticatedUser) session.getUser());
ingestService.startIngestJobsForDataset(dataset, (AuthenticatedUser) session.getUser());

//After dataset saved, then persist prov json data
if(systemConfig.isProvCollectionEnabled()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1347,7 +1347,7 @@ public String save() {
// Call Ingest Service one more time, to
// queue the data ingest jobs for asynchronous execution:
if (mode == FileEditMode.UPLOAD) {
ingestService.startIngestJobs(dataset, (AuthenticatedUser) session.getUser());
ingestService.startIngestJobsForDataset(dataset, (AuthenticatedUser) session.getUser());
}

if (mode == FileEditMode.SINGLE && fileMetadatas.size() > 0) {
Expand Down
83 changes: 83 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/api/Files.java
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@
import com.google.gson.JsonObject;
import edu.harvard.iq.dataverse.DataFile;
import edu.harvard.iq.dataverse.Dataset;
import edu.harvard.iq.dataverse.DatasetLock;
import edu.harvard.iq.dataverse.DatasetServiceBean;
import edu.harvard.iq.dataverse.DatasetVersionServiceBean;
import edu.harvard.iq.dataverse.DataverseRequestServiceBean;
import edu.harvard.iq.dataverse.DataverseServiceBean;
import edu.harvard.iq.dataverse.EjbDataverseEngine;
import edu.harvard.iq.dataverse.UserNotificationServiceBean;
import static edu.harvard.iq.dataverse.api.AbstractApiBean.error;
import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser;
import edu.harvard.iq.dataverse.authorization.users.User;
import edu.harvard.iq.dataverse.datasetutility.AddReplaceFileHelper;
import edu.harvard.iq.dataverse.datasetutility.DataFileTagException;
Expand All @@ -24,11 +26,15 @@
import edu.harvard.iq.dataverse.engine.command.impl.UningestFileCommand;
import edu.harvard.iq.dataverse.export.ExportException;
import edu.harvard.iq.dataverse.export.ExportService;
import edu.harvard.iq.dataverse.ingest.IngestRequest;
import edu.harvard.iq.dataverse.ingest.IngestServiceBean;
import edu.harvard.iq.dataverse.settings.SettingsServiceBean;
import edu.harvard.iq.dataverse.util.BundleUtil;
import edu.harvard.iq.dataverse.util.FileUtil;
import edu.harvard.iq.dataverse.util.StringUtil;
import edu.harvard.iq.dataverse.util.SystemConfig;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.ResourceBundle;
import java.util.logging.Level;
Expand Down Expand Up @@ -328,6 +334,83 @@ public Response uningestDatafile(@PathParam("id") String id) {
}

}

// reingest attempts to queue an *existing* DataFile
// for tabular ingest. It can be used on non-tabular datafiles; to try to
// ingest a file that has previously failed ingest, or to ingest a file of a
// type for which ingest was not previously supported.
// We are considering making it possible, in the future, to reingest
// a datafile that's already ingested as Tabular; for example, to address a
// bug that has been found in an ingest plugin.

@Path("{id}/reingest")
@POST
public Response reingest(@PathParam("id") String id) {

AuthenticatedUser u;
try {
u = findAuthenticatedUserOrDie();
if (!u.isSuperuser()) {
return error(Response.Status.FORBIDDEN, "This API call can be used by superusers only");
}
} catch (WrappedResponse wr) {
return wr.getResponse();
}

DataFile dataFile;
try {
dataFile = findDataFileOrDie(id);
} catch (WrappedResponse ex) {
return error(Response.Status.NOT_FOUND, "File not found for given id.");
}

Dataset dataset = dataFile.getOwner();

if (dataset == null) {
return error(Response.Status.BAD_REQUEST, "Failed to locate the parent dataset for the datafile.");
}

if (dataFile.isTabularData()) {
return error(Response.Status.BAD_REQUEST, "The datafile is already ingested as Tabular.");
}

boolean ingestLock = dataset.isLockedFor(DatasetLock.Reason.Ingest);

if (ingestLock) {
return error(Response.Status.FORBIDDEN, "Dataset already locked with an Ingest lock");
}

if (!FileUtil.canIngestAsTabular(dataFile)) {
return error(Response.Status.BAD_REQUEST, "Tabular ingest is not supported for this file type (id: "+id+", type: "+dataFile.getContentType()+")");
}

dataFile.SetIngestScheduled();

if (dataFile.getIngestRequest() == null) {
dataFile.setIngestRequest(new IngestRequest(dataFile));
}

dataFile.getIngestRequest().setForceTypeCheck(true);

// update the datafile, to save the newIngest request in the database:
dataFile = fileService.save(dataFile);

// queue the data ingest job for asynchronous execution:
String status = ingestService.startIngestJobs(new ArrayList<>(Arrays.asList(dataFile)), u);

if (!StringUtil.isEmpty(status)) {
// This most likely indicates some sort of a problem (for example,
// the ingest job was not put on the JMS queue because of the size
// of the file). But we are still returning the OK status - because
// from the point of view of the API, it's a success - we have
// successfully gone through the process of trying to schedule the
// ingest job...

return ok(status);
}
return ok("Datafile " + id + " queued for ingest");

}

/**
* Attempting to run metadata export, for all the formats for which we have
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,7 @@ DepositReceipt replaceOrAddFiles(String uri, Deposit deposit, AuthCredentials au
throw returnEarly("EJBException: " + sb.toString());
}

ingestService.startIngestJobs(dataset, user);
ingestService.startIngestJobsForDataset(dataset, user);

ReceiptGenerator receiptGenerator = new ReceiptGenerator();
String baseUrl = urlManager.getHostnamePlusBaseUrlPath(uri);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1808,7 +1808,7 @@ private boolean step_100_startIngestJobs(){
// start the ingest!
//

ingestService.startIngestJobs(dataset, dvRequest.getAuthenticatedUser());
ingestService.startIngestJobsForDataset(dataset, dvRequest.getAuthenticatedUser());

msg("post ingest start");
return true;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ public IngestMessage() {

public IngestMessage(int messageLevel) {
this.messageLevel = messageLevel;
//dataFiles = new ArrayList<DataFile>();
datafile_ids = new ArrayList<Long>();
}

Expand All @@ -52,7 +51,6 @@ public IngestMessage(int messageLevel) {
private Long datasetVersionId;
private String versionNote;
private String datasetVersionNumber;
//private List<DataFile> dataFiles;
private List<Long> datafile_ids;

public String getVersionNote() {
Expand Down Expand Up @@ -114,5 +112,4 @@ public void setFileIds(List<Long> file_ids) {
public void addFileId(Long file_id) {
datafile_ids.add(file_id);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,7 @@ public void onMessage(Message message) {

Iterator iter = ingestMessage.getFileIds().iterator();
datafile_id = null;
// TODO:
// is it going to work if multiple files are submitted for ingest?
// -- L.A. Aug. 13 2014

while (iter.hasNext()) {
datafile_id = (Long) iter.next();

Expand Down
23 changes: 20 additions & 3 deletions src/main/java/edu/harvard/iq/dataverse/ingest/IngestRequest.java
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,6 @@ public void setId(Long id) {
this.id = id;
}

//@ManyToOne
//@JoinColumn(nullable=false)

@OneToOne(cascade={CascadeType.MERGE,CascadeType.PERSIST})
@JoinColumn(name="datafile_id")
private DataFile dataFile;
Expand All @@ -51,6 +48,15 @@ public void setId(Long id) {

private String labelsFile;

private Boolean forceTypeCheck;

public IngestRequest() {
}

public IngestRequest(DataFile dataFile) {
this.dataFile = dataFile;
}

public DataFile getDataFile() {
return dataFile;
}
Expand Down Expand Up @@ -83,6 +89,17 @@ public void setLabelsFile(String labelsFile) {
this.labelsFile = labelsFile;
}

public void setForceTypeCheck(boolean forceTypeCheck) {
this.forceTypeCheck = forceTypeCheck;
}

public boolean isForceTypeCheck() {
if (forceTypeCheck != null) {
return forceTypeCheck;
}
return false;
}

@Override
public int hashCode() {
int hash = 0;
Expand Down