You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a query that has no submitted text, only parameters. The parameters should include:
a url to an input S3 bucket which TAP has read/write access, or read only if an optional output bucket is provided. Start with s3 only, but potentially allow for other URLs in future, including connections to databases or pub/sub systems
a sequence of pipelines with accompanying parameters e.g. [{ “pipeline”: “importAndClean”, “parameters”:{“cleanType”:”utf8”}},{“pipeline”:”moves”,”parameters”:{“grammar”:”reflective”}}]
an optional url to a separate output S3 bucket where TAP has write access. If this parameter does not exist, then the input bucket needs to be read/write. TAP will create a subdirectory __TAP_OUTPUT (if not exists) and write to it.
an optional input format - initially only support UTF8 TXT which is default
an optional output format - initially only support JSON which is default, but perhaps CSV, HTML, PDF, etc in future
NOTE: TAP should write over the top of existing files. It is the user's responsibility to ensure file integrity within the buckets (if TAP is called multiple times with the same URL)
A batch mode query should return a single UUID that is linked to the batch job. A subsequent query (with UUID) can be made to check on the progress of the batch.
The batch process should:
Create the __TAP_OUTPUT directory in the appropriate bucket (verifying permissions in the process)
Create a metadata file with the UUID of the batch job e.g. BATCH_xxxx-xxxx-xxx-xxxx.txt
Check the list of pipelines for validity
Record the start time in the metadata file and spawn the job to a new process
Return either a UUID for the batch job, or an appropriate error message
Write the output of the batch job to the __TAP_OUTPUT directory and update metadata periodically (e.g. average document size, average analysis time per document, etc)
The text was updated successfully, but these errors were encountered:
Create a query that has no submitted text, only parameters. The parameters should include:
NOTE: TAP should write over the top of existing files. It is the user's responsibility to ensure file integrity within the buckets (if TAP is called multiple times with the same URL)
A batch mode query should return a single UUID that is linked to the batch job. A subsequent query (with UUID) can be made to check on the progress of the batch.
The batch process should:
The text was updated successfully, but these errors were encountered: