Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve OMERO.table download usability #300

Merged
merged 9 commits into from
Aug 19, 2021

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Jun 23, 2021

This improves the experience and options when downloading OMERO.tables as CSV from /webclient/omero_table/FILEID/.

  • 2 buttons allow: Download of whole table (showing progress etc) or Download Currently displayed page (open URL, same behaviour as before).
  • If the table is being filtered (e.g. ?query=colname>value) then we offer another option of download the Whole table using the current filter (which wasn't possible before), and showing progress.
  • Downloading while showing progress is performed by loading a number of rows at a time into the browser, showing the total size downloaded and the percentage number of total rows (see screenshot). When all rows have been download, we use Blob to download the string for the user.

Screenshot 2021-06-23 at 14 26 45

To test UI:

To test other URLs:

cc @chris-allan

omeroweb/settings.py Outdated Show resolved Hide resolved
@@ -2963,6 +2963,12 @@ def _table_query(request, fileid, conn=None, query=None, lazy=False, **kwargs):
if request.GET.get("limit") is not None
else rows
)
if limit > settings.MAX_TABLE_DOWNLOAD_ROWS:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've repeated the pattern of returning HTTP 200 with error in a number of places but is that something we actually want to repeat here? Perhaps we use HTTP 429 [1]? Might be outside the scope here though.

  1. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential candidate might be HTTP 413 https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/413?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also an option, @sbesson. Again possibly outside the scope of this PR but having a consistent and easily identifiable way of signaling to users that they are asking too much of the server at any one time is probably a very good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _table_query() is used directly as a view method table_query = login_required()(jsonp(_table_query)) (although we don't think we use that URL directly), it should return a custom HttpResponse if we want any status other than a 200.

However, since it is also used by other view methods such as object_table_query(), tableData = _table_query(request, annotation["file"], conn, **kwargs) we'd have to check here for:

if isinstance(tableData, HttpResponse):
    return tableData

The alternative is to have @jsonp decorator handle a dict with error and status keys and and update the HttpResponse it returns accordingly. This could lead to unexpected results if you really wanted to return a JsonResponse with error and status keys and a different actual status code (unlikely)? It keeps the views methods a bit cleaner, but maybe it's too much 'magic' behaviour of @jsonp and we might want to get rid of json-padding callback behaviour at some point.

Strong feelings either way?
But probably better to handle in a different PR since it's orthogonal to the changes here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a separate PR for discussion there: #301

@will-moore
Copy link
Member Author

@chris-allan Other than the response status discussion above, do you (or others at GS) have any other feedback on this and/or are able to test? (see description)

@chris-allan
Copy link
Member

Currently on our testing infrastructure with tables containing ~500K fairly sizeable rows. @erindiel will provide some feedback @will-moore.

@erindiel
Copy link
Member

erindiel commented Jul 8, 2021

Thanks for these changes @will-moore. We have a few points of feedback:

  • The first status shows "NaN undefined" rather than nothing, 0, etc.
    Screen Shot 2021-07-08 at 10 57 43 AM
  • Should there be a way for the user to cancel the download (other than closing their browser window)?
  • We tested with a MAX_BATCH_ROWS of 3000, which was more reasonable for large tables - each batch of 3k rows (~2 MB) downloaded in ~2 sec
  • It is still possible to get an incomplete CSV without notification to the user. We saw an example where a series of batches failed, but the user simply sees the progress bar moving quickly. If some batches succeed before and/or after, a CSV with a subset of the rows will be downloaded. Thoughts on retrying and/or failing completely if any batch fails?
    Screen Shot 2021-07-08 at 10 48 37 AM

@will-moore
Copy link
Member Author

I've added a Cancel button and fixed the other 2 minor points.
But the retrying needs a bit more thought.
If it's quite a rare event, then I think it'd be OK to abort and just say "Try again".
But if it happens more frequently then retrying to get the same chunk sounds like a good idea. Maybe try 3 or 5 times per chunk and if it still hasn't worked then abort. Or maybe just keep on trying for each chunk as long as the user is prepared to wait??
I could keep track of the total number of failed chunks, and if it's over a certain limit (e.g. 1.5 times the number of successful chunks) then ask the user if they want to continue?

@will-moore
Copy link
Member Author

In the last commit, I've handled failed requests by simply re-trying them.
So if you get requests failing at a high rate, the download will just appear to be going a bit slower.
If you get a request that NEVER returns, then the download will appear to be stuck (until you Cancel).
Any suggestions for different ways to handle this?

I've not spent too much time on making this UI look slick. Hopefully the Cancel button is clear enough:

Screenshot 2021-07-08 at 23 21 50

@chris-allan
Copy link
Member

The errors @erindiel was seeing on our testing system were a direct result of the race condition seen by @kkoz outlined in ome/omero-py#292.

Traceback for reference:

2021-07-08 15:57:38,490 ERROR [                 omeroweb.feedback.views] (proc.03125) handler500():163 handler500: Server error
2021-07-08 15:57:38,491 ERROR [                 omeroweb.feedback.views] (proc.03125) handler500():166 Traceback (most recent call last):

  File "/opt/v2-demo-dev/venv36/lib64/python3.6/site-packages/django/core/handlers/exception.py", line 41, in inner
    response = get_response(request)

  File "/opt/v2-demo-dev/venv36/lib64/python3.6/site-packages/django/core/handlers/base.py", line 187, in _get_response
    response = self.process_exception_by_middleware(e, request)

  File "/opt/v2-demo-dev/venv36/lib64/python3.6/site-packages/django/core/handlers/base.py", line 185, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)

  File "/opt/v2-demo-dev/venv36/lib64/python3.6/site-packages/omeroweb/decorators.py", line 538, in wrapped
    retval = f(request, *args, **kwargs)

  File "/opt/v2-demo-dev/venv36/lib64/python3.6/site-packages/omeroweb/decorators.py", line 597, in wrapper
    context = f(request, *args, **kwargs)

  File "/opt/v2-demo-dev/venv36/lib64/python3.6/site-packages/omeroweb/webclient/views.py", line 3190, in omero_table
    request, file_id, conn=conn, query=query, offset=offset, limit=limit, lazy=lazy

  File "/opt/v2-demo-dev/venv36/lib64/python3.6/site-packages/omeroweb/webgateway/views.py", line 2936, in _table_query
    cols = t.getHeaders()

  File "/opt/v2-demo-dev/venv36/lib64/python3.6/site-packages/omero_Tables_ice.py", line 1052, in getHeaders
    return _M_omero.grid.Table._op_getHeaders.invoke(self, ((), _ctx))

omero.InternalException: exception ::omero::InternalException
{
    serverStackTrace = Traceback (most recent call last):
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/util/decorators.py", line 69, in exc_handler
    rv = func(*args, **kwargs)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/util/decorators.py", line 29, in handler
    return func(*args, **kwargs)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/tables.py", line 205, in getHeaders
    rv = self.storage.cols(None, current)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/util/decorators.py", line 91, in with_lock
    return func(*args, **kwargs)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/hdfstorageV2.py", line 423, in cols
    names = self.__mea.colnames
AttributeError: 'NoneType' object has no attribute 'colnames'

    serverExceptionClass =
    message = Internal exception
}

Once we fix that with ome/omero-py#292 and make a release I think errors will be a rare event. Right now they're a given on almost every download on a production like system setup.

/cc @joshmoore

@will-moore
Copy link
Member Author

@erindiel @chris-allan Let me know if you need any other fixes/updates to this: I'm around till Thursday.

@erindiel
Copy link
Member

Thanks @will-moore.

We confirmed that we can cancel the download, and the UI for this is clear.

We can create the race condition by downloading the same table in two separate windows, and confirmed that the retries fix the issue of missed rows.

What do you think about retrying ~5 times, then failing the entire download? Multiple failed requests on the same batch likely reflect a more fundamental issue with OMERO.tables or the particular file (although you'd be unlikely to reach this point if that was the case).

Overall, a great improvement in user experience, so thanks again!

@will-moore
Copy link
Member Author

@erindiel I just show a simple alert() dialog with this message:
Screenshot 2021-07-14 at 13 51 12

@will-moore
Copy link
Member Author

How's this PR looking / working for you @erindiel?
Anything else needed or looking good to merge?

@erindiel
Copy link
Member

Everything here is good from our perspective, thanks @will-moore.

@@ -687,6 +687,13 @@ def leave_none_unset_int(s):
"Prevent multiple files with total aggregate size greater than this "
"value in bytes from being downloaded as a zip archive.",
],
"omero.web.max_table_download_rows": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when release, a release of the documentation will be required so this new parameter is available in the doc

const rowCount = filter ? parseInt("{{ meta.totalCount }}") : parseInt("{{ meta.rowCount }}");
const tableName = "{{ data.name }}.csv";
// Use 10 batches, or max 3000 rows per batch
const MAX_BATCH_ROWS = 3000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want those values to be configured too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to make it configurable unless we know it's useful. Adding code that will likely never be used (and testing it etc) seems like poor use of time.
Like any feature, if we find there's a user who wants/needs it then we can add it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably also dangerous. Using any more than 3000 rows is likely to cause server problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try with other values to see a change in performance? Or is this the "optimal" value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @erindiel tested various values and reported 3000 as best at #300 (comment)
Previously it was 1000 (see a3c1c48).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@jburel
Copy link
Member

jburel commented Aug 19, 2021

Thanks all. Merging.

@jburel jburel merged commit ed8ba63 into ome:master Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants