Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core Improvement] Reign in the amount of content sent to the frontend client when things are updated. #395

Closed
JackUrb opened this issue Jun 19, 2018 · 26 comments
Assignees
Labels
0.2.1 Features planned for the 2.1 release enhancement

Comments

@JackUrb
Copy link
Contributor

JackUrb commented Jun 19, 2018

Noted in #393, a lot of data gets sent to the client as an environment grows. As seen in #386, this is not a sustainable practice. Moving forward we'll need to be able to send update packets from the server rather than the entire contents of an environment's json every time a window in that json gets updated. The client should only request the complete contents when the environment changes or if an inconsistency is detected (can be done using a hash of the stored data).

@JackUrb JackUrb added enhancement 0.2.1 Features planned for the 2.1 release labels Jun 19, 2018
@JackUrb JackUrb self-assigned this Jun 19, 2018
@xssChauhan
Copy link
Contributor

@JackUrb I would like to help with this

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 2, 2018

Hi @xssChauhan - this is a fairly involved change that I've already started work for, but one segment I haven't touched is the last part (the hashing step). This involves sending a request from the web client to a new endpoint with an env id and window id and receiving in return a hash/checksum of the layout and data of that window. You can then also implement the same hash/checksum function on the web side. Does this seem like something you'd want to work on?

@xssChauhan
Copy link
Contributor

@JackUrb This is certainly interesting. Would be glad to work on it. Where do i begin?

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 3, 2018

Alright so you'll start in the server file by introducing a new endpoint. This should be something like the exists handler: https://github.com/facebookresearch/visdom/blob/ec91e738e100a9cbf946c332c7783ffd12c644b5/py/visdom/server.py#L155
It will take a window and eid in the same way:
https://github.com/facebookresearch/visdom/blob/ec91e738e100a9cbf946c332c7783ffd12c644b5/py/visdom/server.py#L506
Then it should perform a hash operation on the json stored in handler.state[eid]['jsons']. Preferably this will be an MD5 of the json string. It should then return that hash by using handler.write().

@xssChauhan
Copy link
Contributor

@JackUrb Understood. Is there a branch for this already? Or do i make one from master only?

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 4, 2018

You can start a new branch for this - it's somewhat independent of the rest of the work

@xssChauhan
Copy link
Contributor

Cool. Getting started now. Thank you for the starting pointers.

@xssChauhan
Copy link
Contributor

@JackUrb Here's my implementation for the hash handler. Will you please take a look at it and tell me your review?

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 10, 2018

@xssChauhan - feel free to open it as a pull request and I can better comment there!

@xssChauhan
Copy link
Contributor

xssChauhan commented Jul 11, 2018

@JackUrb Shall i also add a function to get the hash of window in visdom/init.py?

This function would be similar to visdom.Visdom.win_exists

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 11, 2018

We actually need the function in both __init__.py and main.js as well. Also you'll need to get the window rather than the whole json for an environment in your current implementation

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 11, 2018

The one in init should call the server endpoint, the one in main should perform its own hash

@xssChauhan
Copy link
Contributor

xssChauhan commented Jul 13, 2018

@JackUrb Yup, that's what I had in mind for the function in __init__.

You're right. JSON of a window should be hashed. Will fix it.

For md5 hashing on the JS end, should i use an existing library like md5?

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 13, 2018

Yup, that would work!

@xssChauhan
Copy link
Contributor

@JackUrb Just so that my understanding is clear, for the JS side, we need implementation for the following:

  • Requesting the server endpoint to get the hash of a window
  • A function which returns the hash of the content of a window in state.panes

This would then enable the web client to decide if the there is change in the content of a window by comparing the local and server hash. Then the client can decide to fetch the new changes.

Is my understanding correct?

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 14, 2018

Yup that's the idea

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 25, 2018

Window hashing is implemented in #422.

Next step is more complicated: determining the difference in json that is going to be created by an update or post. This would amount to creating an update package for what happens to p in both the UpdateHandler and PostHandler.

In the UpdateHandler: https://github.com/facebookresearch/visdom/blob/52c453a885836a9449056315d6452ad9a66ce115/py/visdom/server.py#L527
this update packet should include changes that happen in the update_window call. (This can be achieved by having update window return a pair of (updated_pane, update dict))

In the PostHandler: https://github.com/facebookresearch/visdom/blob/52c453a885836a9449056315d6452ad9a66ce115/py/visdom/server.py#L489
this update packet should include if an existing window pane was replaced with the new contents.

This is going to have to be created as a python dict that will convert to JSON. The implementation for this update packet is up to you. Below is a sample for a possible update packet implementation for inserting a trace:

update_packet = {
    'content': {
        'data': {
            'UPDATE_TYPE': 'append',
            'UPDATE_VALUE': new_trace_dict
        }
    }    
} 

Or for deleting a trace (that with idx 3 in the array):

update_packet = {
    'content': {
        'data': {
            3: {UPDATE_TYPE: 'delete'}
        }
    }    
} 

You can test by logging the update_packet before the return calls in both of the handlers.

@xssChauhan
Copy link
Contributor

@JackUrb Thank you for providing the details. I shall have a look and then get back.

@xssChauhan
Copy link
Contributor

@JackUrb Just to make my understanding clear:

For UpdateHandler, we need to update the update_window function so that it returns the (updated_window, updated_data) and to appropriately include it in the packet p.

This can be achieved by comparing the incoming data with the existing data on the handler and then mark the changes in the packet as well.

@JackUrb
Copy link
Contributor Author

JackUrb commented Aug 7, 2018

Hi @xssChauhan - I've done a bit of research since last posting that. I'm going to describe the new set of steps for the overall goal of this core improvement.

The simplest implementation here will be to create a diff between the old json representation of a plot and the new one. This update_packet should be returned alongside p at any return statement in the update call here: https://github.com/facebookresearch/visdom/blob/ecc1c04bb79a6a327862a236f78c095cb6b80184/py/visdom/server.py#L527

  1. We should create update_packet to match an existing dict/json diff implementation, such as this one. As we already have an idea of what's going to be updated, it makes sense to be able do this on the python end.

  2. That would make this line read p, update_packet = UpdateHandler.update(p, args) here: https://github.com/facebookresearch/visdom/blob/ecc1c04bb79a6a327862a236f78c095cb6b80184/py/visdom/server.py#L609

  3. Following that we'll need to send the update_packet along with the final hash via the broadcast method that is called here (rather than sending p): https://github.com/facebookresearch/visdom/blob/ecc1c04bb79a6a327862a236f78c095cb6b80184/py/visdom/server.py#L612
    To do that we'll have to wrap it in a packet like {'command': 'window_update', 'win': args['win'], 'env': eid, 'content': update_packet, 'final_hash': <hash for window>}.

  4. Lastly, a case will need to be created for 'window_update' in _handleMessage in the js here: https://github.com/facebookresearch/visdom/blob/0b89c92a4edf61a1bab6306bfd553627ffc21dd9/js/main.js#L255
    This case should apply the patch to the correct window on the javascript side using the above library. After that it should calculate the local hash for the window and compare it to the final_hash. If applying the patch fails or the hash comparison fails, the client should request the complete data using postForEnv, otherwise the update should be queued with addPaneBatched.

At this point no individual step of this will be self contained, the whole thing will need to be implemented to merge it, but if you get started on this path you can open a PR and then I can contribute as well whenever I get a chance.

@xssChauhan
Copy link
Contributor

@JackUrb Thank you for describing in detail. I shall get started, and open a PR soon. We can have further discussions there.

@amitibo
Copy link
Contributor

amitibo commented Nov 7, 2018

Maybe it will help (for other contributors) to elaborate what is the state of this PR and what is still missing.

Also, is there a benchmark showing that the sending/receiving of diffs is worth the computation overhead introduced in this PR (hashing / calculating diff / applying diff etc.)?

@JackUrb
Copy link
Contributor Author

JackUrb commented Nov 9, 2018

@amitibo Of course. The current state of this development is as follows.

There now exists infrastructure to ensure parity between the server and the frontend client at each update by creating and comparing hashes of the data for an object. This means that the consistency checks are in place to move towards sending plot updates rather then the entirety of the plot data when something changes. There are 2 primary ways that this can be implemented.

  1. Update the frontend data management to be able to handle all of the different update types that the server does in the same way. On the upside this would have little computation on the server-side but on the downside it would result in a lot of duplicated logic, and changes to how the server handles update packets would have to also be reflected in the frontend data management. (I don't personally prefer this solution)
  2. Have the server calculate a packet that would represent the changes needed to be made to a window's underlying data and layout structures. This could be done naively (by copying the data before applying an update, applying the update, then using some kind of object diff between the old and the new), or (preferably) it can be directly coded using knowledge of how the update happens serverside (and manually creating a diff packet that can be consumed by some existing JS json diff library).

I haven't benchmarked this change, but now that we have webgl support for plots with huge amounts of points in #431, and I'm trying to introduce image data over time in #393, and we've already seen data send issues on long running servers with large plots in #386, I'm fairly confident that the IO gains would be worthwhile. The change would also expose logic on the frontend for applying specific updates to plot data, which should help expose a way to solve longtime issue #171.

@amitibo
Copy link
Contributor

amitibo commented Nov 10, 2018

@JackUrb maybe you have some example json files that can demonstrate such a situation, or an example (python) script that creates such files? It can help in debugging/testing/benchmarking this change.

@amitibo
Copy link
Contributor

amitibo commented Nov 10, 2018

Also, I understand that there is a problem finding a json-diff libraries for JS and Python that share the same diff format. Maybe a simple text diff tool can do the job: https://github.com/google/diff-match-patch

@JackUrb
Copy link
Contributor Author

JackUrb commented Jan 22, 2019

Implemented in #552

@JackUrb JackUrb closed this as completed Jan 22, 2019
facebook-github-bot pushed a commit that referenced this issue Apr 4, 2019
Summary:
Closes #286 by implementation.
Fixes #369 for real.

Adds a way to store image history. The `store_history` option, when provided repeatedly for the same window will allow for appending multiple images to the same pane, accessed using a slider. The most recently sent image will be selected by default, and updates from the python client will reset the selection to the most recent. This doesn't break any existing behavior:
<img width="1271" alt="screen shot 2018-06-19 at 5 42 28 pm" src="https://user-images.githubusercontent.com/1276867/41627014-87e384ac-73ec-11e8-85bc-f3473f5f8ca3.png">
This also begins the process for adding 'widgets' to arbitrary panes, allowing for functionality like long requested in #25. (Standardizing these widgets and allowing users to attach them to variables in the plot/layout state will be the last part of implementing #25)

Full implementation flow:
- Adds an option to `image` that attaches a store_history option to the request. Sends the request to the update endpoint if the window already exists. (Note: It'd be nice to make it so that the server doesn't require the `store_history` flag every time an additional image is sent, but it complicates the flow a lot. May revisit down the road)
- Creates flows in the create and update handlers of the server that recognize `image_history` types and append data to the content rather than making it new.
- Adds logic to `ImagePane` that allow them to track the state of the slider, if it exists
- Adds logic to `Panes` that allows specialized panes to render additional content 'widgets' at the bottom of the pane.

Gotchas:
- ~~This PR doesn't have a demo, will include in a follow-up~~ Added in #593
- This PR doesn't have feedback sent back to the python client on slider updates, will implement in a follow-up
- This PR doesn't have a method for the python client to select a slider value, will implement in a follow-up alongside a demo that can sync the contents of multiple image panes to one slider!
- ~~This PR brings up the issue of the size of content being sent during window updates to a head, as each additional image added to a history forces a send of the entire history every single time. Furthermore each update to any window in the same environment will force this same send.~~ This was implemented in #395
- This PR doesn't properly handle margins, even though I can calculate how tall I want them to be. Need to review some CSS magic.
<img width="1268" alt="screen shot 2018-06-19 at 5 50 16 pm" src="https://user-images.githubusercontent.com/1276867/41627411-3652d406-73ee-11e8-86b5-5236ebff6b28.png">
Pull Request resolved: #393

Differential Revision: D14787240

Pulled By: JackUrb

fbshipit-source-id: 6452df6b12b759030efaa254a4cf3f2d4aadb724
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2.1 Features planned for the 2.1 release enhancement
Projects
None yet
Development

No branches or pull requests

3 participants