Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merlion dashboard app #129

Merged
merged 61 commits into from
Nov 8, 2022
Merged

Merlion dashboard app #129

merged 61 commits into from
Nov 8, 2022

Conversation

yangwenzhuo08
Copy link
Contributor

@yangwenzhuo08 yangwenzhuo08 commented Oct 25, 2022

This PR implements a web-based visualization dashboard for Merlion. Users can get it set up by installing Merlion with the optional dashboard dependency, i.e. pip install salesforce-merlion[dashboard]. Then, they can start it up with python -m merlion.dashboard, which will start up the dashboard on port 8050. The dashboard has 3 tabs: a file manager where users can upload CSV files & visualize time series; a forecasting tab where users can try different forecasting algorithms on different datasets; and an anomaly detection tab where users can try different anomaly detection algorithms on different datasets. This dashboard thus provides a no-code interface for users to rapidly experiment with different algorithms on their own data, and examine performance both qualitatively (through visualizations) and quantitatively (through evaluation metrics).

We also provide a Dockerfile which runs the dashboard as a microservice on port 80. The Docker image can be built with docker build . -t merlion-dash -f docker/dashboard/Dockerfile from the Merlion root directory. It can be deployed with docker run -dp 80:80 merlion-dash.

Copy link
Contributor

@aadyotb aadyotb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution Wenzhuo! Here are some of my initial comments:

  1. Can update the PR description to outline what you've done, how the code is organized, and what the major files are for?
  2. Why is the dashboard a separate folder, rather than a part of the Merlion package? Specifically, I'm wondering if it's possible to do something like python -m merlion.dashboard instead of python app.py to launch the app server. You could include a subprocess call in merlion/dashboard/__main__.py like this StackOverflow answer. Note that you could list dashboard as an optional dependency in Merlion's setup.py and throw an import error in merlion/dashboard/__init__.py if the dashboard dependencies are not installed. If you do this, please also change all relative import paths to absolute paths.
  3. Can you install pre-commit and make sure the formatting & copyright headers are applied to all python files? See here.
  4. Can you provide an overview this dashboard in the repo's main README.md? I'm thinking you can add this as a new section before "Getting Started". And you can reproduce the same information in docs/source/index.rst.
  5. What is the purpose of test_anomaly.py and test_forecast.py? Seems like they are redundant with existing test coverage. test_models.py makes sense though, since it's testing your new model classes.
  6. Can you move the new tests to the main tests folder instead of dashboard/tests? This also follows from point (2).

@aadyotb
Copy link
Contributor

aadyotb commented Oct 27, 2022

@yangwenzhuo08 thanks for your changes! This looks great. I've finished what you started in terms of restructuring the module. Now, merlion.dashboard is fully integrated into Merlion itself. The dashboard's dependencies have been added as optional requirements in setup.py, so the user can install the dashboard with pip install salesforce-merlion[dashboard]. The user may manually start up the dashboard with python -m merlion.dashboard, or from Unicorn with gunicorn -b 0.0.0.0:80 merlion.dashboard.sever:server. Additionally, the dashboard is now able to handle exogenous regressors.

In terms of my original comments, can you add the documentation I requested previously? Besides, this, I have a couple new requests.

  1. Would it be possible for you to unify the train/test interface for anomaly detection and forecasting? I think both tasks should allow the user to either (a) upload separate train/test files, or (b) upload a single file and choose a train/test split.
  2. Can you allow max_forecast_steps = None to be a valid specification? It's actually the default setting for most models and is necessary for long-horizon forecasting.

@yangwenzhuo08
Copy link
Contributor Author

@aadyotb Thanks for the revision. For the forecasting tab, we can split train file and test file as the anomaly tab does. Well, to combine these two UIs (upload two files, upload a single file with a split fraction), I'm not sure what layout is better for it. Do you have suggestion on the UI design for this part? For forecasting, it may be straightforward, e.g., we have two dropdown lists, one for train file, the other for test file. And then we have a slider to set the split fraction which is used to split the training data into "train" and "validation". But for anomaly detection, such split has a problem when the number of labels is small, i.e., it is possible that the split validation dataset has no anomalies.

@aadyotb
Copy link
Contributor

aadyotb commented Oct 28, 2022

@yangwenzhuo08 I envision something like the following: you can have a radio box which can select "use same file for train/test" or "use separate test file". If you select "use same file for train/test", you get the slider where you specify the train/test fraction. If you select "use separate test file", you get a prompt to choose the test file. If you specify "use separate test file", the module should throw an error if the test data is not given. What do you think?

And in terms of anomaly detection, it's kind of a well-known issue that the labels are sparse. The evaluation metrics are implemented in such a way that they have reliable fallback options if there are no true positives present in the data. Maybe you can use the plot_anoms helper function in merlion.plot to plot the ground truth anomalies (if they are specified), and then also report the evaluation metrics on both train and test?

@yangwenzhuo08
Copy link
Contributor Author

So the layout is like this:

  1. A radio button to select "single file" or "separate"
  2. A dropdown list to select the train file
  3. If "single" is select, it shows a slider to set split ratio. If "separate" is select, it shows a dropdown list for choosing the test file.
    Is this OK?

@aadyotb
Copy link
Contributor

aadyotb commented Oct 28, 2022

Yes, this sounds good.

yangwenzhuo08 and others added 22 commits November 4, 2022 15:24
For endogenous variables X and exogenous variables Y, the old
implementation of sklearn_base predicted X_t = f(X_{t-1}, Y_{t-1}).
Now, we predict X_t = f(X_{t-1}, Y_t), i.e. we actually use the future
value of the exogenous regressors.
Now, the user can manually select which features they want to use for
multivariate forecasting (instead of just using all non-exogenous
features by default).
@aadyotb aadyotb merged commit c0c852e into main Nov 8, 2022
@aadyotb aadyotb deleted the dashboard branch November 8, 2022 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants