Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow page loads for sketches with high datasource count #3075

Open
mbartle-sf opened this issue Apr 18, 2024 · 0 comments
Open

Slow page loads for sketches with high datasource count #3075

mbartle-sf opened this issue Apr 18, 2024 · 0 comments
Labels

Comments

@mbartle-sf
Copy link

Describe the bug
If a sketch is comprised of more than a few dozen datasources, the requests to /api/v1/sketches/<sketch_id> start to slow down as the server issues dozens of database queries to compile information about all of the datasources related to the sketch. This is exacerbated by #3052 when dozens of timelines must also be loaded and added to the response. Consider removing the datasource from the sketch response, and loading it on demand, instead.

To Reproduce
Use the following script to produce 1000 datasources in a sketch.

from timesketch_api_client import client as timesketch_client
from timesketch_import_client import importer


def upload_n_events(sketch, n):
    for i in range(1000):
        entry = {"message": i, "datetime":"1970-01-01T00:00:00.000Z", "timestamp_desc": "test"}
        with importer.ImportStreamer() as streamer:
            streamer.set_sketch(sketch)
            streamer.set_timeline_name('uploads')
            streamer.add_dict(entry)        


def main():
    client = timesketch_client.TimesketchApi(host_uri='http://127.0.0.1:5000', username='dev', password='dev')
    sketch = client.get_sketch(1)
    upload_n_events(sketch, 1000)


if __name__ == "__main__":
    main()

Then attempt to load the sketch. If Postgres is on the same machine, you'll see the request to /api/v1/sketches/<id> takes a couple of seconds. If the database is on a remote server, the time to load is much higher, approaching the order of minutes.

If you enable postgres logging, you can see that Timesketch is issuing a SELECT query per object related to the sketch, i.e., 1000 queries for 1000 datasources (plus Timeline and sketch queries).

Expected behavior
The sketch loads instantaneously with a database-on-disk, or in a couple of seconds with the database on a remote server.

Desktop (please complete the following information):

  • OS: macOS Sonoma 14.4.1
  • Browser: Firefox
  • Version: 124.0.2 (64-bit)

Additional context
We prefer to load large timelines to our Timesketch server in batches, to make request sizes more reasonable, which is how we can end up with hundreds or thousands of datasources.

@mbartle-sf mbartle-sf added the Bug label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant