Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to add connection with both Library Management and Movie Collection schema results in a 502 #3423

Closed
pavish opened this issue Jan 26, 2024 · 5 comments · Fixed by #3448
Assignees
Labels
needs: troubleshooting Issues that we can't reproduce or need to investigate further before picking a course of action restricted: maintainers Only maintainers can resolve this issue type: bug Something isn't working work: backend Related to Python, Django, and simple SQL
Milestone

Comments

@pavish
Copy link
Member

pavish commented Jan 26, 2024

Description

Hunch

  • Could it be that caddy is killing the request if it takes too long to complete?

Expected behavior

  • The request should succeed.
@pavish pavish added type: bug Something isn't working work: backend Related to Python, Django, and simple SQL restricted: maintainers Only maintainers can resolve this issue needs: troubleshooting Issues that we can't reproduce or need to investigate further before picking a course of action labels Jan 26, 2024
@pavish pavish added this to the v0.1.4 milestone Jan 26, 2024
@seancolsen seancolsen assigned Anish9901 and unassigned mathemancer Jan 31, 2024
@seancolsen seancolsen modified the milestones: v0.1.4, v0.1.5 Feb 1, 2024
@Anish9901
Copy link
Member

I tried to troubleshoot this issue, here are the findings:

  • Gunicorn has a default timeout for requests which is 30s.
  • Adding both the datasets at the same time took about 25s locally but takes 3.1 mins for a remote db.
  • This time could vary depending on where the remote db exists i.e. latency between the server hosting mathesar and the remote db server.

Possible solutions:

  • Increase the timeout for requests to something large enough.
  • Remove 'Movie Collection' and introduce a smaller dataset.
  • Make the data loading pipeline efficient for large enough datasets like the 'Movie Collection'.

@kgodey
Copy link
Contributor

kgodey commented Feb 14, 2024

@Anish9901 thanks for the update!

Make the data loading pipeline efficient for large enough datasets like the 'Movie Collection'.

Could you provide more detail on what this would entail?

@Anish9901
Copy link
Member

Could you provide more detail on what this would entail?

One way I could think of is to use both .sql and .csv, we use .sql to set up the schema, tables, and FKs and use csv to bulk load data using COPY as it is supposed to be much more efficient than regular INSERTS.

If this works, we should do it for all our datasets as it would also make it efficient to start a new demo instance for users.

@Anish9901
Copy link
Member

I just tried this and observed major improvements.

Adding both datasets now takes:

  • 12s instead of 3.1 mins for a remote db. (93.5% improvement than before)!!!
  • 9s instead of 25s for a local db. (64% improvement than before)!!!

@kgodey
Copy link
Contributor

kgodey commented Feb 15, 2024

That's awesome, @Anish9901! Very nice work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs: troubleshooting Issues that we can't reproduce or need to investigate further before picking a course of action restricted: maintainers Only maintainers can resolve this issue type: bug Something isn't working work: backend Related to Python, Django, and simple SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants