Attempting to add connection with both Library Management and Movie Collection schema results in a 502 #3423

pavish · 2024-01-26T09:01:36Z

Description

I'm consistently able to reproduce this on my test setup on GCP, not locally.
When I try to add a new connection to a DB and check both "Library Management" and "Movie Collection" schemas in the "Schemas to Install' field, it results in a 502 Bad Gateway.
Endpoint: https://pavish-testing.mathesar.dev/api/ui/v0/connections/create_from_scratch/
This subsequently results in a followup issue: New connection creation flow does not handle schema creation failure scenarios #3420

Hunch

Could it be that caddy is killing the request if it takes too long to complete?

Expected behavior

The request should succeed.

Anish9901 · 2024-02-14T13:29:30Z

I tried to troubleshoot this issue, here are the findings:

Gunicorn has a default timeout for requests which is 30s.
Adding both the datasets at the same time took about 25s locally but takes 3.1 mins for a remote db.
This time could vary depending on where the remote db exists i.e. latency between the server hosting mathesar and the remote db server.

Possible solutions:

Increase the timeout for requests to something large enough.
Remove 'Movie Collection' and introduce a smaller dataset.
Make the data loading pipeline efficient for large enough datasets like the 'Movie Collection'.

kgodey · 2024-02-14T18:38:03Z

@Anish9901 thanks for the update!

Make the data loading pipeline efficient for large enough datasets like the 'Movie Collection'.

Could you provide more detail on what this would entail?

Anish9901 · 2024-02-15T12:51:03Z

Could you provide more detail on what this would entail?

One way I could think of is to use both .sql and .csv, we use .sql to set up the schema, tables, and FKs and use csv to bulk load data using COPY as it is supposed to be much more efficient than regular INSERTS.

If this works, we should do it for all our datasets as it would also make it efficient to start a new demo instance for users.

Anish9901 · 2024-02-15T19:15:57Z

I just tried this and observed major improvements.

Adding both datasets now takes:

12s instead of 3.1 mins for a remote db. (93.5% improvement than before)!!!
9s instead of 25s for a local db. (64% improvement than before)!!!

kgodey · 2024-02-15T20:18:36Z

That's awesome, @Anish9901! Very nice work.

pavish added this to the v0.1.4 milestone Jan 26, 2024

pavish assigned mathemancer Jan 26, 2024

seancolsen assigned Anish9901 and unassigned mathemancer Jan 31, 2024

seancolsen modified the milestones: v0.1.4, v0.1.5 Feb 1, 2024

Anish9901 mentioned this issue Feb 19, 2024

Efficient data loader #3448

Merged

7 tasks

mathemancer closed this as completed in #3448 Feb 26, 2024

kgodey mentioned this issue Feb 28, 2024

0.1.5 release notes #3449

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempting to add connection with both Library Management and Movie Collection schema results in a 502 #3423

Attempting to add connection with both Library Management and Movie Collection schema results in a 502 #3423

pavish commented Jan 26, 2024 •

edited

Anish9901 commented Feb 14, 2024

kgodey commented Feb 14, 2024

Anish9901 commented Feb 15, 2024

Anish9901 commented Feb 15, 2024

kgodey commented Feb 15, 2024

Attempting to add connection with both Library Management and Movie Collection schema results in a 502 #3423

Attempting to add connection with both Library Management and Movie Collection schema results in a 502 #3423

Comments

pavish commented Jan 26, 2024 • edited

Description

Hunch

Expected behavior

Anish9901 commented Feb 14, 2024

kgodey commented Feb 14, 2024

Anish9901 commented Feb 15, 2024

Anish9901 commented Feb 15, 2024

kgodey commented Feb 15, 2024

pavish commented Jan 26, 2024 •

edited