Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle SQL queries that are too large for a GET request #58

Closed
simonw opened this issue Apr 10, 2021 · 6 comments
Closed

Handle SQL queries that are too large for a GET request #58

simonw opened this issue Apr 10, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Apr 10, 2021

Sometimes it's useful to run GIANT queries - queries with a huge copy-pasted list of IDs in them for example.

Right now the POST works but the GET redirect may cause an error.

If this happens, you can instead create a saved dashboard and the query will execute fine. Some kind of utility mechanism for spotting this and automatically handling it might be nice.

@simonw simonw added the enhancement New feature or request label Apr 10, 2021
@alecfwilson
Copy link

I am writing apparently GIANT queries that don't seem so giant to me. Is giant related to query execution time? Or length of the query in lines? Example:

WITH reports AS 
(
SELECT 
  r.id AS report_id
  , r.appointment_details
  , r.public_notes
  , r.internal_notes
  , r.created_at::DATE AS created_date
  , r.created_at
  , r.appointment_tag_id
  , r.location_id
  , r.is_pending_review
  , r.reported_by_id
  , c.auth0_role_names AS roles
  , ARRAY_AGG(at.id ORDER BY at.id ASC) AS availability_tag_id
  , ARRAY_AGG(at.slug ORDER BY at.id) AS availability
  , ARRAY_AGG(at.group ORDER BY at.id) AS grouping
FROM (SELECT * FROM report WHERE soft_deleted = False AND report_source = 'ca') r
JOIN call_report_availability_tag crat ON r.id=crat.report_id
JOIN availability_tag at ON crat.availabilitytag_id=at.id
JOIN reporter c ON r.reported_by_id=c.id
GROUP BY 1,2,3,4,5,6,7,8,9,10,11
)

, reports_ext AS
(
SELECT 
  *
  , CASE
      WHEN (array_position(availability,'vaccinating_18_plus') OR array_position(availability,'vaccinating_16_plus'))
           AND (
           		array_position(availability_tag_id,1) IS NOT NULL
           		AND array_position(availability_tag_id,2) IS NOT NULL
           		AND array_position(availability_tag_id,3) IS NOT NULL
           		AND array_position(availability_tag_id,4) IS NOT NULL
           		AND array_position(availability_tag_id,5) IS NOT NULL
           		AND array_position(availability_tag_id,6) IS NOT NULL
           		AND array_position(availability_tag_id,7) IS NOT NULL
           		AND array_position(availability_tag_id,10) IS NOT NULL
           		AND array_position(availability_tag_id,11) IS NOT NULL
           		AND array_position(availability_tag_id,12) IS NOT NULL
           		AND array_position(availability_tag_id,13) IS NOT NULL
           		AND array_position(availability_tag_id,14) IS NOT NULL
           		AND array_position(availability_tag_id,20) IS NOT NULL
           		AND array_position(availability_tag_id,21) IS NOT NULL
           		AND array_position(availability_tag_id,25) IS NOT NULL
           		AND array_position(availability_tag_id,32) IS NOT NULL
           		AND array_position(availability_tag_id,33) IS NOT NULL
           		AND array_position(availability_tag_id,24) IS NOT NULL
           		AND array_position(availability_tag_id,34) IS NOT NULL
           		AND array_position(availability_tag_id,35) IS NOT NULL
           		AND array_position(availability_tag_id,36) IS NOT NULL
           		AND array_position(availability_tag_id,37) IS NOT NULL
           	   )
          THEN 1 ELSE 0 END AS is_unrestricted_age_eligibility
  , CASE 
      WHEN (array_position(grouping, 'yes') IS NOT NULL AND array_position(grouping, 'no') IS NOT NULL) 
           OR (array_length(availability_tag_id,1) > 1 AND array_position(availability_tag_id,20) IS NOT NULL)
               OR (
                  array_position(grouping, 'other') IS NOT NULL 
                  AND (
                    array_position(availability_tag_id,1) IS NOT NULL
                    OR array_position(availability_tag_id,2) IS NOT NULL
                    OR array_position(availability_tag_id,3) IS NOT NULL
                    OR array_position(availability_tag_id,5) IS NOT NULL
                    OR array_position(availability_tag_id,6) IS NOT NULL
                    OR array_position(availability_tag_id,7) IS NOT NULL
                       )
         ) THEN 1 ELSE 0 END AS has_contradictory_tags
  , MAX(created_at) OVER (partition by location_id) AS latest_report_for_location_including_skips
  , MAX(CASE WHEN array_position(availability_tag_id,20) IS NULL THEN created_at ELSE '2020-12-31'::DATE END) OVER (partition by location_id) AS latest_report_for_location_no_skips
FROM reports 
)
SELECT 
  *
  , CASE WHEN created_at = latest_report_for_location_including_skips THEN 1 ELSE 0 END AS is_latest_report_for_location_including_skips
  , CASE WHEN created_at = latest_report_for_location_no_skips THEN 1 ELSE 0 END AS is_latest_report_for_location_no_skips
FROM reports_ext

@simonw
Copy link
Owner Author

simonw commented Apr 25, 2021

GIANT here is purely based on the length of the SQL text, since that gets put in a ?sql=... querystring argument and there are plenty of systems out there (like Google Cloud Run) which impose a limit on how long the overall URL can be - limits which don't apply to POST data.

@simonw
Copy link
Owner Author

simonw commented Apr 25, 2021

The Cloud Run limit isn't documented on https://cloud.google.com/run/quotas but https://stackoverflow.com/questions/64097343/if-theres-any-limitation-of-query-parameter-length-in-cloud-run says that it's actually a limit of Google's load balancer, which https://cloud.google.com/load-balancing/docs/quotas#https-lb-header-limits says:

This limit cannot be increased.
The combined size of the request URL and request header must be less than or equal to 64 KB.

@simonw
Copy link
Owner Author

simonw commented Apr 25, 2021

IE used to have a 2048 total URL limit. I'm going to borrow that one, since hopefully anyone else who built a system with a limit took it into account: https://support.microsoft.com/en-us/topic/maximum-url-length-is-2-083-characters-in-internet-explorer-174e7c8a-6666-f4e0-6fd6-908b53c12246

I'll actually go with "if the URL-encoded SQL query is longer than 1800 characters stick with POST rather than redirecting to a GET" because that gives me a bit of breathing room.

@simonw
Copy link
Owner Author

simonw commented Apr 25, 2021

Really good comprehensive answer at https://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers via https://twitter.com/dracos/status/1386380086576926730

This is a popular question, and as the original research is ~14 years old I'll try to keep it up to date: As of Sep 2020, the advice still stands. Even though IE11 may possibly accept longer URLs, the ubiquity of older IE installations plus the search engine limitations mean staying under 2000 chars is the best general policy.

@simonw simonw closed this as completed in 4e325ed Apr 25, 2021
@simonw
Copy link
Owner Author

simonw commented Apr 25, 2021

I don't think this will work correctly with named parameters in queries, because this bit only reads from request.GET:

parameter_values = {
parameter: request.GET.get(parameter, "")
for parameter in parameters
if parameter != "sql"
}

@simonw simonw reopened this Apr 25, 2021
@simonw simonw closed this as completed in 8eae16c Apr 25, 2021
simonw added a commit that referenced this issue Apr 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants