Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to delete ingress data rows #612

Closed
Karakatiza666 opened this issue Aug 31, 2023 · 13 comments
Closed

Failure to delete ingress data rows #612

Karakatiza666 opened this issue Aug 31, 2023 · 13 comments
Assignees
Labels
bug Something isn't working Web Console Related to the browser based UI

Comments

@Karakatiza666
Copy link
Collaborator

When running delete query error 500 is returned. Example query:
Error forwarding HTTP request to pipeline '018a0825-c038-71f2-9747-17f154da8c5c': 'Timeout while waiting for response'
Can be tested with "delete" button present when selecting rows in Data Browser in feat-ui branch

Example query:

http://localhost:8080/v0/pipelines/018a0825-c038-71f2-9747-17f154da8c5c/ingress/GREEN_TRIPDATA?format=json&array=true

body:

[{"delete":{"LPEP_PICKUP_DATETIME":"2014-01-01 00:00:00","LPEP_DROPOFF_DATETIME":"2014-01-01 00:49:25","PICKUP_LOCATION_ID":264,"DROPOFF_LOCATION_ID":226,"TRIP_DISTANCE":1.22,"FARE_AMOUNT":7}},{"delete":{"LPEP_PICKUP_DATETIME":"2014-01-01 00:00:00","LPEP_DROPOFF_DATETIME":"2014-01-01 00:52:03","PICKUP_LOCATION_ID":264,"DROPOFF_LOCATION_ID":186,"TRIP_DISTANCE":9.55,"FARE_AMOUNT":33.5}},{"delete":{"LPEP_PICKUP_DATETIME":"2014-01-01 00:00:00","LPEP_DROPOFF_DATETIME":"2014-01-01 01:08:06","PICKUP_LOCATION_ID":264,"DROPOFF_LOCATION_ID":254,"TRIP_DISTANCE":6.47,"FARE_AMOUNT":20}}]
@ryzhyk
Copy link
Contributor

ryzhyk commented Aug 31, 2023

Do you see anything in the log when running with RUST_LOG=debug?

@ryzhyk
Copy link
Contributor

ryzhyk commented Aug 31, 2023

Also, can you try submitting the same command with curl (please share the curl command you use)?

@Karakatiza666
Copy link
Collaborator Author

The log from receiving the request to timeout:

2023-08-31 17:53:26 DEBUG [manager] Received 
HttpRequest HTTP/1.1 POST:/v0/pipelines/018a0825-c038-71f2-9747-17f154da8c5c/ingress/GREEN_TRIPDATA
  query: ?"format=json&array=true"
  params: Path { path: Url { uri: /v0/pipelines/018a0825-c038-71f2-9747-17f154da8c5c/ingress/GREEN_TRIPDATA?format=json&array=true, path: None }, skip: 73, segments: [("pipeline_id", Segment(14, 50)), ("table_name", Segment(59, 73))] }
  headers:
    "accept-language": "en-US,en;q=0.9,ru;q=0.8,bg;q=0.7"
    "referer": "http://localhost:3000/"
    "host": "localhost:8080"
    "sec-ch-ua-platform": "\"Linux\""
    "origin": "http://localhost:3000"
    "connection": "keep-alive"
    "sec-fetch-mode": "cors"
    "content-length": "581"
    "content-type": "text/csv"
    "sec-ch-ua-mobile": "?0"
    "accept-encoding": "gzip, deflate, br"
    "sec-fetch-dest": "empty"
    "accept": "application/json"
    "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
    "sec-ch-ua": "\"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"114\", \"Google Chrome\";v=\"114\""
    "sec-fetch-site": "same-site"

2023-08-31 17:53:26 DEBUG [manager] Pipeline_id PipelineId(018a0825-c038-71f2-9747-17f154da8c5c)
2023-08-31 17:53:26 DEBUG [manager] Table name "GREEN_TRIPDATA"
2023-08-31 17:53:26 DEBUG [manager] preparing query s819: SELECT location, desired_status, current_status, status_since, error, created
                FROM pipeline_runtime_state
                WHERE id = $1 AND tenant_id = $2
2023-08-31 17:53:26 DEBUG [manager] executing statement s819 with parameters: [018a0825-c038-71f2-9747-17f154da8c5c, 00000000-0000-0000-0000-000000000000]
2023-08-31 17:53:26 DEBUG [pipeline-018a0825-c038-71f2-9747-17f154da8c5c] 
HttpRequest HTTP/1.1 POST:/ingress/GREEN_TRIPDATA
  query: ?"format=json&array=true"
  params: Path { path: Url { uri: /ingress/GREEN_TRIPDATA?format=json&array=true, path: None }, skip: 23, segments: [("table_name", Segment(9, 23))] }
  headers:
    "origin": "http://localhost:3000"
    "date": "Thu, 31 Aug 2023 17:53:26 GMT"
    "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
    "accept": "application/json"
    "sec-ch-ua": "\"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"114\", \"Google Chrome\";v=\"114\""
    "accept-encoding": "gzip, deflate, br"
    "sec-fetch-site": "same-site"
    "host": "localhost:8080"
    "content-type": "text/csv"
    "sec-fetch-mode": "cors"
    "sec-fetch-dest": "empty"
    "transfer-encoding": "chunked"
    "sec-ch-ua-mobile": "?0"
    "referer": "http://localhost:3000/"
    "sec-ch-ua-platform": "\"Linux\""
    "accept-language": "en-US,en;q=0.9,ru;q=0.8,bg;q=0.7"

2023-08-31 17:53:26 DEBUG [pipeline-018a0825-c038-71f2-9747-17f154da8c5c] HTTP input endpoint 'api-ingress-GREEN_TRIPDATA-182d2a26-e7b8-4886-8e4c-32d7cf1a9e8c': start of request
2023-08-31 17:53:27 DEBUG [manager] preparing query s820: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:27 DEBUG [manager] executing statement s820 with parameters: [018a0825-3aca-7447-aeb6-6c48925f162e, 7]
2023-08-31 17:53:27 DEBUG [manager] preparing query s821: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:27 DEBUG [manager] executing statement s821 with parameters: [018a0822-f68a-7acf-9d08-c517539bc664, 1]
2023-08-31 17:53:27 DEBUG [manager] preparing query s822: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:27 DEBUG [manager] executing statement s822 with parameters: [018a0826-1824-7282-b6b0-ff5e3bc88400, 1]
2023-08-31 17:53:27 DEBUG [manager] preparing query s823: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:27 DEBUG [manager] executing statement s823 with parameters: [018a0824-f0b1-7dd0-8cda-2d2b28960230, 1]
2023-08-31 17:53:27 DEBUG [manager] preparing query s824: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:27 DEBUG [manager] executing statement s824 with parameters: [018a0825-3aca-7447-aeb6-6c48925f162e, 1]
2023-08-31 17:53:27 DEBUG [manager] preparing query s825: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:27 DEBUG [manager] executing statement s825 with parameters: [018a0824-f0b1-7dd0-8cda-2d2b28960230, 5]
2023-08-31 17:53:27 DEBUG [manager] preparing query s826: SELECT id, version, tenant_id FROM program WHERE status = 'pending' AND status_since = (SELECT min(status_since) FROM program WHERE status = 'pending')
2023-08-31 17:53:27 DEBUG [manager] executing statement s826 with parameters: []
2023-08-31 17:53:27 DEBUG [manager] preparing query s827: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:27 DEBUG [manager] executing statement s827 with parameters: [018a0825-c029-7836-972d-1dea50b38b5d, 1]
2023-08-31 17:53:27 DEBUG [manager] preparing query s828: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:27 DEBUG [manager] executing statement s828 with parameters: [018a0825-c029-7836-972d-1dea50b38b5d, 8]
2023-08-31 17:53:27 DEBUG [pipeline-018a0825-c038-71f2-9747-17f154da8c5c] HTTP output endpoint 'api-watch-GREEN_TRIPDATA-neighborhood-367c1721-65c0-4f2e-80bb-87d31f4ed4a4': sending chunk #5 (23 bytes)
2023-08-31 17:53:28 DEBUG [manager] preparing query s829: SELECT id, version, tenant_id FROM program WHERE status = 'pending' AND status_since = (SELECT min(status_since) FROM program WHERE status = 'pending')
2023-08-31 17:53:28 DEBUG [manager] executing statement s829 with parameters: []
2023-08-31 17:53:29 DEBUG [manager] preparing query s830: SELECT id, version, tenant_id FROM program WHERE status = 'pending' AND status_since = (SELECT min(status_since) FROM program WHERE status = 'pending')
2023-08-31 17:53:29 DEBUG [manager] executing statement s830 with parameters: []
2023-08-31 17:53:30 DEBUG [manager] preparing query s831: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:30 DEBUG [manager] executing statement s831 with parameters: [018a0825-3aca-7447-aeb6-6c48925f162e, 7]
2023-08-31 17:53:30 DEBUG [manager] preparing query s832: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:30 DEBUG [manager] executing statement s832 with parameters: [018a0822-f68a-7acf-9d08-c517539bc664, 1]
2023-08-31 17:53:30 DEBUG [manager] preparing query s833: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:30 DEBUG [manager] executing statement s833 with parameters: [018a0826-1824-7282-b6b0-ff5e3bc88400, 1]
2023-08-31 17:53:30 DEBUG [manager] preparing query s834: SELECT id, version, tenant_id FROM program WHERE status = 'pending' AND status_since = (SELECT min(status_since) FROM program WHERE status = 'pending')
2023-08-31 17:53:30 DEBUG [manager] executing statement s834 with parameters: []
2023-08-31 17:53:30 DEBUG [manager] preparing query s835: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:30 DEBUG [manager] executing statement s835 with parameters: [018a0824-f0b1-7dd0-8cda-2d2b28960230, 1]
2023-08-31 17:53:30 DEBUG [manager] preparing query s836: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:30 DEBUG [manager] executing statement s836 with parameters: [018a0825-3aca-7447-aeb6-6c48925f162e, 1]
2023-08-31 17:53:30 DEBUG [manager] preparing query s837: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:30 DEBUG [manager] executing statement s837 with parameters: [018a0824-f0b1-7dd0-8cda-2d2b28960230, 5]
2023-08-31 17:53:30 DEBUG [manager] preparing query s838: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:30 DEBUG [manager] executing statement s838 with parameters: [018a0825-c029-7836-972d-1dea50b38b5d, 1]
2023-08-31 17:53:30 DEBUG [manager] preparing query s839: SELECT EXISTS(SELECT 1 FROM program prog
                    WHERE prog.id = $1
                    AND prog.version = $2)
                OR
                EXISTS(SELECT 1 FROM pipeline p, pipeline_history ph, program_history progh
                    WHERE ph.program_id = $1
                    AND ph.revision = p.last_revision
                    AND ph.program_id = progh.id
                    AND progh.version = $2)
2023-08-31 17:53:30 DEBUG [manager] executing statement s839 with parameters: [018a0825-c029-7836-972d-1dea50b38b5d, 8]
2023-08-31 17:53:30 DEBUG [pipeline-018a0825-c038-71f2-9747-17f154da8c5c] HTTP output endpoint 'api-watch-GREEN_TRIPDATA-neighborhood-367c1721-65c0-4f2e-80bb-87d31f4ed4a4': sending chunk #6 (23 bytes)
2023-08-31 17:53:31 DEBUG [manager] preparing query s840: SELECT id, version, tenant_id FROM program WHERE status = 'pending' AND status_since = (SELECT min(status_since) FROM program WHERE status = 'pending')
2023-08-31 17:53:31 DEBUG [manager] executing statement s840 with parameters: []
2023-08-31 17:53:31 ERROR [manager] [HTTP error response] HttpForwardError: Error forwarding HTTP request to pipeline '018a0825-c038-71f2-9747-17f154da8c5c': 'Timeout while waiting for response'
2023-08-31 17:53:31 DEBUG [manager] Error in response: RunnerError { runner_error: HttpForwardError { pipeline_id: PipelineId(018a0825-c038-71f2-9747-17f154da8c5c), error: "Timeout while waiting for response" } }

@Karakatiza666
Copy link
Collaborator Author

@ryzhyk

curl -X 'POST' 'http://localhost:8080/v0/pipelines/018a0825-c038-71f2-9747-17f154da8c5c/ingress/GREEN_TRIPDATA?format=json&array=true' --header "Content-Type: application/json" --data '[{"delete":{"LPEP_PICKUP_DATETIME":"2014-01-01 00:00:00","LPEP_DROPOFF_DATETIME":"2014-01-01 00:49:25","PICKUP_LOCATION_ID":264,"DROPOFF_LOCATION_ID":226,"TRIP_DISTANCE":1.22,"FARE_AMOUNT":7}},{"delete":{"LPEP_PICKUP_DATETIME":"2014-01-01 00:00:00","LPEP_DROPOFF_DATETIME":"2014-01-01 00:52:03","PICKUP_LOCATION_ID":264,"DROPOFF_LOCATION_ID":186,"TRIP_DISTANCE":9.55,"FARE_AMOUNT":33.5}},{"delete":{"LPEP_PICKUP_DATETIME":"2014-01-01 00:00:00","LPEP_DROPOFF_DATETIME":"2014-01-01 01:08:06","PICKUP_LOCATION_ID":264,"DROPOFF_LOCATION_ID":254,"TRIP_DISTANCE":6.47,"FARE_AMOUNT":20}}]'
{"message":"Error forwarding HTTP request to pipeline '018a0825-c038-71f2-9747-17f154da8c5c': 'Timeout while waiting for response'","error_code":"HttpForwardError","details":{"error":"Timeout while waiting for response","pipeline_id":"018a0825-c038-71f2-9747-17f154da8c5c"}}

@ryzhyk
Copy link
Contributor

ryzhyk commented Aug 31, 2023

I was only able to reproduce this when the pipeline is in the PAUSED state. Is this also where this happens for you? If so, we need to figure out what's the correct behavior in this case.

@Karakatiza666
Copy link
Collaborator Author

Karakatiza666 commented Sep 1, 2023

Yes, I actually tested this in PAUSED state. Not sure about semantics of this state. Can the delete request get queued while the pipeline is paused (and reflect the change in data browser), and applied to the output views once pipeline is resumed?

@ryzhyk
Copy link
Contributor

ryzhyk commented Sep 1, 2023

This isn't really specific to deletes. Insertions are handled in the same way. Buffering inputs indefinitely is probably not a good idea. Sure, people are not going to submit millions of updates via the UI, but the same API is used for regular http input streaming. Ideally we would just process the updates by unpausing just one input connector. It's totally doable but needs some extra machinery. For now maybe we can just refuse to process deletes in paused state. Let me look into it a bit more and come up with a suggestion.

@Karakatiza666
Copy link
Collaborator Author

Karakatiza666 commented Sep 1, 2023

Great! In the meantime I'll disable deletion in non-RUNNING states

@ryzhyk
Copy link
Contributor

ryzhyk commented Sep 1, 2023

I guess you mean in PAUSED state. We should do the same with insertions.

@ryzhyk
Copy link
Contributor

ryzhyk commented Sep 1, 2023

So here are the options:

  1. Keep the current behavior and have the UI throw an error when trying to insert or delete in paused state. Not a nice solution, as it is actually useful to be able to pause the pipeline and feed some test data.
  2. Change the http input connector to not pause at all even if the pipeline is paused. (Pausing a pipeline amounts to pausing all input connectors. If a connector ignores the pause command, the pipeline will continue processing data from this connector). This is probably not what what we want.
  3. Add an argument to the HTTP /ingress endpoint to send data to the pipeline even if it is paused.
  4. Extend the REST API to pause/unpause individual connectors. This is probably the most principles solution and is something we are going to need anyway, bit I won't have the cycles to do it.

3 seems like the best option to me. What do you guys think? @gz , @Karakatiza666

@Karakatiza666
Copy link
Collaborator Author

3 sounds like a workaround; If effort to implement it on the backend is more than 33% of option 4 - I'd say option 1 right now, and do option 4 when possible. If less - option 3 for now

@mihaibudiu mihaibudiu added Web Console Related to the browser based UI bug Something isn't working labels Sep 1, 2023
@ryzhyk
Copy link
Contributor

ryzhyk commented Sep 1, 2023

Yes, it's easy to implement. In fact, I now realize that option 4 won't even work, since HTTP connectors are created implicitly and are not even exposed through the API. So option 3 it is.

@ryzhyk
Copy link
Contributor

ryzhyk commented Sep 4, 2023

Solved in #630.

@ryzhyk ryzhyk closed this as completed Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Web Console Related to the browser based UI
Projects
None yet
Development

No branches or pull requests

3 participants