New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data extract #59
Comments
Want the first one? |
I can get all of that fairly simply from a single SQL query except for the success/failure timestamp because we're not caching delivery_status here yet :(
|
Sorry, that was referring to user-supplied state description ("This was successful" / "they didn't have what I wanted")
|
Aha! (Thanks) In which case, something like... SELECT ir.id,
ir.title,
pb.name AS public_body_name,
ir.created_at,
MIN(response.created_at) AS first_response_at,
MAX(response.created_at) AS last_response_at,
MAX(status.created_at) AS last_status_update_at,
MAX(event.created_at) AS requester_updated_at,
CASE
WHEN event.described_state = 'successful' OR event.described_state = 'partially_successful'
THEN 'Success'
ELSE 'Fail'
END AS outcome
FROM info_requests ir
INNER JOIN public_bodies pb ON ir.public_body_id = pb.id
INNER JOIN info_request_events response ON ir.id = response.info_request_id
INNER JOIN info_request_events status ON ir.id = status.info_request_id
JOIN info_request_events event ON ir.id = event.info_request_id
WHERE response.event_type = 'response'
AND status.event_type = 'status_update'
AND (event.described_state = 'successful' OR
event.described_state = 'partially_successful' OR
event.described_state = 'rejected' OR
event.described_state = 'not_held')
GROUP BY ir.id, ir.title, pb.name, ir.created_at, event.described_state
ORDER BY ir.id DESC
LIMIT 10; |
Nice SQL skills 😎 Some comments inline. Looks pretty close though! SELECT ir.id,
ir.title,
pb.name AS public_body_name,
ir.created_at,
MIN(response.created_at) AS first_response_at,
MAX(response.created_at) AS last_response_at,
MAX(status.created_at) AS last_status_update_at,
--
-- The idea behind "Last Success/Failure Status Timestamp" and "Last
-- Status Update Timestamp" was that a user may mark a request as
-- successful, and then ask for more information, changing the state of
-- the request to e.g. "waiting_response".
--
-- I was imagining we'd have a timestamp for each, so that you can spot
-- the above case when those timestamps differ. Actually, its probably
-- too confusing and an edge case, so I think we just stick with the
-- last_status_update_at line above and do away with requester_updated_at
-- completely
--
MAX(event.created_at) AS requester_updated_at,
--
-- I think we should just return the last described state here. It didn't
-- necessarily fail if e.g. the authority said they don't have the info – they
-- did their job and responded.
--
-- event.described_state AS current_status
--
CASE
WHEN event.described_state = 'successful' OR event.described_state = 'partially_successful'
THEN 'Success'
ELSE 'Fail'
END AS outcome
FROM info_requests ir
INNER JOIN public_bodies pb ON ir.public_body_id = pb.id
INNER JOIN info_request_events response ON ir.id = response.info_request_id
INNER JOIN info_request_events status ON ir.id = status.info_request_id
JOIN info_request_events event ON ir.id = event.info_request_id
WHERE response.event_type = 'response'
AND status.event_type = 'status_update'
--- Why are we ignoring a bunch of other possible described states?
--- I don't think we want this filter at all
AND (event.described_state = 'successful' OR
event.described_state = 'partially_successful' OR
event.described_state = 'rejected' OR
event.described_state = 'not_held')
GROUP BY ir.id, ir.title, pb.name, ir.created_at, event.described_state
ORDER BY ir.id DESC
---
--- I know you know this, but don't forget to remove the LIMIT.
---
LIMIT 10; |
Thanks! I once shared an office with 3 DBAs, after the first couple of years they let me be an honorary DBA 😃 (er, which they'll probably revoke after looking at the execution plan for this)
That was an attempt to only have user generated states included when attempting to isolate the most recent user update (but I was concerned I'd cut the list down too much). I'll take it out.
Best not to assume! Updated (simplified) query...
|
@garethrees got time for a quick sanity check? I think I've implemented your changes correctly |
That returns multiple rows for the same requests (looks like it might be one for each state its been in)… |
wait, it's worse than that - forgot to change the join type when I took the restricting clause out |
Simplify all the things (we're not getting anything useful from SELECT ir.id,
ir.title,
pb.name AS public_body_name,
ir.created_at,
MIN(response.created_at) AS first_response_at,
MAX(response.created_at) AS last_response_at,
MAX(status.created_at) AS last_status_update_at,
ir.described_state
FROM info_requests ir
INNER JOIN public_bodies pb ON ir.public_body_id = pb.id
INNER JOIN info_request_events response ON ir.id = response.info_request_id
INNER JOIN info_request_events status ON ir.id = status.info_request_id
WHERE response.event_type = 'response'
AND status.event_type = 'status_update'
GROUP BY ir.id, ir.title, pb.name, ir.created_at, status.described_state
ORDER BY ir.id DESC edit: adding described_state from info_request_events was causing multiple entries for some requests and, as we only wanted the current status, I've updated this to use the info_request described_state instead |
...then make them complicated again:
(you may have to squash this onto one line to make it work because |
Hopefully final version: SELECT ir.id,
ir.title,
pb.name AS public_body_name,
to_char(ir.created_at, 'YYYY-MM-DD HH24:MI:SS') AS created_at,
to_char(MIN(response.created_at), 'YYYY-MM-DD HH24:MI:SS') AS first_response_at,
to_char(MAX(response.created_at), 'YYYY-MM-DD HH24:MI:SS') AS last_response_at,
to_char(MAX(status.created_at), 'YYYY-MM-DD HH24:MI:SS') AS last_status_update_at,
ir.described_state
FROM info_requests ir
INNER JOIN public_bodies pb
ON ir.public_body_id = pb.id
INNER JOIN info_request_events response
ON ir.id = response.info_request_id AND response.event_type = 'response'
INNER JOIN info_request_events status
ON ir.id = status.info_request_id AND status.event_type = 'status_update'
GROUP BY ir.id, ir.title, pb.name, ir.created_at, ir.described_state
ORDER BY ir.id DESC; And to export to csv: \copy (SELECT ir.id,
ir.title,
pb.name AS public_body_name,
to_char(ir.created_at, 'YYYY-MM-DD HH24:MI:SS') AS created_at,
to_char(MIN(response.created_at), 'YYYY-MM-DD HH24:MI:SS') AS first_response_at,
to_char(MAX(response.created_at), 'YYYY-MM-DD HH24:MI:SS') AS last_response_at,
to_char(MAX(status.created_at), 'YYYY-MM-DD HH24:MI:SS') AS last_status_update_at,
ir.described_state
FROM info_requests ir
INNER JOIN public_bodies pb
ON ir.public_body_id = pb.id
INNER JOIN info_request_events response
ON ir.id = response.info_request_id AND response.event_type = 'response'
INNER JOIN info_request_events status
ON ir.id = status.info_request_id AND status.event_type = 'status_update'
GROUP BY ir.id, ir.title, pb.name, ir.created_at, ir.described_state
ORDER BY ir.id DESC) TO '/tmp/asktheeu-20161206.csv' WITH CSV HEADER DELIMITER ',' |
Looks fine to me. |
file emailed so this can probably be closed, but leaving it around for a few days in case of feedback |
See https://groups.google.com/a/mysociety.org/forum/#!topic/alaveteli/ZTR_7sd_nw0
TL;DR so far is to make a CSV with the following headings:
The text was updated successfully, but these errors were encountered: