You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I think that for some reason there are many duplicated rows in the db. From analytical point of view it could be solved, but I assume it's more efficient to drop those duplicates.
The data is duplicated in the source data, this vehicle location appears in multiple siri snapshots as indicated in the siri_snapshot_id row. The siri snapshot id you extracted is 268538, but this same data appears also in the other siri snapshots.
I think it's better to correctly reflect all source data as-is as it is also important information which we don't want to loose (e.g. can be used to find errors on MOT side). Also, the process that loads the siri data should be as minimal as possible so we can quickly process and load all the source data in the quickest possbile time and with minimal strain on the DB.
Given above points, and given that it's easy to fix this when getting the data from the API, I think we should not fix it for now. If this does require a fix then we would need to find a solution which does not modify the source data, but it may require significant development effort. Closing for now because I don't think we will get to it, but feel free to reopen if you think it's critical.
Hi, I think that for some reason there are many duplicated rows in the db. From analytical point of view it could be solved, but I assume it's more efficient to drop those duplicates.
I've extracted https://openbus-stride-public.s3.eu-west-1.amazonaws.com/stride-siri-requester/2022/03/01/07/00.br, and filtered out one line_ref:
As can be seen, it seems the source itself holds one duplicated row, which might could be removed.
In the db it seems this duplicated rows grows to 8 records:
code for getting the db data:
The text was updated successfully, but these errors were encountered: