Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix 347 #365

Merged
merged 13 commits into from
Sep 25, 2020
Merged

fix 347 #365

merged 13 commits into from
Sep 25, 2020

Conversation

chicco785
Copy link
Contributor

fix #347

all tests assumed fiware service path null, so the 'path' selective delete was not performed and tables were always dropped, which of course is wrong behaviour.

what is strange anyhow is the fact that select count (*) in crate return values different from 0 until you don't commit.

@chicco785
Copy link
Contributor Author

#363 need to be merged first

@c0c0n3
Copy link
Member

c0c0n3 commented Sep 22, 2020

@chicco785 if you could please rebase on master since #363 just got merged :-)

* added code to group entities by servicepath when multiple notifications arrive at once (#208)
* add tests and integration tests (so we have also test with orion triggering such case)
* fix notification tests to rely only on api calls (remove call to translator)
* changed error code to 400 when due to invalid request
* upgraded python to 3.8.5
* fix integration test to use new docker image with entrypoint (#362)
* remove build (we can assume a build is available)
* remove volumes at the end of the test
* increase timesleep for one test
* improve code
* fix wrong delete logic not keeping into account correctly servicePath!=None
* add test for #347 bug
 * add test for selective service path delete
@chicco785
Copy link
Contributor Author

rebase completed

@c0c0n3
Copy link
Member

c0c0n3 commented Sep 22, 2020

@chicco785 so I've had a second thought about these delete ops. My naive interpretation was that delete should be some kind of maintenance operation you do to clean up tables after you realise you won't need them ever again---e.g. the devices that write to those tables got decommissioned. If that was the case, the current implementation would be good enough since no data would typically be written to those tables while running a delete. But I don't think this is a realistic assumption anymore and there could be cases where people want to delete data while the table is still being used. The QL spec is also a bit ambiguous about it in that it says "delete all ..." on the one hand and then gives you parameters to select a subset of the data on the other.

Problem: concurrent inserts

With the current implementation we risk accidentally deleting rows that the user wanted to keep. In fact, we run the following operations outside of transaction boundaries and possibly with an eventually consistent DB back-end:

  1. Count how many rows t there are in the target table.
  2. Delete the r rows selected by the input predicate.
  3. If t = r, we assume the table is empty and can be dropped.

Obviously anything can happen in between those steps, in particular new rows could be inserted after we run step 2 but before we get to execute step 3. If that happens those new rows will be lost. Here's an example scenario. Say I've got a daily clean up script to delete rows older than 24 hours. Because of network downtime (or whatever other reason) the devices whose entities get written to the table couldn't send new data in the last 24 hours, but they buffered the data and are ready to send it as soon as the network comes back up. (Or perhaps there's a message queue providing that kind of service.) The script starts running while those devices hammer the system with hundreds of records to insert. As you can imagine we could easily lose hundreds or even thousands of records---think bulk insert.

Proposed solution

Change the SQL translator implementation so we only delete entities but not the table, e.g.

def delete_entity(self, entity_id, entity_type=None, from_date=None,
                      to_date=None, fiware_service=None,
                      fiware_servicepath=None):
        ...
        # delete entries from table
        table_name = self._et2tn(entity_type, fiware_service)
        where_clause = self._get_where_clause([entity_id, ],
                                              from_date,
                                              to_date,
                                              fiware_servicepath)
        op = "delete from {} {}".format(table_name, where_clause)

        try:
            self.cursor.execute(op)
            return self.cursor.rowcount
        except Exception as e:
            logging.error("{}".format(e))
            return 0

This is consistent with the QL spec. Then we provide an additional endpoint to drop the table, i.e. to call _drop_table. This way the API client can make a conscious decision to actually ditch the whole lot when it isn't really needed anymore. Also it looks like having this additional endpoint is in line with what API users would expect---e.g. see @ilarimikkonen's suggestion in #347.

I've got a patch I could push to this branch if we decide to go ahead with this solution.

Alternative solution

Leave things as they are but open an issue about it so the problem won't fly under our radar.

@chicco785
Copy link
Contributor Author

Proposed solution

Change the SQL translator implementation so we only delete entities but not the table, e.g.

def delete_entity(self, entity_id, entity_type=None, from_date=None,
                      to_date=None, fiware_service=None,
                      fiware_servicepath=None):
        ...
        # delete entries from table
        table_name = self._et2tn(entity_type, fiware_service)
        where_clause = self._get_where_clause([entity_id, ],
                                              from_date,
                                              to_date,
                                              fiware_servicepath)
        op = "delete from {} {}".format(table_name, where_clause)

        try:
            self.cursor.execute(op)
            return self.cursor.rowcount
        except Exception as e:
            logging.error("{}".format(e))
            return 0

This is consistent with the QL spec. Then we provide an additional endpoint to drop the table, i.e. to call _drop_table. This way the API client can make a conscious decision to actually ditch the whole lot when it isn't really needed anymore. Also it looks like having this additional endpoint is in line with what API users would expect---e.g. see @ilarimikkonen's suggestion in #347.

I've got a patch I could push to this branch if we decide to go ahead with this solution.

rather than an additional end point, we could have an optional parameter dropTable=true

be aware that this may break several tests because if the table stays in and next insert for the same attribute has a different format, it won't work. i.e. insert will fail

@c0c0n3
Copy link
Member

c0c0n3 commented Sep 23, 2020

rather than an additional end point, we could have an optional parameter dropTable=true

cool, I'll push the patch I have in the meantime which has an extra endpoint, then I'll get rid of it later. Doing this b/c I have other commits on top of that patch.

this may break several tests

yep, the patch fixes the tests too. Have a look and see if there's anything to change. Like I said, later on I'll get rid of the new endpoint. Also I've fixed the delete for timescale (the other commits I was talking about earlier), if you agree I think the easiest is for me to push to this PR those changes too and rename this PR to e.g. "Improved delete endpoints" or something like that.

@chicco785
Copy link
Contributor Author

@c0c0n3 if that's only for your tests, is ok, but if the plan is to have an option on the existing endpoint, I won't merge this to master

@c0c0n3
Copy link
Member

c0c0n3 commented Sep 23, 2020

@chicco785 yea, like I said earlier I'll get rid of the new endpoint it's just that I have some commits in between for timescale, so what I'm asking is this:

  1. Is it okay to change the delete functionality to fix concurrency issues---if yes, I'll get rid of the endpoint and use an input param instead, i.e. dropTable.
  2. Is it okay to add to this PR the fix to the delete methods so that they'll work with any translator as opposed to the currenlty hard-coded Crate translator?

If not, we could have separate PRs for (1) and (2). But I think it'd be more overhead?

@chicco785
Copy link
Contributor Author

@chicco785 yea, like I said earlier I'll get rid of the new endpoint it's just that I have some commits in between for timescale, so what I'm asking is this:

  1. Is it okay to change the delete functionality to fix concurrency issues---if yes, I'll get rid of the endpoint and use an input param instead, i.e. dropTable.

yep

  1. Is it okay to add to this PR the fix to the delete methods so that they'll work with any translator as opposed to the currenlty hard-coded Crate translator?

yep

If not, we could have separate PRs for (1) and (2). But I think it'd be more overhead?

@c0c0n3
Copy link
Member

c0c0n3 commented Sep 24, 2020

@chicco785 I think it's ready for review.

To sum up:

  1. Race conditions. (5be2a2d) The delete API doesn't silently drop the entity table anymore which avoids race conditions when inserts are still possible. We now have a dropTable query param the client can use to drop the table when they know no more inserts are possible and the table is of no further use to them.
  2. Timescale delete. (5e9b53e) The delete API now works with Timescale too.
  3. Timescale delete tests. (55bafc4) Stopgap solution to automate delete tests with Timescale as backend. We can ditch most of the files in this commit as soon as we have a working implementation of Timescale queries.

@c0c0n3 c0c0n3 merged commit d1a6984 into master Sep 25, 2020
@c0c0n3 c0c0n3 deleted the fix-347 branch September 25, 2020 09:11
@c0c0n3 c0c0n3 mentioned this pull request Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Programmatic deletion of Table from Crate DB
2 participants