Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing timestamps can lead to data inconsistencies in soft deleted records #8496

Closed
gabegorelick opened this issue Oct 17, 2017 · 16 comments

Comments

@gabegorelick
Copy link
Contributor

I've written about this before a few times but I will summarize my concerns again here. Apologies for eschewing the issue template and for the length of this post, but I wanted to use this as a place for summarizing all the issues around Sequelize's practice of comparing timestamps for soft deletes.

Use of client time instead of DB time

Before #8485, Sequelize would soft delete records using client's time, but query for them using database's time (by "client's time" I mean client of the database, e.g. an app server, ad hoc script, etc). This lead to inconsistencies that have been well documented; namely if client and DB clocks aren't in sync, you can potentially see a soft deleted record.

#8485 normalized this so that both delete and query operations use the client's time. This is good in that you no longer have to worry about time zone differences or clock drift between the DB and the client.

But the safer choice would arguably be to use the DB's time rather than the client's. This is because a typical app will have a number of clients (load balanced app instances, ad hoc scripts running on dev machines, etc), all of which need to keep their clocks in sync with each other, compared to only a handful of DB instances (and often only 1).

This problem still exists when using DB time if you have multiple database instances in a cluster, but typically the number of DB instances will be far less than the number of clients. Additionally, DB instances tend to be pretty homogeneous and under tight control, so keeping their clocks in sync would tend to be much easier than keeping client clocks in sync.

That being said, I still think it's still dangerous to rely on DB clocks being in sync, especially when you have multiple DB instances and non-monotonic time sources.

Non-monotonic time source

Javascript's Date object is not monotonic, i.e. in the presence of NTP, leap seconds, users manually adjusting the system time, etc, it can tick backwards. When that happens, your previously deleted records are now scheduled to be deleted in the future.

For single node deployments, this can potentially be fixed by using something like process.hrtime as a time source. But clock drift between nodes is still a problem.

Clock drift between nodes

Even if we use a monotonic time source, keeping multiple clocks in sync is really hard. Google resorts to atomic clocks to do it! The linked article mentions that NTP, which I can imagine is more typical than atomic clocks for the majority of Sequelize's users, will typically give you an upper bound for clock offsets at "somewhere between 100ms and 250ms."

Prior art

I haven't researched this too much, but a cursory search reveals that alternatives like ActiveRecord's ActsAsParanoid do not compare timestamps and instead compare to a sentinel value. I'd be curious to see if they're consciously not supporting future deletes or have just never implemented it. Regardless, if Sequelize's behavior in this space really is unique, that seems like more evidence that it's not a safe practice. Especially when it's for a feature (scheduled deletes) that few may use.

Conclusion

I think scheduled soft deletes weaken the consistency of Sequelize. It is very possible to see a soft deleted record 250ms or more after it was soft deleted. Sequelize should consider changing the default behavior to something safer.

@crispkiwi
Copy link

Comparing timestamps for soft deleted records has made the sequelize v4 migration a real pain. I believe it was a mistake to introduce this. I hope this can be changed back in v5, until then we may have to run a fork.

The main issue for us is that soft deleted records cannot be excluded from an index. Under v3 we have the condition WHERE deletedAt IS NULL on all our indexes. We obviously want to keep them slim and including soft deleted records would add unnecessary bloat. The change to include a timestamp comparison in the lookup has a big implication. Our existing indexes won't be used and updating them so lookups don't have to do a full scan would include all soft deleted records in the index.

We encountered this problem attempting to migrate to v4, my initial concerns raised here #7884

The original change #5880 was aimed at solving a semantic problem. I don’t think the consequences of this change were properly thought through. Supporting future deletions of records seems like an afterthought - functionality that in my opinion belongs in the application layer, not the ORM. deletedAt was providing useful metadata for auditing and acting as the boolean for the deleted state. Semantically, I didn’t see it as a problem.

The issues we have had trying to migrate and the problems outlined by @gabegorelick outweigh the benefit to the few who actually want the feature of future deletions. I do hope this can be addressed.

@gabegorelick
Copy link
Contributor Author

If you make deletedAt NOT NULL and add a default value like Infinity (which Sequelize recently added support for), then Sequelize will generate queries closer to what you want (WHERE deletedAt != 'Infinity'). Not using NULL also means any unique constraints that include deletedAt will work in SQL compliant DBs like Postgres where NULL <> NULL.

@crispkiwi
Copy link

Thanks for suggesting the above @gabegorelick. I’ve had a look at the _paranoidClause function on the Model class. What you suggested would indeed work. Setting a default value on the deletedAt property removes the timestamp comparison from lookups completely, instead falling back to a straight WHERE deletedAt = ‘defaultValue’.

This raises the question, why do the time comparison at all? Setting a default value should not have the unintended consequence of negating the scheduled deletion “feature”. It again leads me to believe that it was, at least in part, an afterthought added on to a fix for a "semantic problem".

I’m not sure if want to use this as a workaround for my scenario of soft deletion free indexes, as this has the potential to change because the behaviour is confusing.

I hope you don’t mind me bringing up these points here, I don’t want to derail the discussion from all your initial points which are spot on.

@jeremija
Copy link

jeremija commented Oct 31, 2017

I agree with @crispkiwi, my team has also recently run into issues with the new default deletedAt time comparison. While it is nice to have this as a feature, we'd prefer to be able to disable it. We can use a hack like this one to disable the comparison:

// set this after model definition because `sequelize.sync`'s `CREATE TABLE`
// statement does not work when this is set in the model definition
MyModel.attributes.deletedAt.defaultValue = { [Sequelize.Op.eq]: null }

However, this breaks Model.restore(), Instance.restore(), and who knows what else. We are using MySQL which AFAIK does not have Infinity. Also, it seems unnatural that all deletedAt fields of undeleted records have to be set to a certain not-null value just to get better query performance, which should IMHO be the default.

@blacksun1
Copy link

Hey, upfront I'd like to say that, our whole application is running off of Sequelize so I really appreciate this product. That said, this major 4 release has been a nightmare for us.

This turned out to be the main issue that we were having. We've had to stop our migration for the current time until we can come up with a lasting solution to this issue as we not going to go any further until we can keep the old behaviour.

If we were to create "Proper Indexes" as I was told to by @sushantdhiman via the Slack channel the size of our indexes would become massive and it still runs noticeably slower than it did before. The fact is that we DID have proper indexes before it's just that they indexed out anything that was not null.

We are also looking into Subclassing the Model class as a solution but realistically we would prefer a long term fix. I would love to roll the code back into the main project but before I start I want to know if the changed code will be welcomed. Of course, I would look at making this switchable rather than the current all or nothing approach that has been dumped onto us now.

Thanks in advance,

@sushantdhiman sushantdhiman added status: in discussion For issues and PRs. Community and maintainers are discussing the applicability of the issue. think-about labels Oct 31, 2017
@holm
Copy link
Contributor

holm commented Feb 6, 2018

We are trying to move to Sequelize 4 also, and running into this same issue. I similarly believe this was a mistake to force upon everyone by default. We also have indices that use ´deletedAt is not null`, and I would really prefer to keep those. We have no interest in some soft-delete in the future, which I think is not a real use case for the paranoid system.

Could this not be made an option to enable/disable at least?

@holm
Copy link
Contributor

holm commented Feb 6, 2018

I have added a PR to make restoring the old behavior an option: #9007

@sushantdhiman
Copy link
Contributor

This issue is now fixed in v5 branch 014f2f5

@yeliex
Copy link

yeliex commented Aug 17, 2019

Now we have a problem when migrating from v4 to v5, what we should do to query like
WHERE deletedAt is null or deletedAt < now()

@papb
Copy link
Member

papb commented Aug 18, 2019

Now we have a problem when migrating from v4 to v5, what we should do to query like
WHERE deletedAt is null or deletedAt < now()

Why do you want to perform this query? To me, that query would evaluate to true in every normal circumstance (I can only see it being false if a record was "deleted in the future").

@papb papb removed the status: in discussion For issues and PRs. Community and maintainers are discussing the applicability of the issue. label Aug 18, 2019
@gabegorelick
Copy link
Contributor Author

Why do you want to perform this query?

Not OP, but if they are asking how to emulate v4's future deletion feature, then that's a valid question.

Assuming that's what you want to do, I've had success monkey patching Model._paranoidClause. Caveat: I did that to fix v4's consistency issue, I don't know if it's feasible in v5 to emulate v4 behavior.

@papb
Copy link
Member

papb commented Aug 19, 2019

Not OP, but if they are asking how to emulate v4's future deletion feature, then that's a valid question.

@gabegorelick What is this v4's future deletion feature?

@gabegorelick
Copy link
Contributor Author

What is this v4's future deletion feature?

@papb In Sequelize v4, if deletedAt was set to a timestamp in the future, then a record would not be considered as deleted. This was useful to schedule records for deletion. But as pointed out in this issue, it can lead to consistency issues. Thus, it was removed in v5 in favor of considering a record as deleted if deletedAt is set at all.

@yeliex
Copy link

yeliex commented Aug 20, 2019

@gabegorelick @papb what about provide an option to make deleteAt query more flexible, such as

deleteAt: {
    type: DateTypes.DATE,
    defaultValue: new Date(0), // for example we use 1970-01-01 00:00:00 as defualt to recreate after delete
    defaultQuery: [null, { [Op.gt]: Date.now() }] // just set to defaultValue as default
}

@yeliex
Copy link

yeliex commented Aug 20, 2019

After all, it is irresponsible to add and then remove a speciality

@papb
Copy link
Member

papb commented Aug 20, 2019

@yeliex I have converted your idea into a complete feature request: #11336.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants