Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Neg clausify #168

Merged
merged 14 commits into from
Apr 16, 2024
Merged

Fix Neg clausify #168

merged 14 commits into from
Apr 16, 2024

Conversation

brossignol
Copy link
Contributor

Neg clausify is broken, it returns itself as a clause but I don't think it should ever do that.
An example:

from pysat.formula import Atom, And, Or, Neg
f = Or(Atom(1), Neg(Atom(2)), Neg(Atom(2)))
print(list(f))
>>> [[-2], [-2], [1, -2, -2]]

This assume that all models contain -2, but [1, 2] is a valid model. (The correct clause should be only [1, -2, -2], without the [-2]).

I propose a fix where Neg clauses is always empty. An extra variable clausified is used to avoid multiple clausify of the subformula.

@alexeyignatiev
Copy link
Collaborator

I can't check this behaviour right now - will be back to my computer on Monday. But negation should return a clause if it is a highest level (outer most) term.

@alexeyignatiev
Copy link
Collaborator

I suppose this is caused by the use of the duplicate literal. I should have handled it properly and this is what requires a fix. I can't try it now without a computer but my guess is it will work if you consider a non-repeated Neg(Atom(2)).

@alexeyignatiev
Copy link
Collaborator

If my guess is correct then fixing should be done by changing the condition for clauses creation. Instead of checking only if the clauses are empty, it should also take into account whether the name exists already.

@brossignol
Copy link
Contributor Author

brossignol commented Apr 14, 2024

When listing the clauses __iter__ is used for the highest level and _iter otherwise.
A solution could be to overwrite Neg.__iter__ to return only subformula and have Neg._iter return itself.

Also, this could be extended to other classes.
This could solves differences in behaviour that appear if you clausify subformula directly:

f = And(Atom(1), Atom(2))
g = And(f, Atom(3))
list(f)
>>> [[1], [2]]
f = And(Atom(1), Atom(2))
g = And(f, Atom(3))
g.clausify()
list(f)
>>> [[1, -4], [2, -4], [4, -1, -2]]

@alexeyignatiev
Copy link
Collaborator

I agree!

@alexeyignatiev
Copy link
Collaborator

alexeyignatiev commented Apr 15, 2024

So if you think you can reimplement the patch implementing these changes and taking into account that Neg should return a clause if that is the outermost negation then I will be happy to merge! Alternatively, I can do it myself but I will have to decline the PR.

@brossignol
Copy link
Contributor Author

I fixed for Neg.

from pysat.formula import Atom, And, Neg

print(list(Neg(Atom(2))))
>>> [[-2]]

f = Or(Atom(1), Neg(Atom(2)), Neg(Atom(2)))
print(list(f))
>>> [[1, -2, -2]]

I will look for other classes too.

@alexeyignatiev
Copy link
Collaborator

Thanks!

@brossignol
Copy link
Contributor Author

Should a single atom return itself as a clause when it is outermost? (basically a single atom formula)
Currently

list(Atom(1))
>>>[] 

@brossignol
Copy link
Contributor Author

brossignol commented Apr 15, 2024

I finished to fix all classes.

  • A new attribut _clauses_tseitin is used to store clauses after Tseitin transformation.
  • The attribut .clauses keeps the raw clauses.
  • _iter has a new argument outermost to decide witch clauses to return between clauses and _clauses_tseitin.
  • Add a new test test_clausification.py with simple example reproducing current problems.

This fix avoids all the inner/outermost clause problems and avoid recomputing clausification multiple times.
The only drawback is it double memory footprint.

I made _clauses_tseitin hidden but we can rename it to visible variable clauses_tseitin or even take another name.

edit: The build fails for python 3.11 and 3.12 only, I don't know why.

@alexeyignatiev
Copy link
Collaborator

But why do you need to have another variable for storing clauses? Why is clauses not enough? All the formula types except for Neg seem have no issue with the outermost case.

Yes, Atom should return a clause if is outermost but I could not find a use for that.

@brossignol
Copy link
Contributor Author

It may not be important, but if you read the subformula clauses, you may get different results.
Currently:

f = And(Atom(1),Atom(2))
g = And(f, Atom(3))

print(list(f))
>>> [[1], [2]]

g.clausify()
print(list(f))
>>> [[1, -4], [2, -4], [4, -1, -2]]

I aimed to have always the same results when called directly (still return Tseitin variables when called as a subformula)

print(list(f))
>>> [[1], [2]]

@alexeyignatiev
Copy link
Collaborator

alexeyignatiev commented Apr 16, 2024

I see what you mean. This makes sense although I feel somehow uneasy doubling the number of clauses. Do you make it work uniformly across all the formula types, including CNF?

UPDATE: No worries, I can see now that CNF is among the classes updated.

@brossignol
Copy link
Contributor Author

I think I can make that only the requested clauses (.clauses or .clauses_tseitin) are stored. This way, the subformula won't store any unnecessary clauses.
This will increase calculation time when both types of clauses are called on the same formula. This seems more acceptable.

@alexeyignatiev
Copy link
Collaborator

Well, re-encoding every time is a problem. 🙂

@alexeyignatiev
Copy link
Collaborator

So I think it is a lesser evil to do the work once (although it spends more memory).

@brossignol
Copy link
Contributor Author

I was thinking more as re-encoding twice at most. Once for .clauses (if needed) and once for .clauses_tseitin (if needed).
Given that .clauses_tseitin require .clauses, if the call order is .clauses then .clauses_tseitin there is no overcost.

@alexeyignatiev
Copy link
Collaborator

Yes, but every time a sub-formula appears in a larger formula, you have to "re-tseitinize" it.

@brossignol
Copy link
Contributor Author

But if we store the tseitin result inside .clauses_tseitin there is no problem. Am I wrong?

@alexeyignatiev
Copy link
Collaborator

My current solution was to keep the clauses representing a non-tseitinized (unnamed) variant of the clauses and then give it a name by applying Tseitin transformation on demand. You correctly pointed out that a user may want to consider clauses representing not only the most complex formula but also its specific sub-formulas, which requires one to keep both versions of the clauses after clausification or re-clausifying on the fly every time. I don't see how you can do it by keeping a single version only and encoding on the fly only once.

@alexeyignatiev
Copy link
Collaborator

But if we store the tseitin result inside .clauses_tseitin there is no problem. Am I wrong?

In this case, there is no problem of re-encoding but there is a problem of a duplicate of the clauses. 🙂

@brossignol
Copy link
Contributor Author

There is a duplicate only if the user asks both regular clauses and tseistin clauses on the same formula.
My guess is it will not happen very often.

@brossignol
Copy link
Contributor Author

brossignol commented Apr 16, 2024

I could aim that clausifying a big formula gives the result without extra computing or memory.
But if regular clauses of subformula are required then there are extra cost of memory and computation. (at most x2).

@alexeyignatiev
Copy link
Collaborator

I could aim that clausifying a big formula once gives the result without extra computing or memory. But if regular clauses of subformula are required then there are extra cost of memory and computation. (at most x2).

I presume this is what your current patch is doing?

@brossignol
Copy link
Contributor Author

For the moment, it's doubling the memory, but I think I can achieve this goal relatively easily.
I need to tell .clausify to not store .clauses when not required.

@alexeyignatiev
Copy link
Collaborator

For the moment, it's doubling the memory, but I think I can achieve this goal relatively easily. I need to tell .clausify to not store .clauses when not required.

But the thing is you never know if a user will need that clausal representation of the sub-formula. It is safer to keep it.

@brossignol
Copy link
Contributor Author

brossignol commented Apr 16, 2024

They will have access to the tseitin clauses of subformula at no cost, but accessing the regular clause will add cost.

To be fair I have now idea how regular users will use the code (I am not a SAT expert). The most common case can guide how the code is optimized for.

@brossignol
Copy link
Contributor Author

I updated with the memory saving. The results seem convincing to me.

And example to display the new behaviour:

f = And(Atom('1'), Atom('2'))
g = And(f, Atom('3'))

print('f', f.clauses, 'tseitin:', f.clauses_tseitin)
print('g', g.clauses, 'tseitin:', g.clauses_tseitin)
>>> f [] tseitin: []
>>> g [] tseitin: []

g.clausify()
print('f', f.clauses, 'tseitin:', f.clauses_tseitin)
print('g', g.clauses, 'tseitin:', g.clauses_tseitin)
>>>f [] tseitin: [[1, -3], [2, -3], [3, -1, -2]]
>>>g [[3], [4]] tseitin: []

f.clausify()
print('f', f.clauses, 'tseitin:', f.clauses_tseitin)
print('g', g.clauses, 'tseitin:', g.clauses_tseitin)
>>>f [[1], [2]] tseitin: [[1, -3], [2, -3], [3, -1, -2]]
>>>g [[3], [4]] tseitin: []

@alexeyignatiev
Copy link
Collaborator

This looks good, thank you!

@alexeyignatiev alexeyignatiev merged commit 9d22f3e into pysathq:master Apr 16, 2024
1 check passed
@alexeyignatiev
Copy link
Collaborator

Just in case you will continue working on something related to these changes, I've renamed the variable self.clauses_tseitin to self.encoded. Again, thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants