New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory error for big discrete Bayesian networks #114
Comments
Hi @shiruizhao, thank you for trying SPPL. The issue is stemming from the Python standard library, specifically the parser in the ast module. The string As a workaround could you try writing all the if/else as standard code blocks, instead of using the syntax with nested expressions? That is, instead of C_NI_12_45 ~= choice({'3' : 0.5, '4' : 0.4, '5' : 0.1, '6' : 0.0}) if (C_NI_12_30 == '3') else (choice({'3' : 0.2, '4' : 0.55, '5' : 0.2, '6' : 0.05}) if (C_NI_12_30 == '4') else (choice({'3' : 0.1, '4' : 0.3, '5' : 0.5, '6' : 0.1}) if (C_NI_12_30 == '5') else (choice({'3' : 0.0, '4' : 0.15, '5' : 0.25, '6' : 0.6})))) try if (C_NI_12_30 == '3'):
C_NI_12_45 ~= choice({'3' : 0.5, '4' : 0.4, '5' : 0.1, '6' : 0.0})
elif (C_NI_12_30 == '4'):
C_NI_12_45 ~= choice({'3' : 0.2, '4' : 0.55, '5' : 0.2, '6' : 0.05}
elif (C_NI_12_30 == '5'):
C_NI_12_45 ~= choice({'3' : 0.1, '4' : 0.3, '5' : 0.5, '6' : 0.1}
else:
C_NI_12_45 ~= choice({'3' : 0.0, '4' : 0.15, '5' : 0.25, '6' : 0.6} |
Thanks for your reply. @fsaad |
I have two suggestions regarding your use case.
That is, try to change CKNI_12_00 ~= choice({'20_MG_L' : 0.3333333,'30_MG_L' : 0.3333333,'40_MG_L' : 0.3333333}) to CKNI_12_00 ~= choice({'20_MG_L' : 1./3, '30_MG_L' : 1./3, '40_MG_L' : 1./3})
That is, try to change if ((CNOD_12_00 == '0_5_MG_L') and (CBODN_12_00 == '5_MG_L') and (CKNN_12_00 == '0_5_MG_L') and (CNON_12_00 == '2_MG_L')): CNON_12_15 ~= choice({'2_MG_L' : 0.9555, '4_MG_L' : 0.0445, '6_MG_L' : 0.0, '10_MG_L' : 0.0})
elif ((CNOD_12_00 == '0_5_MG_L') and (CBODN_12_00 == '5_MG_L') and (CKNN_12_00 == '0_5_MG_L') and (CNON_12_00 == '4_MG_L')): CNON_12_15 ~= choice({'2_MG_L' : 0.0056, '4_MG_L' : 0.9944, '6_MG_L' : 0.0, '10_MG_L' : 0.0}) to if CNOD_12_00 == '0_5_MG_L':
if CBODN_12_00 == '5_MG_L':
if CKNN_12_00 == '0_5_MG_L':
if CNON_12_00 == '2_MG_L':
CNON_12_15 ~= choice({'2_MG_L' : 0.9555, '4_MG_L' : 0.0445, '6_MG_L' : 0.0, '10_MG_L' : 0.0})
elif CNON_12_00 == '2_MG_L'
CNON_12_15 ~= choice({'2_MG_L' : 0.0056, '4_MG_L' : 0.9944, '6_MG_L' : 0.0, '10_MG_L' : 0.0})
... N.B: In SPPL, the conjunction symbol is the logical operator |
Thanks for your help! Following your suggestion, it works now. |
I ran your code end-to-end and the compilation phase took around 40 minutes using Intel Core i7-8665U @ 1.90GHz, 4 cores, 8 threads. Here is the output on the query phase:
(FYI, your program
This is a really good question. One way to think of model compilation in SPPL is that, as a subroutine, it "pre-solves" a large collection of inference problems (which correspond not only to all marginals, but also arbitrary subsets of all marginals, among others). These solutions are indirectly encoded in the compiled artifact. Thus, solving queries against the compiled representation, like the ones in your benchmark, is designed to be fast and the representation reusable across multiple different queries. To save the compiled model, you can serialize it as a JSON string and then write to and load it from disk, see, e.g., https://github.com/probcomp/sppl/blob/f10146ab4fd45460051a67c414f4317ba329405e/tests/test_spe_to_dict.py#L35-L40 In general, when going from a Bayes Net to a sum-product network, it is perfectly possible for the representation to scale exponentially. Because Bayes Net inference is known to scales exponentially in the tree-width (which in a bad case could be roughly equal to the size of the original graph) and SPN inference scales linearly in the size of the graph for simple queries, for these bounds to match up, the corresponding SPN must be exponentially large. The actual size of the SPN will depend on the conditional independences in the Bayes Net: the more CIs there are, the smaller the compiled model will be. That said, there are many opportunities to speed up the translation process, which is currently implemented in unoptimized Python 3 with interpretation overhead and dynamic error checking. |
Thanks for your detail explaination! As a hareware designer, I'd like to build energy-efficent chip for prob. inference with real-time constraint, I assume the translation could be done offline on a server whilethe queries of SPN may need to be run on the edge device with more power/runtime constraint, so I focus on the inference phase. I am running a whole bunch of benchmark of discrete BNs, if you are interested, I can share with you the results when I finish it. |
You are very welcome.
That is right. For large discrete Bayesian network with arbitrary conditional probability tables we expect that the translation runtime will exceed the query (inference) runtime. Thus, translation is best done "offline" whereas queries can be solved quickly in the online phase.
Please do share your results, and feel free to open any more tickets if you have questions or issues with the software. |
I tried to use SPPL to rn several BNs from bnrepository https://www.bnlearn.com/bnrepository/, The script I create is attached at https://github.com/shiruizhao/swift/blob/master/example/water_benchmark.py. It reported following error, could you have a look if I use the library wrong?
"python3.8/site-packages/sppl/compilers/sppl_to_python.py", line 360, in compile
tree = ast.parse(source_prime)
File "python3.8/ast.py", line 47, in parse
return compile(source, filename, mode, flags,
MemoryError
The text was updated successfully, but these errors were encountered: