Some questions about formula from the paper "Learning Bayesian Networks with Thousands of Variables". #10

AlbertYang1995 · 2018-11-15T08:19:46Z

Hi, I am trying to use the formula from the paper but I met some troubles.
I don't know whether I misunderstand it.

Here is the formula:
BIC(x, pa) = BIC*(x, pa1) + BIC*(x, pa2) + inter(x, pa1, pa2) + N * ( I (pa1;pa2| x ) - I (pa1;pa2) )
inter(x, pa1, pa2) = logN * ( |x| - 1 ) * ( |pa1| + |pa2| - |pa1| |pa2| - 1) - BIC( x )

I know pa1Upa2 = pa, so we can decompose BIC, and it's the most amazing part. But I am confused about |x|, |pa1| and |pa2|, do they mean the number of states as mentioned in the paper?

for example : x can be 0, 1, 2 (discrete data), so |x| = 3?

But I found that this formula didn't equal to original BIC score, because the |pa1| |pa2| will increase with parents number increase, eventually inter will become negative, and the score will far from the original one.

Did I do something wrong? Thank you very much.

mauro-idsia · 2018-11-15T10:03:40Z

Dear Albert, the BIC* is an approximation of the BIC. A bounded approximation (and this is the core finding of that paper), but nevertheless it is not equal to the exact BIC score. Bests, Mauro

…

On Thu, Nov 15, 2018 at 9:19 AM Albert Yang ***@***.***> wrote: Hi, I am trying to use the formula from the paper but I met some troubles. I don't know whether I misunderstand it. Here is the formula: BIC(x, pa) = BIC*(x, pa1) + BIC*(x, pa2) + inter(x, pa1, pa2) + N * ( I (pa1;pa2| x ) - I (pa1;pa2) ) inter(x, pa1, pa2) = logN * ( |x| - 1 ) * ( |pa1| + |pa2| - |pa1| |pa2| - 1) - BIC( x ) I know pa1Upa2 = pa, so we can decompose BIC, and it's the most amazing part. But I am confused about |x|, |pa1| and |pa2|, do they mean the number of states as mentioned in the paper? for example : x can be 0, 1, 2 (discrete data), so |x| = 3? But I found that this formula didn't equal to original BIC score, because the *|pa1| |pa2|* will increase with parents number increase, eventually *inter* will become negative, and the score will far from the original one. Did I do something wrong? Thank you very much. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#10>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWpFFsVhONLYUa3al7QmXAAsA1VVMswOks5uvSOigaJpZM4YfSwq> .

-- --- Skana "Just ask yourself the right question."

AlbertYang1995 · 2018-11-15T11:16:04Z

Dear Mauro,

I find that | x | means the number of values of x, but |pa1| and |pa2| mean the number of parent nodes of subset.

Albert

mauro-idsia · 2018-11-15T12:25:53Z

From pag. 2 of [Scanagatta et all, 2015]: "and | · | indicates the size of the Cartesian product space of the variables given as arguments (instead of the number of variables)" so |pa| means the number of possible combinations of values of the variables contained in |pa| Hope this clarifies the issue, Mauro

…

On Thu, Nov 15, 2018 at 12:16 PM Albert Yang ***@***.***> wrote: Dear Mauro, I find that | x | means the number of values of x, but |pa1| and |pa2| mean the number of parent nodes of subset. Albert — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWpFFqm2NFv8XCVyjuVtA3nkxOyVfRfhks5uvUz0gaJpZM4YfSwq> .

-- --- Skana "Just ask yourself the right question."

AlbertYang1995 · 2018-11-15T13:01:05Z

At the very beginning I thought |pa| means the number of possible combination of values.

For example: pa have 3 variables(a, b, c), and each variable has 3 values(0, 1, 2), so the |pa| = 27

But when I coded and ran, the result wasn't right, so I came here ask for help.

During the time when I waited for reply, I guessed that the |pa| may be represent the number of parent nodes and I had a try. The result shows that these two BIC scores are the same.

Maybe I am wrong, but I really doubt the meaning of |pa|.

mauro-idsia · 2018-11-15T16:38:46Z

Dear Albert, I can assure you that it means the number of possible combinations. Another way to think of it, it is directly related to the number of independent parameters k in the standard definition of BIC: BIC = − 2LL + k*log(n) in the CPTs of a Bayesian network you need to specify a complete distribution for each of the possible combination of values of the parents. Bests, Mauro

…

On Thu, Nov 15, 2018 at 2:01 PM Albert Yang ***@***.***> wrote: At the very beginning I thought |pa| means the number of possible combination of values. For example: *pa* have 3 variables(a, b, c), and each variable has 3 values(0, 1, 2), so the |pa| = 27 But when I coded and ran, the result wasn't right, so I came here ask for help. During the time when I waited for reply, I guessed that the |pa| may be represent the number of parent nodes and I had a try. The result shows that these two BIC scores are the same. Maybe I am wrong, but I really doubt the meaning of |pa|. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWpFFi2RKS_Ftg7DVz74yz7WM86kCQ7Mks5uvWWRgaJpZM4YfSwq> .

-- --- Skana "Just ask yourself the right question."

AlbertYang1995 · 2018-11-16T03:28:01Z

Dear Mauro,

You are totally right. Indeed the BIC* is an approximation of the BIC.

And I find that I didn't change the penalty of BIC*, so when I assumed that |pa| mean the number of parent nodes, those BIC scores turned out to be the same coincidently.

I am so sorry and thank you for your time.

Albert

AlbertYang1995 · 2018-11-16T09:48:51Z

Dear Mauro,

I have another question, we already know that BIC can be decomposed, but when the number of parent nodes increase, how to calculate conditional mutual information between two parents subsets?

For example:
Assuming that we have 30 parent nodes in all, then we can divide them into pa1 and pa2 these two subsets, pa1 have 10 variables and pa2 have 20 variables, after that pa2 can be divided into another two subsets again, so we can calculate BIC*. But each time we decompose parents subset, we have to calculate conditional mutual information and mutual entropy between pa1 and pa2 ( I (pa1;pa2 | x) and I (pa1;pa2) ).

There are too many combinations, how to calculate the mutual information between these two huge subsets?

Thank you very much.

Albert

mauro-idsia · 2018-11-19T07:38:40Z

Dear Albert, In general, the mathematical formulas remain the same regardless of the size of the inputs. Having said that, it would be rather bizarre to have a parent set of 30 variables. In real-world case, I would say that 6 or 7 is already a big parent set. Remember that the BIC penalizes the score of a network / parent set based on the number of parameters required to represent it.

…

On Fri, Nov 16, 2018 at 10:48 AM Albert Yang ***@***.***> wrote: Dear Mauro, I have another question, we already know that BIC can be decomposed, but when the number of parent nodes increase, how to calculate conditional mutual information between two parents subsets? For example: Assuming that we have 30 parent nodes in all, then we can divide them into pa1 and pa2 these two subsets, pa1 have 10 variables and pa2 have 20 variables, after that pa2 can be divided into another two subsets again, so we can calculate BIC*. But each time we decompose parents subset, we have to calculate conditional mutual information and mutual entropy between pa1 and pa2 ( I (pa1;pa2 | x) and I (pa1;pa2) ). There are too many combinations, how to calculate the mutual information between these two huge subsets? Thank you very much. Albert — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#10 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AWpFFsxTBEsNnjyyjHyHL2Z9AYA-PCUzks5uvooEgaJpZM4YfSwq> .

-- --- Skana "Just ask yourself the right question."

AlbertYang1995 · 2018-11-20T07:47:11Z

Dear Mauro,

Thank you so much.
I am ready to close this issue.

Albert

AlbertYang1995 closed this as completed Nov 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about formula from the paper "Learning Bayesian Networks with Thousands of Variables". #10

Some questions about formula from the paper "Learning Bayesian Networks with Thousands of Variables". #10

AlbertYang1995 commented Nov 15, 2018

mauro-idsia commented Nov 15, 2018 via email

AlbertYang1995 commented Nov 15, 2018

mauro-idsia commented Nov 15, 2018 via email

AlbertYang1995 commented Nov 15, 2018

mauro-idsia commented Nov 15, 2018 via email

AlbertYang1995 commented Nov 16, 2018

AlbertYang1995 commented Nov 16, 2018

mauro-idsia commented Nov 19, 2018 via email

AlbertYang1995 commented Nov 20, 2018

Some questions about formula from the paper "Learning Bayesian Networks with Thousands of Variables". #10

Some questions about formula from the paper "Learning Bayesian Networks with Thousands of Variables". #10

Comments

AlbertYang1995 commented Nov 15, 2018

mauro-idsia commented Nov 15, 2018 via email

AlbertYang1995 commented Nov 15, 2018

mauro-idsia commented Nov 15, 2018 via email

AlbertYang1995 commented Nov 15, 2018

mauro-idsia commented Nov 15, 2018 via email

AlbertYang1995 commented Nov 16, 2018

AlbertYang1995 commented Nov 16, 2018

mauro-idsia commented Nov 19, 2018 via email

AlbertYang1995 commented Nov 20, 2018