Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Contingency Table Generator (CTGenerator) #11

Closed
oschulte opened this issue Jun 16, 2017 · 10 comments
Closed

fix Contingency Table Generator (CTGenerator) #11

oschulte opened this issue Jun 16, 2017 · 10 comments
Assignees

Comments

@oschulte
Copy link
Collaborator

oschulte commented Jun 16, 2017

@vidhiJain The contingency table code is in

https://github.com/sfu-cl-lab/FactorBase/blob/master/src/BayesBaseCT_SortMerge.java. It's called from RunBB.java as follows:

//assumes that dbname is in config file and that dbname_setup exists.

BayesBaseCT_SortMerge.CTGenerator();

  1. So it should not be difficult to just call it by itself. It would be progress if we could do that.

The key procedure is CTGenerator()
This is what builds the CT tables. Unfortunately Zhensong made a version of CTGenerator that is for working with groundings called target. Plus he merged this with the nontarget code. Also he merged it with a copy for the case where we are interested only in a subset of the functor nodes. My suggestion would be this.

  1. Make a new branch.

  2. In the new branch, make a copy of BayesBaseCT_SortMerge.java with all the target and subset stuff removed. I can probably even find an older version without the target stuff. See if we can run it then.

  3. Then we can design a CT generator for groundings and subsets from scratch. I think a key move would be to change CT generator so that it takes as input the setup database rather than treat that as a global variable. Then we can use the CT generator with different (temporary) setup databases.

@oschulte
Copy link
Collaborator Author

oschulte commented Jun 19, 2017

More precise plan (for discussion).

  1. move standard databases to cs-oschulte-03.cs.sfu.ca ,e.g.
  • unielwin_std
  • Financial_std
  • UW_std
  1. run ct-generator to check it works

  2. revise ct-generator to drop auxilliary tables

  3. discuss how to achieve two functionalities

  • generating ct-tables for subsets of functor nodes
  • generating ct-tables that includes groundings for selected population variables

@vidhiJain
Copy link
Contributor

  1. The Standard database that now exist in cs-oschulte-03.cs.sfu.ca are:
  • New_Financial_std (There is some error in 'drop schema Financial_std' or 'create table...' in this database.
  • unielwin (This is a copy of the database that exist in cs-oschulte-01.cs.sfu.ca. Is there any existing database named unielwin_std?)
  • New_UW_std (Same error as in Financial_std)
    The file directories Financial_std and UW_std seem to contain hidden files due to which they can not be deleted from the cs-oschulte-03.cs.sfu.ca server. These are currently empty schemas.

@oschulte
Copy link
Collaborator Author

Thanks Vidhi! As for unielwin, I guess there is no_std. Maybe rename it New_Unielwin. You could even name these things Vidhi_Financial instead of New_Financial_std. But any name is okay I'd suggest documenting it somewhere, maybe in a readme file (e.g. Vidhi_notes).

As for deleting the old ones, I didn't mean to delete them, we may need them in the future. But never mind. I vaguely remember this problem of dropping them, maybe it's a permission issue. Maybe just write it down as an issue here and we'll come back to it when we review the whole database - there are lots of things to delete!

So can you build CT tables for the three new databases?

@vidhiJain
Copy link
Contributor

vidhiJain commented Jun 19, 2017

So the build_CT() in RunBB.java works and creates _BN, _CT and _setup databases for:

  • New_Financial_std : in 1456779ms
  • New_UW_std : --
  • unielwin : in 4377ms

@oschulte
Copy link
Collaborator Author

Hooray! Can you check if you can delete the code with subset* and target* and still have it work?

@vidhiJain
Copy link
Contributor

vidhiJain commented Jun 21, 2017

Yes, I deleted the SubsetCTComputation.java and the functions called in BayesBaseCT_SortMerge.java. The execution completed for (All CT tables)

  • New_Financial_std : 1416402 ms
  • New_UW_std : 15721ms
  • unielwin : 986 ms

@oschulte
Copy link
Collaborator Author

oschulte commented Jun 23, 2017

Great! I hope the new code is easier. For implementing the relational classification formula, we need to add functionality to BayesBaseCT_SortMerge.java. How about we discuss this tomorrow morning? If you want to get started on something, it would be helpful to make the buildCT procedure more flexible.

  1. It's the key procedure, worth putting in a file of its own like buildCT.java
  2. Right now it uses global variables. It would be more useful if we could pass it arguments like this buildCT(connection,setup_db,data_db,ct_db) such that
  • connection is a database connection
  • setup_db is a schema that contains the metadata (e.g. 1Nodes, RNodes, FNodes), e..g unielwin_st_setup
  • data_db contains the original data (e.g. unielwin_std)
  • after running buildCT, ct_db contains the CT table for the FNodes in setup_db and the data in data_db

Right now BayesBaseCT_SortMerge.java makes the database schemas global variables and calls buildCT after setting them. If we could change BayesBaseCT_SortMerge.java so that it calls buildCT.java with the right arguments, that would make it easier to use buildCT for other purposes like relational classification. It would also help with issue #32.

@oschulte
Copy link
Collaborator Author

@vidhiJain @JanyQZ Hi, my comment above describes the change we want for buildCT so that we can pass it arguments (e.g. NewSetup). Once we've done that we can plan the details for extra functionality, like specifying subsets of functors for the CT tables.

@oschulte oschulte changed the title Contingency Table Problem fix Contingency Table Generator (buildCT) Jun 23, 2017
@oschulte oschulte changed the title fix Contingency Table Generator (buildCT) fix Contingency Table Generator (CTGenerator) Jun 26, 2017
@oschulte
Copy link
Collaborator Author

oschulte commented Jun 27, 2017

If we can make the functionality to copy the setup database, we can delete functors from it. Like this:

Input: A datadb, a subset of functors functor_list, a setupdb

  1. Copy the setupdb to subset_setup_db
  2. Delete the functors from subset_setup_db that are not in functor_list. Or maybe rewrite CT_generator so it uses something like subset_1nodes rather than 1nodes, similarly for 2nodes and rnodes.
  3. Run CTgenerator on subset_setup_db

To add the groundings, we can add pvariable_ids to the group by clause in CTgenerator (already implemented)

@oschulte
Copy link
Collaborator Author

this is almost done for ct-table. Still needs to be integrated with classifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants