Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quick'n'dirty subsampling #6

Closed
riastradh-probcomp opened this issue May 21, 2015 · 3 comments
Closed

quick'n'dirty subsampling #6

riastradh-probcomp opened this issue May 21, 2015 · 3 comments
Assignees

Comments

@riastradh-probcomp
Copy link
Contributor

  • Initialize Crosscat on single random subset of real data for all models.
  • For BQL queries on rows not known to Crosscat, use hypotheticals.
    • No need to manifest insertion into Crosscat in BQL for now.
@riastradh-probcomp
Copy link
Contributor Author

vkmvkmvkm: In the future, will we want each model to be analyzed on a (potentially) different randomly chosen subset of rows?

@vkmvkmvkmvkm
Copy link
Contributor

Yes. (If it helps, I'm happy to discuss an interface to models and
meta-models that could make this easy and also help decouple BQL & its
primitives from crosscat.)

On Fri, May 29, 2015 at 6:59 PM, riastradh-probcomp <
notifications@github.com> wrote:

vkmvkmvkm: In the future, will we want each model to be analyzed on a
different subset of rows?


Reply to this email directly or view it on GitHub
#6 (comment)
.

@riastradh-probcomp
Copy link
Contributor Author

Fixed in 0780f7c.

Selection of rows is not random: it is the first ones in the table. Randomizing it is a separate issue, requires deciding whether to do it nondeterministically or with a fixed seed, and is not what old bayesdb did anyway.

Selection of rows is not per-model: every model shares a common subsampling of rows. Crosscat is not currently set up to allow two models to be trained on two distinct sets of rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants