## Subsampling variables in a PAIRWISE query using BQL

__To run a PAIRWISE query on a subset of variables/rows__:

1. Create a table containing a random subset of variable names (for `DEPENDENCE PROBABILITY`) or rowids (for `SIMILARITY`).
2. Run the pairwise query, using a WHERE clauses to filter only those entities which are in the subsample.

This notebook illustrates the workflow, which is useful for running `PAIRWISE` queries on large populations where the full query can be very slow.

In [8]:
%load_ext iventure.magics
%vizgpm inline

The iventure.magics extension is already loaded. To reload it, use:
  %reload_ext iventure.magics


<IPython.core.display.Javascript object>

In [9]:
!rm -f resources/gapminder.depprob.bdb
%bayesdb -j resources/gapminder.depprob.bdb

u'Loaded: resources/gapminder.depprob.bdb'

In [10]:
%bql CREATE TABLE gapminder_t FROM 'resources/gapminder.csv'
%bql .nullify gapminder_t ''

Nullified 31876 cells


In [11]:
%bql CREATE POPULATION gapminder FOR gapminder_t WITH SCHEMA(GUESS STATTYPES OF (*));
%bql CREATE GENERATOR gapminder_m FOR gapminder;
%bql INITIALIZE 4 MODELS FOR gapminder_m;

#### Create a table of 50 subsampled variables (from the 300 total variables).

In [12]:
%%bql
CREATE TEMP TABLE subsampled_variables AS
    ESTIMATE name FROM VARIABLES OF gapminder ORDER BY random() LIMIT 50;

#### Run the `DEPENDENCE PROBABILITY` query on the subsample of variables.

In [13]:
%%bql
.interactive_heatmap 
ESTIMATE
    DEPENDENCE PROBABILITY
FROM PAIRWISE VARIABLES OF gapminder
WHERE
    name0 IN (SELECT name FROM subsampled_variables)
    AND name1 IN (SELECT name FROM subsampled_variables)

<IPython.core.display.Javascript object>