# BayesDB analysis of fish bouts in relation to paramecia characteristics

## 1. Ingesting the data

### a. Launching BayesDB

In [2]:
from csv_analysis import invert_all_bouts
import pandas as pd
#drct = '042318_6/'
drct = 'wik_bdb/'
csv_file = drct + 'all_huntbouts_marr_nob0_nostrike.csv'
data = pd.read_csv(csv_file)
#invert_all_bouts(data, drct)


In [3]:
%load_ext jupyter_probcomp.magics

session_id: jovyan@nightcrawler2-notebook_2019-05-08T13:20:50.037069_4


In [4]:
%matplotlib inline
%vizgpm inline

<IPython.core.display.Javascript object>

In [5]:
!rm -f wik_bdb/bdb_hunts_marr.bdb
%bayesdb -j wik_bdb/bdb_hunts_marr.bdb



u'Loaded: wik_bdb/bdb_hunts_marr.bdb'

### b. Loading and printing the data from a `.csv` file

In [6]:
%sql DROP TABLE IF EXISTS bout_table;
#%bql CREATE TABLE bout_table FROM 'wik_bdb/huntbouts_inverted.csv' 
%bql CREATE TABLE bout_table FROM 'wik_bdb/all_huntbouts_marr_nob0_nostrike.csv'

In [7]:
%sql SELECT * FROM bout_table LIMIT 5;

Unnamed: 0,Bout Number,Para Az,Para Alt,Para Dist,Postbout Para Az,Postbout Para Alt,Postbout Para Dist
0,1.0,0.017831,0.659219,90.223577,-0.122253,0.179276,86.955144
1,2.0,-0.290174,0.157606,153.124859,-0.275967,0.322912,125.122296
2,3.0,-0.254112,0.324604,121.740275,0.053819,0.341071,92.544562
3,1.0,0.02626,0.380699,258.205861,0.167752,0.582927,181.386225
4,2.0,0.211294,0.613286,170.139844,-0.13413,0.401721,109.953785


In [8]:
%bql .nullify bout_table ''
%bql .nullify bout_table 'nan'

Nullified 0 cells
Nullified 276 cells


## 2. Automatically learning a CrossCat probabilistic model

### a. Defining an analysis population

In [9]:
%bql GUESS SCHEMA FOR bout_table

Unnamed: 0,column,stattype,num_distinct,reason
0,Bout Number,nominal,24.0,There are fewer than 20 distinct numerical va...
1,Para Az,numerical,1784.0,There are at least 20 unique numerical values...
2,Para Alt,numerical,1784.0,There are at least 20 unique numerical values...
3,Para Dist,numerical,1784.0,There are at least 20 unique numerical values...
4,Postbout Para Az,numerical,1691.0,There are at least 20 unique numerical values...
5,Postbout Para Alt,numerical,1691.0,There are at least 20 unique numerical values...
6,Postbout Para Dist,numerical,1691.0,There are at least 20 unique numerical values...


In [10]:
%%mml
DROP POPULATION IF EXISTS bout_population;
CREATE POPULATION bout_population FOR bout_table WITH SCHEMA (GUESS STATTYPES OF (*);
                          IGNORE "Bout Number";)

### b. Creating and analyzing a probabilistic model (automatically)

In [11]:
%mml CREATE GENERATOR marr_generator FOR bout_population;

In [12]:
%multiprocess on

Multiprocessing turned on from on.


In [13]:
%mml INITIALIZE 50 MODELS IF NOT EXISTS FOR marr_generator;

In [14]:
%mml ALTER GENERATOR "marr_generator" ENSURE VARIABLES * DEPENDENT;

In [None]:
%mml ANALYZE marr_generator FOR 100 ITERATIONS;
# note you can use the (OPTIMIZED) flag here but that gave you weird results last time. 

Completed: 100 iterations in 2385.760107 seconds.
Completed: 100 iterations in 2767.179567 seconds.
Completed: 100 iterations in 1763.489575 seconds.
Completed: 100 iterations in 1927.454419 seconds.
Completed: 100 iterations in 1264.234266 seconds.
Completed: 100 iterations in 1589.133720 seconds.
Completed: 100 iterations in 1799.229036 seconds.
Completed: 100 iterations in 1518.489631 seconds.
Completed: 100 iterations in 1549.576800 seconds.
Completed: 100 iterations in 2519.347685 seconds.
Completed: 100 iterations in 3707.605163 seconds.
Completed: 100 iterations in 2655.850948 seconds.
Completed: 100 iterations in 3071.220527 seconds.
Completed: 100 iterations in 1689.596500 seconds.
Completed: 100 iterations in 2450.125848 seconds.
Completed: 100 iterations in 1578.410756 seconds.
Completed: 100 iterations in 2254.932255 seconds.
Completed: 100 iterations in 2786.419161 seconds.
Completed: 100 iterations in 4741.244721 seconds.
Completed: 100 iterations in 2618.986815 seconds.
