# BayesDB analysis of fish bouts in relation to paramecia characteristics

## 1. Ingesting the data

### a. Launching BayesDB

In [31]:
from csv_analysis import invert_all_bouts
import pandas as pd
#drct = '042318_6/'
drct = 'wik_bdb/'
csv_file = drct + 'all_huntbouts_marr_nob0.csv'
data = pd.read_csv(csv_file)
#invert_all_bouts(data, drct)


In [32]:
%load_ext jupyter_probcomp.magics

The jupyter_probcomp.magics extension is already loaded. To reload it, use:
  %reload_ext jupyter_probcomp.magics


In [33]:
%matplotlib inline
%vizgpm inline

<IPython.core.display.Javascript object>

In [34]:
!rm -f wik_bdb/bdb_hunts_marr.bdb
%bayesdb -j wik_bdb/bdb_hunts_marr.bdb



u'Loaded: wik_bdb/bdb_hunts_marr.bdb'

### b. Loading and printing the data from a `.csv` file

In [35]:
%sql DROP TABLE IF EXISTS bout_table;
#%bql CREATE TABLE bout_table FROM 'wik_bdb/huntbouts_inverted.csv' 
%bql CREATE TABLE bout_table FROM 'wik_bdb/all_huntbouts_marr_nob0.csv'

In [36]:
%sql SELECT * FROM bout_table LIMIT 5;

Unnamed: 0,Bout Number,Para Az,Para Alt,Para Dist,Postbout Para Az,Postbout Para Alt,Postbout Para Dist
0,1.0,-0.162832,0.654781,305.528267,-0.158165,0.73665,286.473525
1,2.0,-0.179093,0.818221,281.527199,0.171628,0.271518,200.86039
2,3.0,0.073963,0.21028,134.098036,0.012218,0.210299,95.292186
3,4.0,0.108441,0.299536,77.905007,0.221962,0.746375,60.931606
4,1.0,0.198364,0.499294,282.973273,-0.067488,0.434303,188.771498


In [37]:
%bql .nullify bout_table ''
%bql .nullify bout_table 'nan'

Nullified 0 cells
Nullified 150 cells


## 2. Automatically learning a CrossCat probabilistic model

### a. Defining an analysis population

In [38]:
%bql GUESS SCHEMA FOR bout_table

Unnamed: 0,column,stattype,num_distinct,reason
0,Bout Number,numerical,24.0,There are at least 20 unique numerical values...
1,Para Az,numerical,1164.0,There are at least 20 unique numerical values...
2,Para Alt,numerical,1164.0,There are at least 20 unique numerical values...
3,Para Dist,numerical,1164.0,There are at least 20 unique numerical values...
4,Postbout Para Az,numerical,1113.0,There are at least 20 unique numerical values...
5,Postbout Para Alt,numerical,1113.0,There are at least 20 unique numerical values...
6,Postbout Para Dist,numerical,1113.0,There are at least 20 unique numerical values...


In [39]:
%%mml
DROP POPULATION IF EXISTS bout_population;
CREATE POPULATION bout_population FOR bout_table WITH SCHEMA (GUESS STATTYPES OF (*);
                          IGNORE "Bout Number";)

### b. Creating and analyzing a probabilistic model (automatically)

In [40]:
%mml CREATE GENERATOR marr_generator FOR bout_population;

In [41]:
%multiprocess on

Multiprocessing turned on from on.


In [42]:
%mml INITIALIZE 50 MODELS IF NOT EXISTS FOR marr_generator;

In [43]:
%mml ALTER GENERATOR "marr_generator" ENSURE VARIABLES * DEPENDENT;

In [44]:
%mml ANALYZE marr_generator FOR 100 ITERATIONS;
# note you can use the (OPTIMIZED) flag here but that gave you weird results last time. 

Completed: 100 iterations in 759.348661 seconds.
Completed: 100 iterations in 898.248162 seconds.
Completed: 100 iterations in 754.216420 seconds.
Completed: 100 iterations in 825.218060 seconds.
Completed: 100 iterations in 1037.278390 seconds.
Completed: 100 iterations in 1370.180199 seconds.
Completed: 100 iterations in 933.931300 seconds.
Completed: 100 iterations in 1121.893041 seconds.
Completed: 100 iterations in 1111.002343 seconds.
Completed: 100 iterations in 1213.503775 seconds.
Completed: 100 iterations in 1068.450452 seconds.
Completed: 100 iterations in 628.763136 seconds.
Completed: 100 iterations in 888.211892 seconds.
Completed: 100 iterations in 712.577730 seconds.
Completed: 100 iterations in 857.564495 seconds.
Completed: 100 iterations in 764.932816 seconds.
Completed: 100 iterations in 648.070188 seconds.
Completed: 100 iterations in 1094.853770 seconds.
Completed: 100 iterations in 1943.790969 seconds.
Completed: 100 iterations in 1313.207370 seconds.
Completed: 