Missing values imputation implementation #743

jsalvatier · 2015-06-07T19:51:33Z

No description provided.

This has a couple of advantages over the Text backend. 1. The values are stored as in a table per chain, not a file per variable. This is easier to inspect and work with directly if desired (e.g., with pd.read_csv). 2. Values are stored during sampling, not kept in memory.

The main reason to keep the Text backend around was that pandas was an optional dependency. Now that this is no longer the case, the original Text backend doesn't offer any advantages over the CSV backend. (Even if someone prefers not to write to files while sampling, they can sample with the NDArray backend and then use the CSV dump function.) Rename CSV to Text. This name is more appropriate because the values being stored as plain text is the important feature, not which delimiter is used.

Commit 85b53d4 changed the location of the test sqlite file from the PWD to /tmp. (I believe this was because, due to permission issues, the file was not being removed, and subsequent tests were failing as a result). Use tempfile.gettempdir to be more portable.

With the introduction of missing variables it becomes important to be able to use the value of an ObservedRV. However, this was not possible because ObservedRV didn't subclass TensorVariable because ObservedRV could also handle multiple observed arrays so wouldn't always be a single tensor. This splits up the two cases into ObservedRV, for single observed arrays, and MultiObservedRV for multiple observed arrays. ObservedRV now inherits from TensorVariable.

jsalvatier · 2015-06-07T20:45:11Z

I'm actually not able to replicate this issue. I have the latest Theano master.

ERROR: pymc3.tests.test_shared.test_deterministic

Traceback (most recent call last):
File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/travis/build/pymc-devs/pymc3/pymc3/tests/test_shared.py", line 14, in test_deterministic
y = pm.Normal('y', 0, 1, observed=X)
File "/home/travis/build/pymc-devs/pymc3/pymc3/distributions/distribution.py", line 19, in new
return model.Var(name, dist, data)
File "/home/travis/build/pymc-devs/pymc3/pymc3/model.py", line 158, in Var
var = ObservedRV(name=name, data=data, distribution=dist, model=self)
File "/home/travis/build/pymc-devs/pymc3/pymc3/model.py", line 410, in init
data = as_tensor(data, name,model,distribution.dtype)
File "/home/travis/build/pymc-devs/pymc3/pymc3/model.py", line 369, in as_tensor
data = pandas_to_array(data).astype(dtype)
ValueError: setting an array element with a sequence.

jsalvatier · 2015-06-07T21:29:59Z

Managed to replicate.

Missing values imputation implementation

kyleam and others added 23 commits June 7, 2015 12:40

Add pandas as hard dependency

e507cf1

simple draft of missing values

2722583

missing vals use NoDistribution

f196191

test missing values

7959bed

fix for python 3.4

875d45f

handle missing data from pandas

0082eef

Added disaster_model_missing (not working)

92c9e01

Added lasso model with missing covariates

6111b95

make example work

7aee153

fix non numpy data

a04fec1

fix multiobservedrv

9e1cd02

fix disasters missing

d5916b6

handle gradient when no vars are passed

af4cb94

give missing variables the .model parameter

e3319d8

use the right datatype for the missing values

90b4d2c

make lasso missing work

0af9cff

add fake data for missing lasso example

0b336cf

need n='short' for auto running

d57fc13

increase samples to get summary to work

608c011

support shared variables

469ec7b

jsalvatier added a commit that referenced this pull request Jun 7, 2015

Merge pull request #743 from pymc-devs/missingmaster2

96e03fc

Missing values imputation implementation

jsalvatier merged commit 96e03fc into master Jun 7, 2015

jsalvatier deleted the missingmaster2 branch June 7, 2015 22:28

jsalvatier mentioned this pull request Jun 7, 2015

Missing values imputation implementation #742

Closed

jsalvatier mentioned this pull request Jun 7, 2015

Missing values imputation implementation #712

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing values imputation implementation #743

Missing values imputation implementation #743

Uh oh!

jsalvatier commented Jun 7, 2015

Uh oh!

jsalvatier commented Jun 7, 2015

Uh oh!

jsalvatier commented Jun 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Missing values imputation implementation #743

Missing values imputation implementation #743

Uh oh!

Conversation

jsalvatier commented Jun 7, 2015

Uh oh!

jsalvatier commented Jun 7, 2015

ERROR: pymc3.tests.test_shared.test_deterministic

Uh oh!

jsalvatier commented Jun 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants