Add CSV backend #723

kyleam · 2015-06-01T03:52:55Z

This has a couple of advantages over the Text backend.

The values are stored as in a table per chain, not a file per
variable. This is easier to inspect and work with directly if
desired (e.g., with pd.read_csv).
Values are stored during sampling, not kept in memory.

This relies on Pandas, which currently isn't listed as a hard
dependency.

twiecki · 2015-06-01T15:12:33Z

This is great. Do you think we could use odo to abstract away from the backend? https://odo.readthedocs.org/en/latest/

kyleam · 2015-06-01T23:03:23Z

I started reading up on and experimenting with blaze, odo, and friends this weekend, and they all look really nice. But at I don't yet have a clear picture of how to best integrate them and would be glad to see any suggestions or proposals for this (but perhaps we should start a separate, dedicated issue for exploring that?).

datnamer · 2015-06-02T15:33:55Z

@kyleam, I think dask with its array of parallel shared memory, async and distributed schedulers and chunking system would be the best bet, aside from odo, for integration. More info, including an offer from the library author to help, can be found in #707. Matt can help with use case guidance as well.

jsalvatier · 2015-06-02T23:01:34Z

pymc3/tests/test_csv_backend.py

Why two underscores?

Just to make the indices visually distinct since it's common for
variable names to have both underscores and digits. But if people
have a strong preference against it, I'm ok changing it to one
underscore.

Makes sense. I think its good to keep.

jsalvatier · 2015-06-02T23:27:34Z

Is pandas ever a hard dependency to install? Is there a reason not to make it a hard dep?

Otherwise looks good to me.

kyleam · 2015-06-03T04:47:37Z

I'm for making Pandas a non-optional dependency. If we do that, I'd
be in favor of removing the Text backend completely and replacing it
with the CSV (perhaps renaming CSV to Text?).

jsalvatier · 2015-06-03T06:00:04Z

@twiecki @fonnesbeck ? Thoughts on making pandas a hard dep?

fonnesbeck · 2015-06-03T13:25:44Z

Given that Pandas is now a core library in the scientific stack, forcing users to install it is not unreasonable.

twiecki · 2015-06-03T15:10:12Z

+1 on adding pandas as a hard dependency

jsalvatier · 2015-06-05T00:24:32Z

@kyleam can you add the dependency to setup.py? Then I think we're good to go.

kyleam · 2015-06-05T17:33:34Z

can you add the dependency to setup.py?

Will do.

This has a couple of advantages over the Text backend. 1. The values are stored as in a table per chain, not a file per variable. This is easier to inspect and work with directly if desired (e.g., with pd.read_csv). 2. Values are stored during sampling, not kept in memory.

The main reason to keep the Text backend around was that pandas was an optional dependency. Now that this is no longer the case, the original Text backend doesn't offer any advantages over the CSV backend. (Even if someone prefers not to write to files while sampling, they can sample with the NDArray backend and then use the CSV dump function.) Rename CSV to Text. This name is more appropriate because the values being stored as plain text is the important feature, not which delimiter is used.

kyleam · 2015-06-06T01:21:16Z

Updated. Does anyone have an issue with removing Text (and renaming CSV to Text)?

jsalvatier · 2015-06-06T03:04:29Z

What additional capabilities beyond CSV does Text currently have? If it doesn't really have any, then I'm all for it.

kyleam · 2015-06-06T04:29:33Z

None (see commit message).

jsalvatier · 2015-06-06T07:09:14Z

Yeah, lets replace it then.

Add CSV backend

jsalvatier reviewed Jun 2, 2015
View reviewed changes

kyleam added 3 commits June 5, 2015 21:00

Add pandas as hard dependency

5b9093f

kyleam force-pushed the csv branch from f690fb9 to 9c084dd Compare June 6, 2015 01:19

jsalvatier added a commit that referenced this pull request Jun 6, 2015

Merge pull request #723 from pymc-devs/csv

b0b79f7

Add CSV backend

jsalvatier merged commit b0b79f7 into master Jun 6, 2015

twiecki deleted the csv branch June 6, 2015 09:15

Add CSV backend #723

Add CSV backend #723

Uh oh!

Conversation

kyleam commented Jun 1, 2015

Uh oh!

twiecki commented Jun 1, 2015

Uh oh!

kyleam commented Jun 1, 2015 via email

Uh oh!

datnamer commented Jun 2, 2015

Uh oh!

jsalvatier Jun 2, 2015

Choose a reason for hiding this comment

Uh oh!

kyleam Jun 3, 2015

Choose a reason for hiding this comment

Uh oh!

jsalvatier Jun 3, 2015

Choose a reason for hiding this comment

Uh oh!

jsalvatier commented Jun 2, 2015

Uh oh!

kyleam commented Jun 3, 2015

Uh oh!

jsalvatier commented Jun 3, 2015

Uh oh!

fonnesbeck commented Jun 3, 2015

Uh oh!

twiecki commented Jun 3, 2015

Uh oh!

jsalvatier commented Jun 5, 2015

Uh oh!

kyleam commented Jun 5, 2015

Uh oh!

kyleam commented Jun 6, 2015

Uh oh!

jsalvatier commented Jun 6, 2015

Uh oh!

kyleam commented Jun 6, 2015

Uh oh!

jsalvatier commented Jun 6, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants