Skip to content

Conversation

kyleam
Copy link
Contributor

@kyleam kyleam commented Jun 1, 2015

This has a couple of advantages over the Text backend.

  1. The values are stored as in a table per chain, not a file per
    variable. This is easier to inspect and work with directly if
    desired (e.g., with pd.read_csv).
  2. Values are stored during sampling, not kept in memory.

This relies on Pandas, which currently isn't listed as a hard
dependency.

@twiecki
Copy link
Member

twiecki commented Jun 1, 2015

This is great. Do you think we could use odo to abstract away from the backend? https://odo.readthedocs.org/en/latest/

@kyleam
Copy link
Contributor Author

kyleam commented Jun 1, 2015 via email

@datnamer
Copy link

datnamer commented Jun 2, 2015

@kyleam, I think dask with its array of parallel shared memory, async and distributed schedulers and chunking system would be the best bet, aside from odo, for integration. More info, including an offer from the library author to help, can be found in #707. Matt can help with use case guidance as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why two underscores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make the indices visually distinct since it's common for
variable names to have both underscores and digits. But if people
have a strong preference against it, I'm ok changing it to one
underscore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think its good to keep.

@jsalvatier
Copy link
Member

Is pandas ever a hard dependency to install? Is there a reason not to make it a hard dep?

Otherwise looks good to me.

@kyleam
Copy link
Contributor Author

kyleam commented Jun 3, 2015

I'm for making Pandas a non-optional dependency. If we do that, I'd
be in favor of removing the Text backend completely and replacing it
with the CSV (perhaps renaming CSV to Text?).

@jsalvatier
Copy link
Member

@twiecki @fonnesbeck ? Thoughts on making pandas a hard dep?

@fonnesbeck
Copy link
Member

Given that Pandas is now a core library in the scientific stack, forcing users to install it is not unreasonable.

@twiecki
Copy link
Member

twiecki commented Jun 3, 2015

+1 on adding pandas as a hard dependency

@jsalvatier
Copy link
Member

@kyleam can you add the dependency to setup.py? Then I think we're good to go.

@kyleam
Copy link
Contributor Author

kyleam commented Jun 5, 2015

can you add the dependency to setup.py?

Will do.

kyleam added 3 commits June 5, 2015 21:00
This has a couple of advantages over the Text backend.

1. The values are stored as in a table per chain, not a file per
   variable.  This is easier to inspect and work with directly if
   desired (e.g., with pd.read_csv).

2. Values are stored during sampling, not kept in memory.
The main reason to keep the Text backend around was that pandas was an
optional dependency.  Now that this is no longer the case, the
original Text backend doesn't offer any advantages over the CSV
backend.  (Even if someone prefers not to write to files while
sampling, they can sample with the NDArray backend and then use the
CSV dump function.)

Rename CSV to Text.  This name is more appropriate because the values
being stored as plain text is the important feature, not which
delimiter is used.
@kyleam
Copy link
Contributor Author

kyleam commented Jun 6, 2015

Updated. Does anyone have an issue with removing Text (and renaming CSV to Text)?

@jsalvatier
Copy link
Member

What additional capabilities beyond CSV does Text currently have? If it doesn't really have any, then I'm all for it.

@kyleam
Copy link
Contributor Author

kyleam commented Jun 6, 2015

None (see commit message).

@jsalvatier
Copy link
Member

Yeah, lets replace it then.

jsalvatier added a commit that referenced this pull request Jun 6, 2015
@jsalvatier jsalvatier merged commit b0b79f7 into master Jun 6, 2015
@twiecki twiecki deleted the csv branch June 6, 2015 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants