Improving the python API #53

eerolinna · 2019-10-16T13:46:28Z

Here's some potential changes for the python API that I feel could make it better. The R API is not affected.

You can view a README with the new API here: https://github.com/eerolinna/posterior_database/blob/patch-1/python/README.md

Constructing a posterior

Old: po = Posterior(posterior_name, my_pdb)
New: po = my_pdb.posterior(posterior_name)
(the old way will still work)

The same applies for model and dataset, so we have my_pdb.model(model_name) and my_pdb.dataset(data_name)

I feel this can makes it clearer that the posterior comes from the posterior database.

Accessing model code

Old (all are equivalent)

mo = po.model
mo.model_code("stan")
mo.stan_code()
po.model_code("stan")
po.stan_code()

New

mo.code("stan")
mo.stan_code()
po.code("stan")
po.stan_code() # this could also maybe be dropped, but keeping this is fine too

This drops the unnecessary model prefix. po.model_code_file_path("stan") is also shortened to po.code_file("stan")

po.code("stan") maybe should still be po.model_code, or we could just drop it and use po.model.code("stan"). po.stan_code() could perhaps also be dropped, then we'd use po.model.stan_code()

Accessing model information

Old

mo.model_info
po.model_info

New

mo.information
po.model.information

Drops the model prefix and removes the shortened form

The result from calling these is something like

{'keywords': ['bda3_example', 'hiearchical'],
 'description': 'A centered hiearchical model for the 8 schools example of Rubin (1981)',
 'urls': ['http://www.stat.columbia.edu/~gelman/arm/examples/schools'],
 'title': 'A centered hiearchical model for 8 schools',
 'references': ['rubin1981estimation', 'gelman2013bayesian'],
 'added_by': 'Mans Magnusson',
 'added_date': '2019-08-12'}

The slot model_code is dropped from the result as this is just an implementation detail and mo.code_file("stan") already contains this information.

Accessing data file

Old

da = po.dataset
da.data()
po.data()

This one I'm not sure what to do about, I feel it's confusing to have both po.data() (the actual data, in other words the loaded JSON) and po.dataset (which is the PDB concept of a dataset that has a name like "8_schools" and contains both the actual data po.dataset.data() and the information about the dataset po.dataset.information)

Accessing dataset information

Old

da.data_info
po.data_info

New

da.information
po.dataset.information

Same changes as in model information

The result is something like

{'keywords': ['bda3_example'],
 'description': 'A study for the Educational Testing Service to analyze the effects of\nspecial coaching programs on test scores. See Gelman et. al. (2014), Section 5.5 for details.',
 'urls': ['http://www.stat.columbia.edu/~gelman/arm/examples/schools'],
 'title': 'The 8 schools dataset of Rubin (1981)',
 'references': ['rubin1981estimation', 'gelman2013bayesian'],
 'added_by': 'Mans Magnusson',
 'added_date': '2019-08-12'}

The slot data_file is dropped from the result as this is just an implementation detail and da.data_file() already contains this information.

The text was updated successfully, but these errors were encountered:

eerolinna · 2019-10-16T13:47:37Z

Any comments @MansMeg @paul-buerkner?

eerolinna · 2019-10-21T17:03:53Z

Perhaps this could be an improvement to the dataset API

Accessing the loaded data JSON file

po.dataset.values()

values is a decently common python idiom for accessing the underlying values (for example xarray and pandas use it)

Get file path to the data JSON file

po.dataset.file()

eerolinna · 2019-10-22T11:43:20Z

I would appreciate some comments. Do you agree with this change or do you have some concerns? It's also fine to say "My gut reaction is negative but I can't yet articulate why" if that's the case.

MansMeg · 2019-10-22T12:34:34Z

I will go through this later.

MansMeg · 2019-10-31T17:50:39Z

So I think this new structure would be really good. It looks better. I think this would also simplify connecting to Github (and in the future a public database). I think it looks good.

eerolinna · 2019-11-01T05:19:12Z

Great! I'll do the changes soon (probably start of next week)

MansMeg · 2019-11-02T06:26:13Z

Me, Paul and Aki decided that the R package will be the reference implementation for now since this is where we are working now. so I remove this issue from the prototype milestone. If this can be fixed this coming week that would be great, because that when we expect the database prototype to come out.

eerolinna · 2019-11-02T12:10:27Z

That's fine. I'll very likely get these changes done this coming week.

MansMeg added this to the Prototype milestone Oct 22, 2019

MansMeg removed this from the Prototype milestone Nov 2, 2019

eerolinna mentioned this issue Nov 5, 2019

Improved python api #80

Merged

MansMeg closed this as completed in #80 Nov 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the python API #53

Improving the python API #53

eerolinna commented Oct 16, 2019 •

edited

Loading

eerolinna commented Oct 16, 2019

eerolinna commented Oct 21, 2019

eerolinna commented Oct 22, 2019

MansMeg commented Oct 22, 2019

MansMeg commented Oct 31, 2019

eerolinna commented Nov 1, 2019

MansMeg commented Nov 2, 2019

eerolinna commented Nov 2, 2019

Improving the python API #53

Improving the python API #53

Comments

eerolinna commented Oct 16, 2019 • edited Loading

Constructing a posterior

Accessing model code

Accessing model information

Accessing data file

Accessing dataset information

eerolinna commented Oct 16, 2019

eerolinna commented Oct 21, 2019

Accessing the loaded data JSON file

Get file path to the data JSON file

eerolinna commented Oct 22, 2019

MansMeg commented Oct 22, 2019

MansMeg commented Oct 31, 2019

eerolinna commented Nov 1, 2019

MansMeg commented Nov 2, 2019

eerolinna commented Nov 2, 2019

eerolinna commented Oct 16, 2019 •

edited

Loading