Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the python API #53

Closed
eerolinna opened this issue Oct 16, 2019 · 8 comments · Fixed by #80
Closed

Improving the python API #53

eerolinna opened this issue Oct 16, 2019 · 8 comments · Fixed by #80

Comments

@eerolinna
Copy link
Contributor

eerolinna commented Oct 16, 2019

Here's some potential changes for the python API that I feel could make it better. The R API is not affected.

You can view a README with the new API here: https://github.com/eerolinna/posterior_database/blob/patch-1/python/README.md

Constructing a posterior

Old: po = Posterior(posterior_name, my_pdb)
New: po = my_pdb.posterior(posterior_name)
(the old way will still work)

The same applies for model and dataset, so we have my_pdb.model(model_name) and my_pdb.dataset(data_name)

I feel this can makes it clearer that the posterior comes from the posterior database.

Accessing model code

Old (all are equivalent)

mo = po.model
mo.model_code("stan")
mo.stan_code()
po.model_code("stan")
po.stan_code()

New

mo.code("stan")
mo.stan_code()
po.code("stan")
po.stan_code() # this could also maybe be dropped, but keeping this is fine too

This drops the unnecessary model prefix. po.model_code_file_path("stan") is also shortened to po.code_file("stan")

po.code("stan") maybe should still be po.model_code, or we could just drop it and use po.model.code("stan"). po.stan_code() could perhaps also be dropped, then we'd use po.model.stan_code()

Accessing model information

Old

mo.model_info
po.model_info

New

mo.information
po.model.information

Drops the model prefix and removes the shortened form

The result from calling these is something like

{'keywords': ['bda3_example', 'hiearchical'],
 'description': 'A centered hiearchical model for the 8 schools example of Rubin (1981)',
 'urls': ['http://www.stat.columbia.edu/~gelman/arm/examples/schools'],
 'title': 'A centered hiearchical model for 8 schools',
 'references': ['rubin1981estimation', 'gelman2013bayesian'],
 'added_by': 'Mans Magnusson',
 'added_date': '2019-08-12'}

The slot model_code is dropped from the result as this is just an implementation detail and mo.code_file("stan") already contains this information.

Accessing data file

Old

da = po.dataset
da.data()
po.data()

This one I'm not sure what to do about, I feel it's confusing to have both po.data() (the actual data, in other words the loaded JSON) and po.dataset (which is the PDB concept of a dataset that has a name like "8_schools" and contains both the actual data po.dataset.data() and the information about the dataset po.dataset.information)

Accessing dataset information

Old

da.data_info
po.data_info

New

da.information
po.dataset.information

Same changes as in model information

The result is something like

{'keywords': ['bda3_example'],
 'description': 'A study for the Educational Testing Service to analyze the effects of\nspecial coaching programs on test scores. See Gelman et. al. (2014), Section 5.5 for details.',
 'urls': ['http://www.stat.columbia.edu/~gelman/arm/examples/schools'],
 'title': 'The 8 schools dataset of Rubin (1981)',
 'references': ['rubin1981estimation', 'gelman2013bayesian'],
 'added_by': 'Mans Magnusson',
 'added_date': '2019-08-12'}

The slot data_file is dropped from the result as this is just an implementation detail and da.data_file() already contains this information.

@eerolinna
Copy link
Contributor Author

Any comments @MansMeg @paul-buerkner?

@eerolinna
Copy link
Contributor Author

Perhaps this could be an improvement to the dataset API

Accessing the loaded data JSON file

po.dataset.values()

values is a decently common python idiom for accessing the underlying values (for example xarray and pandas use it)

Get file path to the data JSON file

po.dataset.file()

@eerolinna
Copy link
Contributor Author

I would appreciate some comments. Do you agree with this change or do you have some concerns? It's also fine to say "My gut reaction is negative but I can't yet articulate why" if that's the case.

@MansMeg MansMeg added this to the Prototype milestone Oct 22, 2019
@MansMeg
Copy link
Collaborator

MansMeg commented Oct 22, 2019

I will go through this later.

@MansMeg
Copy link
Collaborator

MansMeg commented Oct 31, 2019

So I think this new structure would be really good. It looks better. I think this would also simplify connecting to Github (and in the future a public database). I think it looks good.

@eerolinna
Copy link
Contributor Author

Great! I'll do the changes soon (probably start of next week)

@MansMeg
Copy link
Collaborator

MansMeg commented Nov 2, 2019

Me, Paul and Aki decided that the R package will be the reference implementation for now since this is where we are working now. so I remove this issue from the prototype milestone. If this can be fixed this coming week that would be great, because that when we expect the database prototype to come out.

@MansMeg MansMeg removed this from the Prototype milestone Nov 2, 2019
@eerolinna
Copy link
Contributor Author

That's fine. I'll very likely get these changes done this coming week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants