Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download bridgestan repo as artifact for BridgeStan.jl #70

Merged
merged 16 commits into from
Jan 27, 2023

Conversation

sethaxen
Copy link
Contributor

When installing BridgeStan.jl, this PR allows for the bridgestan repo to be downloaded at package build time using Julia's Artifacts system, which should allow BridgeStan to be registered.

It also adds a workflow trigger for incrementing the version number and registering. Note that the instructions at https://github.com/JuliaRegistries/General#registering-a-package-in-general will need to be followed to trigger the initial registration. There's a usual 3 day embargo on all new packages, and after the first version is released, then all subsequent ones can be triggered with this action.

A few notes:

The test suite doesn't pass on my machine, but it didn't pass before this PR. Still, I'm able to use StanModel:

julia> using BridgeStan, JSON, PosteriorDB, ZipFile;

julia> BridgeStan.get_bridgestan()
"/home/sethaxen/.julia/artifacts/730b3ee2b8c9fd643dcb2bee6c0ea5995db142d7/bridgestan-1.0.0"

julia> pdb = PosteriorDB.database();

julia> post = PosteriorDB.posterior(pdb, "dogs-dogs");

julia> model = PosteriorDB.model(post);

julia> stan_file = PosteriorDB.path(PosteriorDB.implementation(model, "stan"));

julia> reader = ZipFile.Reader(PosteriorDB.path(PosteriorDB.dataset(post)));

julia> data = read(only(reader.files), String);

julia> close(reader);

julia> model = StanModel(; stan_file, data);

julia> x = randn(BridgeStan.param_unc_num(model));
3-element Vector{Float64}:
  1.3503676806980218
  0.9985177063922579
 -0.5326054957782398

julia> BridgeStan.log_density_gradient(model, x);
(-3397.8117344293314, [-350.14480235250244, -4082.3566484283583, -2562.7088242578147])

We'll see what happens on CI.

@WardBrian WardBrian self-requested a review January 20, 2023 23:20
@WardBrian
Copy link
Collaborator

The asset attached to the release was manually generated by me. I cloned the repo somewhere fresh, recursively deleted the .git/ folders, and then tar-gzipped it for the release. We could probably make this automatic as well

@sethaxen
Copy link
Contributor Author

Ah, the source of the failure seems to be that the CI has a job to build the test models before running the tests, but since that's pointing to the cloned repo, whereas BridgeStan.jl is pointing to the artifact, the test throws an error when the .so file is not found.

So that the tests can pass also when not run on CI, can we build these models in the test suite itself? We could copy them to a temporary directory first.

@sethaxen
Copy link
Contributor Author

We could probably make this automatic as well

Yes agreed.

@WardBrian
Copy link
Collaborator

So that the tests can pass also when not run on CI, can we build these models in the test suite itself? We could copy them to a temporary directory first.

I would be against this since it will mean re-building the tests in a lot of places, e.g once per julia version if we ever started testing against multiple. You can already run the tests locally if you run make STAN_THREADS=true test_models first

Is there a good way to have the artifact only pulled in during actual installs so we don’t need to rewrite our CI for this?

@sethaxen
Copy link
Contributor Author

Is there a good way to have the artifact only pulled in during actual installs so we don’t need to rewrite our CI for this?

I designated that the artifact should be lazily downloaded. So e.g. if one sets the BRIDGESTAN environment variable, then the artifact is never even downloaded; otherwise it's downloaded once when one first calls get_bridgestan(). I also reverted the changes to the CI so now it runs all tests with the cloned version of bridgestan. This just won't test that the artifact is correctly downloaded. Perhaps that should be its own job?

@WardBrian
Copy link
Collaborator

Lazy artifacts actually sound ideal for this use case, since someone who does clone the repo surely doesn't need another copy

@@ -6,9 +6,8 @@ end
function get_bridgestan()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the docstring of this function to clarify the behavior with the artifact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! This function isn't currently exported.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to export it (might as well in Python/R as well, since it makes the example programs easier to write generically, as you've done here). Maybe under the name get_bridgestan_path rather than get_bridgestan?

Is there a way to issue some sort of diagnostic (warning, print statement, anything) the when the LazyArtifact is downloaded?
I might be imagining users who don't exist, but I have a concern that people will download the repository and then not realize the actual source is being used from somewhere else, so it would be nice if we could say something like

Downloading the BridgeStan source as an artifact to $download_location. If you wish to use an alternative source, use set_bridgestan_path!()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to export it (might as well in Python/R as well, since it makes the example programs easier to write generically, as you've done here). Maybe under the name get_bridgestan_path rather than get_bridgestan?

I renamed it to get_bridgestan_path, documented it, and exported it. The Python and R interfaces seem to have completely different approaches for accessing the bridgestan source (Python uses a variable, and I can't tell how R does it), so it seems like harmonizing these should be a separate PR.

Is there a way to issue some sort of diagnostic (warning, print statement, anything) the when the LazyArtifact is downloaded?

Not really. The idea behind the artifact system is that the Julia code is completely unaware of and agnostic to the download of the artifact. So we don't know whether the artifact will be downloaded until it is being downloaded, and we don't know its path until it has been downloaded. We could issue an @info message any time get_bridgestan_path() is called without BRIDGESTAN being set, but this will get annoying fast.

The artifact system does log that the artifact is being downloaded itself. Here's what it looks like after it has been downloaded. During the download process, it shows a progress bar (it just downloads once, not certain why it shows the message twice).

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am just now remembering R doesn't even have the compilation utilities yet, so there's certainly nothing to update there. Python can be done separately (we will probably want to find a way to do something similar in that package as well, if we want to ship this on PyPi)

@WardBrian
Copy link
Collaborator

At this point I'm more than happy with the Julia changes, but I don't really know enough about Julia package registration to say about the github action. Is there a definitive resource I can read more about this from?

@sethaxen
Copy link
Contributor Author

Is there a definitive resource I can read more about this from?

Sure, here are the usage docs for this workflow: https://github.com/julia-actions/RegisterAction#basic-usage. They have a section on how to register packages in a subdirectory, which is a bit manual. In an attempt to make it more automatic, I instead specified the subdirectory in the workflow, which I'm not sure will work. If it doesn't, I can revert to the manual approach in a future PR.

@WardBrian
Copy link
Collaborator

Hm, it seems like if we want to do a synchronous release of all of our packages, we probably shouldn't use that tool, since it will update only the Julia version and push a commit to our repo as a result. We can still do something automated, most likely.

There's a bit of a chicken-egg problem here with how we generate the link to the tar file if the release doesn't exist yet, but ideally a release action would:

  • Update version numbers (this will need to happen in ~4 places)
  • Update the link to the Julia artifact (this artifact won't, necessarily, exist yet, but we should be able to predict its url?)
  • Generate the tarball as described above
  • Create a commit and a tag, push
  • Create a github release, add the tarball to it
  • Add @JuliaRegistrator register as a comment on the commit, push python package to PyPi

Definitely not asking you to do this, just planning out loud. Let me play around with some actions which could do this, in the mean time if you remove that action I'd be happy to merge this PR

@sethaxen
Copy link
Contributor Author

I guess a downside to a completely synchronized release pipeline is that sometimes a fix might be needed only in a single interface (e.g. the Python one), and so a release would be made in, say, the Julia code with no changes made to its code. Not a major downside though.

I've removed the registration workflow.

@WardBrian
Copy link
Collaborator

I guess a downside to a completely synchronized release pipeline is that sometimes a fix might be needed only in a single interface (e.g. the Python one), and so a release would be made in, say, the Julia code with no changes made to its code. Not a major downside though.

Yeah, that's certainly true. I think synchronized release numbers across all of them has benefits, too

@sethaxen
Copy link
Contributor Author

Ah, one more thing. It probably makes sense to remove the Manifest.toml file. These should in general not be checked in for packages intended to be dependencies. e.g. I'm not 100% certain, but including it as it currently is may prevent the package from being installed on any version but v1.8.

@sethaxen
Copy link
Contributor Author

I think synchronized release numbers across all of them has benefits, too

This makes sense so long as the languages have compatible versioning requirements. Julia packages are required to follow a modified semver. IIUC Python is more relaxed, so you could probably keep those version numbers synchronized. Not certain about R.

@WardBrian
Copy link
Collaborator

I wasn't sure if checking them in was standard or not. We'd likely want to add it to our gitignore then

@sethaxen
Copy link
Contributor Author

I wasn't sure if checking them in was standard or not.

They're standard for getting maximum reproducibility in projects, and they're necessary when one has unregistered dependencies, but otherwise it's better to specify compat bounds and let Julia's resolver figure everything out.

We'd likely want to add it to our gitignore then

Done!

@WardBrian
Copy link
Collaborator

Thanks!

My proof-of-concept for a release action can be seen on my fork here: https://github.com/WardBrian/bridgestan/actions/runs/4028327473/jobs/6925090207

I think this is basically what we'd like (replacing the TODOs at the end with actual release logic for Python). The only problem I can see is we can't have the julia subfolder of any of our tar.gz files correctly link to "itself", if that makes sense. This isn't really a problem, since if you're using the tarball you don't need the artifact, it's just slightly unsatisfying.

@WardBrian WardBrian merged commit 4fa2b79 into roualdes:main Jan 27, 2023
@sethaxen
Copy link
Contributor Author

My proof-of-concept for a release action can be seen on my fork here: https://github.com/WardBrian/bridgestan/actions/runs/4028327473/jobs/6925090207

Nice!

The only problem I can see is we can't have the julia subfolder of any of our tar.gz files correctly link to "itself", if that makes sense. This isn't really a problem, since if you're using the tarball you don't need the artifact, it's just slightly unsatisfying.

Ah I see, yeah that is annoying. I guess an alternative way to do it would be to upload the tarball somewhere else than the release (e.g. ArtifactsUtils.jl has a utility for uploading to a gist: https://simeonschaub.github.io/ArtifactUtils.jl/stable/#ArtifactUtils.upload_to_gist) so as to not give the sense that the latest commit in the tarball matches the latest commit on the release. This might make it less clear in BridgeStan.jl which version of the source is in the artifact.

Another alternative would be tarring everything but the interfaces.

@WardBrian
Copy link
Collaborator

Tarring everything but the interfaces and then uploading separate things (e.g python wheels, whatever the Julia equivalent is) would be my choice if the setup described above proves to be problematic/confusing for end users

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants