Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial for running containers as a demo or for evaluation #10273

Merged
merged 13 commits into from
Feb 7, 2024

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Jan 25, 2024

What this PR does / why we need it:

People want to try running Dataverse in containers for demo or evaluation purposes. This pull requests is mostly documentation, which can be previewed at https://dataverse-guide--10273.org.readthedocs.build/en/10273/container/index.html

It also includes a stripped-down Docker compose file that's more suited to demos than development.

Please note that it currently still uses the "dev" bootstrapping persona, which means that it's just as insecure as a developer's environment with open admin APIs, etc. I wrote a section on security about this.

I also stubbed out related pages under "running Dataverse in Docker" and reorganized here and there in the guide (the Container Guide).

Which issue(s) this PR closes:

Special notes for your reviewer:

Please see above and the comments I left on specific lines under "files changed".

Suggestions on how to test this:

Try the quickstart under the demo page, at least: https://dataverse-guide--10273.org.readthedocs.build/en/10273/container/running/demo.html#quickstart

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No.

Is there a release notes update needed for this change?:

Yes, included.

Differences from dev version:

- localstack and minio removed
- env vars filled in based on current .env

The goal is to have a single file to download, rather than a compose
file and an .env file.
Also update tags section under "app image" (now live).
@pdurbin pdurbin marked this pull request as ready for review January 26, 2024 21:31
@pdurbin pdurbin removed their assignment Jan 26, 2024
Comment on lines +31 to +33
Development and maintenance of the `image's code <https://github.com/IQSS/dataverse/tree/develop/src/main/docker>`_
happens there (again, by the community). Community-supported image tags are based on the two most important
upstream branches:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from base-image.rst and left the "by the community" stuff in there but I'd say we can remove it (both here and there). These days IQSS is helping to maintain these images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Praise be!

Intro
-----

See :doc:`../dev-usage`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider moving dev-usage.rst to be underneath "running" but I haven't tried yet.


Please be aware that for now, the "dev" persona is used to bootstrap Dataverse, which means that admin APIs are wide open (to allow developers to test them; see :ref:`securing-your-installation` for more on API blocking), the "create user" key is set to a default value, etc. You can inspect the dev person `on GitHub <https://github.com/IQSS/dataverse/blob/master/modules/container-configbaker/scripts/bootstrap/dev/init.sh>`_ (look for ``--insecure``).

We plan to ship a "demo" persona but it is not ready yet. See also :ref:`configbaker-personas`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never created a persona. I hope doesn't take much time to create and document how to use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: link to this page from admin/metadatacustomization.rst

Comment on lines 5 to 6
dev_dataverse:
container_name: "dev_dataverse"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the container names with dev_ prepended. Happy to change this.

DATAVERSE_DB_HOST: postgres
DATAVERSE_DB_PASSWORD: secret
DATAVERSE_DB_USER: dataverse
DATAVERSE_FEATURE_API_BEARER_AUTH: "1"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the bearer auth stuff in just in case someone wants to demo the new frontend as well. Should I note this under the security section?

image: gdcc/dataverse:alpha
restart: on-failure
user: payara
environment:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to only have a single file to download, I copied the values out of .env into the compose file.

Deleting the Data Directory
+++++++++++++++++++++++++++

Data related to the Dataverse containers is placed in a directory called ``docker-dev-volumes`` next to the ``compose.yml`` file. If you are finished with your demo or evaluation or you want to start fresh, simply delete this directory.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we continue to call the directory "docker-dev-volumes"? For a demo something like "dataverse-data" might make more sense.

Comment on lines +122 to +127
In the compose file, try increasing the timeout in the bootstrap container by adding something like this:

.. code-block:: bash

environment:
- TIMEOUT=10m
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're worried about this, should we preemptively put this in the compose file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and document it prominently. I had some failed startups and it took me a while to realize that the bootstrap did not run due the short timeout.

Comment on lines 121 to 122
#volumes:
# - ./docker-dev-volumes/smtp/data:/mail
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure why this is commented out. It's also commented out in the "dev" version. I've never worried about persisting this data.

Introduction
============

Dataverse in containers!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could steal my old bricks image and put it here (or a variant thereof)

Comment on lines -14 to -16
This guide is *not* about installation on technology like Docker Swarm, Kubernetes, Rancher or other
solutions to run containers in production. There is the `Dataverse on K8s project <https://k8s-docs.gdcc.io>`_ for this
purpose, as mentioned in the :doc:`/developers/containers` section of the Developer Guide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should keep a variant of this to make it clearer what is out of scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so maybe clarify the relationship to k8s, rancher, swarm, etc.

Helping with the Containerization Effort
----------------------------------------

In 2023 the Containerization Working Group started meeting regularly. All are welcome to join! We talk in #containers at https://chat.dataverse.org and have a regular video call. For details, please visit https://ct.gdcc.io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In 2023 the Containerization Working Group started meeting regularly. All are welcome to join! We talk in #containers at https://chat.dataverse.org and have a regular video call. For details, please visit https://ct.gdcc.io
In 2023 the Containerization Working Group started meeting regularly. All are welcome to join! We talk in #containers at https://chat.dataverse.org and have a regular video call. For details, please visit https://ct.gdcc.io.

@pdurbin pdurbin self-assigned this Jan 29, 2024
@cmbz cmbz added the Size: 10 A percentage of a sprint. 7 hours. label Jan 30, 2024
Also, explain how to create a persona and some basic config.

This comment has been minimized.

@johannes-darms
Copy link
Contributor

johannes-darms commented Jan 31, 2024

@pdurbin:

  1. Would it be possible to add a section 'how to configure the loglevel' within the Troubleshooting section.
  2. IMHO the "FAKE” DOI provider can be deprecated and replaced with a PermaLink Provider. It's sufficient to setup and use dataverse without the need of faking behavior.
  3. Remove the Docker, Kubernetes, and Containers page or move content into the new Container Guide.
  4. I would alter the ToC to hide some details. Something like:
  • Container Guide
    • Introduction
    • Use cases
      • Production
      • Demo or Evaluation
      • Development
        • Backend Development
        • Frontend Development
        • Metadatablock Development
        • GitHub Action
    • Images
      • Dataverse Application
      • Application Base
      • Config Baker

Besides those remarks/requests its a really good guide! Many thanks!

Using --insecure at first and then doing securing APIs, etc later
(like non --insecure does) seems like the best option for now.

It allows us to simplify the tutorial and set up an unblock key
for later use.
@pdurbin pdurbin removed their assignment Feb 1, 2024
@pdurbin
Copy link
Member Author

pdurbin commented Feb 1, 2024

I had a great long discussion with @poikilotherm @johannes-darms @donsizemore and others about this pull request during the container meeting this morning, which was recorded. Thank you! The change I made afterwards was to use --insecure temporary. See 89739bc.

@poikilotherm I know we talked about adding --container as a flag for setup-all but I think this requires rebuilding the configbaker image (where that script lives). I'm trying to keep this tutorial as simple as possible. Right now it involves downloading a compose.yml and and an init.sh. I'd like to avoid further complexity.

There's other feedback I could or should address such as talking about -d (detached) and trying to move from FAKE to Permalinks and other stuff I'm forgetting but I'm putting this back into ready for review so @landreev can see what he thinks. We're hoping to try this demo tutorial for the MOC proof of concept:

This comment has been minimized.

@landreev landreev self-assigned this Feb 2, 2024
@Saixel
Copy link
Contributor

Saixel commented Feb 6, 2024

Hey! I encountered a small issue while following the Deleting Data and Starting Over section detailed in the guide. After stopping the server, deleting the data directory as instructed, and rerunning docker-compose up, the data folder was indeed recreated. However, the server did not start correctly. The logs from the bootstrap service reported the following:

bootstrap         | 2024-02-05T12:49:33Z INF [HTTP] Checking the http://dataverse:8080/api/info/version ...
bootstrap         | 2024-02-05T12:49:33Z ERR Expectation failed error="the status code doesn't expect" actual=404 expect=200

This output suggests that the bootstrap service was trying to verify the Dataverse API version but received a 404 (Not Found) response instead of the expected 200 (OK).

To resolve this, I stopped the process and restored the original data folder, ensuring only its contents were cleared. This approach rectified the error, allowing the server to launch successfully.

It seems like there might be a step missing in the guide or an issue with the docker-compose configuration that prevents the server from starting fresh when the data directory is entirely removed. Has anyone else experienced similar behavior, or could this be a specific edge case in the setup process?

Copy link

github-actions bot commented Feb 6, 2024

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:10238-container-demo
ghcr.io/gdcc/configbaker:10238-container-demo

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@landreev
Copy link
Contributor

landreev commented Feb 6, 2024

@Saixel I'm not sure what happened, but I'm wondering if what you ran into was simply the Docker instance failing to properly deploy when composed from scratch; not because there are any steps missing in the guide. There's enough going on on the inside that there may be some timing issues - something taking just a little too long to initialize, etc. - for the whole thing to fail once in a while.
Generally, there really is no difference between running compose up for the first time, and doing it again with the data directory deleted.

@landreev
Copy link
Contributor

landreev commented Feb 6, 2024

I'm happy to just merge this, unless anyone objects?

@pdurbin
Copy link
Member Author

pdurbin commented Feb 7, 2024

@landreev it's fine with me if you merge. I threw a release not in there.

Are you ok with how we're running containers in the foreground? I could document -d (detached) if you like.

@landreev
Copy link
Contributor

landreev commented Feb 7, 2024

It's easier to stop when it's running in the foreground, and you get a better idea of what's going on. In other words, I don't think it's strictly necessary, anyone who may need -d for any practical purposes will likely figure it out on their own. But up to you.
Otherwise, I'll just merge it.

@pdurbin
Copy link
Member Author

pdurbin commented Feb 7, 2024

@landreev yeah, I feel the same way. People will figure out -d if they want that. Please feel free to merge away.

Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - thank you, Phil.

@landreev landreev merged commit d944773 into develop Feb 7, 2024
8 checks passed
@landreev landreev deleted the 10238-container-demo branch February 7, 2024 15:15
@pdurbin pdurbin added this to the 6.2 milestone Feb 7, 2024
@landreev
Copy link
Contributor

@Saixel Sorry for dismissing your report - it appears that this may in fact be either a bug, or a behavior that we didn't document properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours.
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

Tutorial for running containers as a demo or for evaluation
6 participants