Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the need for installing cohort extractor #832

Merged
merged 9 commits into from
Jul 20, 2022

Conversation

StevenMaude
Copy link
Contributor

@StevenMaude StevenMaude commented Jun 29, 2022

This may not be an ideal approach, but works.

The opensafely-cohort-extractor Python package dependency is replaced with a submodule dependency. Currently, we only need the docstrings from this code: we don't run any cohort-extractor related code.

This should:

It also removes a considerable number of dependencies, for building the documentation.

If we want to test cohort-extractor snippets in future, then we might need a solution for that. It may be possible to more tightly contain those checks to a snippets directory, and run in a separate virtualenv to that used to build the docs.

See the individual commit messages for explanation of some decision making.

This may not be an ideal approach, but works.

What this doesn't cover:

  • Moving to Python 3.10 as Data Builder uses 3.9, and unsure if there are any problems there. The core problem is the two different versions; it would be easy enough to later to upgrade again.

See the individual commit messages for explanation of some decision making.

Tasks

  • Document the use of submodules here, maybe with a brief rationale/link to this PR
  • Add a means to the Justfile to clone the submodule

@render
Copy link

render bot commented Jun 29, 2022

@render
Copy link

render bot commented Jun 29, 2022

Your Render PR Server at https://opensafely-documentation-pr-832.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-cau7jmjru51tqnfohqkg.

@StevenMaude StevenMaude force-pushed the steve/remove-need-for-installing-cohort-extractor branch from 0e4fc1f to 04dc63e Compare June 29, 2022 16:14
@render
Copy link

render bot commented Jun 29, 2022

Your Render PR Server at https://opensafely-documentation-pr-832.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-cau7jmjru51tqnfohqkg.

python: python3.10

files: "^databuilder/"
python: python3.9
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can we remove this specification entirely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set a python version, it can either be done per rule or with a default. We've been moving to the default pattern everywhere of late as we've not found a need to change version per rule.

@@ -7,7 +7,7 @@ nav:
- Security: security-levels.md
- Data sources:
- data-sources/index.md
- Overview: data-sources/intro.md
- Overview: data-sources/intro.md
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: These trailing whitespaces were present before, but now picked up by pre-commit, so we have to fix them to placate it.

@StevenMaude StevenMaude force-pushed the steve/remove-need-for-installing-cohort-extractor branch from 04dc63e to 3ea3a7f Compare June 29, 2022 16:23
@render
Copy link

render bot commented Jun 29, 2022

Your Render PR Server at https://opensafely-documentation-pr-832.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-cau7jmjru51tqnfohqkg.

@StevenMaude
Copy link
Contributor Author

If we go with this, it would also need cohort-extractor syncing to current version, before merging.

@StevenMaude StevenMaude marked this pull request as ready for review June 29, 2022 16:25
Copy link
Contributor

@ghickman ghickman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I feel about this!

I used submodules for years to manage m vim plugins and came to really dislike them. Having dependabot handle the upgrades seems like it would remove many of the issues though. Could we document how to manually bump the submodule too?

Is there any reason we need to stick to 3.9 in this change? Bumping to 3.10 would mean we can drop all the .python-version files.

python: python3.10

files: "^databuilder/"
python: python3.9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set a python version, it can either be done per rule or with a default. We've been moving to the default pattern everywhere of late as we've not found a need to change version per rule.

@StevenMaude
Copy link
Contributor Author

StevenMaude commented Jun 30, 2022

Using 3.10 is fine. The only reason I didn't was to initially get everything synced up, and that Data Builder itself is only tested with 3.9.

I think that would be OK to move, but anyone shout if not.

Submodules can be sticky, but:

  • I think it's a not entirely invalid use case here. All we want is a copy of the source to process the docstrings, not to actually run the code. The current mkdocstrings docs actually advocate against installation into the virtualenv.
  • The current situation has caused a lot of frustration, and it resolves it for now.

@StevenMaude StevenMaude marked this pull request as draft June 30, 2022 16:56
@StevenMaude StevenMaude marked this pull request as draft June 30, 2022 16:56
@ghickman
Copy link
Contributor

ghickman commented Jul 1, 2022

@StevenMaude – agreed on both points and my hope is that the automation will remove the vast majority of submodule pain (and a little bit of me hopes that they got even a tiny bit better in the ~6y since I used them in anger!).

So I reckon go for it 👍

@bloodearnest
Copy link
Contributor

Seagulling in to suggest a potential completely alternate approach that avoids git submodules (which scare me).

I suggest we make the default databuilder package have no (or minimal) requirements by default, and we add extras in the package tooling to install the rest of stuff as needed e.g. pip install databuilder[full], which would bring in all the actual dependencies.

This is also a horrible hack. But for my money its better than submodules.

@evansd
Copy link
Contributor

evansd commented Jul 14, 2022

@bloodearnest We were already planning to do the non-hacky version of that package split with databuilder anyway. I think this submodules approach is just about dealing with cohortextractor, and in that sense it seems like an improvement on the status quo.

@StevenMaude StevenMaude force-pushed the steve/remove-need-for-installing-cohort-extractor branch from 3ea3a7f to b6e641c Compare July 14, 2022 16:14
@render
Copy link

render bot commented Jul 14, 2022

A deploy for your Render PR Server at https://opensafely-documentation-pr-832.onrender.com just failed.

View details on your dashboard at https://dashboard.render.com/static/srv-cau7jmjru51tqnfohqkg.

@StevenMaude StevenMaude force-pushed the steve/remove-need-for-installing-cohort-extractor branch from b6e641c to d35264b Compare July 14, 2022 16:21
@render
Copy link

render bot commented Jul 14, 2022

Your Render PR Server at https://opensafely-documentation-pr-832.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-cau7jmjru51tqnfohqkg.

@StevenMaude StevenMaude marked this pull request as ready for review July 14, 2022 16:28
@StevenMaude
Copy link
Contributor Author

Since Data Builder uses Python 3.9, and we do currently need to install it as a package, I've opted to stick with Python 3.9 for now, for simplicity, and to reduce the risk of any gotchas in future.

It would be much easier now to consider upgrading to 3.10 though.

@StevenMaude
Copy link
Contributor Author

StevenMaude commented Jul 14, 2022

⚠️ Depending on when this actually gets merged, it could warrant a final rebase on main to ensure we don't downgrade any requirements. This isn't a big risk, but would be slightly annoying to have to upgrade them again or have duplicated Dependabot PRs.

(The dependencies are in sync with main at time of writing.)

Copy link
Contributor

@ghickman ghickman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so glad to see this being tackled, thank you for finding a solution!


the opensafely-databuilder-mkdocs plugin is tested with 3.7. (Which we should possibly go and change.)

Not for this PR, but wasn't sure of a better place to put it! The plugin was tested with 3.7 so it was in lock-step with this repo because it needs to be installed into the virtualenv. So it should be updated to match this one.

Side note: I previously tried to see if it was plausible to move the plugin code into this repo. Plugins use setuptools entrypoints so if we can work out how to specify a local directory in the requirements spec then it should work. We'd save a whole repos worth of admin overhead then.

pyproject.toml Outdated Show resolved Hide resolved
justfile Show resolved Hide resolved
INSTALL.md Outdated Show resolved Hide resolved
INSTALL.md Outdated Show resolved Hide resolved
INSTALL.md Outdated Show resolved Hide resolved
INSTALL.md Outdated Show resolved Hide resolved
@StevenMaude
Copy link
Contributor Author

thank you for finding a solution!

It's not the entire way there. But this, along with Becky's recent changes to how the reference page JSON is generated in the Data Builder repository, should be the start of things getting a bit easier.

@StevenMaude StevenMaude force-pushed the steve/remove-need-for-installing-cohort-extractor branch from d35264b to d4d5c87 Compare July 20, 2022 14:36
@render
Copy link

render bot commented Jul 20, 2022

Your Render PR Server at https://opensafely-documentation-pr-832.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-cau7jmjru51tqnfohqkg.

1 similar comment
@render
Copy link

render bot commented Jul 20, 2022

Your Render PR Server at https://opensafely-documentation-pr-832.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-cau7jmjru51tqnfohqkg.

StevenMaude added a commit that referenced this pull request Jul 20, 2022
We're not building the project as such.

It was [added to try to placate
Dependabot](#832 (comment)).
@StevenMaude StevenMaude force-pushed the steve/remove-need-for-installing-cohort-extractor branch from db69d40 to c39acef Compare July 20, 2022 14:47
@render
Copy link

render bot commented Jul 20, 2022

Your Render PR Server at https://opensafely-documentation-pr-832.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-cau7jmjru51tqnfohqkg.

1 similar comment
@render
Copy link

render bot commented Jul 20, 2022

Your Render PR Server at https://opensafely-documentation-pr-832.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-cau7jmjru51tqnfohqkg.

This avoids installing into the same virtualenv as the documentation and
allows us to:

* more easily unify the Python version we're using for
  building the documentation and generating snippets.
* choose a newer version of Python than cohort-extractor uses

All we actually need are the docstrings right now. If we later want to
test cohort-extractor snippets, we might have to revisit this.

Note that the whitespace changes are because pre-commit is now running
on the entire repository, not just the `databuilder/` subdirectory.
Not entirely sure why the Data Builder directory was 3.10.

Data Builder is currently [tested with
3.9](https://github.com/opensafely-core/databuilder/blob/08b3895f79d155075d2c295f5d64619da14003ad/.github/workflows/main.yml#L30), and the
opensafely-databuilder-mkdocs plugin is [tested with
3.7](https://github.com/opensafely-core/mkdocs-opensafely-databuilder/blob/7a1686a5f887e3b35d8dbe5fffd4bcfa066f0599/.github/workflows/main.yml#L30). (Which we
should possibly go and change.)

In principle, we could move to Python 3.10 too. The reason for not doing
so is that Data Builder is only tested with 3.9.

If we upgrade Data Builder, then we could do the same here.
* Explain submodule usage.
* Tidy up content ordering:
  there was a jumble of cohort-extractor and Data Builder content.
We're not building the project as such.

It was [added to try to placate
Dependabot](#832 (comment)).
Using `just` instead of specifying commands.
@StevenMaude StevenMaude force-pushed the steve/remove-need-for-installing-cohort-extractor branch from 113b8a1 to 2b80798 Compare July 20, 2022 15:54
@render
Copy link

render bot commented Jul 20, 2022

Your Render PR Server at https://opensafely-documentation-pr-832.onrender.com is now live!

View it on your dashboard at https://dashboard.render.com/static/srv-cau7jmjru51tqnfohqkg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants