-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load DAGs per git-sync #177
Comments
My understanding is that #150 needs to be implemented together with this, to allow for multiple different DAG loading mechanisms, is that right? @adwk67 Or is it possible to implement the git-sync mechanism without breaking changes? Another question about my understanding: The operator would get some git links, and clone the repos when starting up, presumably into an emptyDir. Is that right? No provisioning of volumes by the user required. |
|
Are you two in agreement now? If so I'm happy to move it on |
Just throwing in the idea of including all the stuff from |
I like this idea and will aim to incorporate it. |
I agree |
Does that mean I can finally remove the node labels? (see unticked box in "acceptance criteria")? |
Just a side note: I we still rely on any nodeSelectors in any test for any operator we should probably use affinity instead - which is now supported :) |
Yes |
I just browsed through the docs and see that the "PersistentVolumeClaim" has gone missing. Has that feature been removed? |
Yes, it was problematic and was one of the reasons we wanted to have git-sync instead. |
In that case though that removal needs to be documented in the changelog |
The functionality is still there - we only removed the example/documentation. I've updated the changelog to reflect this in a separate PR. |
Okay, I'm not sure I do understand why we remove the docs entirely then? |
Using PVCs for this kind of customer-facing thing is problematic as it depends on what type of PVC-access (read-write-many etc.) is available on a given cloud and if RWX is not available (as is the case with Ionos, for example) then node selection needs to be used to make sure this approach works. I don't think we should give the impression that we recommend or support it. And we didn't add any functionality for this: we just documented how it could be done. |
Issue #150 introduces one possibility for general management of external resources: for DAGs, the recommendation from Airflow is to use git-sync (see e.g. here for more info). This issue covers implementing git-sync in a container to regularly keep DAGs updated.
An example of this is given here. The airflow CRD can be changed to include an optional section shown below this (in airflow the roles are expected to have the same config, so the git-sync definition can be at top-level. This might need to be revisited if git-sync is a feature useful for other operators, such as nifi workflows etc.).
There are quite a number of parameters that can be set (see here and the sections that follow the link). Analagous to the SparkHistoryServerSpec the proposal is to define mandatory fields and expose the rest via a map;
A git-sync container can only sync a single repo, so multiple repos would require a container each.
A git-sync block may occur at top-level (as would be the case for e.g. airflow), or at role level, if the product only requires external git resources for a specific component (i.e. the init-container will only be created for that role).
Acceptance critiera
See stackabletech/docker-images#337
The text was updated successfully, but these errors were encountered: