-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the 'rev-' directory name instead of creating a symlink #314
Comments
The symlink is the only way to do atomic updates, so that a reader will never see the repo not-existing. Do I understand right that you are running git-sync to pull a tag or branch one time and then copy the results into a build? Would it be easier to just call |
Hi @thockin, Thank for your answer, I really appreciate it Yes, you are right, I downloading the GIT repo in the first step, and in the next, I want to compile the source. - name: git-secret
secret:
secretName: falcon-idrsa
defaultMode: 256 #0400 I consider creating another initContainer which just will copy id_rsa file and fix the rights, but that will add another container to POD definitions and make it more complicated. I am not mistaken |
Also, BIG advantages of I resolved the problem with a symlink, just added another initContainer step which will change the directory name. It's is important that if something failed is should return |
I don't know why git-sync would be faster - it literally runs `git
clone` - if you run it with `-v 5` it will log all the commands it
executed and you can see what it did.
If you just want to compile the code, cloning with `--depth 1` only
pulls the current state, no history. I still think this would be
easier for you than using git-sync.
One thing you could try would be `git -C $root/$dest worktree add
/some/other/path/` which pins the current hash at that path.
I have to warn you though, that the only thing you can count on is
that $root/$dest points to the most-recent complete checkout.
Anything else inside $root is reserved and subject to change.
Can you help me understand the case that exits 0 in failure? It
should not. Can you paste logs?
…On Fri, Dec 4, 2020 at 2:07 AM rat ***@***.***> wrote:
Also, BIG advantages of git-sync are speed, where can download my repo in 15s where git clone can do that in1m30s a huge difference, even if I use git clone -b master --single-branch
I resolved the problem with a symlink, just added another initContainer step which will change the directory name.
But unfortunately, I discovered another issue to git-sync is when it fails (cannot download repo, etc) it always returns exit 0 -> EXIT_SUCCESS
It's is important that if something failed is should return exit 1 -> EXIT_FAILURE to inform the parent process about the problem
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
The way I've gotten around this issue is having a post hook script to copy the script to another directory based on the git hash. cp -R . ../$(git rev-parse HEAD)
rm -rf ../finished* Then in my consumer of the webhook, I use the webhook header You could also have the consumer take the webhook and do any sort of action you want with it. I'm currently toying with the webhook consumer starting a kubernetes job that has an entrypoint for the container. # shallow clone of a single hash
git init /tmp/simple-run
git remote add origin "${GIT_REPO_URL}"
git fetch origin "{GIT_SHA}" --depth 1
git reset --hard FETCH_HEAD Either of these scenarios would give you an actual directory and not a symlink for you to operate on. |
Thank you @thockin for your suggestion and thank you @wimo7083 for an interesting solution I will also look into that. about exit 0, I noticed that it happens when it cannot read SSH Key logs kubectl logs -n test myapp
INFO: detected pid 1, running init handler
ERROR: can't configure SSH: can't access SSH key: stat /etc/git-secret/id_rsaa: no such file or directory status of pod - containerID: docker://f444e9e458cc14094c2094c84d632350105b4aa88f96e2296a15ac6c9842458e
image: k8s.gcr.io/git-sync/git-sync:v3.2.0
imageID: docker-pullable://k8s.gcr.io/git-sync/git-sync@sha256:873fc1bcd6048247036969dcb75f0b1f9c915167b86cb908f1fe3de0e060c562
lastState: {}
name: git-xcaf
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: docker://f444e9e458cc14094c2094c84d632350105b4aa88f96e2296a15ac6c9842458e
exitCode: 0 <- ### EXIT_SUCCESS ###
finishedAt: "2020-12-10T07:48:14Z"
reason: Completed
startedAt: "2020-12-10T07:48:14Z" myapp 0/1 Pending 0 0s
myapp 0/1 Pending 0 0s
myapp 0/1 ContainerCreating 0 0s
myapp 0/1 Completed 0 2s pod definition ...
- name: git-sync
image: k8s.gcr.io/git-sync/git-sync:v3.2.0
env:
- name: GIT_SYNC_SSH
value: "true"
- name: GIT_SYNC_REPO
value: ssh://git@bitbucket.repo.io:7999/repo/repo.git
- name: GIT_SYNC_BRANCH
value: master
- name: GIT_SSH_KEY_FILE
value: /etc/git-secret/id_rsaa
- name: GIT_KNOWN_HOSTS
value: "false"
- name: GIT_SYNC_ONE_TIME
value: "true"
- name: GIT_SYNC_ROOT
value: /git
- name: GIT_SYNC_DEST
value: repo
... br |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/lifecycle frozen |
Just providing future audience a valid example, in context of docker compose, of @apex-omontgomery's method. a service of docker compose visiting a private git repo, pulling repo data to a temp dir
the exechook script
|
For the specific case of |
Would you please provide more reproducible example? I get confused after see that the way you handled the repository and mount DAGs in your container. |
The script works with my docker compose file section tightly. Yes it is mounting a folder of What make you confused? |
the current git-sync downloads content to The only way to get the git content using git-sync is to copy git content out of the above folder and move it to an folder mounted by docker compose. In my case, I directly mount the |
I don't use compose - are you saying there's no way to share a volume between 2 containers? |
it is possible to share volume. But the volume has to include actual files. Not a symlink created by one service. Out of that service, the symlink does not work for others. |
@eugeneYWang Here is version: '3.8'
services:
airflow-worker:
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
environment:
<<: *airflow-common-env
# Required to handle warm shutdown of the celery workers properly
# See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
DUMB_INIT_SETSID: "0"
restart: always
volumes:
- ./git-sync-dags/project/dags:/opt/airflow/dags:ro
- ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
- ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
git-sync:
image: k8s.gcr.io/git-sync:v3.1.5
volumes:
- ./git-sync-dags:/tmp/git
- ~/.ssh/id_ed25519:/etc/git-secret/ssh
environment:
- GIT_SYNC_REPO=${GIT_SYNC_DAG_REPO}
- GIT_SYNC_BRANCH=main
- GIT_SYNC_REV=HEAD
- GIT_SYNC_DEST=project
- GIT_SYNC_WAIT=300
- GIT_SYNC_ONE_TIME=false
- GIT_SYNC_TIMEOUT=120
- GIT_SYNC_SSH=true
- GIT_SYNC_SSH_KEY_FILE=/etc/git-secret/ssh
- GIT_KNOWN_HOSTS=false How can I resolve the issue? P.S. I am using this file in order to run Airflow. |
@mostafaghadimi I made the same mistake before: trying to replace the default git folder with a folder mounted by docker compose. In your case, you have this line: |
also, your git-sync is way older than I used. If you decide to follow my solution, use the same version as I used at least. |
@eugeneYWang would you please help me to change the docker-compose file the way it works? |
@mostafaghadimi You already have my solution for reference and my explanation to save your time. Now it is your turn to do your homework. :) |
@eugeneYWang BTW, I think there is a problem with your file, since some commands are commented and I asked the question for that, it's a little bit confusing. As an example |
I don't understand what that means - the symlink points to a relative path, so it should work.
|
@thockin You are right. In this case where the symlink and because, If multiple |
Yeah, it is a meaningless comment to audience. But it is commented anyway so I think people should be aware that it is useless. |
@eugeneYWang How the following scenario is possible?
and
I mean that you have set the |
`/
If I remember correctly, |
@mostafaghadimi so good luck. try and trail. At least I saved your time of reading the source code. |
There is several implicit restrictions forbidding me to use this approach as well, in the context of Airflow setup. After all, you don't have to consider our cases as a requirement to change git-sync. After all, Airflow with docker compose is not a suggested solution for production deployment. Eventually we will use the Airflow Helm to work with K8s. |
Telling me there are implicit restrictions without telling me what they are doesn't help me fix them :) You said above you CAN understand the symlink, right?
The one to use is the one that |
git-sync is not designed for the context of compose and airflow. It is not worth to split all the details that I tested which probably will not be used by you. If you are curious what have I gone through, some of details can be found on #694 which was referenced above.
|
#694 is mostly about setting the wrong env var. Buirined in there you said:
That still doesn't actually tell me anything. Is it a bug in Airflow that can't handle symlinks properly? Is git-sync actually doing something wrong in some edge case? Is docker compose broken? Is it just a permissions problem? I see a lot of people using git-sync with Airflow, so I'd be glad tomake it work better, but I don't know what the problem is. The symlink that git-sync publishes has a relative target. Any tool that understands POSIX filesystems should be able to treat it like a directory unless they go out of their way to do something special for a symlink. |
Airflow expected the DAGs folder to be all actual files. The way I created symlink and rev folder in dags folder, while symlink would work, leads to a non-stop scanning of DAG files, as I describe in #694 I tried to mount dags folders as the sync dest folder. But docker compose will create a mounted volume there, leading symlink cannot be created. the mounted folder cannot be created inside sync root folder as well, as git-sync should have the ownership to create and drop things. Eventually, I let git-sync do the pulling in its own space and exec hook to always copy data from the latest branch over to my mounted volume. This is the only approach to work with docker compose, airflow, and git-sync. |
So the DAG "folder" is the symlink - a POSIX-compatible tool should resolve that automatically (literally it should not need to do anything) to the current hash. Does that not work?
That sounds like a nasty bug in Airflow - shouldn't we try to fix it there? Did you file a bug against airflow? I found some issues that relate to this but they were closed with PRs (so presumably fixed) a year ago.
Yeah, you don't want to do that. Is there a TRIVIAL repro? I don't know airflow. I don't know how to reproduce the error that you are seeing. Preferably, something I can run locally in a docker container without a full cluster - something easy to debug and see if I can reproduce the problem. Can you help me with that? |
Airflow's official Helm has set up git-sync as a sidecar program and they suggested the official helm as production deployment. They used docker compose just as a learning platform and did not suggest it as a production deployment. Their compose file did not include git-sync neither.
I don't think they will put effort in fixing issues that occurs when Airflow is used in a way that they don't recommend (airflow + compose + git-sync). I have seen them pushing people back when a user want to have features, which are included in their helm deployment, in docker compose deployment. I understand how they think. Docker compose is not designed to have comparable scaling capabilities as K8s. It will not be the final solution of Airflow deployment. If I have dedicated time to study K8S, I will probably not spend time on trying to integrate git-sync into compose and airflow as well.
As using compose with airflow and git-sync is a hack solution, it is probably not worth to put the effort in this. |
I tried running airflow manually. I created a git repo with a trivial DAG. I git-synced it to /tmp/gs-vol with "the_link" being the link. I set the airflow.cfg From what I can see, it works. I am happy to dig in further if you can show me what doesn't work, but I don't know it well enough to repro, obviously. |
@thockin The problem will start after updating any parameters in the DAGs repository. In that case the |
@eugeneYWang I tried the example you placed here, but I got the following error:
Would you please help me resolving it? git-sync:
image: k8s.gcr.io/git-sync/git-sync:v3.6.4
user: "${AIRFLOW_UID:-50000}:0"
volumes:
- ./git-sync-dags:/tmp/git
- .git_sync_hook.sh:/home/git_sync_hook.sh
- ~/.ssh/id_ed25519:/etc/git-secret/ssh
environment:
- GIT_SYNC_REPO=git@github.com:mostafaghadimi/airflow_git_sync.git
- GIT_SYNC_BRANCH=main
- GIT_SYNC_REV=HEAD
- GIT_SYNC_ROOT=/tmp/gitdata
# - GIT_SYNC_DEST=project
- GIT_SYNC_WAIT=300
# - GIT_SYNC_ONE_TIME=false
- GIT_SYNC_TIMEOUT=120
- GIT_SYNC_SSH=true
- GIT_SYNC_SSH_KEY_FILE=/etc/git-secret/ssh
- GIT_SYNC_ONE_TIME=true
- GIT_SYNC_DEPTH=0
- GIT_KNOWN_HOSTS=false
- GIT_SYNC_ADD_USER=true
- GIT_SYNC_EXECHOOK_COMMAND=/home/git_sync_hook.sh |
I only know how to run |
that is permission issue. If your host system is MacOS or Linux, just give the script file permission to be executed. chmod +x filepath Forgive me for AFK |
@eugeneYWang , The webhook command works properly, but whenever I mount
The same error I got even before adding the webhook script. I've searched over internet and the problem as mentioned in this link, says that the directory is deleted! P.S. Everything works properly on |
Don't mount the symlink as a volume. That is hostile to updates, as you found out. Volume-mount the whole git-sync root volume, but set airflow's mount There's still a problem if airflow uses the git-sync dir as its working directory and doesn't refresh. The directory gets removed, so airflow may need to do the equivalent of If someone can help me trigger the problem (just show me which airflow commands to run!!) I can see about suggesting a fix for airflow. |
Hi,
I would like to ask you if there is any chance to change the directory name instead of creating a symlink.
I am trying to use the git-sync tool together with kaniko (a tool to build the docker files). Unfortunately, docker command
COPY
sees this symlink as a file, and instead of copying all files from inside of a directory, it copies only symlink as a single file. check my issue 1513Currently with flags
GIT_SYNC_ROOT=/github
andGIT_SYNC_DEST=repo
is created symlinkrepo
inside directory/github
I am not able to use the current name of the directory
rev-249971a4b53a2a32ca573cd512a53d6a8cf15e78
because it will change on each commit, so it's impossible to automate the build process for docker image.I am open to any suggestions.
br
The text was updated successfully, but these errors were encountered: