Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Molecule seems an issue. #105

Closed
robertdebock opened this issue Jan 24, 2022 · 42 comments
Closed

Molecule seems an issue. #105

robertdebock opened this issue Jan 24, 2022 · 42 comments

Comments

@robertdebock
Copy link

I noticed you are also experiencing issues with Molecule and the required collections in prepare.yml and verify.yml.

I have not found a solution for this issue... I think it's a Molecule issue really, but a workaround may help.

Let me know if you want to work together to find a solution. (mail: robert@meinit.nl)

Regards,

Robert.

@stefangweichinger
Copy link
Owner

Thanks for this cross-referencing ;-)
Yes, I tried to use your latest github action for my pipeline and it failed.

I fiddle with it in this PR: #104

Forked your repo and patched the Dockerfile: robertdebock/docker-github-action-molecule#11

This fixed a local test against this role.

As soon as your latest docker image is available to use with github actions I will rerun my pipeline there.

The upstream issue seems to be this: ansible/molecule#2755

I don't know if there is more cleanup needed maybe. Maybe you could get rid of something according to the latest upstream development?

@stefangweichinger
Copy link
Owner

@robertdebock Does 3.0.2 contain my patch already? My pipeline still fails with that release: https://github.com/stefangweichinger/ansible-rclone/actions/runs/1740650166

@robertdebock
Copy link
Author

It does, but I am also getting errors. Weird, sometimes it works, sometimes failures. I'm guessing cache plays up somewhere.

Maybe ":latest" is not a good plan... Give me a bit more time to understand what makes the action fail or succeed so randomly.

@stefangweichinger
Copy link
Owner

@robertdebock I don't have ":latest" as far as I see ;-) no hurry, it's still in a test branch here.

@stefangweichinger
Copy link
Owner

Checked https://github.com/stefangweichinger/ansible-rclone/blob/upgrade-mol-action/.github/workflows/molecule_test.yml ... it pulls "3.0.2" and not "latest". If I re-run the pipeline jobs, the log says it pulls "latest". Maybe because it's linked at dockerhub or so? Same tag currently?

@robertdebock
Copy link
Author

I guess "latest" is cached, making the results unpredictable. I'll re-introduce versioned images in the action again. Give me a moment.

@robertdebock
Copy link
Author

Trying GitHub action version 4.0.3 that uses Docker image 4.0.3 on a GitLab role.

I've pushed many updates this morning, I guess CI will take hours to finish. (Likely; speak to you in the morning.)

@robertdebock
Copy link
Author

Still struggling, have not found a solution. Tried so far:

  • Use "ansible-5.2.0" instead of "ansible-core".
  • Do and don't mention "community.general".
  • Do and don't mention "community.docker"
  • Now trying: do mention "community.general" and "community.docker" with a specific version.

What a mess. I'm not understanding why I'm running into this issue; other repositories seems to be "fine".

@stefangweichinger
Copy link
Owner

It's a bit confusing, your docker images have other release numbers than the github action, right?

https://github.com/robertdebock/docker-github-action-molecule/releases shows 4.0.5

the github action hops from 3.0.2 (in my case) to 4.0.5 now also?

The "destroy" action fails again with 4.0.5 in my pipeline, although the message is different now:

https://github.com/stefangweichinger/ansible-rclone/runs/4947770349?check_suite_focus=true#step:4:36

I will re-test locally in the next hours.

Maybe you can get rid of community.docker when considering ansible/molecule#2755 (comment), right? I have to check again what the differences are, ansible vs. ansible-base etc

@robertdebock
Copy link
Author

I know; I'm using TOX. Let me focus on that.

@stefangweichinger
Copy link
Owner

Ah, I see. Didn't use tox so far, might consider using it as well for my tiny role.

@robertdebock
Copy link
Author

Yesterday I've released version 4.0.5 for both the Docker container and the GitHub action. So; yes; since yesterday they are the same version.

@stefangweichinger
Copy link
Owner

Yesterday I've released version 4.0.5 for both the Docker container and the GitHub action. So; yes; since yesterday they are the same version.

great, that helps

@stefangweichinger
Copy link
Owner

My local test with your latest (= 4.0.5) docker container works. Maybe it's really some caching issue with GitHub?

@stefangweichinger
Copy link
Owner

We could also try to show the issue to the folks in the mentioned issue in the molecule-project?

@stefangweichinger
Copy link
Owner

looking at robertdebock/docker-github-action-molecule@f9ef157

I don't see how you cover the installation of the 2 galaxy-collections now (community.docker and community.general). Could you explain?

@stefangweichinger
Copy link
Owner

just as a note and test: checked for existance of the 2 collections in 4.0.5:

community.docker              2.1.1  
community.general             4.3.0

@robertdebock
Copy link
Author

Want to chat a bit? -> https://meet.jit.si/AnsibleChatting

@stefangweichinger
Copy link
Owner

Want to chat a bit? -> https://meet.jit.si/AnsibleChatting

in ~10 minutes, ok?

@robertdebock
Copy link
Author

looking at robertdebock/docker-github-action-molecule@f9ef157

I don't see how you cover the installation of the 2 galaxy-collections now (community.docker and community.general). Could you explain?

I now understand collections are included in Ansible 4 and Ansible 5. For what I understand the pip package ansible includes ansible-core:

$ pip show ansible
Name: ansible
Version: 5.2.0
Summary: Radically simple IT automation
Home-page: https://ansible.com/
Author: Ansible, Inc.
Author-email: info@ansible.com
License: GPLv3+
Location: /opt/homebrew/lib/python3.9/site-packages
Requires: ansible-core
Required-by: 

(See the Requires: ansible-core?)

In my case the Action was failing, because tox.ini referred to ansible-core and ansible-base, not ansible. Too works using this:

[tox]
minversion = 3.21.4
envlist = py{310}-ansible-{4,5}

skipsdist = true

[testenv]
deps =
    4: ansible == 4.*
    5: ansible == 5.*
    molecule[docker]
    docker == 5.*
    ansible-lint == 5.*
commands = molecule test
setenv =
    TOX_ENVNAME={envname}
    PY_COLORS=1
    ANSIBLE_FORCE_COLOR=1
    ANSIBLE_ROLES_PATH=../

passenv = namespace image tag DOCKER_HOST

@robertdebock
Copy link
Author

I was able to reproduce the issue:

  TASK [Destroy molecule instance(s)] ********************************************
  changed: [localhost] => (item=haproxy-debian-latest)
  TASK [Wait for instance(s) deletion to complete] *******************************
  failed: [localhost] (item=haproxy-debian-latest) => {"ansible_job_id": "757465577215.66", "ansible_loop_var": "item", "attempts": 1, "changed": false, "finished": 1, "item": {"ansible_job_id": "757465577215.66", "ansible_loop_var": "item", "changed": true, "failed": 0, "finished": 0, "item": {"command": "/sbin/init", "image": "robertdebock/debian:latest", "name": "haproxy-debian-latest", "pre_build_image": true, "privileged": true, "volumes": ["/sys/fs/cgroup:/sys/fs/cgroup:ro"]}, "results_file": "/github/home/.ansible_async/757465577215.66", "started": 1}, "msg": "could not find job", "results_file": "/root/.ansible_async/757465577215.66", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Now let's focus on fixing.

@stefangweichinger
Copy link
Owner

observation: I disabled the first destroy and re-ran the jobs: https://github.com/stefangweichinger/ansible-rclone/runs/4949049869?check_suite_focus=true#step:4:61

fails at create now. An idea might be that the job belongs to another user than the one checking it. Found something like that in some issues. Although I don't see any become: in my case.

@robertdebock
Copy link
Author

I also saw the become: yes remarks; if a job is started with become: yes, the async_status task should also run with become: yes.

But; destroy.yml does not use become.

Continuing the search.

@robertdebock
Copy link
Author

Interesting; using Tox, this works. So, I'll compare the tox package list and the pip packages when not using tox.

Without tox

ansible==5.2.0
ansible-lint==5.3.2
docker==5.0.3
molecule==3.5.2
molecule-docker==1.1.0
tox==3.24.5
testinfra==6.0.0
yamllint==1.26.3

With tox

    4: ansible == 4.*
    5: ansible == 5.*
    molecule[docker]
    docker == 5.*
    ansible-lint == 5.*

Let me change the requirements.txt to be closer to what tox uses.

@stefangweichinger
Copy link
Owner

sounds worth a try. I am a bit busy today and can't do much right now.

@stefangweichinger
Copy link
Owner

Can you reproduce it locally as well?

@robertdebock
Copy link
Author

robertdebock commented Jan 26, 2022

Have not tried that, what command do you use for that?

Nevermind:

docker run --privileged \
  --volume $(pwd):/github/workspace/robertdebock/$(basename $(pwd)):z \
  --volume /var/run/docker.sock:/var/run/docker.sock:z \
  --tty --interactive --env \
  command="lint" \
  --env GITHUB_REPOSITORY="robertdebock/$(basename  $(pwd))" \
  --env ANSIBLE_ROLES_PATH="../" \
  ${docker_hash}

@stefangweichinger
Copy link
Owner

Have not tried that, what command do you use for that?

As in your TESTING.md:

docker_hash=$(docker build . -q)

cd /home/sgw/projects/github/ansible-rclone

docker run --privileged   --volume $(pwd):/github/workspace/robertdebock/$(basename $(pwd)):z   --volume /var/run/docker.sock:/var/run/docker.sock:z   --tty --interactive   --env GITHUB_REPOSITORY="robertdebock/$(basename $(pwd))"   --env ANSIBLE_ROLES_PATH="../"   ${docker_hash}

I wonder if it has to do with some capsuling at GitHub or so. But maybe I overthink here.

For reference:

I use Fedora 35 on my machine, docker version 20.10.12

@stefangweichinger
Copy link
Owner

That run is using "--privileged" ... maybe that makes a difference?
I am afk now for around an hour or so. thanks so far.

@robertdebock
Copy link
Author

If I'm correct the Ansible job write a file to store the job number or so. Maybe the writing of that file fails or is undone or so.

I'm trying to reproduce locally.

@robertdebock
Copy link
Author

Locally it works.

So, it must be related to GitHub. I'm going to guess that file that's used by Ansible is not writable or so.

@robertdebock
Copy link
Author

You can change the directory where Ansible saved the async status:

ANSIBLE_ASYNC_DIR="/tmp/some_dir"

Now figuring out how to integrate that into the action or Molecule.

@robertdebock
Copy link
Author

Yes, that seems to work

I've added this to molecule/default/molecule.yml:

provisioner:
  name: ansible
  env:
    ANSIBLE_ASYNC_DIR: "/tmp/ansible_async_dir"

I'm not sure if the GitHub action should set that variable by default... Let me know what you think.

@stefangweichinger
Copy link
Owner

Yes, that seems to work

I've added this to molecule/default/molecule.yml:

provisioner:
  name: ansible
  env:
    ANSIBLE_ASYNC_DIR: "/tmp/ansible_async_dir"

I'm not sure if the GitHub action should set that variable by default... Let me know what you think.

sounds great. Will test later in the afternoon when I am on the train.

@robertdebock
Copy link
Author

I think this variable should be set in the action; it's only failing in the GitHub action. I'll release a new version (4.0.6) of the action a bit later.

That would mean you can remove the last change in molecule/default/molecule.yml.

@robertdebock
Copy link
Author

Released, so without changes this should start to work for you, when using version 4.0.6.

@stefangweichinger
Copy link
Owner

looks good! -> https://github.com/stefangweichinger/ansible-rclone/actions/runs/1750585961

on my way now, reading and understanding later! thanks a lot so far

@stefangweichinger
Copy link
Owner

@robertdebock So where did the change come from? With older releases of your container it worked. It's not very important to me but would be interesting, right?

@stefangweichinger
Copy link
Owner

enabled the first destroy action also, works as well:

https://github.com/stefangweichinger/ansible-rclone/actions/runs/1750585961

The logs says it pulls 4.0.5, while I define 4.0.6 in my workflow.

@stefangweichinger
Copy link
Owner

So we had 2 issues here:

  • missing community.general : at first added by a "RUN" statement, then included in new ansible
  • async-tmpdir-behavior

Right?

I will rerun my pipeline and check why the version is reported wrong (or if that is gone now).
Sooner or later I might consider merging that setup into main/master here.

I consider using tox as well, just curious, maybe overkill. Definitely a new issue then ;-)

@robertdebock
Copy link
Author

The two issues:

  • ansible-core/ansible-base do not have collections.
  • async_dir is weird in GitHub runners.

Nice about tox; test multiple version of Ansible. (or python, but my strategy is to prove that Ansible is working, not that python is working...)

Thanks a lot for you input, it helped me stay motivated and pickup new leads!

Ready to close?

@stefangweichinger
Copy link
Owner

Nearly ;-)

Reran the pipelines and it pulls 4.0.5 again while I define 4.0.6? ideas? tagging issue maybe?

But the pipeline works now. I will keep it in an extra branch for now, until I get home and feel sure about my stuff ;-)

Sure, I can close this one asap.
I might do tox-tests sooner or later.

You're welcome, it was nice to chat to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants