inspec fails with kitchen verify -c N #119

pudge · 2016-11-16T19:44:10Z

Executing multiple simultaneous kitchen verify runs using concurrency (-c 5, for example) fails, apparently using the same configuration for all nodes (note target in the attached log has the same port for each run, when the boxes are all running on different ports).

When running serially instead of concurrently, it works fine.

inspec-concurrent-kitchen.log.txt

The text was updated successfully, but these errors were encountered:

otakup0pe · 2016-12-01T06:26:21Z

Confirming that we are seeing this as well....

baurmatt · 2016-12-01T10:44:28Z

We're also seeing this problem.

stefanandres · 2017-01-23T08:33:54Z

Can I do anything to help resolving this issue? I'm don't really know the code, but this problem occurs in each of our CI runs. :/

quulah · 2017-08-02T06:55:15Z

Is this inspec/inspec#1598 issue related?

ghost · 2017-09-10T03:25:49Z

This seems to be a problem for almost a year now

nhudacin · 2017-09-11T14:52:37Z

That's the workaround, running kitchen verify serially?

pudge · 2017-09-11T15:24:53Z

That's the workaround, running kitchen verify serially?

It worked for me, @nhudacin.

ghost · 2017-09-13T18:38:40Z

@adamleff do you know of anyone that can help?

adamleff · 2017-09-13T18:44:02Z

@pantocrator27 Unfortunately there is no one that is actively working on fixing this. As an open source project, we absolutely welcome members of the community helping us by contributing fixes and providing reproduction steps for those that are willing to help.

I just did a quick test with the latest test-kitchen and kitchen-inspec and could not reproduce this, so for someone to engage on this, whether or not they work for Chef, we'll need some more concrete steps we can use to reproduce this issue.

Thank you.

ghost · 2017-09-13T19:00:41Z

Thanks @adamleff, I will work on providing replication steps ... however for the time being, what I am personally finding is that if you set a concurrency higher than the number of (platforms x test suites), this is when I am getting the error

dragon788 · 2017-11-17T15:17:40Z

Seeing this as well, the weird thing is ONLY the verify portion fails if concurrent greater than -c 2, everything else is fine, and it appears to have something to do with the thor/busser/busser-serverspec transfers to the instance (tar errors, checksum errors, file not found errors have all cropped up in the logs), and oddly -c 2 works, but -c 3 or higher fails consistently if the instances involved in the -c N have the same base ami/image.

Below is what we run on our CI system, yes we could just run kitchen verify or kitchen test and it would do all the previous steps, but we want to fail fast and know explicitly if there is a timeout or permissions issue and where it happens. Also we've found that kitchen test is flaky at times, but kitchen converge && kitchen verify works 98% of the time (if not using -c3).

Our CI build script:

export LC_CTYPE=en_US.UTF-8
chef exec cookstyle --parallel --color
chef exec foodcritic -f correctness .
chef exec rspec --color --tty
chef exec kitchen create --color -c4
chef exec kitchen converge --color -c4
chef exec kitchen setup --color -c4
chef exec kitchen verify --color -c2
chef exec kitchen destroy --color -c4

Excerpt of .kitchen.yml:

driver:
  name: ec2
  aws_ssh_key_id: secret-key
  security_group_ids: ["sg-groups"]
  region: us-west-2
  subnet_id: subnet-our-ids
  iam_profile_name: iam-role-chef-test-kitchen
  instance_type: t2.medium
  interface: private
  tags:
    Name: test-kitchen-patching-wrapper

provisioner:
  name: chef_zero
  require_chef_omnibus: 13.6.4

transport:
  username: ubuntu
  ssh_key: ~/.ssh/our-chef-key.pem
  connection_timeout: 10
  connection_retries: 5

platforms:
  - name: ubuntu-14.04
    driver:
      image_id: lightly-modified-1404-ami
      user_data: test/user_data.sh
      block_device_mappings:
        - device_name: /dev/sda1
          ebs:
            volume_type: gp2
            volume_size: 50
            delete_on_termination: true
        - device_name: /dev/sdb
          ebs:
            volume_type: gp2
            volume_size: 100
            delete_on_termination: true
    transport:
      name: sftp
  - name: windows-2012r2
    transport:
      username: administrator
      connection_retry_sleep: 15
      connection_retries: 60

suites:
  - name: tuesday_patch
    run_list:
      - recipe[patching_wrapper::default]
    includes: ["ubuntu-14.04"]
  - name: thursday_patch
    run_list:
      - recipe[patching_wrapper::default]
    attributes:
        jenkins_role: 'master'
    includes: ["ubuntu-14.04"]
  - name: thursday_patch
    run_list:
      - recipe[patching_wrapper::default]
    attributes:
        jenkins_role: 'master'
    includes: ["ubuntu-14.04"]
  - name: windows_chef_upgrade
    run_list:
      - recipe[patching_wrapper::default]
    includes: ["windows-2012r2"]

dragon788 · 2017-11-17T17:45:09Z

So the crazy thing is we can run kitchen verify -c4 against AWS from our local workstations pretty consistently without any errors but it always fails on our CI system (that actually lives in AWS) above -c2.

It also could be that our Windows is dodging a bullet by virtue of the WinRM transport being slow for file transfers but oddly it always seems to be thursday_patch that loses the race(condition) at -c3 or above.

See this gist for an example of failed run output, https://gist.github.com/dragon788/03c77c7aac1c27efb826387e87b892ef

jurajseffer · 2017-12-01T15:18:02Z

When using --parallel or -c I usually get:

>>>>>>     Failed to complete #verify action: [no implicit conversion of nil into String] on default-consul3
>>>>>>     Failed to complete #verify action: [no implicit conversion of nil into String] on default-consul1

but I also got

>>>>>>     Failed to complete #verify action: [Client error, can't connect to 'ssh' backend: Train::Transports::SSH does not implement #connect()] on default-consul2

It's not consistent, sometimes it works, most of the times no parallel runs do. This seems quite broken.

ricoli · 2018-02-22T17:42:23Z

Any progress on this? Also seeing it when using CentOS 6 on AWS...

ghost · 2018-03-08T20:46:12Z

In monitoring this, I noticed that when running kitchen list while this is running, Last Action will say Set Up but Last Error will say Type Error

slve · 2018-03-09T23:03:18Z

I also wanted to prefix the output and came to this solution using GNU parallel,
where $BOX is the pattern I originally passed to kitchen test and -j3 means 3 concurrent runs.

kl=$(kitchen list | cut -d' ' -f1 | sed 1d | grep "$BOX")
parallel -j3 --tag kitchen test {} ::: $kl

ghost · 2018-03-21T19:21:07Z

I have switched to using parallel as well per @slve's recommendation to better success

gionn · 2018-05-04T08:59:22Z

To avoid breaking due to deprecation notices while starting up test-kitchen and kitchen.log truncated by every process starting up (but output would be garbled in any case):

kl=$(kitchen list -b -l fatal| cut -d' ' -f1 | sed 1d | grep "$BOX")
parallel -j3 --tag kitchen test --no-log-overwrite {} ::: $kl

mike10010100 · 2018-07-13T15:40:17Z

We ran into the same issue when running kitchen test -c while using kitchen-dokken. We found a solution by specifying the same number of distinct volumes as there are platforms running simultaneously.

For example, if you have 5 platforms, under driver: you need to add:

volumes: [
    '/var/lib/docker', '/var/lib/docker-one', '/var/lib/docker-two', '/var/lib/docker-three', '/var/lib/docker-four'
  ]

After this addition, we've been able to run concurrent kitchen test runs without this error.

kekaichinose · 2019-06-12T01:26:26Z

What’s happening? Why was this issue closed?
This issue was closed due to some much needed review of legacy issues or issues that were spawned in older versions of InSpec, i.e. < v3.

Why do I care?
You would care about this if this was an issue that you are still seeing and/or feel needs to be addressed in the current version of InSpec.

What do I need to do?
If this issue is no longer important, no further action is necessary. However, if you think this is something that should be addressed, please open a new issue and refer to the original issue in the description.

otakup0pe mentioned this issue Dec 1, 2016

kitchen test or verify with --parallel option fails test-kitchen/test-kitchen#1125

Closed

chris-rock mentioned this issue Jan 2, 2017

JUnit XML dumping fails inspec/inspec#1383

Closed

ghost mentioned this issue Sep 10, 2017

Failed to complete #verify action: [no implicit conversion of nil into String] #135

Closed

adamleff added the bug label Sep 20, 2017

ghost mentioned this issue Mar 3, 2018

Inspec-1.51.* fail with --concurrency option in kitchen #167

Closed

tas50 added Type: Bug Doesn't work as expected. and removed bug labels Jan 14, 2019

kekaichinose closed this as completed Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inspec fails with kitchen verify -c N #119

inspec fails with kitchen verify -c N #119

pudge commented Nov 16, 2016

otakup0pe commented Dec 1, 2016

baurmatt commented Dec 1, 2016

stefanandres commented Jan 23, 2017

quulah commented Aug 2, 2017

ghost commented Sep 10, 2017

nhudacin commented Sep 11, 2017

pudge commented Sep 11, 2017

ghost commented Sep 13, 2017

adamleff commented Sep 13, 2017

ghost commented Sep 13, 2017

dragon788 commented Nov 17, 2017

dragon788 commented Nov 17, 2017

jurajseffer commented Dec 1, 2017

ricoli commented Feb 22, 2018

ghost commented Mar 8, 2018

slve commented Mar 9, 2018 •

edited

Loading

ghost commented Mar 21, 2018

gionn commented May 4, 2018 •

edited

Loading

mike10010100 commented Jul 13, 2018

kekaichinose commented Jun 12, 2019

inspec fails with kitchen verify -c N #119

inspec fails with kitchen verify -c N #119

Comments

pudge commented Nov 16, 2016

otakup0pe commented Dec 1, 2016

baurmatt commented Dec 1, 2016

stefanandres commented Jan 23, 2017

quulah commented Aug 2, 2017

ghost commented Sep 10, 2017

nhudacin commented Sep 11, 2017

pudge commented Sep 11, 2017

ghost commented Sep 13, 2017

adamleff commented Sep 13, 2017

ghost commented Sep 13, 2017

dragon788 commented Nov 17, 2017

dragon788 commented Nov 17, 2017

jurajseffer commented Dec 1, 2017

ricoli commented Feb 22, 2018

ghost commented Mar 8, 2018

slve commented Mar 9, 2018 • edited Loading

ghost commented Mar 21, 2018

gionn commented May 4, 2018 • edited Loading

mike10010100 commented Jul 13, 2018

kekaichinose commented Jun 12, 2019

slve commented Mar 9, 2018 •

edited

Loading

gionn commented May 4, 2018 •

edited

Loading