Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃悰 Fix bug to run Kubernetes jobs with large files #368

Merged
merged 4 commits into from May 7, 2020

Conversation

haizi-zh
Copy link
Contributor

@haizi-zh haizi-zh commented May 2, 2020

When you create a Kubernetes job, the source files will be uploaded as Kubernetes secrets. However, they have a size limit of 1MB. See: kubernetes/kubernetes#19781

The KubernetesExecutor.run() method has already taken this into account by checking secret file sizes. Yet before invoking run(), at initialization the method register_secret() will be called, which should also check the file size, otherwise an OSError: (32, 'EPIPE') may be thrown.

We can solve the problem by checking source file sizes in register_secret(), skipping large ones, with warning messages.


Bug description

Version: snakemake v5.15.0

Scenario: call snakemake from a git directory which contains files larger than 1MB, with --kubernetes arguments. For example:

snakemake --kubernetes --container-image zephyre/cfdna_pipeline:v0.1.14 -j 100 --default-remote-provider S3 --default-remote-prefix pandisease.epifluidlab.cchmc.org -s pandisease.smk -p

Output:

Traceback (most recent call last):
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 340, in _send_until_done
    return self.connection.send(data)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1757, in send
    self._raise_ssl_error(self._ssl, result)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1663, in _raise_ssl_error
    raise SysCallError(errno, errorcode.get(errno))
OpenSSL.SSL.SysCallError: (32, 'EPIPE')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/http/client.py", line 1065, in _send_output
    self.send(chunk)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/http/client.py", line 987, in send
    self.sock.sendall(data)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 352, in sendall
    data[total_sent : total_sent + SSL_WRITE_BLOCKSIZE]
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 346, in _send_until_done
    raise SocketError(str(e))
OSError: (32, 'EPIPE')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/snakemake/__init__.py", line 654, in snakemake
    keepincomplete=keep_incomplete,
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/snakemake/workflow.py", line 842, in execute
    keepincomplete=keepincomplete,
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/snakemake/scheduler.py", line 229, in __init__
    keepincomplete=keepincomplete,
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/snakemake/executors.py", line 1395, in __init__
    self.register_secret()
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/snakemake/executors.py", line 1425, in register_secret
    self.kubeapi.create_namespaced_secret(self.namespace, secret)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 6819, in create_namespaced_secret
    (data) = self.create_namespaced_secret_with_http_info(namespace, body, **kwargs)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 6910, in create_namespaced_secret_with_http_info
    collection_formats=collection_formats)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 344, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 178, in __call_api
    _request_timeout=_request_timeout)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 387, in request
    body=body)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/kubernetes/client/rest.py", line 266, in POST
    body=body)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/kubernetes/client/rest.py", line 166, in request
    headers=headers)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/request.py", line 80, in request
    method, url, fields=fields, headers=headers, **urlopen_kw
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/request.py", line 171, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/poolmanager.py", line 330, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/http/client.py", line 1065, in _send_output
    self.send(chunk)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/http/client.py", line 987, in send
    self.sock.sendall(data)
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 352, in sendall
    data[total_sent : total_sent + SSL_WRITE_BLOCKSIZE]
  File "/Users/haizi/miniconda3/envs/comp-bio/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 346, in _send_until_done
    raise SocketError(str(e))
urllib3.exceptions.ProtocolError: ('Connection aborted.', OSError("(32, 'EPIPE')"))

When you create a Kubernetes job, the source files will be uploaded
as Kubernetes secrets. However, they have a size limit of 1MB. See:
kubernetes/kubernetes#19781

The KubernetesExecutor.run() method has already taken this into account
by checking secret file sizes. Yet before invoking run(), at
initialization the method register_secret() will be called, which should
also check the file size, otherwise an OSError: (32, 'EPIPE') may be
thrown.
Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks a lot!

@sonarcloud
Copy link

sonarcloud bot commented May 7, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.3% 0.3% Duplication

@johanneskoester johanneskoester merged commit 8347bdd into snakemake:master May 7, 2020
@haizi-zh haizi-zh deleted the fix-k8s-large-file branch November 17, 2020 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants