Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to catch boto throttling #84

Closed
isichei opened this issue Nov 23, 2018 · 2 comments
Closed

Need to catch boto throttling #84

isichei opened this issue Nov 23, 2018 · 2 comments

Comments

@isichei
Copy link
Contributor

isichei commented Nov 23, 2018

See traceback

[2018-11-23 16:55:18,326] {models.py:1428} INFO - Executing <Task(PythonOperator): count> on 2018-11-23 16:47:34.247276
[2018-11-23 16:55:18,326] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow run crest_preprocess_tables_etl count 2018-11-23T16:47:34.247276 --job_id 3263 --raw -sd DAGS_FOLDER/crest_preprocess_tables_etl.py']
[2018-11-23 16:55:20,522] {base_task_runner.py:98} INFO - Subtask: [2018-11-23 16:55:20,521] {__init__.py:45} INFO - Using executor LocalExecutor
[2018-11-23 16:55:20,825] {base_task_runner.py:98} INFO - Subtask: [2018-11-23 16:55:20,824] {models.py:189} INFO - Filling up the DagBag from /Users/karik/airflow/dags/crest_preprocess_tables_etl.py
[2018-11-23 16:55:21,097] {base_task_runner.py:98} INFO - Subtask: [2018-11-23 16:55:21,097] {credentials.py:1032} INFO - Found credentials in shared credentials file: ~/.aws/credentials
[2018-11-23 16:55:23,282] {cli.py:374} INFO - Running on host Karik.local
[2018-11-23 16:55:23,338] {logging_mixin.py:84} INFO - Starting job "airflow_crest_count"...

[2018-11-23 16:57:25,607] {models.py:1595} ERROR - An error occurred (ThrottlingException) when calling the GetJobRun operation (reached max retries: 4): Rate exceeded
Traceback (most recent call last):
  File "/anaconda/lib/python3.6/site-packages/airflow/models.py", line 1493, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/anaconda/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 89, in execute
    return_value = self.execute_callable()
  File "/anaconda/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 94, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/Users/karik/airflow/dags/crest_preprocess_tables_etl.py", line 90, in run_glue_job_as_airflow_task
    job.wait_for_completion()
  File "/anaconda/lib/python3.6/site-packages/etl_manager/etl.py", line 404, in wait_for_completion
    status = self.job_status
  File "/anaconda/lib/python3.6/site-packages/etl_manager/etl.py", line 379, in job_status
    return _glue_client.get_job_run(JobName=self.job_name, RunId=self.job_run_id)
  File "/anaconda/lib/python3.6/site-packages/botocore/client.py", line 320, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/anaconda/lib/python3.6/site-packages/botocore/client.py", line 623, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the GetJobRun operation (reached max retries: 4): Rate exceeded
[2018-11-23 16:57:25,609] {models.py:1624} INFO - Marking task as FAILED.
[2018-11-23 16:57:25,631] {models.py:1644} ERROR - An error occurred (ThrottlingException) when calling the GetJobRun operation (reached max retries: 4): Rate exceeded
[2018-11-23 16:57:25,633] {base_task_runner.py:98} INFO - Subtask: Traceback (most recent call last):
[2018-11-23 16:57:25,633] {base_task_runner.py:98} INFO - Subtask:   File "/anaconda/bin/airflow", line 27, in <module>
[2018-11-23 16:57:25,633] {base_task_runner.py:98} INFO - Subtask:     args.func(args)
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:   File "/anaconda/lib/python3.6/site-packages/airflow/bin/cli.py", line 392, in run
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:     pool=args.pool,
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:   File "/anaconda/lib/python3.6/site-packages/airflow/utils/db.py", line 50, in wrapper
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:     result = func(*args, **kwargs)
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:   File "/anaconda/lib/python3.6/site-packages/airflow/models.py", line 1493, in _run_raw_task
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:     result = task_copy.execute(context=context)
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:   File "/anaconda/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 89, in execute
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:     return_value = self.execute_callable()
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:   File "/anaconda/lib/python3.6/site-packages/airflow/operators/python_operator.py", line 94, in execute_callable
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:     return self.python_callable(*self.op_args, **self.op_kwargs)
[2018-11-23 16:57:25,634] {base_task_runner.py:98} INFO - Subtask:   File "/Users/karik/airflow/dags/crest_preprocess_tables_etl.py", line 90, in run_glue_job_as_airflow_task
[2018-11-23 16:57:25,635] {base_task_runner.py:98} INFO - Subtask:     job.wait_for_completion()
[2018-11-23 16:57:25,635] {base_task_runner.py:98} INFO - Subtask:   File "/anaconda/lib/python3.6/site-packages/etl_manager/etl.py", line 404, in wait_for_completion
[2018-11-23 16:57:25,635] {base_task_runner.py:98} INFO - Subtask:     status = self.job_status
[2018-11-23 16:57:25,635] {base_task_runner.py:98} INFO - Subtask:   File "/anaconda/lib/python3.6/site-packages/etl_manager/etl.py", line 379, in job_status
[2018-11-23 16:57:25,635] {base_task_runner.py:98} INFO - Subtask:     return _glue_client.get_job_run(JobName=self.job_name, RunId=self.job_run_id)
[2018-11-23 16:57:25,635] {base_task_runner.py:98} INFO - Subtask:   File "/anaconda/lib/python3.6/site-packages/botocore/client.py", line 320, in _api_call
[2018-11-23 16:57:25,635] {base_task_runner.py:98} INFO - Subtask:     return self._make_api_call(operation_name, kwargs)
[2018-11-23 16:57:25,635] {base_task_runner.py:98} INFO - Subtask:   File "/anaconda/lib/python3.6/site-packages/botocore/client.py", line 623, in _make_api_call
[2018-11-23 16:57:25,635] {base_task_runner.py:98} INFO - Subtask:     raise error_class(parsed_response, operation_name)
[2018-11-23 16:57:25,635] {base_task_runner.py:98} INFO - Subtask: botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the GetJobRun operation (reached max retries: 4): Rate exceeded

I'm running 8 python operators in parallel - wondering if because it's running from same python kernel they are throwing an error from boto3 because running in parallel. Don't seem to get this issue with dockerised versions on kubernetes pod operator but not running as much in parallel so needs proper testing on dockerised version.

@isichei
Copy link
Contributor Author

isichei commented Nov 23, 2018

@isichei
Copy link
Contributor Author

isichei commented Nov 23, 2018

I think it makes the most sense to add an Exponential backoff to the wait_for_completion() method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant