Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

Is dataflow job submission still broken? #220

Closed
cisaacstern opened this issue Jan 17, 2023 · 6 comments
Closed

Is dataflow job submission still broken? #220

cisaacstern opened this issue Jan 17, 2023 · 6 comments

Comments

@cisaacstern
Copy link
Member

cisaacstern commented Jan 17, 2023

Today I re-deployed the backend service via #218, which should have reverted changes last week that caused Dataflow job submission problems.

While certain issues may have been resolved, some issue seems to be persisting, as pangeo-forge/staged-recipes#247 (comment) did not succeed in deploying a job. Here are the backend logs following that comment:

Jan 17 12:51:22  2023-01-17 20:51:21,991 INFO - orchestrator - Calling run with args, kws: ('https://github.com/thenaomig/staged-recipes', '34a4f3f9f38499ddbcd118a2b59bf4108d62d42a', RecipeRun(recipe_id='cesm2_r11i1p1f1_ssp370', bakery_id=1, head_sha='34a4f3f9f38499ddbcd118a2b59bf4108d62d42a', started_at=datetime.datetime(2023, 1, 13, 22, 49, 7), conclusion=None, is_test=True, dataset_public_url=None, id=1486, feedstock_id=1, version='', completed_at=None, status='queued', dataset_type='zarr', message=None), 'pangeo-forge/staged-recipes'), {'feedstock_subdir': 'recipes/cmip6-wrf-wus'}
Jan 17 12:51:22  2023-01-17 20:51:22,223 DEBUG - orchestrator - Dumping bakery config to json: {'Bake': {'bakery_class': 'pangeo_forge_runner.bakery.dataflow.DataflowBakery', 'job_name': 'a6170692e70616e67656f2d666f7267652e6f7267251486', 'container_image': 'gcr.io/pangeo-forge-4967/pangeo/forge:5e51a29'}, 'TargetStorage': {'fsspec_class': 's3fs.S3FileSystem', 'fsspec_args': {'client_kwargs': {'endpoint_url': 'https://ncsa.osn.xsede.org'}, 'default_cache_type': 'none', 'default_fill_cache': False, 'use_listings_cache': False, 'key': SecretStr('**********'), 'secret': SecretStr('**********')}, 'root_path': 'Pangeo/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1486/cesm2_r11i1p1f1_ssp370.zarr', 'public_url': 'https://ncsa.osn.xsede.org/{root_path}'}, 'InputCacheStorage': {'fsspec_class': 'gcsfs.GCSFileSystem', 'fsspec_args': {'bucket': 'pangeo-forge-prod-cache'}, 'root_path': 'pangeo-forge-prod-cache'}, 'MetadataCacheStorage': {'fsspec_class': 'gcsfs.GCSFileSystem', 'fsspec_args': {}, 'root_path': 'pangeo-forge-prod-cache/metadata/pangeo-forge/test/pangeo-forge/staged-recipes/recipe-run-1486/cesm2_r11i1p1f1_ssp370.zarr'}, 'DataflowBakery': {'temp_gcs_location': 'gs://pangeo-forge-prod-dataflow/temp'}}
Jan 17 12:51:22  2023-01-17 20:51:22,229 DEBUG - orchestrator - Running command: ['pangeo-forge-runner', 'bake', '--repo=https://github.com/thenaomig/staged-recipes', '--ref=34a4f3f9f38499ddbcd118a2b59bf4108d62d42a', '--json', '--prune', '--Bake.recipe_id=cesm2_r11i1p1f1_ssp370', '-f=/tmp/tmphq0uxi87.json', '--feedstock-subdir=recipes/cmip6-wrf-wus']
Jan 17 12:55:13    error: subprocess-exited-with-error
Jan 17 12:55:13    × python setup.py egg_info did not run successfully.
Jan 17 12:55:13    │ exit code: 1
Jan 17 12:55:13    ╰─> [28 lines of output]
Jan 17 12:55:13        Traceback (most recent call last):
Jan 17 12:55:13          File "<string>", line 2, in <module>
Jan 17 12:55:13          File "<pip-setuptools-caller>", line 34, in <module>
Jan 17 12:55:13          File "/tmp/pip-download-nryxb09_/apache-beam_5dcd438c30a6474c87fb9f55352c445a/setup.py", line 110, in <module>
Jan 17 12:55:13            _CYTHON_VERSION = get_distribution('cython').version
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 514, in get_distribution
Jan 17 12:55:13            dist = get_provider(dist)
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 386, in get_provider
Jan 17 12:55:13            return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 956, in require
Jan 17 12:55:13            needed = self.resolve(parse_requirements(requirements))
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 815, in resolve
Jan 17 12:55:13            dist = self._resolve_dist(
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 844, in _resolve_dist
Jan 17 12:55:13            env = Environment(self.entries)
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 1044, in __init__
Jan 17 12:55:13            self.scan(search_path)
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 1077, in scan
Jan 17 12:55:13            self.add(dist)
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 1096, in add
Jan 17 12:55:13            dists.sort(key=operator.attrgetter('hashcmp'), reverse=True)
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 2631, in hashcmp
Jan 17 12:55:13            self.parsed_version,
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 2678, in parsed_version
Jan 17 12:55:13            self._parsed_version = parse_version(self.version)
Jan 17 12:55:13          File "/usr/local/lib/python3.9/dist-packages/pkg_resources/_vendor/packaging/version.py", line 266, in __init__
Jan 17 12:55:13            raise InvalidVersion(f"Invalid version: '{version}'")
Jan 17 12:55:13        pkg_resources.extern.packaging.version.InvalidVersion: Invalid version: '1.1build1'
Jan 17 12:55:13        [end of output]
Jan 17 12:55:13    note: This error originates from a subprocess, and is likely not a problem with pip.
Jan 17 12:55:13  error: metadata-generation-failed
Jan 17 12:55:13  × Encountered error while generating package metadata.
Jan 17 12:55:13  ╰─> See above for output.
Jan 17 12:55:13  note: This is an issue with the package mentioned above, not pip.
Jan 17 12:55:13  hint: See above for details.
Jan 17 12:55:19  [2023-01-17 20:55:18 +0000] [62] [CRITICAL] WORKER TIMEOUT (pid:63)
Jan 17 12:55:19  [2023-01-17 20:55:18 +0000] [62] [WARNING] Worker with pid 63 was terminated due to signal 6
Jan 17 12:55:19  [2023-01-17 20:55:19 +0000] [160] [INFO] Booting worker with pid: 160
Jan 17 12:55:22  [2023-01-17 20:55:21 +0000] [160] [INFO] Started server process [160]
Jan 17 12:55:22  [2023-01-17 20:55:21 +0000] [160] [INFO] Waiting for application startup.
Jan 17 12:55:22  [2023-01-17 20:55:21 +0000] [160] [INFO] Application startup complete.

Looks like something having to do with the version of beam?

@cisaacstern cisaacstern changed the title Is the dataflow job submission still broken> Is the dataflow job submission still broken? Jan 17, 2023
@cisaacstern
Copy link
Member Author

Members of the pangeo-forge Heroku team can get a shell on the running production container with:

heroku run /bin/bash -a pangeo-forge-api-prod

I've just done this and confirmed that there is an importable installation of apache-beam==2.42.0 there.

@cisaacstern
Copy link
Member Author

This type of issue, i.e. "Why did a particular job submission fail?", is a common problem in our backend service. As is the case here, with our current logging infrastructure, it's typically not immediately clear if the failure was due to some general problem in the backend service, or if the problem is specific to this particular recipe. Typically I have been re-running these pangeo-forge-runner calls manually from my local machine, to see if they succeed or not. Perhaps there would be a way to do better stream capture of stdout logging from subprocesses, but with #204, recipe parsing will move to a separate (Cloud Run) service. At that point, there may be a way to more easily recover the stdout streams from a particular parsing event (directly from Cloud Run).

@cisaacstern
Copy link
Member Author

Typically I have been re-running these pangeo-forge-runner calls manually from my local machine, to see if they succeed or not.

I've just done this for this job and I was able to submit the job from my local machine, using the same command as was run in production. So this would appear to be a problem with the production environment after all.

@cisaacstern
Copy link
Member Author

Based on these logs following the test run triggered by pangeo-forge/staged-recipes#247 (comment)

Jan 20 13:23:43 [pangeo-forge-api-prod] [heroku/web.1] Process running mem=627M(122.6%)
Jan 20 13:23:43 [pangeo-forge-api-prod] [heroku/web.1] Error R14 (Memory quota exceeded)
Jan 20 13:23:48 [pangeo-forge-api-prod] [app/heroku-postgres] source=DATABASE addon=postgresql-fitted-67704 sample#current_transaction=3024 sample#db_size=9433647bytes sample#tables=4 sample#active-connections=12 sample#waiting-connections=0 sample#index-cache-hit-rate=0.99996 sample#table-cache-hit-rate=0.99999 sample#load-avg-1m=0.015 sample#load-avg-5m=0.025 sample#load-avg-15m=0.005 sample#read-iops=0 sample#write-iops=0.61404 sample#tmp-disk-used=543600640 sample#tmp-disk-available=72435191808 sample#memory-total=8038316kB sample#memory-free=2039480kB sample#memory-cached=5320944kB sample#memory-postgres=25028kB sample#wal-percentage-used=0.06477745905179602
Jan 20 13:24:00 [pangeo-forge-api-prod] [app/web.1] [2023-01-20 21:24:00 +0000] [61] [CRITICAL] WORKER TIMEOUT (pid:62)
Jan 20 13:24:00 [pangeo-forge-api-prod] [app/web.1] [2023-01-20 21:24:00 +0000] [61] [WARNING] Worker with pid 62 was terminated due to signal 6
Jan 20 13:24:00 [pangeo-forge-api-prod] [app/web.1] [2023-01-20 21:24:00 +0000] [203] [INFO] Booting worker with pid: 203
Jan 20 13:24:02 [pangeo-forge-api-prod] [app/web.1] [2023-01-20 21:24:02 +0000] [203] [INFO] Started server process [203]
Jan 20 13:24:02 [pangeo-forge-api-prod] [app/web.1] [2023-01-20 21:24:02 +0000] [203] [INFO] Waiting for application startup.
Jan 20 13:24:02 [pangeo-forge-api-prod] [app/web.1] [2023-01-20 21:24:02 +0000] [203] [INFO] Application startup complete.
Jan 20 13:24:03 [pangeo-forge-api-prod] [heroku/web.1] source=web.1 dyno=heroku.247104119.[22c216b8-1f2c-4c3a-9c3d-7401527c41c9](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?q=%2222c216b8-1f2c-4c3a-9c3d-7401527c41c9%22&focus=1553203709111312578&selected=1553203709111312578) sample#load_avg_1m=0.96 sample#load_avg_5m=0.57 sample#load_avg_15m=0.27
Jan 20 13:24:03 [pangeo-forge-api-prod] [heroku/web.1] source=web.1 dyno=heroku.247104119.[22c216b8-1f2c-4c3a-9c3d-7401527c41c9](https://my.papertrailapp.com/systems/pangeo-forge-api-prod/events?q=%2222c216b8-1f2c-4c3a-9c3d-7401527c41c9%22&focus=1553203709111312582&selected=1553203709111312582) sample#memory_total=818.12MB sample#memory_rss=511.82MB sample#memory_cache=0.00MB sample#memory_swap=306.31MB sample#memory_pgpgin=1416593pages sample#memory_pgpgout=1328492pages sample#memory_quota=512.00MB
Jan 20 13:24:03 [pangeo-forge-api-prod] Process running mem=818M(159.8%)
Jan 20 13:24:03 [pangeo-forge-api-prod] [heroku/web.1] Error R14 (Memory quota exceeded)

I believe this particular recipe is causing worker OOM before it can finish submitted the job to Dataflow. I'm going to hypothesize, then, that this is not some more general bug in the production deployment, but rather something specific to this recipe.

Will do a bit more digging before closing this issue.

@cisaacstern cisaacstern changed the title Is the dataflow job submission still broken? Is dataflow job submission still broken? Jan 23, 2023
@cisaacstern
Copy link
Member Author

When I left off with this last week, I'd thought that the one feedstock where this behavior was observed was an outlier due to OOM issues. Perhaps that is the case, but following the failed test run deployment from pangeo-forge/staged-recipes#245 (comment), it seems clear that there is some more general issue at play. Here's the server trace following that slash command:

2023-01-23T19:09:25.342210+00:00 app[web.1]: 2023-01-23 19:09:25,342 DEBUG - orchestrator - Running command: ['pangeo-forge-runner', 'bake', '--repo=https://github.com/lzampier/staged-recipes', '--ref=f833ad583a76722355b309d6005cb7cf8e6d4d0b', '--json', '--prune', '--Bake.recipe_id=OSI-SAF-450-430-a_rg025', '-f=/tmp/tmp1e25o9am.json', '--feedstock-subdir=recipes/OSI-SAF-450-430-a_rg025']
2023-01-23T19:09:31.442443+00:00 heroku[web.1]: source=web.1 dyno=heroku.247104119.4cace7a7-02f0-49b0-9cee-e3974011462a sample#load_avg_1m=0.00 sample#load_avg_5m=0.00 sample#load_avg_15m=0.00
2023-01-23T19:09:31.470446+00:00 heroku[web.1]: source=web.1 dyno=heroku.247104119.4cace7a7-02f0-49b0-9cee-e3974011462a sample#memory_total=511.89MB sample#memory_rss=363.31MB sample#memory_cache=148.58MB sample#memory_swap=0.00MB sample#memory_pgpgin=180577pages sample#memory_pgpgout=52598pages sample#memory_quota=512.00MB
2023-01-23T19:09:42.930642+00:00 app[web.1]: error: subprocess-exited-with-error
2023-01-23T19:09:42.930657+00:00 app[web.1]: 
2023-01-23T19:09:42.930657+00:00 app[web.1]: × python setup.py egg_info did not run successfully.
2023-01-23T19:09:42.930658+00:00 app[web.1]: │ exit code: 1
2023-01-23T19:09:42.930658+00:00 app[web.1]: ╰─> [26 lines of output]
2023-01-23T19:09:42.930659+00:00 app[web.1]: Traceback (most recent call last):
2023-01-23T19:09:42.930659+00:00 app[web.1]: File "<string>", line 2, in <module>
2023-01-23T19:09:42.930660+00:00 app[web.1]: File "<pip-setuptools-caller>", line 34, in <module>
2023-01-23T19:09:42.930660+00:00 app[web.1]: File "/tmp/pip-download-wt0_y6kb/apache-beam_5304fa4b0e1143aeb814a5af1e30c8a6/setup.py", line 110, in <module>
2023-01-23T19:09:42.930660+00:00 app[web.1]: _CYTHON_VERSION = get_distribution('cython').version
2023-01-23T19:09:42.930661+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 514, in get_distribution
2023-01-23T19:09:42.930661+00:00 app[web.1]: dist = get_provider(dist)
2023-01-23T19:09:42.930662+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 386, in get_provider
2023-01-23T19:09:42.930662+00:00 app[web.1]: return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
2023-01-23T19:09:42.930662+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 956, in require
2023-01-23T19:09:42.930663+00:00 app[web.1]: needed = self.resolve(parse_requirements(requirements))
2023-01-23T19:09:42.930663+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 815, in resolve
2023-01-23T19:09:42.930663+00:00 app[web.1]: dist = self._resolve_dist(
2023-01-23T19:09:42.930664+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 844, in _resolve_dist
2023-01-23T19:09:42.930664+00:00 app[web.1]: env = Environment(self.entries)
2023-01-23T19:09:42.930664+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 1044, in __init__
2023-01-23T19:09:42.930665+00:00 app[web.1]: self.scan(search_path)
2023-01-23T19:09:42.930665+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 1077, in scan
2023-01-23T19:09:42.930665+00:00 app[web.1]: self.add(dist)
2023-01-23T19:09:42.930666+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 1096, in add
2023-01-23T19:09:42.930666+00:00 app[web.1]: dists.sort(key=operator.attrgetter('hashcmp'), reverse=True)
2023-01-23T19:09:42.930666+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 2631, in hashcmp
2023-01-23T19:09:42.930667+00:00 app[web.1]: self.parsed_version,
2023-01-23T19:09:42.930667+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py", line 2685, in parsed_version
2023-01-23T19:09:42.930667+00:00 app[web.1]: raise packaging.version.InvalidVersion(f"{str(ex)} {info}") from None
2023-01-23T19:09:42.930668+00:00 app[web.1]: pkg_resources.extern.packaging.version.InvalidVersion: Invalid version: '1.1build1' (package: distro-info)
2023-01-23T19:09:42.930668+00:00 app[web.1]: [end of output]
2023-01-23T19:09:42.930668+00:00 app[web.1]: 
2023-01-23T19:09:42.930669+00:00 app[web.1]: note: This error originates from a subprocess, and is likely not a problem with pip.
2023-01-23T19:09:42.932950+00:00 app[web.1]: error: metadata-generation-failed
2023-01-23T19:09:42.932951+00:00 app[web.1]: 
2023-01-23T19:09:42.932951+00:00 app[web.1]: × Encountered error while generating package metadata.
2023-01-23T19:09:42.932952+00:00 app[web.1]: ╰─> See above for output.
2023-01-23T19:09:42.932952+00:00 app[web.1]: 
2023-01-23T19:09:42.932953+00:00 app[web.1]: note: This is an issue with the package mentioned above, not pip.
2023-01-23T19:09:42.932953+00:00 app[web.1]: hint: See above for details.
2023-01-23T19:09:44.100776+00:00 app[web.1]: 2023-01-23 19:09:44,100 ERROR - orchestrator - Recipe run  failed with: Traceback (most recent call last):
2023-01-23T19:09:44.100800+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/apache_beam/utils/processes.py", line 89, in check_output
2023-01-23T19:09:44.100801+00:00 app[web.1]: out = subprocess.check_output(*args, **kwargs)
2023-01-23T19:09:44.100802+00:00 app[web.1]: File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
2023-01-23T19:09:44.100802+00:00 app[web.1]: return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2023-01-23T19:09:44.100802+00:00 app[web.1]: File "/usr/lib/python3.9/subprocess.py", line 528, in run
2023-01-23T19:09:44.100803+00:00 app[web.1]: raise CalledProcessError(retcode, process.args,
2023-01-23T19:09:44.100805+00:00 app[web.1]: subprocess.CalledProcessError: Command '['/usr/bin/python3.9', '-m', 'pip', 'download', '--dest', '/tmp/tmpb3mp5hec', 'apache-beam==2.42.0', '--no-deps', '--no-binary', ':all:']' returned non-zero exit status 1.
2023-01-23T19:09:44.100805+00:00 app[web.1]: 
2023-01-23T19:09:44.100806+00:00 app[web.1]: During handling of the above exception, another exception occurred:
2023-01-23T19:09:44.100806+00:00 app[web.1]: 
2023-01-23T19:09:44.100806+00:00 app[web.1]: Traceback (most recent call last):
2023-01-23T19:09:44.100806+00:00 app[web.1]: File "/usr/local/bin/pangeo-forge-runner", line 8, in <module>
2023-01-23T19:09:44.100807+00:00 app[web.1]: sys.exit(main())
2023-01-23T19:09:44.100807+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_runner/cli.py", line 28, in main
2023-01-23T19:09:44.100808+00:00 app[web.1]: app.start()
2023-01-23T19:09:44.100808+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_runner/cli.py", line 23, in start
2023-01-23T19:09:44.100808+00:00 app[web.1]: super().start()
2023-01-23T19:09:44.100808+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/traitlets/config/application.py", line 473, in start
2023-01-23T19:09:44.100809+00:00 app[web.1]: return self.subapp.start()
2023-01-23T19:09:44.100809+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_runner/commands/bake.py", line 176, in start
2023-01-23T19:09:44.100810+00:00 app[web.1]: result = pipeline.run()
2023-01-23T19:09:44.100810+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/apache_beam/pipeline.py", line 547, in run
2023-01-23T19:09:44.100810+00:00 app[web.1]: return Pipeline.from_runner_api(
2023-01-23T19:09:44.100810+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/apache_beam/pipeline.py", line 574, in run
2023-01-23T19:09:44.100810+00:00 app[web.1]: return self.runner.run_pipeline(self, self._options)
2023-01-23T19:09:44.100811+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 475, in run_pipeline
2023-01-23T19:09:44.100811+00:00 app[web.1]: artifacts = environments.python_sdk_dependencies(options)
2023-01-23T19:09:44.100811+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/apache_beam/transforms/environments.py", line 799, in python_sdk_dependencies
2023-01-23T19:09:44.100811+00:00 app[web.1]: return stager.Stager.create_job_resources(
2023-01-23T19:09:44.100811+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/apache_beam/runners/portability/stager.py", line 310, in create_job_resources
2023-01-23T19:09:44.100812+00:00 app[web.1]: Stager._create_beam_sdk(sdk_remote_location, temp_dir))
2023-01-23T19:09:44.100812+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/apache_beam/runners/portability/stager.py", line 822, in _create_beam_sdk
2023-01-23T19:09:44.100812+00:00 app[web.1]: sdk_local_file = Stager._download_pypi_sdk_package(temp_dir)
2023-01-23T19:09:44.100813+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/apache_beam/runners/portability/stager.py", line 936, in _download_pypi_sdk_package
2023-01-23T19:09:44.100814+00:00 app[web.1]: processes.check_output(cmd_args)
2023-01-23T19:09:44.100814+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/apache_beam/utils/processes.py", line 94, in check_output
2023-01-23T19:09:44.100814+00:00 app[web.1]: raise RuntimeError( \
2023-01-23T19:09:44.100814+00:00 app[web.1]: RuntimeError: Full traceback: Traceback (most recent call last):
2023-01-23T19:09:44.100814+00:00 app[web.1]: File "/usr/local/lib/python3.9/dist-packages/apache_beam/utils/processes.py", line 89, in check_output
2023-01-23T19:09:44.100814+00:00 app[web.1]: out = subprocess.check_output(*args, **kwargs)
2023-01-23T19:09:44.100814+00:00 app[web.1]: File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
2023-01-23T19:09:44.100815+00:00 app[web.1]: return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2023-01-23T19:09:44.100815+00:00 app[web.1]: File "/usr/lib/python3.9/subprocess.py", line 528, in run
2023-01-23T19:09:44.100815+00:00 app[web.1]: raise CalledProcessError(retcode, process.args,
2023-01-23T19:09:44.100815+00:00 app[web.1]: subprocess.CalledProcessError: Command '['/usr/bin/python3.9', '-m', 'pip', 'download', '--dest', '/tmp/tmpb3mp5hec', 'apache-beam==2.42.0', '--no-deps', '--no-binary', ':all:']' returned non-zero exit status 1.
2023-01-23T19:09:44.100815+00:00 app[web.1]: 
2023-01-23T19:09:44.100815+00:00 app[web.1]: Pip install failed for package: apache-beam==2.42.0
2023-01-23T19:09:44.100819+00:00 app[web.1]: Output from execution of subprocess: b"Collecting apache-beam==2.42.0\n  Downloading apache-beam-2.42.0.zip (2.9 MB)\n     \xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81\xe2\x94\x81 2.9/2.9 MB 30.1 MB/s eta 0:00:00\n  Preparing metadata (setup.py): started\n  Preparing metadata (setup.py): finished with status 'error'\n"

I do not yet have a working theory of what this issue is, and I believe we've hit the point at which manually fixing these issues without true integration tests that can reproduce these issues, is paying diminishing returns for the project. I am going to open a separate issue on that, and then proceed in that direction.

@cisaacstern
Copy link
Member Author

cisaacstern commented Jan 23, 2023

As discussed in #225, I'm going to go a different direction with this, so closing this. I meant to close #223 with this comment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant