Support configuration key to toggle using IPs or hostnames by astahlman · Pull Request #7 · lyft/airflow

astahlman · 2017-06-24T00:10:30Z

Background

We're patching the socket library in order to use IP addresses instead of hostnames, because Lyft hostnames are special and don't actually resolve. (See https://github.com/lyft/etl/pull/5560 for context) service_venv sets the $PATH such that /usr/local/lib/service_venv/bin/ takes precedence over srv/service/current/bin/, so it's not enough to simply put a patched version of airflow in the ETL repository.

We're ensuring that our patched version of airflow takes precedence by overwriting /usr/local/lib/service_venv/bin/airflow with a script that just calls our patched version at /srv/service/current/bin/airflow (from the ETL repository). See this salt-state [1].

That doesn't work anymore, because with the rollout of frozen_venvs we can't count on airflow being at a hardcoded path. https://github.com/lyft/etl/pull/6073 was one attempt to solve this by finding the service_venv root dynamically. That approach doesn't work, because it depends on service_venv being available by the time the Jinja template is rendered, but service_venv won't be available until it's created in the lyft-python state (which requires that the template is rendered).

Solution

We can avoid this chicken and egg problem if we simply patch the Airflow code directly, rather than trying to overwrite it on deployment. We apply our patch iff the prefer_ip_over_hostname key under the lyft
configuration section is set.

Rollout

On the sharedairflowworker (which is only running test DAGs so far), we will stop overwriting the airflow command with our patched version and set the prefer_ip_over_hostname key.

If this works, then we can take the same approach for the other ASGs. Until then, we will continue overwriting the airflow command under service_venv/ with our patched version from ETL.

cc @lyft/data-platform

[1] https://github.com/lyft/etl/blob/master/ops/config/states/etl/init.sls#L50-L59

* Background We're patching the socket library in order to use IP addresses instead of hostnames, because Lyft hostnames are special and don't actually resolve. (See https://github.com/lyft/etl/pull/5560 for context) `service_venv` sets the \$PATH such that `/usr/local/lib/service_venv/bin/` takes precedence over `srv/service/current/bin/`, so it's not enough to simply put a patched version of `airflow` in the ETL repository. We're ensuring that our patched version of `airflow` takes precedence by overwriting `/usr/local/lib/service_venv/bin/airflow` with a script that just calls our patched version at `/srv/service/current/bin/airflow` (from the ETL repository). See this salt-state [1]. That doesn't work anymore, because with the rollout of frozen_venvs we can't count on `airflow` being at a hardcoded path. https://github.com/lyft/etl/pull/6073 was one attempt to solve this by finding the service_venv root dynamically. That approach doesn't work, because it depends on `service_venv` being available by the time the Jinja template is rendered, but `service_venv` won't be available until it's created in the lyft-python state (which requires that the template is rendered). We can avoid this chicken and egg problem if we simply patch the Airflow code directly, rather than trying to overwrite it on deployment. We apply our patch iff the `prefer_ip_over_hostname` key under the `lyft` configuration section is set. * Rollout On the sharedairflowworker (which is only running test DAGs so far), we will stop overwriting the airflow command with our patched version and set the `prefer_ip_over_hostname` key. If this works, then we can take the same approach for the other ASGs. Until then, we will continue overwriting the `airflow` command under service_venv/ with our patched version from ETL. [1] https://github.com/lyft/etl/blob/master/ops/config/states/etl/init.sls#L50-L59

astahlman · 2017-06-26T16:22:21Z

👀 @perryzheng

cc @SaurabhBajaj

bhuwanchopra

Overall makes sense to me. Minor suggestion

bhuwanchopra · 2017-06-26T16:56:06Z

airflow/bin/airflow

 from airflow.bin.cli import CLIFactory

+def get_private_ip(name=''):
+    r = requests.get("http://169.254.169.254/latest/meta-data/local-ipv4")


Should "http://169.254.169.254/latest/meta-data/local-ipv4" also be a part of configuration?

We could, but I see no reason to make it configurable unless it could change. This is the static address from which EC2 instances vend metadata - see the AWS docs for details.

Wish boto3 had a nice wrapper around this, but boto/boto3#313 has been open for a while. There is boto.utils.get_instance_metadata() though:

>>> import boto.utils >>> instance_metadata = boto.utils.get_instance_metadata() >>> print instance_metadata['local-ipv4'] 10.0.24.67

But that is a dependency to another library, which may be available though.

Overall looks like the hostname the node returns:

PRODUCTION: amalakar@etl-production-iad-d532cf46:~$ hostname etl-production-iad-d532cf46

Doesn't seem to have a dns entry or an entry in /etc/hosts

PRODUCTION: amalakar@etl-production-iad-d532cf46:~$ host etl-production-iad-d532cf46 Host etl-production-iad-d532cf46 not found: 3(NXDOMAIN)

I have seen lot of systems do depend on the gethostname to resolve correctly on a DNS lookup. For example hadoop cli would fail when this is not satisfied. I think any instance that gets provisioned should have these pieces working which is an assumption for lot of systems out there.

@astahlman should we create a ticket against #provisioning to take care of this? We may end up patching/hacking startup script of many other systems otherwise.

Yeah, provisioning is aware of it, and unfortunately it seems that this is by design. Lyft uses the hostname to encode some information about the host that's used for monitoring purposes, see the previous discussions in these two Slack threads:

https://lyft.slack.com/archives/C3ASS377S/p1493405313083642

https://lyft.slack.com/archives/C2U75K0H5/p1484849237000002

perryzheng · 2017-06-26T18:29:29Z

👍 lgtm

astahlman · 2017-06-26T18:29:45Z

💨

bhuwanchopra reviewed Jun 26, 2017

View reviewed changes

astahlman merged commit 5586353 into data-platform Jun 26, 2017

eschachar deleted the patch_socket branch September 24, 2022 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support configuration key to toggle using IPs or hostnames#7

Support configuration key to toggle using IPs or hostnames#7
astahlman merged 1 commit intodata-platformfrom
patch_socket

astahlman commented Jun 24, 2017

Uh oh!

astahlman commented Jun 26, 2017

Uh oh!

bhuwanchopra left a comment

Uh oh!

bhuwanchopra Jun 26, 2017

Uh oh!

astahlman Jun 26, 2017

Uh oh!

amalakar Jun 26, 2017

Uh oh!

astahlman Jun 26, 2017

Uh oh!

perryzheng commented Jun 26, 2017

Uh oh!

astahlman commented Jun 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

astahlman commented Jun 24, 2017

Background

Solution

Rollout

Uh oh!

astahlman commented Jun 26, 2017

Uh oh!

bhuwanchopra left a comment

Choose a reason for hiding this comment

Uh oh!

bhuwanchopra Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

astahlman Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

amalakar Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

astahlman Jun 26, 2017

Choose a reason for hiding this comment

Uh oh!

perryzheng commented Jun 26, 2017

Uh oh!

astahlman commented Jun 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants