Support configuration key to toggle using IPs or hostnames#7
Support configuration key to toggle using IPs or hostnames#7astahlman merged 1 commit intodata-platformfrom
Conversation
* Background We're patching the socket library in order to use IP addresses instead of hostnames, because Lyft hostnames are special and don't actually resolve. (See https://github.com/lyft/etl/pull/5560 for context) `service_venv` sets the \$PATH such that `/usr/local/lib/service_venv/bin/` takes precedence over `srv/service/current/bin/`, so it's not enough to simply put a patched version of `airflow` in the ETL repository. We're ensuring that our patched version of `airflow` takes precedence by overwriting `/usr/local/lib/service_venv/bin/airflow` with a script that just calls our patched version at `/srv/service/current/bin/airflow` (from the ETL repository). See this salt-state [1]. That doesn't work anymore, because with the rollout of frozen_venvs we can't count on `airflow` being at a hardcoded path. https://github.com/lyft/etl/pull/6073 was one attempt to solve this by finding the service_venv root dynamically. That approach doesn't work, because it depends on `service_venv` being available by the time the Jinja template is rendered, but `service_venv` won't be available until it's created in the lyft-python state (which requires that the template is rendered). We can avoid this chicken and egg problem if we simply patch the Airflow code directly, rather than trying to overwrite it on deployment. We apply our patch iff the `prefer_ip_over_hostname` key under the `lyft` configuration section is set. * Rollout On the sharedairflowworker (which is only running test DAGs so far), we will stop overwriting the airflow command with our patched version and set the `prefer_ip_over_hostname` key. If this works, then we can take the same approach for the other ASGs. Until then, we will continue overwriting the `airflow` command under service_venv/ with our patched version from ETL. [1] https://github.com/lyft/etl/blob/master/ops/config/states/etl/init.sls#L50-L59
bhuwanchopra
left a comment
There was a problem hiding this comment.
Overall makes sense to me. Minor suggestion
| from airflow.bin.cli import CLIFactory | ||
|
|
||
| def get_private_ip(name=''): | ||
| r = requests.get("http://169.254.169.254/latest/meta-data/local-ipv4") |
There was a problem hiding this comment.
Should "http://169.254.169.254/latest/meta-data/local-ipv4" also be a part of configuration?
There was a problem hiding this comment.
We could, but I see no reason to make it configurable unless it could change. This is the static address from which EC2 instances vend metadata - see the AWS docs for details.
There was a problem hiding this comment.
Wish boto3 had a nice wrapper around this, but boto/boto3#313 has been open for a while. There is boto.utils.get_instance_metadata() though:
>>> import boto.utils
>>> instance_metadata = boto.utils.get_instance_metadata()
>>> print instance_metadata['local-ipv4']
10.0.24.67But that is a dependency to another library, which may be available though.
Overall looks like the hostname the node returns:
PRODUCTION: amalakar@etl-production-iad-d532cf46:~$ hostname
etl-production-iad-d532cf46
Doesn't seem to have a dns entry or an entry in /etc/hosts
PRODUCTION: amalakar@etl-production-iad-d532cf46:~$ host etl-production-iad-d532cf46
Host etl-production-iad-d532cf46 not found: 3(NXDOMAIN)
I have seen lot of systems do depend on the gethostname to resolve correctly on a DNS lookup. For example hadoop cli would fail when this is not satisfied. I think any instance that gets provisioned should have these pieces working which is an assumption for lot of systems out there.
@astahlman should we create a ticket against #provisioning to take care of this? We may end up patching/hacking startup script of many other systems otherwise.
There was a problem hiding this comment.
Yeah, provisioning is aware of it, and unfortunately it seems that this is by design. Lyft uses the hostname to encode some information about the host that's used for monitoring purposes, see the previous discussions in these two Slack threads:
|
👍 lgtm |
|
💨 |
Background
We're patching the socket library in order to use IP addresses instead of hostnames, because Lyft hostnames are special and don't actually resolve. (See https://github.com/lyft/etl/pull/5560 for context)
service_venvsets the $PATH such that/usr/local/lib/service_venv/bin/takes precedence oversrv/service/current/bin/, so it's not enough to simply put a patched version ofairflowin the ETL repository.We're ensuring that our patched version of
airflowtakes precedence by overwriting/usr/local/lib/service_venv/bin/airflowwith a script that just calls our patched version at/srv/service/current/bin/airflow(from the ETL repository). See this salt-state [1].That doesn't work anymore, because with the rollout of frozen_venvs we can't count on
airflowbeing at a hardcoded path. https://github.com/lyft/etl/pull/6073 was one attempt to solve this by finding the service_venv root dynamically. That approach doesn't work, because it depends onservice_venvbeing available by the time the Jinja template is rendered, butservice_venvwon't be available until it's created in the lyft-python state (which requires that the template is rendered).Solution
We can avoid this chicken and egg problem if we simply patch the Airflow code directly, rather than trying to overwrite it on deployment. We apply our patch iff the
prefer_ip_over_hostnamekey under thelyftconfiguration section is set.
Rollout
On the sharedairflowworker (which is only running test DAGs so far), we will stop overwriting the airflow command with our patched version and set the
prefer_ip_over_hostnamekey.If this works, then we can take the same approach for the other ASGs. Until then, we will continue overwriting the
airflowcommand under service_venv/ with our patched version from ETL.cc @lyft/data-platform
[1] https://github.com/lyft/etl/blob/master/ops/config/states/etl/init.sls#L50-L59