Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$PATH does not contain paths defined on /etc/environment #3461

Closed
2 of 7 tasks
marcuslopes opened this issue Jul 14, 2021 · 46 comments
Closed
2 of 7 tasks

$PATH does not contain paths defined on /etc/environment #3461

marcuslopes opened this issue Jul 14, 2021 · 46 comments
Assignees

Comments

@marcuslopes
Copy link

marcuslopes commented Jul 14, 2021

Agent Version and Platform

Version of your agent? 2.187.2, 2.188.3

OS of the machine running the agent? Ubuntu 18.04, Ubuntu 20.04

Azure DevOps Type and Version

dev.azure.com (Cloud)

What's not working?

Issue created after request from the contributors of the /actions/virtual-environments repository (see:
actions/runner-images#3695), saying this issue is probably related to the agents and not the images themselves.

Description

When running a task inside a self-hosted Ubuntu build agent (either manual or automatic scaleset), running echo $PATH on a self hosted agent returns a very limited number of paths.
Connecting to the agent via SSH and running echo $PATH directly will return the full range of paths inside the PATH variable though.

Virtual environments affected

  • Ubuntu 16.04
  • Ubuntu 18.04
  • Ubuntu 20.04
  • macOS 10.15
  • macOS 11
  • Windows Server 2016
  • Windows Server 2019

Image version and build link

(using the https://github.com/actions/virtual-environments repository)

Ubuntu 18
Image version: releases/ubuntu18/20210606 commit 58b026cedf2363aee66fcdde3981b09704d5bd79
Agent version: 2.187.2

Ubuntu 20
Image version: releases/ubuntu20/20210606 commit a26b241d4791b9af60f069b29c6e993595d75349
Agent version: 2.188.3

Packer: 1.6.2, 1.7.2

Expected behavior

Running a shell script task echo $PATH on a self hosted Ubuntu agent should return all of the paths defined inside the /etc/environment file:

PATH=/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:$HOME/.local/bin:/opt/pipx_bin:/usr/share/rust/.cargo/bin:$HOME/.config/composer/vendor/bin:/usr/local/.ghcup/bin:$HOME/.dotnet/tools:/snap/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

Actual behavior

Running a shell script task echo $PATH on a self hosted Ubuntu agent returns a very limited number of paths:

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

If I perform the following steps, $PATH gets filled as expected though:

  1. Connect to the virtual machine
  2. Open the agent folder /opt/vsts/a1
  3. Run the following commands:
./env.sh
sudo ./svc.sh stop
sudo ./svc.sh start

Repro steps

  1. Build a VM image using the Ubuntu image versions provided
  2. Create a virtual machine on a scaleset using the newly generated image (either manual or automatic scaleset via Azure DevOps)
  3. (If manual scaleset only) , install a vsts agent on the virtual machine using a release from the vsts agent repo (https://github.com/microsoft/azure-pipelines-agent)
  4. Create a new pipeline (ex.: YAML) and run script echo $PATH

Agent and Worker's Diagnostic Logs

Agent_20210714-172015-utc.log
Worker_20210714-155257-utc.log

@marcuslopes
Copy link
Author

marcuslopes commented Jul 21, 2021

Hi @vladislav-ryzhov !

Thanks for your help with this issue. Please let me know if I can help with any futher information.

@ChristopheLav
Copy link

Any news on this issue?

We continue to investigate but because the PATH is provisioned during the VM image generation, the /etc/environment file is in the good state when the agent is automatically installed by the VMSS extension created by ADO itself (we used the native Scales Set agents feature) at the first startup.

I understand this:

  • This script is used by the VMSS extension to install the agent (link)
  • I see the config.sh script is used, that call the env.sh script
  • env.sh script create the .path file with incorrect values from $PATH

This is strange because if we create a custom script extension that run a script to read the value of the variable $PATH we got the right values. So, this seems the extension is able to run a script in the right context properly initialized. But not the ADO VMSS extension, seems. Maybe the issue is here?

Also, the Microsoft Hosted pools that use the same based images as us don't have this issue. With some commands, I have check the content of the agent directory and I see some differences with the official agent. One difference is: no .path or .env files seems to be created in the root directory.

What are the differences? How we can reproduce the same result?

As workaround, we recommend to our users to use the ##vso[task.prependpath]local directory path command in a script task to add the missing values when there are impacts but it's not optimal.

Thanks!

@kuleshovilya kuleshovilya self-assigned this Sep 23, 2021
@kuleshovilya
Copy link
Contributor

Hello @marcuslopes @ChristopheLav
So, as far as we tested, it loads the /etc/environment just fine, assuming that agent was started after the environment was provisioned. I would assume that there is something incorrect with the provisioning in your system, have you considered that:

  1. PATH variable loads /etc/environment on login, and that might affect the visibility
  2. You could use source /etc/environment to read it

Could you please describe deployment process in greater details, so we would be able to understand where the issue is?
Also, you can ping me at v-ikuleshov@microsoft.com

@AlexrDev
Copy link

AlexrDev commented Sep 27, 2021

Hi,

We've been having a similar issue since approx. last week thursday on our VMSS agents.

Our setup is (roughly):

  • VMSS with a custom script extension which adds VSTS_AGENT_INPUT_WORK and PATH to /etc/environment
  • Devops pipeline agent installation
  • MMAExtension installation

The VSTS_AGENT_INPUT_WORK variable is being read correctly (agent sets this to an attached disk), it's just the PATH that's not being set correctly anymore (it's being set to the same value as secure_path in /etc/sudoers iirc)

@kuleshovilya
Copy link
Contributor

@AlexrDev I'm pretty sure that's different issue from what @marcuslopes and @ChristopheLav have, though it doesn't seem to be agent issue if the setting a variable is the issue.

@Jean-FrancoisBeaudet
Copy link

Jean-FrancoisBeaudet commented Sep 28, 2021

Hi @kuleshovilya,

We use ‘Azure Virtual Machine Scale Set’ agent pool type. There is nothing custom from our part in the vmss extensions. The extension in place is all configured by Azure DevOps itself (extension name: Microsoft.Azure.DevOps.Pipelines.Agent).
Just like @AlexrDev mentioned, we also noticed that the path that is provisioned in the /agent/.path file is the same that in the value secure_path inside /etc/sudoers.

In our process to create the generalized image used in our vmss, we first created build a generalized image with Packer and push it in an image Gallery.

From there,

  • Create a specialized vm
  • Run the post-generation scripts (as recommended from Microsoft documentation) via a custom script extension installed on the vm.
  • Proceed with the generalization and versioning
  • Using the command sudo waagent -deprovision
  • Shutting down the vm
  • Generalize the vm
  • Capture the vm
  • Use this image as a reference in our vmss

We finally found a way to fix the current issue (as a workaround)

  • create the AzDevOps user beforehand, using parts of the enableagent scripts as inspiration. (this additional step is done on the specialized vm BEFORE running the post-generation scripts)
  • Modifying the sudoers file to have the value secure_path to be equal to the path value in /etc/environment. (this additional step is done on the specialized vm AFTER running the post-generation scripts)

Here is the specialized vm extension custom script

#!/bin/bash

sudo useradd -m AzDevOps
sudo usermod -a -G docker AzDevOps
sudo usermod -a -G adm AzDevOps
sudo usermod -a -G sudo AzDevOps

sudo chmod -R +r /home
sudo setfacl -Rdm "u:AzDevOps:rwX" /home
sudo setfacl -Rb /home/AzDevOps
sudo echo 'AzDevOps ALL=NOPASSWD: ALL' >> /etc/sudoers

sudo su AzDevOps

# Run the post-generation scripts
sudo find /opt/post-generation -mindepth 1 -maxdepth 1 -type f -name '*.sh' -exec bash {} \;

# Remove the secure_path line
sed -i.bak '/secure_path/d' /etc/sudoers

# Add the secure_path with the /etc/environment path
pathFromEnv=$(cut -d= -f2 /etc/environment | tail -1)
echo "Defaults secure_path=\"$pathFromEnv\"" >> /etc/sudoers

We tried various ways of creating files in /etc/sudoers.d/ and modifying env_keep, env_reset, disabled secure_path, etc. While the behavior seemed OK while connecting to the specialized vm, and that the secure_path was either removed or replaced in the sudoers.d files, the azure extension always wrote the default secure_path in the .path on the resulting vmss.

Forcing the right path in the secure_path is the only thing that worked, which points to a weird behavior when using azure extensions on a vm/vmss.

When we use this new image version for the vmss, the problem is fixed.

*This is a workaround and not something we consider using as a official fix since it modify the sudoers file directly.

We also noticed a difference in the machine from Azure pipelines pools. The configuration for the vsts user vsts ALL=(ALL) NOPASSWD=ALL) is not in the sudoers file but in the file vsts in the /etc/sudoers.d/ directory. The AzureDevOps extensions is adding this configuration for the user AzDevOps directly to the sudoers file. There seems to be slight differences in the way the user vsts is configure vs. the user AzDevOps. Maybe this is part of the reason why we have this issue.

Thank you for your support on this issue.

By the way I work in the same team as @marcuslopes and @ChristopheLav

@kuleshovilya
Copy link
Contributor

@Jean-FrancoisBeaudet @marcuslopes @ChristopheLav As I said above, that doesn't seem to be agent related, but more has to do with the way your setup works. What you do with the custom script is just a way to force /etc/environemnt through a sudoers file.
It goes like:

  1. Read /etc/environment on login and load it into PATH variable
  2. Read PATH variable on agent start
  3. So, echo $PATH will display the variable that you had at the start of the vm, all the changes after it won't be read.

But, you can make changes to PATH afterwards in your pipelines. Re-reading /etc/environment is possible with something like source or bash-script.
Sorry, but I struggle to understand the issue there from the explanation, maybe you could rephrase it? Otherwise it just looks like an issue with bash's PATH reading priority.

@ChristopheLav
Copy link

Thank you for your reply @kuleshovilya

@Jean-FrancoisBeaudet describes what we do for building the master image that we use in an Azure Virtual Machine Scale Set. As indicated by @marcuslopes, we use the official master image sources in https://github.com/actions/virtual-environments.

To manage the Scale Set, we use the native Azure DevOps Scale Sets feature. As you can read in the Lifecycle of a Scale Set Agent section, the installation of the agent is directly managed by Microsoft with an own extension (automatically injected in the Scale Set by ADO). The documentation show the used (and not customizable) script. We have no custom script currently because we privilege the personnalisation during the build of the master image to keep good (start) performance because we want to use the tear down after each use feature at some time.

Maybe we can attempt to add a Custom Script Extension with what you indicates (source). But the issue always occurs with the master image, so I think the Microsoft script/logic should be updated to work correctly in this case, if it is related to how the agent is installed by the Microsoft script.

I hope it's more clear for you.

@kuleshovilya
Copy link
Contributor

Hmm, yes, a bit more clear, but sadly I don't think that would be changed for images, since this behaviour is expected from Bash. So I would recommend to just add step with source to your pipelines, at least I've tested it out and it seems to be working fine for your case.

@ChristopheLav
Copy link

ChristopheLav commented Sep 29, 2021

Put this command into our pipelines to fix a global agent installation issue (maybe from Bash) impact all our developers pipelines (not a few). But it's a workaround for now.

Because this issue impact the usage of the master images (from Microsoft) with a native feature of ADO (from Microsoft) - I think the installation script (from Microsoft, https://vstsagenttools.blob.core.windows.net/tools/ElasticPools/Linux/6/enableagent.sh) should be updated to fix this issue. This is the most clean way, I think. We can't do anything on our side on this one.

Other consideration, native Azure Pipelines pool (managed entirely by Microsoft) don't have this issue and does not require to add the source command. So, what the magic thing is used by Microsoft here? Have you the possibility to find the information?

@kuleshovilya
Copy link
Contributor

Don't think enableagen.sh requires updating as well, as it is a case that is specific for you.
Once again, since we are talking about PATH variable in Bash being rewritten and read in realtime, it's expected that customized approach is needed.

@ChristopheLav
Copy link

ChristopheLav commented Sep 30, 2021

I don't agree with the statement "it's specific for you". We use this Microsoft documentation without any customization 🙂

For Windows agents, all is/seems good ✔

For Linux agent, the $PATH is provisionned with the values of /etc/sudoers instead of /etc/environment.

The agent installation is done on the Microsoft side (see the bellow documentation) and it is not working out of the box. I think there is an issue here (maybe not related to the agent itself) but @marcuslopes was redirect in this repo from actions/runner-images#3695

We have done one more test on our side. We have added a Custom Script Extension on the VMMS then:

  • Read the $PATH variable (values from /etc/sudoers are provisionned)
  • Execute the source command
  • Read again the $PATH variable (values from /etc/sudoers are provisionned, not from /etc/environment)
  • If we will connect to the VM with the user, the $PATH is correctly provisionned.

This test show the potential issue with the extension on the VMMS (note: Microsoft use an extension to install the agent). As I indicate in the beginning of my reply, we use the Microsoft documentation without any customization. So, the feature (Elastic Pools) is not working out of the box with Linux images possibly related to a specific Extension issue.

Microsoft don't have this issue with their own Hosted Pools, so two possibilities:

  • You don't use the Elastic Pools process internally
  • You use the Elastic Pools process internally but you have customized something

Can you find the missing information on your side, or redirect the issue to the team that is responsible of the Elastic Pools feature?

Thank you!

@kuleshovilya
Copy link
Contributor

kuleshovilya commented Oct 1, 2021

@ChristopheLav With statement "it's specific for you" I mean that it's something that you specifically require and it can be fixed locally, so we are not sure that global change is required.
As for the execution of commands, have you executed just source or source /etc/environment?
The reason why I'm hesitant on any changes is because althoguh I can reproduce the issue, it's hardly unexpected, since it's the way Bash reads env variables, and it's easily fixable from the pipeline perspective.

@ChristopheLav
Copy link

ChristopheLav commented Oct 1, 2021

I don't agree with that @kuleshovilya 🙂

The Ubuntu (18.04 and 20.04) images on actions/virtual-environments repo are provisionned with values into /etc/environment that are required for the tools in the image work correctly (no customization from us) like it's documented. Also, as I said previously, the Microsoft Hosted Pools (Azure Pipelines) don't have this issue: the $PATH variable contains all the right values that are coming from /etc/environment without any command/workaround like source. You can test yourself with a simple echo $PATH using a Linux agent from Azure Pipelines. In result, all tools are working correctly on Azure Pipelines agents. The images used by Microsoft are the same and are coming from actions/virtual-environments too.

Also, my last test clearly show the issue:

  • I'm connect to the VM and run echo $PATH - values are good ✅
  • I run the same command into an VMMS Custom Script Extension - values are not loaded ❌

With the Elastic Pools feature, the agent is automatically installed with an VMMS extension (it's not customizable) and the issue is encountered. In the result, some tools of the Microsoft images that required right values from $PATH are not working as expected (PIPX for example).

Because Microsoft don't have this issue with their own pools, so Microsoft don't use the same Elastic Pools features or have a custom undocumented step. Can you request some help from the Team that is responsible of the Elastic Pools feature if not related to the agent directly?

Maybe @MaksimZhukov can jump in the discussion and help you to better understand (he redirect us into this repo from the original issue actions/runner-images#3695)

@catthehacker
Copy link

Also, my last test clearly show the issue:

  • I'm connect to the VM and run echo $PATH - values are good ✅
  • I run the same command into an VMMS Custom Script Extension - values are not loaded ❌

Sounds like PAM doing it's work during logon. You could try verifying if after logon you can access $PATH.

@ChristopheLav
Copy link

ChristopheLav commented Oct 1, 2021

If I manually connect to the VM instance that previously run the VMMS Custom Script Extension, the values from $PATH are correct yes.

The agent is injected automatically by MS with an extension (not customizable) and the $PATH values are only read during the installation (so in the context of the Extension, with seems not loaded values).

@kuleshovilya
Copy link
Contributor

  • I'm connect to the VM and run echo $PATH - values are good ✅
  • I run the same command into an VMMS Custom Script Extension - values are not loaded ❌

That's exactly what I've said in my first reply in this thread, /etc/environment is loaded into PATH at logon that's why it works, and the source command is able to load it without logon. Once again, those test's doesnt't show the issue.

@ChristopheLav
Copy link

That's exactly what I've said in my first reply in this thread, /etc/environment is loaded into PATH at logon that's why it works, and the source command is able to load it without logon. Once again, those test's doesnt't show the issue.

And I respond to you: we use the official Elastic Pools ADO feature that is documented here. The documentation indicates to build and use the images from the actions/virtual-environments then the Virtual Machine Scale Set is automatically managed by the ADO feature (so Microsoft) that inject an extension automatically to install the agent when an VM instance is created.

If I'm connect manually to the VM and see the $PATH correctly provisioned and that the image tools are working correctly - I can conclude the image generation was succeed (I think).

My test demonstrate the issue is related to the lifecycle of the $PATH (I'm potentially ok with the explanation) but the Elastic Pools feature is not working out of the box because of this (with Linux images only). The $PATH variable need to be provisioned correctly and the Azure Pipelines pools don't have this issue. So, what is the undocumented thing that Microsoft use to get the $PATH correctly provisioned in their VM instances?

The Elastic Pools feature don't have the same behaviour out of the box. And, that is the issue. I'm expect that the feature works correctly out of the box like Azure Pipelines agent! In particular because the installation is managed by Microsoft automatically (with no customization possibility) with the Elastic Pools feature.

Have to put a source command in each pipeline is not a valid solution (only a temporary workaround) because it is not in that way the agents are working in the Microsoft Hosted pools or if I install manually an agent on a VM. The issue only occurs with the Elastic Pools feature and the automatically management that Microsoft has implemented.

@catthehacker
Copy link

Sorry, I meant if you try sudo login <user> or sudo login -f <user>

@ChristopheLav
Copy link

ChristopheLav commented Oct 1, 2021

Sorry, I meant if you try sudo login <user> or sudo login -f <user>

Yes if I'm logon on the VM instance manually.
No if I use a VMSS Custom Script Extension.

This is my tests and associated results:

From Commands Result
VMSS Custom Script Extension echo $PATH
VMSS Custom Script Extension sudo login -f AzDevOps
echo $PATH
Manually login to the VM instance echo $PATH
Manually login to the VM instance sudo login -f AzDevOps
echo $PATH

This is the script used for the tests:

# Create our user account (same as MS)
echo creating AzDevOps account
sudo useradd -m AzDevOps
sudo usermod -a -G docker AzDevOps
sudo usermod -a -G adm AzDevOps
sudo usermod -a -G sudo AzDevOps

echo "Giving AzDevOps user access to the '/home' directory"
sudo chmod -R +r /home
setfacl -Rdm "u:AzDevOps:rwX" /home
setfacl -Rb /home/AzDevOps
echo 'AzDevOps ALL=NOPASSWD: ALL' >> /etc/sudoers

# Diagnostics PATH issue (custom part)
mkdir -p "/opt/dbg"
{ whoami; echo $PATH; } > /opt/dbg/1.txt
sudo login -f AzDevOps
{ whoami; echo $PATH; } > /opt/dbg/2.txt

The AzDevOps account is created like what Microsoft does in enableagent.sh

I noted that on a VMSS Custom Script Extension, the command sudo login -f user do anything: the logged user remain root. So I decided to use the same method than Microsoft does to install the agent (see previous link):

sudo runuser AzDevOps -c "{ echo \$PATH; } > /opt/dbg/3.txt"

These are the results:

From Result
VMSS Custom Script Extension
Manually login to the VM instance

I got this time the same results when I'm logon manually on the VM instance or when the script is executed in a VMSS Custom Script Extension.

I modify a little bit the command to include the source command like you want @kuleshovilya :

sudo runuser AzDevOps -c "{ source /etc/environment; echo \$PATH; } > /opt/dbg/4.txt"

These are the results:

From Result
VMSS Custom Script Extension
Manually login to the VM instance

As I understand with this conversation and some researches: the configuration from /etc/sudoers is used to apply some policies when sudo and runuser are used. When the secure_path is defined, the$PATH variable will not be loaded from the global environment machine but with the specific values from secure_path. Loaded explicitely the variables with the source command is required.

Go back to how the agent installation is done:

  • Call the config.sh script
  • The script is calling env.sh that read the values from $PATH variable and put them into a file .path

Remember the Elastic Pools automatically install the agent with a VMSS Extension and it is not customizable from the users: https://vstsagenttools.blob.core.windows.net/tools/ElasticPools/Linux/6/enableagent.sh

The issue for me a couple of things:

  • The agent consider during the installation that the variable $PATH is always up-to-date and read the variable without any possibility to force a reload from /etc/environment
  • The Elastic Pools feature install the agent without calling the source /etc/environment to ensure to get the proper values
  • The Elastic Pools feature install the agent with sudo command sudo runuser AzDevOps -c "......"
  • The team on @actions/virtual-environments provisionned required values in /etc/environment since some months, and the Elastic Pools feature refers to this repos for get the master images but the installation process is not compatible (currently) with that

Seems like a deadlock because I can't personnalize anything with the Elastic Pools feature that is not working correctly out of the box due to decisions of differents teams.

I think some changes are required:

  • Update the env.sh to include the command source /etc/environment (as an option?)
  • Update the enableagent.sh to include the command source /etc/environment before start the agent installation
  • Or a mix of the two previous changes

Let me known if you see something else.

@kuleshovilya
Copy link
Contributor

While we can't talk for the team that manages enableagent.sh, we won't be adding source command to env.sh for sure.
Once again, please check /etc/environment setup and usage, the way you describe it's working is fine and how it's supposed to be, so far, we can only recommend adding source to your pipelines or something else that customizes the behaviour.

@ChristopheLav
Copy link

Once again, please check /etc/environment setup and usage, the way you describe it's working is fine and how it's supposed to be, so far, we can only recommend adding source to your pipelines or something else that customizes the behaviour.

It's not a valid workaround.

It's not required with Azure Pipelines, so why it's required with Elastic Pools? Because MS uses the same master images, there is a custom undocumented configuration step here. I would like to know this step.

If no one adapt something, the Elastics Pool don't work out of the box. The documentation need to be updated to indicates this MAJOR point!

Whatever it's working as Linux expect, if the feature Elastic Pools don't work there is something to fix in your (MS) end. I don't understand why you don't want to push the issue to the right team if it's not related to you. I really don't understand that point.

@MaksimZhukov the design of the master images that use /etc/environment is not working with the Elastic Pools.

@kuleshovilya
Copy link
Contributor

kuleshovilya commented Oct 4, 2021

@ChristopheLav Because this is not an unexpected or incorrect behaviour, and so we are not sure to what department we shall push this, the issue is not in the setup or anything like that, the way you are trying to use /etc/environment itself in not the way it's supposed to be used, thus you will need to make customization to your pipelines.
Regarding the differences between hosted agent and self-hosted ones, we can recommend to open a ticket on a AzDO portal here

@ChristopheLav
Copy link

@ChristopheLav Because this is not an unexpected or incorrect behaviour, and so we are not sure to what department we shall push this, the issue is not in the setup or anything like that, the way you are trying to use /etc/environment itself in not the way it's supposed to be used, thus you will need to make customization to your pipelines.

Like I said previously - I don't want to use this way! I don't want myself. It's used by the master images itself that are maintained by Microsoft! We opened a ticket on their repo and @MaksimZhukov redirect you here (see the link in the first post).

You can post an update on the original issue. This can help 😉

Regarding the differences between hosted agent and self-hosted ones, we can recommend to open a ticket on a AzDO portal here

We speak about the Elastic Pools feature... not manually self hosted agents. This is not the same!

@kuleshovilya
Copy link
Contributor

kuleshovilya commented Oct 4, 2021

@ChristopheLav Well yeah, not self-hosted, my bad. But still, this is not something we manage, or managed through GitHub, so opening a ticket on AzDO would be the way to learn the difference.
As well as we are not maintaining master images, this would require creating issue on the AzDO portal too.

@ChristopheLav
Copy link

@ChristopheLav Well yeah, not self-hosted, my bad. But still, this is not something we manage, or managed through GitHub, so opening a ticket on AzDO would be the way to learn the difference.

Ok

As well as we are not maintaining master images, this would require creating issue on the AzDO portal too.

We post an issue here because MS indicates that to us in this parent issue:

actions/runner-images#3695

Can you post a message on the parent issue to indicates the usage of /etc/environment seems not the right way to do the things?

@kuleshovilya
Copy link
Contributor

@ChristopheLav Sure, left a comment in the discussion

@ChristopheLav
Copy link

@kuleshovilya Thank you!

@kuleshovilya
Copy link
Contributor

I'll close the issue for now, will reopen if something arises. Feel free to ping me here or at v-ikuleshov@microsoft.com

@cavemandaveman
Copy link

Why was this closed? This is still a major issue with self-hosted agents. The PATH is read from /etc/sudoers and not from /etc/environment.

You can't expect us to edit /etc/sudoers in our custom image just for ADO. And adding a step to every pipeline to source /etc/environment is not practical.

The problem lies in the ADO agent extension - and should be fixed there.

@teck-kcheema
Copy link

This is still the case and should be fixed!

@tonyskidmore
Copy link

I have been working on a three part blog series on Azure DevOps Self-Hosted VMSS Agents. I looked at this issue again while working through that and came up with a workaround that seems to work for me, based on various comments in the history of this issue. Specifically, in the PATH issue in Part 2 I mention what I did to workaround this. Would be interested to get any feedback.

@HoLengZai
Copy link

HoLengZai commented Sep 17, 2022

Thanks @ChristopheLav, @marcuslopes, @Jean-FrancoisBeaudet
Your investigation saved me a bunch of time.

I think i found the workaround (by using the command chattr in the custom script). I know this issue is closed (even i don't think it should be closed) and a bit old.
It will be great and easier if Microsoft dev team can publish the git repo of this extension. It would be much easier to figure out the issue and post the issue to the correct team directly as we don't know which dev team has developed this VM extension to install the pipeline agent.
I tried all the stuffs to trick the VM extension (especially the file enableagent.sh which calling config.sh which "source" .env which generating the .path file

@tonyskidmore, I looked your workaround but I agree with @ChristopheLav, we should not change the "sudoers" file.
As mentioned on the official Microsoft repo: https://github.com/actions/runner-images#about
(aka: https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/scale-set-agents?view=azure-devops#where-can-i-find-the-images-used-for-microsoft-hosted-agents)

I run plenty of pipeline on the Microsoft-hosted "Azure Pipeline" and indeed the 2 images (MS-hosted and the one from the official repo runner-images with packer) are the same however the way the pipeline agent is deployed is totally different.
Example:
MS-hosted create the user account vsts and it put the agent in /home/vsts/agents/<x.y.z>/
VM pipeline extension create the user AzDevOps and put the agent in /agent/

Then MS-hosted is running the pipeline agent as a service in systemd.
It's not the case with the VM pipeline extension. I don't really know how it runs but i don't see any vsts service in /etc/systemd/system/*vsts*

So here my workaround and i hope it will help others:
I use the custom script extension to create the /agent/.path which creating the issue before the VM pipeline agent extension.
I know that the VM pipeline agent extension will overwrite the /agent/.path file as the extension is run through the AzDevOps account (in theory), so as superuser, so it will overwrite it.
As mentioned on Microsoft doc, the VM pipeline agent extension will always run at last (or at least after the custom script extension)

<-->
btw, I still don't understand why
image
$PATH =/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
I played with /etc/sudoers.d folder, runuser config, .bashrc, .profile, .bash_profile (non interactive, non login shell, login shell, etc, I tried all combinaison to have the regular $PATH (= /etc/environment)
I always got the secure_path PATH from sudoers.
<-->

Anyway, so here the content of the custom script:

I create the AzDevOps user instead of letting the VM pipeline extension to do it as the 'enableagent.sh' check if the account has been already created. And the main reason is to be sure that the script in '/opt/post-generation' run after AzDevOps user creation, so the /etc/environment will contain AzDevOps instead of the default user name or $HOME
(@tonyskidmore that's one thing that your workaround does not do, so your /etc/environment contains the default user name instead of AzDevOps, I don't know if there is any issue but if you look the MS-hosted one, the PATH contains the vsts user and not the default user name)

# copied from enableagent.sh script
sudo useradd -m AzDevOps
sudo usermod -a -G docker AzDevOps
sudo usermod -a -G adm AzDevOps
sudo usermod -a -G sudo AzDevOps

sudo chmod -R +r /home
setfacl -Rdm "u:AzDevOps:rwX" /home
setfacl -Rb /home/AzDevOps

# I do not want to modify the sudoers file like on the MS hosted agent. so i create the config in sudoers.d
# As recommended on the sudoers man, I do chmod 0440 for the files
# NOPASSWD:ALL should be enough too. SETENV can be removed
sudo su -c "echo 'AzDevOps  ALL=(ALL) NOPASSWD:SETENV: ALL' | sudo tee /etc/sudoers.d/02_keep_env_for_AzDevOps && chmod 0440 /etc/sudoers.d/02_keep_env_for_AzDevOps"
# I think this one is useless but my idea is to tell not to env_reset for that particular user (AzDevOps) but it doesn't work, I always got the secure_path from sudoers file
sudo su -c "echo 'Defaults:AzDevOps    !env_reset' | sudo tee /etc/sudoers.d/01_keep_env_for_AzDevOps && chmod 0440 /etc/sudoers.d/01_keep_env_for_AzDevOps"

# copied from "https://github.com/actions/runner-images/blob/main/docs/create-image-and-azure-resources.md#ubuntu"
sudo su -c "find /opt/post-generation -mindepth 1 -maxdepth 1 -type f -name '*.sh' -exec bash {} \;"

# I believed that it will help to put something in profile.d but it doesn't change anything so this line can be removed too
sudo su -c "echo 'source /etc/environment' > /etc/profile.d/agent_env_vars.sh && chmod 755 /etc/profile.d/agent_env_vars.sh"

# Useless the 3 below - it doesnt change anything for AzDevOps PATH when I run a pipeline
echo "source /etc/environment" >> /home/AzDevOps/.bashrc
echo "source /etc/environment" >> /home/AzDevOps/.profile
echo "source /etc/environment" >> /home/AzDevOps/.bash_profile

# Thanks @tonyskidmore,  I reuse your line :)
pathFromEnv=$(cut -d= -f2 /etc/environment | tail -1)

# I create the /agent folder with the same permission as what the VM pipeline agent extension will do after that custom script
mkdir /agent && chmod 775 /agent
# I will put the proper PATH in the .path file and i will set the same permission as what the VM pipeline agent extension will do after that custom script
echo $pathFromEnv > /agent/.path && chmod 444 /agent/.path
# I do the same on /agent/.env file as the VM pipeline agent extension will append on that file instead of overwriting it (=not like for /agent/.path)
echo "PATH=$pathFromEnv" > /agent/.env && chmod 644 /agent/.env
# Change the ownership and group owner for the whole /agent folder (so recursively with -R)
sudo -E su -c 'chown -R AzDevOps:AzDevOps /agent'

# THE WORKAROUND!!! : make the /agent/.path immutable, so the VM pipeline agent extension won't be able to overwrite it even though the extension runs as root
# https://www.golinuxcloud.com/restrict-root-directory-extended-attributes/
chattr +i /agent/.path

Then I use bastion to check that it's really immutable:
image

Then I run a pipeline which I print my PATH and I run ansible --version and it works !
image

@HoLengZai
Copy link

HoLengZai commented Sep 17, 2022

Here is my final script without comments and I removed unused commands:

sudo useradd -m AzDevOps
sudo usermod -a -G docker AzDevOps
sudo usermod -a -G adm AzDevOps
sudo usermod -a -G sudo AzDevOps

sudo chmod -R +r /home
setfacl -Rdm "u:AzDevOps:rwX" /home
setfacl -Rb /home/AzDevOps

sudo su -c "echo 'AzDevOps  ALL=(ALL) NOPASSWD:ALL' | sudo tee /etc/sudoers.d/01_AzDevOps && chmod 0440 /etc/sudoers.d/01_AzDevOps"

# Must be done after AzDevOps user creation
sudo su -c "find /opt/post-generation -mindepth 1 -maxdepth 1 -type f -name '*.sh' -exec bash {} \;"

pathFromEnv=$(cut -d= -f2 /etc/environment | tail -1)

mkdir /agent && chmod 775 /agent
echo $pathFromEnv > /agent/.path && chmod 444 /agent/.path
echo "PATH=$pathFromEnv" > /agent/.env && chmod 644 /agent/.env
chown -R AzDevOps:AzDevOps /agent

chattr +i /agent/.path

In my terraform vmss module, my extension block looks like this:
This is how I push my custom script, so it will be run before the VM pipeline agent extension

  extension = {
    name                 = var.extension["name"]
    publisher            = var.extension["publisher"]
    type                 = var.extension["type"]
    type_handler_version = var.extension["type_handler_version"]
    settings = jsonencode({
      "script" = base64encode(data.local_file.sh.content)
    })
  }

  # -
  # - Custom Scripts
  # -
  data "local_file" "sh" {
    #filename = "${path.module}/files/${var.vmss_linux_script_filename}"
    filename = join("/", [path.module, "files", var.vmss_linux_script_filename])
  }

@HoLengZai
Copy link

Additional investigation:
In all the case, as mentioned on my previous comment.
The main difference between the VM pipeline agent extension and the Microsoft hosted agent is that the first one does not run the agent as a service.

  • VM pipeline agent extension use './run.sh' and put the process in the background
  • Microsoft hosted use './runsvc.sh' (not to say install the agent as a service in systemd)

I run multiple pipeline with Microsoft hosted agent, and I didn't find any .env or/and .path file in their image (by running script on the pipeline find / -name .env) since they way they deploy the pipeline agent is different.

So the issue is clearly how the VM pipeline agent extension called the 'config.sh' and the 'run.sh' file:
image

I also tried to play with runuser config file (/etc/default/runuser)

#pathFromEnv=$(cut -d= -f2 /etc/environment | tail -1)
#echo "ALWAYS_SET_PATH=yes" >> /etc/default/runuser
#echo "ENV_PATH=$pathFromEnv" >> /etc/default/runuser
#echo "ENV_ROOTPATH=$pathFromEnv" >> /etc/default/runuser

but it didn't work, I got an error 100 from the VM pipeline agent extension. and if I tried to run the config.sh manually through bastion on that VM instance, I will get multiple errors related to a lib....so missing

So the dev team which has developed enableagent.sh (VM pipeline agent extension) should change the way they call the agent config/run scripts (like sudo -E runuser -l AzDevOps -c "/bin/bash $dir/config.sh .......
But since we cannot modify the file enableagent.sh from the VM pipeline agent extension (since this extension is forced to run after Custom Script Linux extension)
That's why I only found a way to block the VM pipeline agent extension to overwrite the file '.path' created previously by the custom script linux extension is to use chattr. So even root user cannot modify the file without removing the immutable attribute.

I have also find that post very useful (#3494) .env file and PATH variable
So based on @KonstantinTyukalov comment, we might be able to only create the '.env' file on the custom script to add
PATH=<content_of_/etc/environment> and that's it (as I did too additional on my custom script echo "PATH=$pathFromEnv" > /agent/.env && chmod 644 /agent/.env)
and we do not need to use chattr too as .env got priority over .path file. And since on the env.sh script does an append to any existing .env file (not like .path which is a overwrite), it should work too.
I didn't try that option as I add the PATH on both (.path and .env) just in case.

image

@mortenlerudjordet
Copy link

mortenlerudjordet commented May 27, 2023

@HoLengZai : Thanks for the script, saved me a lot of headache after i fell face down into troubleshooting the issue.
I also do not understand the difficulty engaging MS into changing the install script the vmss extension that gets downloaded and run during agent onboarding.

Can confirm your approach makes the PATH correctly populated when running logic through the pipeline.
I also use runner-images MS repo for creating the custom vmss image to use, and had some frustration figuring out why I could not call many of the tools directly in the pipeline.

@chrisdecker1201
Copy link

@HoLengZai You made my day. Thank you very much for your work.

@ohlrogge
Copy link

We also ran into this problem with ubuntu2204. Eventhough I like the workaround from @HoLengZai - it's still a work around. A permanent solution by Microsoft is favoured by us.

@DARB-CCM-S-20
Copy link

@kuleshovilya This is not a use-case specific problem to @ChristopheLav. With the workarounds suggested in this thread it means that there is no way to create a reusable ScaleSet image with custom software to use for agents in an agent pool. In our case, we need to have ansible installed on the agent, which is installed via pipx, and therefore the /opt/pipx_bin directory needs to be appended to the path. When installing this at image build time, the PATH variable is updated, but this is then overwritten when the agent software is installed by ADO causing the incorrect $PATH in the .path file.

Please re-open this issue as it is a problem created by Microsoft's approach to custom image use, rather than a user specific scenario

@mzarglis
Copy link

mzarglis commented Aug 31, 2023

Been struggling with this issue for a week. Appreciate the workaround. @kuleshovilya Not sure why this is closed very clearly an issue many people are encountering while generating images for self-hosted build agents

enescakir added a commit to ubicloud/ubicloud that referenced this issue Sep 12, 2023
While testing our self-hosted GitHub runners on a rust repository, it
failed with:

    line 1: cargo: command not found

I found out it related to $PATH environment variable, and it revealed
additional issues.

First when I run `echo $PATH` it printed:

    PATH=$HOME/.local/bin:/opt/pipx_bin:$HOME/.cargo/bin:....:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

Also `echo $HOME` prints "/home/runner". I couldn't understand why cargo
command not found at the beginning. Then I learned in Unix-like operating
systems the `PATH` environment variable is just a string that holds a list
of directories separated by colons (`:`). Variable substitution does
occur at the moment of assignment.

I found similar issues
(actions/runner-images#3695 (comment))

Some configuration files such as $PATH related to the user's home
directory need to be changed. We need to run post-generations scripts
after first boot to configure them.
https://github.com/actions/runner-images/blob/main/docs/create-image-and-azure-resources.md#post-generation-scripts

Post-generation scripts use latest record at /etc/passwd as default
user.

We need to reconnect to vm to reload environment variables, so we
invalidate ssh cache.

This change alone was not enough. I noticed $PATH inside at workflow job
was printed as:

    /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

After some research I found another issue:
microsoft/azure-pipelines-agent#3461

runner script doesn't use global $PATH variable by default. It gets path
from secure_path at /etc/sudoers. Changing sudoers files isn't a good
thing to do. Also script load .env file, so we are able to overwrite
default path value of runner script with $PATH.

After solving these two puzzling issues, runner script is able to load
correct $PATH value.

    /home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:...:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
enescakir added a commit to ubicloud/ubicloud that referenced this issue Sep 12, 2023
While testing our self-hosted GitHub runners on a rust repository, it
failed with:

    line 1: cargo: command not found

I found out it related to $PATH environment variable, and it revealed
additional issues.

First when I run `echo $PATH` it printed:

    PATH=$HOME/.local/bin:/opt/pipx_bin:$HOME/.cargo/bin:....:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

Also `echo $HOME` prints "/home/runner". I couldn't understand why cargo
command not found at the beginning. Then I learned in Unix-like operating
systems the `PATH` environment variable is just a string that holds a list
of directories separated by colons (`:`). Variable substitution does
occur at the moment of assignment.

I found similar issues
(actions/runner-images#3695 (comment))

Some configuration files such as $PATH related to the user's home
directory need to be changed. We need to run post-generations scripts
after first boot to configure them.
https://github.com/actions/runner-images/blob/main/docs/create-image-and-azure-resources.md#post-generation-scripts

Post-generation scripts use latest record at /etc/passwd as default
user.

We need to reconnect to vm to reload environment variables, so we
invalidate ssh cache.

This change alone was not enough. I noticed $PATH inside at workflow job
was printed as:

    /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

After some research I found another issue:
microsoft/azure-pipelines-agent#3461

runner script doesn't use global $PATH variable by default. It gets path
from secure_path at /etc/sudoers. Changing sudoers files isn't a good
thing to do. Also script load .env file, so we are able to overwrite
default path value of runner script with $PATH.

After solving these two puzzling issues, runner script is able to load
correct $PATH value.

    /home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:...:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
enescakir added a commit to ubicloud/ubicloud that referenced this issue Sep 13, 2023
While testing our self-hosted GitHub runners on a rust repository, it
failed with:

    line 1: cargo: command not found

I found out it related to $PATH environment variable, and it revealed
additional issues.

First when I run `echo $PATH` it printed:

    PATH=$HOME/.local/bin:/opt/pipx_bin:$HOME/.cargo/bin:....:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

Also `echo $HOME` prints "/home/runner". I couldn't understand why cargo
command not found at the beginning. Then I learned in Unix-like operating
systems the `PATH` environment variable is just a string that holds a list
of directories separated by colons (`:`). Variable substitution does
occur at the moment of assignment.

I found similar issues
(actions/runner-images#3695 (comment))

Some configuration files such as $PATH related to the user's home
directory need to be changed. We need to run post-generations scripts
after first boot to configure them.
https://github.com/actions/runner-images/blob/main/docs/create-image-and-azure-resources.md#post-generation-scripts

Post-generation scripts use latest record at /etc/passwd as default
user.

We need to reconnect to vm to reload environment variables, so we
invalidate ssh cache.

This change alone was not enough. I noticed $PATH inside at workflow job
was printed as:

    /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

After some research I found another issue:
microsoft/azure-pipelines-agent#3461

runner script doesn't use global $PATH variable by default. It gets path
from secure_path at /etc/sudoers. Changing sudoers files isn't a good
thing to do. Also script load .env file, so we are able to overwrite
default path value of runner script with $PATH.

After solving these two puzzling issues, runner script is able to load
correct $PATH value.

    /home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:...:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
enescakir added a commit to ubicloud/ubicloud that referenced this issue Sep 13, 2023
While testing our self-hosted GitHub runners on a rust repository, it
failed with:

    line 1: cargo: command not found

I found out it related to $PATH environment variable, and it revealed
additional issues.

First when I run `echo $PATH` it printed:

    PATH=$HOME/.local/bin:/opt/pipx_bin:$HOME/.cargo/bin:....:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

Also `echo $HOME` prints "/home/runner". I couldn't understand why cargo
command not found at the beginning. Then I learned in Unix-like operating
systems the `PATH` environment variable is just a string that holds a list
of directories separated by colons (`:`). Variable substitution does
occur at the moment of assignment.

I found similar issues
(actions/runner-images#3695 (comment))

Some configuration files such as $PATH related to the user's home
directory need to be changed. We need to run post-generations scripts
after first boot to configure them.
https://github.com/actions/runner-images/blob/main/docs/create-image-and-azure-resources.md#post-generation-scripts

Post-generation scripts use latest record at /etc/passwd as default
user.

We need to reconnect to vm to reload environment variables, so we
invalidate ssh cache.

This change alone was not enough. I noticed $PATH inside at workflow job
was printed as:

    /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

After some research I found another issue:
microsoft/azure-pipelines-agent#3461

runner script doesn't use global $PATH variable by default. It gets path
from secure_path at /etc/sudoers. Changing sudoers files isn't a good
thing to do. Also script load .env file, so we are able to overwrite
default path value of runner script with $PATH.

After solving these two puzzling issues, runner script is able to load
correct $PATH value.

    /home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:...:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
@jappenzesr
Copy link

jappenzesr commented Oct 12, 2023

@ChristopheLav deserves the github medal of honor for his patience. We have run into the same issue on our agent pools built based on runner-images, and I certainly echo the astonishment that this issue is closed. Can this be re-evaluated please?

@RelicCornhusk
Copy link

This problem is clearly still afflicting users and should not be closed. Thanks for the workaround @HoLengZai!

@RobinDink
Copy link

We just encountered this issue as well, same use case (self-hosted buildagents with packer images).
It's great that someone found a workaround but this should be fixed at the source of the problem. @kuleshovilya please reopen this issue!

@patrickmoore-nc
Copy link

I have just run into this issue - pip3 packages warn they are being installed to a location outside the PATH.

Works absolutely fine using Microsoft-hosted ubuntu-latest agents. But broken for self-hosted. How can this issue be closed while something as fundamental as the path remains divergent between Azure DevOps agent types?

@pbcahill
Copy link

pbcahill commented Apr 2, 2024

Encountering this issue on Ubuntu 22 images/agents as well. Please reopen. All those custom scripts and workarounds should not be needed for something as simple as the agent reading the PATH defined in /etc/environment. Especially given this works fine on the windows images/agents when the system PATH is customized.

@pbcahill
Copy link

pbcahill commented Apr 3, 2024

If I manually ssh to one of the ubuntu scale set agents, go to the agent directory, and run the ./env.sh script, that properly updates the .path file in the agent directory with the PATH values we customized in /etc/environment. But the agent would need to be restarted for that change to reflect which we can't do in this scenario.

So it seems to me that as part of the unattended agent installation process, the ./env.sh script should be run before the agent is started. In looking at the enableagent.sh script, seems like this could be resolved by simply adding a line to run ./env.sh right before run.sh is executed:
image

Any thoughts/feedback on that idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests