Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to upgrade a resource server from 4.1.12 to 4.2.X fails in a composite tree #4612

Closed
2 tasks done
bh9 opened this issue Dec 2, 2019 · 17 comments
Closed
2 tasks done

Comments

@bh9
Copy link

bh9 commented Dec 2, 2019

  • master
  • 4-2-stable

Bug Report

iRODS Version, OS and Version

4.1.12 -> 4.2.6, Ubuntu Xenial

What did you try to do?

install irods-server on a host which had irods-resource=4.1.12 installed and has composite resources

Expected behavior

irods-resource's preremove.sh is not run, as the irods-server package replaces it

Observed behavior (including steps to reproduce, if applicable)

irods-resource's preremove.sh is run, so irods attempts to delete the resources, and when it inevitably fails, sends an error to dpkg, which then stops the install process, so irods-server is not installed

ubuntu@terraformed-ires-green-1:~$ sudo apt-get install irods-server
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  grub-pc-bin
Use 'sudo apt autoremove' to remove it.
The following additional packages will be installed:
  irods-externals-avro1.7.7-0 irods-externals-boost1.60.0-0
  irods-externals-clang-runtime3.8-0 irods-externals-jansson2.7-0
  irods-externals-libarchive3.3.2-0 irods-externals-zeromq4-14.1.3-0
  irods-icommands irods-runtime
The following packages will be REMOVED:
  irods-resource
The following NEW packages will be installed:
  irods-externals-avro1.7.7-0 irods-externals-boost1.60.0-0
  irods-externals-clang-runtime3.8-0 irods-externals-jansson2.7-0
  irods-externals-libarchive3.3.2-0 irods-externals-zeromq4-14.1.3-0
  irods-icommands irods-runtime irods-server
0 upgraded, 9 newly installed, 1 to remove and 275 not upgraded.
Need to get 0 B/28.7 MB of archives.
After this operation, 100 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
(Reading database ... 119138 files and directories currently installed.)
Removing irods-resource (4.1.12) ...
Testing Safe Resource Removal
ERROR :: Unable To Remove a Locally Resident Resource.  Aborting.
      :: Please run 'iadmin rmresc --dryrun RESOURCE_NAME' on local resources for more information.
dpkg: error processing package irods-resource (--remove):
 installed irods-resource package pre-removal script subprocess returned error exit status 1
@mcv21
Copy link
Contributor

mcv21 commented Dec 2, 2019

AIUI irods-server from 4.2 is replacing irods-resource (and irods-icat, but I'll stick to irods-resource here WLOG); indeed it Replaces: and Breaks: them. But the prerm needs to be more careful about what it's doing.

In particular, currently it deletes resources on any action other than "upgrade"; instead it needs to handle at least deconfigure in-favour and remove in-favour - in both cases, it should not be deleting resources.

More detail on the packaging flows is available here:
https://www.debian.org/doc/debian-policy/ch-maintainerscripts.html#summary-of-ways-maintainer-scripts-are-called (and in subsequent sections)

@mcv21
Copy link
Contributor

mcv21 commented Dec 2, 2019

Also, what is actually needed here is Replaces: and Conflicts (see section 7.6.2 of https://www.debian.org/doc/debian-policy/ch-relationships.html#s-replaces ) rather than Breaks: but my comments on what prerm needs to handle remain roughly the same.

@alanking
Copy link
Contributor

alanking commented Dec 9, 2019

We have reproduced this issue on Ubuntu 14.04.

Note: Best practice is to stop running iRODS servers before upgrading which will work in this scenario with no code changes. However...

When upgrading an operational server (icat or resource) from 4.1.x to 4.2.x on Ubuntu:

  1. Replace the contents of /var/lib/irods/packaging/preremove.sh with the following lines:
#!/bin/bash
service irods stop
  1. Upgrade the iRODS server:
$ apt install irods-server
  1. Start iRODS 4.2.x server:
$ service irods start

This works regardless of the presence of resource hierarchies.

@bh9
Copy link
Author

bh9 commented Dec 10, 2019

Upgrading a stopped irods-resource fails in the same way as upgrading a running irods-resource. I think you'd have to stop the IES (provider) for it to not delete the resources (but I don't know if the script would then succeed)
Replacing the whole script works, but so does simply stopping

if [ "$PACKAGEUPGRADE" == "false" ] ; then
# =-=-=-=-=-=-=-
# determine if we can remove the resource from the zone
if [ "$SERVER_TYPE" == "resource" ] ; then
hn=`hostname`
dn=`hostname -d`
fhn="$hn.$dn"
echo "Testing Safe Resource Removal"
do_not_remove="FALSE"
# =-=-=-=-=-=-=-
# do a dryrun on the resource removal to determine if this resource server can
# be safely removed without harming any data
resources_to_remove=()
for resc in `su -c "iadmin lr" $IRODS_SERVICE_ACCOUNT_NAME`
do
# =-=-=-=-=-=-=-
# for each resource determine its location. if it is this server then dryrun
loc=$( su -c "iadmin lr $resc | grep resc_net | cut -d' ' -f2" $IRODS_SERVICE_ACCOUNT_NAME )
if [[ $loc == $hn || $loc == $fhn ]]; then
rem=$( su -c "iadmin rmresc --dryrun $resc | grep SUCCESS" $IRODS_SERVICE_ACCOUNT_NAME )
if [[ "x$rem" == "x" ]]; then
# =-=-=-=-=-=-=-
# dryrun for a local resource was a failure, set a flag
do_not_remove="TRUE"
else
# =-=-=-=-=-=-=-
# dryrun for a local resource was a success, add resc to array for removal
echo " Adding [$resc] for removal..."
resources_to_remove+=($resc)
fi
fi
done
if [[ "$do_not_remove" == "TRUE" ]]; then
# hard stop
echo "ERROR :: Unable To Remove a Locally Resident Resource. Aborting."
echo " :: Please run 'iadmin rmresc --dryrun RESOURCE_NAME' on local resources for more information."
exit 1
else
# loop through and remove the resources
for delresc in ${resources_to_remove[*]}
do
echo " Removing Resource [$delresc]"
su -c "iadmin rmresc $delresc" $IRODS_SERVICE_ACCOUNT_NAME
if [ $? != 0 ] ; then
exit 1
fi
done
fi
fi
# =-=-=-=-=-=-=-
# stop any running iRODS Processes
echo "Stopping iRODS :: $IRODS_HOME/irodsctl stop"
cd $IRODS_HOME
su --shell=/bin/bash -c "$IRODS_HOME/irodsctl stop" $IRODS_SERVICE_ACCOUNT_NAME
cd /tmp
# =-=-=-=-=-=-=-
# detect operating system
DETECTEDOS=`$IRODS_HOME_DIR/packaging/find_os.sh`
# =-=-=-=-=-=-=-
# report that we are not deleting some things
echo "NOTE :: The Local System Administrator should delete these if necessary."
# =-=-=-=-=-=-=-
# report that we are not deleting the account(s)
echo " :: Leaving $IRODS_SERVICE_ACCOUNT_NAME Service Account in place."
if [ "$DETECTEDOS" == "RedHatCompatible" ]; then # CentOS and RHEL and Fedora
echo " :: try:"
echo " :: sudo /usr/sbin/userdel $IRODS_SERVICE_ACCOUNT_NAME"
elif [ "$DETECTEDOS" == "SuSE" ]; then # SuSE
echo " :: try:"
echo " :: sudo /usr/sbin/userdel $IRODS_SERVICE_ACCOUNT_NAME"
echo " :: sudo /usr/sbin/groupdel $IRODS_SERVICE_GROUP_NAME"
elif [ "$DETECTEDOS" == "Ubuntu" ]; then # Ubuntu
echo " :: try:"
echo " :: sudo userdel $IRODS_SERVICE_ACCOUNT_NAME"
# groupdel is not necessary on Ubuntu, apparently...
fi
# =-=-=-=-=-=-=-
# remove runlevels and aliases (use os-specific tools)
if [ "$DETECTEDOS" == "Ubuntu" ] ; then
update-rc.d -f irods remove
elif [ "$DETECTEDOS" == "RedHatCompatible" ] ; then
/sbin/chkconfig --del irods
elif [ "$DETECTEDOS" == "SuSE" ] ; then
/sbin/chkconfig --del irods
fi
fi
from running, e.g. by changing the if to always be false. There is an argument that this block should only be run if you're running purge, not remove (see man dpkg for the difference between --remove and --purge)

@mcv21
Copy link
Contributor

mcv21 commented Dec 10, 2019

--purge is for removing configuration files. I'm not sure package removal should be removing resources at all, to be honest.

@trel
Copy link
Member

trel commented Dec 10, 2019

I don't understand your claim...

"Upgrading a stopped irods-resource fails in the same way as upgrading a running irods-resource."

Is the service account irods_environment.json file on the resource server pointed to itself? If so, then the iadmin commands that attempt to remove anything should fail since its own server is down. If the environment file is pointed to a different server, then why?

@mcv21 Agreed - which is why the 4.2 scripts don't attempt any resource removal.

@bh9
Copy link
Author

bh9 commented Dec 10, 2019

The irods_host was pointed to the icat, because irods_host is only used when acting as a client (per https://docs.irods.org/4.2.6/system_overview/configuration/), and clients should talk to an icat, not directly to a resource. If this should always be localhost, could that be documented please? Setting it to localhost does make the stop start approach work.

@trel
Copy link
Member

trel commented Dec 10, 2019

Generally, and by default, the service account on a resource server is wired to connect to its own server. This was the way we reproduced the work above.

Yes, clients can connect to any iRODS server, and they'll redirect their auth over to the catalog provider.

Regardless, replacing the preremove.sh script will provide certainty as the script won't do anything but stop the local server and then get out of the way.

@kript
Copy link

kript commented Dec 11, 2019

Note: Best practice is to stop running iRODS servers before upgrading which will work in this scenario with no code changes. However...

This is, to my knowledge, undocumented, at least in the iRODS docs.

@trel
Copy link
Member

trel commented Dec 11, 2019

Agreed - will rectify. Marking this as a documentation issue for 4.2.x.

@trel trel self-assigned this Dec 11, 2019
@trel trel added this to the 4.2.7 milestone Dec 11, 2019
@mcv21
Copy link
Contributor

mcv21 commented Dec 11, 2019

Note: Best practice is to stop running iRODS servers before upgrading which will work in this scenario with no code changes. However...

This is, to my knowledge, undocumented, at least in the iRODS docs.

...and is one of the things packaging systems are meant to take care of!

trel added a commit to irods/irods_docs that referenced this issue Dec 11, 2019
trel added a commit to irods/irods_docs that referenced this issue Dec 11, 2019
@trel trel closed this as completed Dec 11, 2019
@mcv21
Copy link
Contributor

mcv21 commented Dec 11, 2019

This really isn't just a documentation issue; could it be reopened in the mean time?

@bh9
Copy link
Author

bh9 commented Dec 11, 2019

Also, could the irods_host assumption be documented please?

@trel
Copy link
Member

trel commented Dec 11, 2019

Okay - will reopen and put in the 4.2.8 milestone.

  1. We'll strip the attempt to remove any iRODS resources on package removal.
  2. We'll document the assumption/best practice that service accounts should point to their own local server via irods_host in .irods/irods_environment.json.

Note: The 4.1 icat_host string in server_config.json has turned into a 4.2 catalog_provider_hosts array (this is unrelated to item 2).

@trel trel reopened this Dec 11, 2019
@trel trel modified the milestones: 4.2.7, 4.2.8 Dec 11, 2019
@trel trel added the bug label Dec 11, 2019
@trel
Copy link
Member

trel commented Dec 12, 2019

i thought this seemed familiar... #4307

thoughts on that approach? a script to manipulate the hosts themselves?

@kript
Copy link

kript commented Dec 13, 2019

Perhaps better to move that discussion to that issue, but for me, I can see this breaking the principle of least surprise for an administrator.

I'm also not sure how the scripts get on without installing the package, which will trigger the uninstall of the 4.1.x packages, and so still alarming for the admin, unless I have misunderstood something?

@trel trel modified the milestones: 4.2.8, 4.2 Backlog Mar 3, 2020
@trel
Copy link
Member

trel commented Apr 4, 2020

Oops, moving this back to 4.2.8.

@trel trel modified the milestones: 4.2 Backlog, 4.2.8 Apr 4, 2020
trel added a commit to trel/irods_docs that referenced this issue Apr 4, 2020
trel added a commit to trel/irods_docs that referenced this issue Apr 4, 2020
trel added a commit to irods/irods_docs that referenced this issue Apr 4, 2020
trel added a commit to irods/irods_docs that referenced this issue Apr 4, 2020
trel added a commit to trel/irods that referenced this issue Apr 4, 2020
The script was attempting to cleanup the iRODS catalog of any
iRODS resources resident on this to-be-removed catalog consumer server.

Any cleanup should be performed by a rodsadmin at a later time.

Removing this code in service of the 'Principle of Least Surprise'.
trel added a commit that referenced this issue Apr 4, 2020
The script was attempting to cleanup the iRODS catalog of any
iRODS resources resident on this to-be-removed catalog consumer server.

Any cleanup should be performed by a rodsadmin at a later time.

Removing this code in service of the 'Principle of Least Surprise'.
trel added a commit to trel/irods that referenced this issue Apr 4, 2020
The script was attempting to cleanup the iRODS catalog of any
iRODS resources resident on this to-be-removed catalog consumer server.

Any cleanup should be performed by a rodsadmin at a later time.

Removing this code in service of the 'Principle of Least Surprise'.
trel added a commit that referenced this issue Apr 4, 2020
The script was attempting to cleanup the iRODS catalog of any
iRODS resources resident on this to-be-removed catalog consumer server.

Any cleanup should be performed by a rodsadmin at a later time.

Removing this code in service of the 'Principle of Least Surprise'.
@trel trel closed this as completed Apr 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants