Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/etc/udev/rules.d/70-persistent-net.rules #770

Closed
gdha opened this issue Feb 12, 2016 · 19 comments
Closed

/etc/udev/rules.d/70-persistent-net.rules #770

gdha opened this issue Feb 12, 2016 · 19 comments
Assignees
Labels
bug The code does not do what it is meant to do enhancement Adaptions and new features
Milestone

Comments

@gdha
Copy link
Member

gdha commented Feb 12, 2016

I was doing a test with a VM restore (cloning) of a RHEL6 system and noticed that the eth0 device was renamed to eth2, and eth1 to eth3. After the recover the system came up, but with no IP address configured to eth2.

In the recover log we find the following:

2016-02-11 05:34:29 Including finalize/GNU/Linux/30_create_mac_mapping.sh
cat: /sys/class/net/eth1/address: No such file or directory
2016-02-11 05:34:29 Including finalize/GNU/Linux/41_migrate_udev_rules.sh
2016-02-11 05:34:29 Updating udev configuration (70-persistent-net.rules)

However, this is easy to fix by:

  • delete the /etc/udev/rules.d/70-persistent-net.rules file and reboot, or
  • editing /etc/udev/rules.d/70-persistent-net.rules and delete the old entries and edit NAME entry (change eth2 to eth0, and eth3 to eth1) and reboot

This makes my think, wouldn't it be better to remove this restored /etc/udev/rules.d/70-persistent-net.rules file during the recover process? As this file gets recreated during the reboot.
Any ideas or suggestions?

@schlomo
Copy link
Member

schlomo commented Feb 12, 2016

Isn't that a bug earlier up in the rescue part? E.g. when we boot the system and decide how to map the old interfaces of the source system to the new interfaces of the recovery system?

the logic in 41_migrate_udev_rules.sh suggests that all decisions have been made much earlier and not here.

@jsmeix
Copy link
Member

jsmeix commented Feb 12, 2016

On my SLES12 system /etc/udev/rules.d/70-persistent-net.rules contains:

# This file was automatically generated by the /usr/lib/udev/write_net_rules
# program,run by the persistent-net-generator.rules rules file.
#
# You can modify it,as long as you keep each rule on a single
# line,and change only the value of the NAME= key.
# PCI device 0x8086:0x10de (e1000e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:26:b9:82:21:7a", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

I think it is a generic issue what to do during backup restore with files that are in the backup but should not be restored because they are automatically created and maintained by the system.

The generic traditional example of such a file is /etc/mtab. As long as it was a regular file it must not have been restored with outdated content from a backup. Nowadays it is a symbolic link to /proc/self/mounts which should probably be restored to ensure that link is available (I don't know if that link is automatically created by the system).

I think in general the backup should contain all files to be complete and it is the task of the admin to explicitly exclude files from his backup restore that he does not want to restore via EXCLUDE_RESTORE.

I think in general rear should handle the backup and restore task as "completely separated third party stuff" and not try to mess around with it.

Nevertheless rear could have in usr/share/rear/conf/default.conf a well documented predefined EXCLUDE_RESTORE list of well-known files that usually never make sense to be restored.

@schlomo
Copy link
Member

schlomo commented Feb 12, 2016

guys, I think we have a bug somewhere earlier in our process. Why did this suddenly pop up after working flawless for years?

@jsmeix
Copy link
Member

jsmeix commented Feb 12, 2016

@schlomo
As far as I understand what @gdha wrote "After the recover the system came up" it means that the issue was not in the rear recovery system but that "rear recover" had worked but afterwards when booting the recovered system the issue appeared.

I am not at all a networking configuration expert but I would assume that rear does not need to change the networking configuration after the backup restore.

I think that by default the new hardware where "rear recover" is run has to be sufficiently compatible with the old hardware where "rear mkbackup" was run so that the networking configuration files from the backup would still work on the new hardware.

I think only configuration files regarding storage (partitioning, file system, mount points - e.g. /etc/fstab) and configuration files regarding the bootloader need to be adapted by rear after the backup restore.
But I could be wrong (in particular because I am not at all a networking configuration expert).

@schlomo
Copy link
Member

schlomo commented Feb 12, 2016

Yes, exactly my point. If the recovered system comes up and does not work properly then this is a bug in ReaR. The buggy code was executed in the environment of the rescue system. Either in the boot phase or in the rear recover phase.

It worked before, even with RHEL6. So there must be a bug somewhere. Either RHEL did an update or we have a change or both.

@jsmeix
Copy link
Member

jsmeix commented Feb 12, 2016

In gereral whenever "they" change basic system stuff, then "suddenly" new interesting issues pop up in rear, in particular here it seems to be systemd/udev related.

@jsmeix
Copy link
Member

jsmeix commented Feb 12, 2016

@gdha
can you exclude /etc/udev/rules.d/70-persistent-net.rules from the backup restore via EXCLUDE_RESTORE and try again.

If it works then, it proves (from my point of view) that this issue is just one more instance of the generic issue that files that are automatically created and maintained by the system should be excluded from the backup restore.

@gdha
Copy link
Member Author

gdha commented Feb 12, 2016

@jsmeix Doing the test again must be requested, but we could do it with another system perhaps
@schlomo a bug? maybe, the MAC address should be remapped, need to dig deeper to be sure.

@gdha
Copy link
Member Author

gdha commented Feb 13, 2016

@schlomo I think it has to do with device renaming and as

2016-02-11 05:34:29 Including finalize/GNU/Linux/30_create_mac_mapping.sh
cat: /sys/class/net/eth1/address: No such file or directory

suggests if did not find the device. udevd already kicked off and read the original file (with unmodified MAC addresses) and created 2 new entries. I've seen this many times already (and noticed the same in several reported issues). When you use DHCP then it goes by unnoticed.

Personally, I think adding this file to EXCLUDE_RESTORE could prevent this. Not yet tested this theory.

@schlomo
Copy link
Member

schlomo commented Feb 13, 2016

I think that we need to find a generic solution that works regardless of the restore options. EXCLUDE_RESTORE would only cover NETFS but not the other backup methods. Imagine ReaR failing because TSM restored that file.

@jsmeix
Copy link
Member

jsmeix commented Feb 15, 2016

@schlomo I have a generic question:
Is rear meant to automatically exlude files from the backup restore that should not be restored or it this something that the admin must do?

@schlomo
Copy link
Member

schlomo commented Feb 15, 2016

IMHO for ReaR there are three independent concerns:

  1. bare metal recovery (block devices, boot loaders ...)
  2. restoring files (either through builtin backup or via external backup)
  3. system reconfiguration (network card mapping, disk mapping, drivers matching ...)

So far the part of restoring files is very independent from the other parts. Hence nothing magic here.

I would also advise against a magic integration but to always assume that the backup solution restored all files. If we need to remove some as part of the system reconfiguration then this is OK.

@jsmeix
Copy link
Member

jsmeix commented Feb 16, 2016

@schlomo many thanks for your explanation - I always appreciate your valuable background information - it helps a lot!

Now (at least for me) it is perfectly clear.

If needed rear should remove restored files as part of the system reconfiguration.

Therefore I created #779
"Move away restored files that should not have been restored."

If you agree I will implement it - but probably after the 1.18 release.

@jsmeix
Copy link
Member

jsmeix commented Feb 16, 2016

If you like, I implemented right now something for the 1.18 release, see #780

@gdha gdha added this to the Rear v1.18 milestone Feb 16, 2016
@gdha gdha self-assigned this Feb 16, 2016
@gdha gdha added enhancement Adaptions and new features fixed / solved / done and removed discuss / RFC labels Feb 18, 2016
@gdha
Copy link
Member Author

gdha commented Feb 18, 2016

Can be close as #779 pull request has been checked in.

@gdha gdha closed this as completed Feb 18, 2016
@jsmeix
Copy link
Member

jsmeix commented Feb 19, 2016

Unfortunately I think #779 "Move away restored files that should not have been restored" may not help regarding what @gdha wrote here in his initial comment #770 (comment)

Reason (as far as I see):
Via #779 files are removed immediately after backup restore. Accordingly with

BACKUP_RESTORE_MOVE_AWAY_FILES=( /etc/udev/rules.d/70-persistent-net.rules )

/etc/udev/rules.d/70-persistent-net.rules gets removed immediately after backup restore but later usr/share/rear/finalize/GNU/Linux/41_migrate_udev_rules.sh is run that creates /etc/udev/rules.d/70-persistent-net.rules anew.

When @gdha needs to remove /etc/udev/rules.d/70-persistent-net.rules in the rear recovery system after "rear recover" finished and before rebooting into the recreated system, then #779 does not help and the real fix is probably (as @schlomo suggested) in the rear recovery code that deals with udev rules.

FYI why it works in my particular case with SLE12-SP1:

In my case finalize/GNU/Linux/41_migrate_udev_rules.sh does the following (excerpt from my "rear -d -D recover" log):

+ source /usr/share/rear/restore/default/99_move_away_restored_files.sh
++ pushd /mnt/local
...
++ cp -a --parents etc/udev/rules.d/70-persistent-net.rules var/lib/rear/moved_away_after_backup_restore/
...
++ rm -rf etc/udev/rules.d/70-persistent-net.rules
...
++ popd
.
.
.
+ source /usr/share/rear/finalize/GNU/Linux/41_migrate_udev_rules.sh
...
++ echo -e 'Updating udev configuration (70-persistent-net.rules)'
++ cp /mnt/local//etc/udev/rules.d/70-persistent-net.rules /mnt/local/root/rear-70-persistent-net.rules.old
cp: cannot stat '/mnt/local//etc/udev/rules.d/70-persistent-net.rules': No such file or directory
++ cp /etc/udev/rules.d/70-persistent-net.rules /mnt/local//etc/udev/rules.d/70-persistent-net.rules

In my case 70-persistent-net.rules in the rear recovery system and the one in the original system are identical:

# diff -wups /var/lib/rear/moved_away_after_backup_restore/etc/udev/rules.d/70-persistent-net.rules /etc/udev/rules.d/70-persistent-net.rules  
Files /var/lib/rear/moved_away_after_backup_restore/etc/udev/rules.d/70-persistent-net.rules and /etc/udev/rules.d/70-persistent-net.rules are identical

so that all that deletion and re-creation does not actually change anything in my particular case.

@jsmeix jsmeix added bug The code does not do what it is meant to do and removed fixed / solved / done labels Feb 19, 2016
@jsmeix jsmeix reopened this Feb 19, 2016
@jsmeix
Copy link
Member

jsmeix commented Feb 19, 2016

I think there is perhaps a bug in 41_migrate_udev_rules.sh but I am not at all a sufficient systemd/udev expert to really help here.

I will now blindly test what happens on my system when I change 41_migrate_udev_rules.sh so that it does not do anything with 70-persistent-net.rules ...

@jsmeix
Copy link
Member

jsmeix commented Feb 19, 2016

I use two SLES12-SP1 test systems: Two KVM/QEMU virtual machines with one NIC and DHCP on each where one is the "original system" where I run "rear mkbackup" and the other one is the "replacement system" where I run "rear recover".

I changed 41_migrate_udev_rules.sh to do nothing at all by adding at its beginning

# do nothing at all:
return 0

For me the final result is the same:

99_move_away_restored_files.sh removes /etc/udev/rules.d/70-persistent-net.rules

That is all what is done with it during "rear recover" (that ends on 2016-02-19 11:49:38.597966929 - I assume the "rear recover" time is UTC).

After reboot into the recovered system 70-persistent-net.rules gets re-created identical to what it was before

# ls -l /etc/udev/rules.d/70-persistent-net.rules
-rw-r--r-- 1 root root 439 Feb 19 12:56 /etc/udev/rules.d/70-persistent-net.rules
# diff -wups /var/lib/rear/moved_away_after_backup_restore/etc/udev/rules.d/70-persistent-net.rules /etc/udev/rules.d/70-persistent-net.rules
Files /var/lib/rear/moved_away_after_backup_restore/etc/udev/rules.d/70-persistent-net.rules and /etc/udev/rules.d/70-persistent-net.rules are identical

As expected my DHCP networking still "just works".

gdha added a commit that referenced this issue Mar 2, 2016
@gdha
Copy link
Member Author

gdha commented Mar 2, 2016

@jsmeix @schlomo Yesterday we were able to retest this on the same hardware where we noticed the behavior described in this issue with the latest snapshot of 20160222.
In the /etc/rear/local.conf file we added BACKUP_RESTORE_MOVE_AWAY_FILES=( /etc/udev/rules.d/70-persistent-net.rules ) and ran a recover process where we noticed the following:

2016-03-01 05:50:18 Including finalize/GNU/Linux/41_migrate_udev_rules.sh
diff: /mnt/local//etc/udev/rules.d/70-persistent-net.rules: No such file or directory
2016-03-01 05:50:18 Updating udev configuration (70-persistent-net.rules)
cp: cannot stat `/mnt/local//etc/udev/rules.d/70-persistent-net.rules`: No such file or directory
2016-03-01 05:50:18 Including finalize/GNU/Linux/42_migrate_network_configuration_files.sh
2016-03-01 05:50:18 SED_SCRIPT: ';s/00:50:56:99:1b:49/00:50:56:99:24:97/g;s/00:50:56:99:1B:49/00:50:56:99:24:97/g;s/00:50:56:99:f7:e9/00:50:56:99:b1:25/g;s/00:50:56:99:F7:E9/00:50:56:99:B1:25/g'

Indeed the /mnt/local//etc/udev/rules.d/70-persistent-net.rules was successfully removed, and recovered from the rescue image by script 42_migrate_network_configuration_files.sh (in the meantime I improved the verbosity of this script).
However, the final /mnt/local//etc/udev/rules.d/70-persistent-net.rules file was looking good before we rebooted the system (and was still ok after the reboot as well).
This concludes my quest for this issue. If on-one resist we can close this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The code does not do what it is meant to do enhancement Adaptions and new features
Projects
None yet
Development

No branches or pull requests

3 participants