Skip to content

First Boot Wizard crashes when radio without hardware exists in UCI #1225

@Pablomonte

Description

@Pablomonte

First Boot Wizard crashes when radio without hardware exists in UCI

Problem Description

The First Boot Wizard (FBW) crashes during network scanning when there's a radio configured in UCI (/etc/config/wireless) but the corresponding physical hardware (phy) doesn't exist in the system.

Symptoms

  • FBW starts scanning and detects mesh networks successfully
  • Process crashes silently during config download phase
  • Frontend shows "Connection attempt not yet started" indefinitely
  • /tmp/scanning file remains true (no cleanup)
  • No config files downloaded to /tmp/fbw/

Steps to Reproduce

  1. Have a router with a stale UCI radio configuration (e.g., radio2) pointing to non-existent PCI hardware
  2. Start First Boot Wizard scan via lime-app
  3. FBW detects networks but crashes when trying to process them

Environment

Hardware: Router with 2 physical radios (phy0, phy1)
UCI Config: 3 radios configured (radio0, radio1, radio2)

# Physical radios
root@LiMe-1d2ae2:~# ls /sys/class/ieee80211/
phy0  phy1

# UCI radios
root@LiMe-1d2ae2:~# uci show wireless | grep "^wireless.radio"
wireless.radio0=wifi-device
wireless.radio0.path='platform/ahb/18100000.wmac'
wireless.radio1=wifi-device
wireless.radio1.path='pci0000:00/0000:00:00.0'
wireless.radio2=wifi-device
wireless.radio2.path='pci0000:01/0000:01:00.0'  # <-- Hardware doesn't exist

Error Log

root@LiMe-1d2ae2:~# /bin/firstbootwizard
[FBW] Scanning...
/usr/bin/lua: /usr/lib/lua/lime/wireless.lua:19: wireless.get_phy_mac(..) failed reading: /sys/class/ieee80211/phy2/macaddress
stack traceback:
	[C]: in function 'assert'
	/usr/lib/lua/lime/wireless.lua:19: in function 'get_phy_mac'
	/usr/lib/lua/firstbootwizard.lua:110: in function 'func'
	/usr/lib/lua/firstbootwizard/functools.lua:63: in function </usr/lib/lua/firstbootwizard/functools.lua:60>
	(tail call): ?
	/usr/lib/lua/firstbootwizard.lua:127: in function 'cb'
	/usr/lib/lua/firstbootwizard/functools.lua:127: in function 'reduce'
	/usr/lib/lua/firstbootwizard.lua:430: in function 'get_all_networks'
	/bin/firstbootwizard:7: in main chunk
	[C]: ?

Root Cause Analysis

The bug occurs in this call chain:

  1. firstbootwizard.lua:110 - fbw.get_own_macs() iterates over all 5GHz radios
  2. firstbootwizard/utils.lua:78 - extract_phys_from_radios("radio2") returns "phy2"
    function utils.extract_phys_from_radios(radio)
        return "phy"..radio.sub(radio, -1)  -- Assumes radioN = phyN
    end
  3. wireless.lua:110 calls wireless.get_phy_mac("phy2")
  4. wireless.lua:19 - assert() crashes when file doesn't exist:
    function wireless.get_phy_mac(phy)
        local path = "/sys/class/ieee80211/"..phy.."/macaddress"
        local mac = assert(fs.readfile(path), "wireless.get_phy_mac(..) failed reading: "..path):gsub("\n","")
        return utils.split(mac, ":")
    end

Why the incorrect mapping happens

The code incorrectly assumes that radioN always corresponds to phyN:

  • radio0phy0
  • radio1phy1
  • radio2phy2 ❌ (phy2 doesn't exist)

This is fragile because:

  • Radio names are UCI configuration names (can be arbitrary)
  • Phy names are kernel-assigned based on hardware detection order
  • A radio can be removed/disabled in hardware but remain in UCI config

Proposed Solutions

Solution 1: Graceful error handling (Quick fix)

Modify wireless.get_phy_mac() to return nil instead of crashing:

function wireless.get_phy_mac(phy)
	local path = "/sys/class/ieee80211/"..phy.."/macaddress"
	-- Check if phy exists before trying to read MAC
	if not fs.stat(path) then
		utils.log("wireless.get_phy_mac: phy "..phy.." does not exist, skipping")
		return nil
	end
	local mac = assert(fs.readfile(path), "wireless.get_phy_mac(..) failed reading: "..path):gsub("\n","")
	return utils.split(mac, ":")
end

Then update fbw.get_own_macs() to filter out nil results:

function fbw.get_own_macs()
    local radios = ft.map(utils.extract_prop(".name"), wireless.scandevices())
    local radios_5ghz = ft.filter(wireless.is5Ghz, radios)
    local phys = ft.map(utils.extract_phys_from_radios, radios_5ghz)
    local macs = ft.map(function(phy)
        local mac = wireless.get_phy_mac(phy)
        if mac then
            return table.concat(mac, ":")
        end
        return nil
    end, phys)
    -- Filter out nils
    return ft.filter(function(mac) return mac ~= nil end, macs)
end

Solution 2: Correct radio→phy mapping (Proper fix)

Don't assume radioN = phyN. Instead, derive the phy from the radio's device path:

function wireless.get_phy_from_radio(radio_name)
    local uci = config.get_uci_cursor()
    local path = uci:get("wireless", radio_name, "path")
    if not path then
        utils.log("wireless.get_phy_from_radio: no path for radio "..radio_name)
        return nil
    end

    -- Find phy by matching device path
    for phy_dir in fs.dir("/sys/class/ieee80211/") do
        if phy_dir ~= "." and phy_dir ~= ".." then
            local device_link = fs.readlink("/sys/class/ieee80211/"..phy_dir.."/device")
            if device_link and device_link:find(path, 1, true) then
                return phy_dir
            end
        end
    end

    utils.log("wireless.get_phy_from_radio: phy not found for radio "..radio_name.." with path "..path)
    return nil
end

Solution 3: Filter radios during scandevices (Most robust)

Modify wireless.scandevices() to only return radios that have corresponding hardware:

function wireless.scandevices()
    local devices = {}
    local uci = config.get_uci_cursor()

    uci:foreach("wireless", "wifi-device", function(dev)
        -- Check if hardware exists for this radio
        local path = dev.path
        if path then
            local phy_exists = false
            for phy_dir in fs.dir("/sys/class/ieee80211/") do
                if phy_dir ~= "." and phy_dir ~= ".." then
                    local device_link = fs.readlink("/sys/class/ieee80211/"..phy_dir.."/device")
                    if device_link and device_link:find(path, 1, true) then
                        phy_exists = true
                        break
                    end
                end
            end

            if phy_exists then
                devices[dev[".name"]] = dev
            else
                utils.log("wireless.scandevices: skipping radio "..dev[".name"].." - hardware not found")
            end
        else
            utils.log("wireless.scandevices: skipping radio "..dev[".name"].." - no path defined")
        end
    end)

    -- ... rest of the function
end

Workaround

Users can work around this by cleaning up stale radio configurations:

# Identify stale radios
for radio in $(uci show wireless | grep "=wifi-device" | cut -d. -f2 | cut -d= -f1); do
    path=$(uci get wireless.$radio.path 2>/dev/null)
    if [ -n "$path" ]; then
        # Check if hardware exists
        if ! ls -d /sys/devices/$path/ieee80211/phy* >/dev/null 2>&1; then
            echo "Stale radio: $radio (path: $path)"
            uci delete wireless.$radio
        fi
    fi
done
uci commit wireless

Related Issues

Impact

  • Severity: High - FBW completely breaks on affected systems
  • Frequency: Medium - affects routers with hardware changes or stale configs
  • User Experience: Critical - prevents initial network setup

Additional Context

This bug was discovered while debugging why FBW was stuck showing "Connection attempt not yet started" in lime-app. The silent crash leaves the system in an inconsistent state with /tmp/scanning locked to true, preventing subsequent scan attempts until manual cleanup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions