Skip to content
This repository has been archived by the owner on May 17, 2022. It is now read-only.

Commit

Permalink
g4-common: wrild: fixes & introducing watchdog
Browse files Browse the repository at this point in the history
global:

- added developer debug option (everything but no ril restarts)
- fix: SIM-less mode in rild and watchdog
- complete re-work of dontaudit (macro implemented)

rild related fixes/enhancements:

- faster detection of ril issues
- full support for encrypted devices
- more reliable boot process detection
- fix: issue when READY but no operator
- better logging in some places
- RILRESTART moved to a function
- fix: do not count when on PIN screen
- fix: counter wasn't reset in all cases

watchdog related:

before: wrild went to background forever.
now: wrild will go to background but loading a watchdog
which will run forever.

This introduces a (hopefully) intelligent watchdog for rild.
Main reason was high CPU load in very rare cases.
The watchdog is fully integrated into wrild and so makes
use of e.g. the new RILRESTART function when triggering
a restart is needed.

watchdog main features:

- actions of this watchdog begin with "woof" (adb logcat -s WRILD -e woof)
- configurable: DEBUGLOG on/off (default: /sdcard/Download/wdlog)
- configurable: CPU threshold
- configurable: max duration of breaking the CPU threshold
- configurable: watchdog frequency
- configurable: 3rd party apps which provide in-app-calls
(when in foreground: pausing, when in background and active call: pausing)

watchdog process:

The watchdog checks (intervall: WDFREQ) if the CPU and max retries thresholds
are exceeded for the defined duration.
If thats the case it will check for the following and do NOTHING when any of those
are met:

- active call (dialer) in foreground or background
- ringing (dialer)
- 3rd party call app (CALLAPPS) in foreground
- 3rd party call app (CALLAPPS) in background AND in-call

If NONE of these apply 3 logfiles get created (in DOGLOGS):

- YYYY-MM-DD_<rild-PID>_logcat.txt
- YYYY-MM-DD_<rild-PID>_dmesg.txt
- YYYY-MM-DD_<rild-PID>_ps.txt

Afterwards rild gets kicked out of the field and restarted properly.
That means we also check if the cell service coming back as it should.
If not: we restart RILD again like we did on the first boot (when necessary).

grab debug logs:

newer adb versions support pulling a complete dir so if you left DOGLOGS
at default this will do:

adb pull /sdcard/Download/wdlog
  • Loading branch information
steadfasterX committed Aug 15, 2019
1 parent 6094a17 commit 3a833ef
Show file tree
Hide file tree
Showing 7 changed files with 349 additions and 101 deletions.
270 changes: 227 additions & 43 deletions rootdir/bin/wrild.sh
Expand Up @@ -2,85 +2,269 @@
###################################################################################################
#
# workaround for randomly no SIM on boot (https://github.com/Suicide-Squirrel/issues_oreo/issues/6)
# plus a watchdog for rild high-cpu load in rare cases
#
###################################################################################################
x=1

# rild
PRTRIGGER=0
REQRESTART=99
MAXRET=10

# watchdog ("woof:" in logcat)
DEBUGLOG=1 # 0: disable debug logging, 1: enable
DOGLOGS=/sdcard/Download/wdlog # log directory when DEBUGLOG=1, path must be owned and r/w for root
TSCPU=90 # max allowed cpu usage threshold
TSTIME=60 # how many secs rild is allowed to consume TSCPU before a restart of rild is triggered
WDFREQ=20 # check frequency of the watchdog in secs
# The total check amount(!) will be calculated as: TSTIME / WDFREQ
# Examples:
# 60 / 5 = 12 checks over the 60 sec time frame
# 60 / 20 = 3 checks over the 60 sec time frame

# a "|" delimited list of apps/package names (case insensive) which are able to accept/do calls
# if one of these apps are in the foreground(!) the watchdog will skip any actions
# a package name is something like com.microsoft.office.lync15 but office.lync or lync is fully enough as well
CALLAPPS="dialer|call|whatsapp|thoughtcrime.securesms|telegram|challegram|viber|threema|slack|facebook.orca|facebook.mlite|skype|office.lync|microsoft.teams|imoim"


# internal watchdog debug mode. never touch this!
WDDEBUG=0
#####################################################################################


# logging func
F_LOG(){
# d: DEBUG e: ERROR f: FATAL i: INFO v: VERBOSE w: WARN s: SILENT
log -t WRILD -p "$1" "${0}: $2"
log -t WRILD -p "$1" "${0##*/}: $2"
}

# check for the current RIL state
# check for the current RIL and device state
F_RILCHK(){
CURSTATE=$(getprop gsm.sim.state)
CUROPER=$(getprop gsm.sim.operator.numeric)
DBOOTED=$(getprop sys.boot_completed)
CURANIM=$(getprop init.svc.bootanim)
ENC=$(getprop ro.crypto.state)
ENCSTATE=$(getprop init.svc.uncrypt)
PROPSIM=$(getprop wrild.sim.count)
if [ -z "$PROPSIM" ];then
SIMCOUNT=$(logcat -b all -d |egrep "insertedSimCount.*[01]" | egrep -o "[01]" | tail -n1)
setprop wrild.sim.count $SIMCOUNT
else
F_LOG i "using previous detected sim count.."
SIMCOUNT=$PROPSIM
fi

F_LOG "i" "sys.boot_completed >$DBOOTED<"
F_LOG "i" "gsm.sim.state >$CURSTATE<"
F_LOG "i" "gsm.sim.operator.numeric >$CUROPER<"
F_LOG "i" "init.svc.bootanim >$CURANIM<"
F_LOG "i" "enc, encstate: >$ENC<, >$ENCSTATE<"
F_LOG "i" "sim count: >$SIMCOUNT<"

if [ "$CURSTATE" == "READY" ]; then
if [ "$CURSTATE" == "READY" ] && [ ! -z "$CUROPER" ]; then
echo 0
elif [ "$DBOOTED" != "1" ];then
echo 7
elif [ "$CURSTATE" == "PIN_REQUIRED" ]; then
elif [ "$CURANIM" != "stopped" ];then
echo 7
elif [ "$ENC" == "encrypted" ] && [ "$ENCSTATE" != "stopped" ];then
echo 7
elif [ "$CURSTATE" == "PIN_REQUIRED" ]; then
echo 9
elif [ "$CURSTATE" == "LOADED" ] && [ -z "$CUROPER" ];then
sleep 20
F_LOG i "LOADED but no operator yet .. sleeping 25s"
sleep 25
echo 1
elif [ "$CURSTATE" == "LOADED" ] && [ ! -z "$CUROPER" ];then
echo 0
elif [ "$SIMCOUNT" == "0" ];then
echo 42
else
sleep 20
F_LOG i "No condition met (yet) .. sleeping 10s"
sleep 10
echo 1
fi
}

while [ "$REQRESTART" -ne 0 ];do
REQRESTART=$(F_RILCHK)

# PIN_REQUIRED means usually the user get prompted - unfortunately
# sometimes there is no prompt.
# this will restart RIL not on the first but every second run only (which should be safe) and
# let the user enough time to enter the PIN if the prompt appears
if [ "$REQRESTART" -eq 9 ]&&[ $PRTRIGGER -eq 0 ];then
F_LOG i "PIN_REQUIRED detected. waiting 40s for user input.." && sleep 40
PRTRIGGER=1
elif [ "$REQRESTART" -eq 7 ];then
F_LOG i "Boot (still) in progress ... hanging around for 20s ..." && sleep 20
x=1
elif [ "$REQRESTART" -eq 1 ];then
F_LOG w "RIL restart - try $x of $MAXRET"
stop real-ril-daemon
sleep 1
start real-ril-daemon
F_LOG w "restarted RIL daemon as REQRESTART was set to >$REQRESTART<"
sleep 40
PRTRIGGER=0
elif [ "$REQRESTART" -eq 9 ] ;then
F_LOG i "PIN_REQUIRED detected. waiting another minute for user input.." && sleep 60
elif [ "$REQRESTART" -eq 0 ] ;then
F_LOG i "no restart required. RILD seems to work properly already." && break
F_RILRESTART(){
x=$1
while [ "$REQRESTART" -ne 0 ];do
REQRESTART=$(F_RILCHK)

# PIN_REQUIRED means usually the user get prompted - unfortunately
# sometimes there is no prompt.
# this will restart RIL not on the first but every second run only (which should be safe) and
# let the user enough time to enter the PIN if the prompt appears
if [ "$REQRESTART" -eq 9 ]&&[ $PRTRIGGER -eq 0 ];then
F_LOG i "PIN_REQUIRED detected. waiting 40s for user input.." && sleep 40
PRTRIGGER=1
elif [ "$REQRESTART" -eq 7 ];then
F_LOG i "Boot (still) in progress ... hanging around for 10s ..." && sleep 10
x=1
elif [ "$REQRESTART" -eq 1 ];then
[ $WDDEBUG == 1 ] && F_LOG e "!!!! DEBUG MODE DEBUG MODE - NO ACTION TAKEN !!!!"
if [ -d /sdcard/Download/ ];then
F_LOG w "RIL restart - try $x of $MAXRET"
[ $WDDEBUG == 0 ] && stop real-ril-daemon
sleep 1
[ $WDDEBUG == 0 ] && start real-ril-daemon
F_LOG w "restarted RIL daemon as REQRESTART was set to >$REQRESTART<"
[ $WDDEBUG == 0 ] && sleep 40
PRTRIGGER=0
x=$((x + 1))
else
F_LOG w "Skipping restart as /data isn't mounted yet ..." && sleep 5
fi
elif [ "$REQRESTART" -eq 9 ] ;then
F_LOG i "PIN_REQUIRED detected. waiting another minute for user input.." && sleep 60
x=1
elif [ "$REQRESTART" -eq 0 ] ;then
F_LOG i "no restart required. RILD seems to work properly already." && break
elif [ "$REQRESTART" -eq 42 ];then
F_LOG e "No SIM detected!" && return $REQRESTART
else
F_LOG e "unusual state detected . waiting 20s before another try .." && sleep 20
fi
if [[ $x -eq $MAXRET ]];then
F_LOG e "auto restart RIL daemon aborted.. too many tries!"
return 99
fi
done
}

# endless sleep
F_DOZE(){
if [ $WDDEBUG == 0 ];then
while true; do F_LOG i "Watchdog has been disabled :(" && sleep 86400 ;done
else
F_LOG e "unusual state detected . waiting 20s and will try again .." && sleep 20
fi
x=$((x + 1))
if [[ $x -eq $MAXRET ]];then
F_LOG e "auto restart RIL daemon aborted.. too many tries!"
break
F_LOG i "woof: DEBUG MODE !!!! Watchdog would have been disabled but as we debug..."
fi
done
}

# restart RIL in defined conditions.
F_RILRESTART 1
RRET=$?

MYPID=$(ps -opid,cmd|grep wrild.sh| egrep -o "[0-9]+")
F_LOG i "RIL should be fine now. Going to background with pid >$MYPID<"
F_LOG i "RIL handling finished. Going to background with pid >$MYPID<"

# when too many retries (returncode:99) the watchdog does not need to be started
# but we keep wrild running as we are a service
[ $RRET -eq 99 -o $RRET -eq 42 ] && F_DOZE

###############################################################################################
#
# watchdog area
#
###############################################################################################

F_LOG i "woof: watchdog is starting...."

# debug logs - if enabled
[ -d "$DOGLOGS" ] && rm -rf $DOGLOGS
F_LOGRIL(){
if [ "$DEBUGLOG" -ne 0 ];then
mkdir $DOGLOGS
F_LOG i "woof: debug logging started"
LOGPID=$1
#TIMESTMP="$(date +%F-%H-%M-%S)"
TIMESTMP="$(date +%F)"
logcat -b all -d -D >> $DOGLOGS/${TIMESTMP}_${LOGPID}_logcat.txt \
&& F_LOG w "woof: debug log written: $DOGLOGS/${TIMESTMP}_${LOGPID}_logcat.txt"
[ $WDDEBUG == 0 ] && logcat -b all -c && F_LOG w "woof: CLEARED LOGCAT"
echo -e "\n\n$(date):\n\n $(dmesg -c)" >> $DOGLOGS/${TIMESTMP}_${LOGPID}_dmesg.txt && F_LOG w "woof: debug log written: $DOGLOGS/${TIMESTMP}_${LOGPID}_dmesg.txt"
echo -e "\n\n$(date):\n\n $(ps -A)" >> $DOGLOGS/${TIMESTMP}_${LOGPID}_ps.txt && F_LOG w "woof: debug log written: $DOGLOGS/${TIMESTMP}_${LOGPID}_ps.txt"
fi
}

# watch the dog
F_WOOF(){
DOG="$1"
F_LOG d "woof: sniffing for $DOG"
for dog in $(ps -A -opid:1,cmd:4,pcpu:0 | grep -v wrild | grep " $DOG"| tr " " "," | cut -d "," -f 1,3);do
#dpid=$(echo "${dog/,*/}"| egrep -o '[0-9]+')
dpid="${dog/,*/}"
dcpu=$(printf "%.0f" "${dog/*,/}")
# if we found a dog which breaks the threshold immediately inform the watch proc & catch logs
if [ "$dcpu" -ge "$TSCPU" ];then
F_LOG w "woof: $dog - current cpu usage: $dcpu %, pid: $dpid"
echo $dpid && return 3
fi
done
F_LOG d "woof: $DOG is a good doggie (normal CPU usage) ..."
echo 0 && return 0
}

# bite the dog - but BEWARE OF THE DRAGONS!
F_BITEDOG(){
DPID=$1

# dragon: regular phone call
INACALL=$(dumpsys telephony.registry | egrep -o 'mCallState=.*')
case $INACALL in
mCallState=0) # idle
;;
mCallState=1) # ringing
F_LOG w "woof: will not bite the dog because it barks (ringing)"
return 3
;;
mCallState=2) # active call
F_LOG w "woof: will not bite the dog because he would bite back (active call)"
return 4
;;
esac

# run forever
while true; do sleep 86400; done
# dragon: 3rd party app with calling support - foreground
unset CAPP
CAPP=$(dumpsys window windows | grep mCurrentFocus | egrep -oi "$CALLAPPS" | head -n 1)
[ ! -z $CAPP ] && F_LOG w "woof: will not bite the dog because he has supercow powers (found a calling app in foreground: $CAPP)" && return 5

# dragon: 3rd party app within a call - in background
unset CAPP
CAPP=$(dumpsys window windows |grep topApp | egrep -i "$CALLAPPS" | egrep -i "voip|call" | head -n 1 | tr -d " ")
[ ! -z $CAPP ] && F_LOG w "woof: will not bite the dog because he is HIGH (active call in background) ..." && F_LOG w "woof: calling app in background: $CAPP" && return 6

F_LOG w "woof: woof! saying goodbye to rild? We will see.."
F_LOGRIL $DPID
# reset operator id to ensure RIL gets restarted
[ $WDDEBUG == 0 ] && setprop gsm.sim.operator.numeric ""
REQRESTART=1 F_RILRESTART 2
[ $? -eq 99 -o $? -eq 42 ] && F_DOZE
}

# delay the very first watchdog run
[ $WDDEBUG == 0 ] && F_LOG i "woof: yawn ... I think .. I will sleep a bit before actually starting my work (2 min)" && sleep 120
[ $WDDEBUG == 1 ] && F_LOG e "woof: !!! DEBUG MODE DEBUG MODE !!! SLEEP DISABLED FOR FIRST WD RUN!"

# run forever and watch out for dogs
WCNT=$(($TSTIME/WDFREQ))
[ $WDDEBUG == 1 ] && WCNT=2 && TSCPU=0
while true; do
DOGPID=$(F_WOOF rild)
WOOFRET=$?
if [ $WOOFRET -eq 0 ];then
F_LOG d "woof: all fine ... shhhh... don't wake sleeping dogs ..."
WCNT=$(($TSTIME/WDFREQ))
else
WCNT=$((WCNT - 1))
if [ "$WCNT" -gt 0 ];then
[ $WDDEBUG == 1 ] && F_LOG e "woof: !!!! DEBUG MODE DEBUG MODE !!!!"
F_LOG i "woof: rild ($DOGPID) eats more CPU than is good for us - over ${TSCPU}% ... (countdown: $WCNT)"
else
[ $WDDEBUG == 1 ] && F_LOG e "woof: !!!! DEBUG MODE DEBUG MODE !!!!"
# trigger and give it time to come back
F_LOG w "woof: the hunt is open! run rild RUN ... (restarting $DOGPID)"
F_BITEDOG $DOGPID
[ $WDDEBUG == 0 ] && sleep 30
WCNT=$(($TSTIME/WDFREQ))
[ $WDDEBUG == 1 ] && WCNT=2
fi
fi
[ $WDDEBUG == 1 ] && WDFREQ=5
sleep $WDFREQ
done

F_LOG f "killed?!"
#############################################################################################
F_LOG f "wtf!? This should never happen!"
1 change: 1 addition & 0 deletions sepolicy/logd.te
@@ -0,0 +1 @@
allow logd unlabeled:dir search;
1 change: 1 addition & 0 deletions sepolicy/property.te
@@ -1 +1,2 @@
type qti_init_prop, property_type;
type wrild_prop, property_type;
1 change: 1 addition & 0 deletions sepolicy/property_contexts
Expand Up @@ -3,3 +3,4 @@ ro.qualcomm.bluetooth. u:object_r:qti_init_prop:s0
ro.bluetooth. u:object_r:qti_init_prop:s0
persist.data.rear. u:object_r:camera_prop:s0
persist.data.front. u:object_r:camera_prop:s0
wrild. u:object_r:wrild_prop:s0
5 changes: 5 additions & 0 deletions sepolicy/servicemanager.te
Expand Up @@ -16,3 +16,8 @@ allow servicemanager mm-pp-daemon:process getattr;
allow servicemanager vendor_per_mgr:dir search;
allow servicemanager vendor_per_mgr:file { open read };
allow servicemanager vendor_per_mgr:process getattr;

# watchdog
allow servicemanager wrild:dir search;
allow servicemanager wrild:file { read open };
allow servicemanager wrild:process getattr;
4 changes: 4 additions & 0 deletions sepolicy/system_server.te
Expand Up @@ -26,3 +26,7 @@ allow system_server qmuxd_socket:dir { add_name write search };
allow system_server qmuxd_socket:sock_file { create setattr write };

allow system_server vfat:dir { open read write };

# watchdog
allow system_server wrild:fifo_file write;
allow system_server wrild:fd use;

0 comments on commit 3a833ef

Please sign in to comment.