Skip to content
Browse files

adding logic to retry the ssh command which accesses that database.

the sqlite db can through errors on db contention.  because this command happens when all the file is completed it is a likely place for contention under some circumstances.  When transfering and image to multiple destinations on a single VMM it is likely that they will all complete at near the same time.  This patch adds logic to retry the command with backoff

another patch to AsyncNotification.py will limit the size of the error text.
  • Loading branch information...
1 parent caf60f8 commit b28da0f61fe7850b929cfff0593260b380ccace8 @buzztroll buzztroll committed
Showing with 24 additions and 12 deletions.
  1. +19 −11 control/bin/ltclient.sh
  2. +5 −1 control/src/python/workspacecontrol/defaults/AsyncNotification.py
View
30 control/bin/ltclient.sh
@@ -62,9 +62,7 @@ do
message=`echo $out | awk -F , '{ print $3 }'`
echo $out
if [ "X$done" == "XTrue" ]; then
- if [ $rc -ne 0 ]; then
- exit $rc
- fi
+ exit $rc
fi
# once it succeds we can reset the error counter
ssh_error_cnt=0
@@ -79,15 +77,25 @@ do
done
echo "$localpath exists"
-if [ "X$done" == "XFalse" ]; then
- echo "running a blocking query"
- # if we get here the file exists but we have not yet received word of
- # suceess from the head node. run a blocking query
+echo "running a blocking query"
+done=0
+ssh_error_cnt=0
+# if we get here the file exists but we have not yet received word of
+# suceess from the head node. run a blocking query
+while [ $done -eq 0 ];
+do
ssh -p $port $userhost "$remoteexe" --reattach "$rid"
rc=$?
-else
- echo "already cleared done flag"
- rc=0
-fi
+ if [ $rc -ne 0 ]; then
+ ssh_error_cnt=`expr $ssh_error_cnt + 1`
+ if [ $ssh_error_cnt -gt 3 ]; then
+ done=1
+ else
+ sleep 0.$RANDOM
+ fi
+ else
+ done=1
+ fi
+done
echo "exiting with $rc"
exit $rc
View
6 control/src/python/workspacecontrol/defaults/AsyncNotification.py
@@ -98,7 +98,11 @@ def notify(self, name, actiondone, code, error):
else:
raise ProgrammingError("unknown actiondone for notification")
-
+
+ max_error_str_len = 256
+ if len(errtxt) > max_error_str_len:
+ self.c.log.warn("error message %s is being truncated" % (errtxt))
+ errtxt = errtxt[:max_error_str_len - 3] + "..."
errtxt = self._bashEscape(errtxt)

0 comments on commit b28da0f

Please sign in to comment.
Something went wrong with that request. Please try again.