Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage message_mutex in motorOms (in drvMAXv.cc observed) and then in motordrvCom.cc #172

Closed
hjunkes opened this issue Dec 4, 2020 · 12 comments
Milestone

Comments

@hjunkes
Copy link

hjunkes commented Dec 4, 2020

Besides small problems with VME access and devLib2 with RTEMS5,
I came across the following problem: A mutex is created in drvMAXv.cc:
in motor_init a mutex is created:

    pvtdata = (struct MAXvController *) malloc(sizeof(struct MAXvController));
    pvtdata->message_mutex = epicsMutexMustCreate();
    pmotorState->DevicePrivate = pvtdata;

This mutex is used during configuration of the card in the IOC-Init-Thread/Task and therefore the mutex "belongs to" (is owned by) the init task.
But after the initialization of the board the "motor_task" is generated:

epicsThreadCreate((char *) "MAXv_motor", epicsThreadPriorityMedium,
                  epicsThreadGetStackSize(epicsThreadStackMedium),
                  (EPICSTHREADFUNC) motor_task, (void *) &targs);

Subsequent commands to the MAXv card will be triggered by this epicsThread, but the mutex is not owned by this task and this leads to error messages (not owner with RECURSIVE pthread_mutex'es) or to supending the task with the "old" phtread mutex'es.
as a small workaround I have changed the following:

@@ -1391,6 +1395,8 @@ static int motor_init()
                 /* Is this needed??? */
                 send_recv_mess(card_index, DONE_QUERY, MAXv_axis[motor_index], axis_pos, 1);
             }
+            Debug(1, "motor_init: destroy mutex\n");
+            epicsMutexDestroy(pvtdata->message_mutex);

             Debug(2, "motor_init: Init Address=%p\n", localaddr);
             Debug(3, "motor_init: Total encoders = %d\n", total_encoders);

and consequently also in motordrvCom.cc

@@ -188,6 +200,7 @@ epicsShareFunc int motor_task(struct thread_args *args)
             for (itera = 0; itera < *tabptr->cardcnt_ptr; itera++)
             {
                 struct controller *brdptr = (*tabptr->card_array)[itera];
+                ((struct MAXvController*)(brdptr->DevicePrivate))->message_mutex = epicsMutexMustCreate();
                 if (brdptr != NULL && brdptr->motor_in_motion)
                     stale_data_delay = query_axis(itera, tabptr, previous_time, stale_data_max_delay);
             }

Before I start a pull request I wanted to discuss this first. Maybe I have missed something completely.

@tboegi
Copy link
Contributor

tboegi commented Dec 4, 2020

@hjunkes : Could that be elaborated a little bit?
"Usage message_..": What should this mean ?
It is used, but shouldn't ?

grep -R message_mutex *
modules/motorOms/omsApp/src/drvMAXv.cc:#define MUTEX(card) ((struct MAXvController *)(motor_state[card]->DevicePrivate))->message_mutex
modules/motorOms/omsApp/src/drvMAXv.cc: pvtdata->message_mutex = epicsMutexMustCreate();
modules/motorOms/omsApp/src/drvMAXv.h: epicsMutexId message_mutex;
tb@susi:/nfs/home/tb/NoBackup/projects/epics/motor$ grep -R MUTEX *
modules/motorOms/omsApp/src/drvMAXv.cc:#define MUTEX(card) ((struct MAXvController *)(motor_state[card]->DevicePrivate))->message_mutex
modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) {
modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexLock(MUTEX(card));
modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexUnlock(MUTEX(card));
modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) {
modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexLock(MUTEX(card));
modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexUnlock(MUTEX(card));
modules/motorOms/omsApp/src/drvMAXv.cc: if (!epicsMutexTryLock(MUTEX(card))) {
modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexLock(MUTEX(card));
modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexUnlock(MUTEX(card));
modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexUnlock(MUTEX(card));
modules/motorOms/omsApp/src/drvMAXv.cc: epicsMutexUnlock(MUTEX(card));

@hjunkes
Copy link
Author

hjunkes commented Dec 4, 2020

Hallo Torsten, Du warst viel schneller als ich alles eingeben konnte ;-)

@MarkRivers
Copy link
Member

I guess I don't understand the issue. Nearly all of my drivers create an epicsMutex in a constructor that is called by whatever thread is running iocsh, and then they are used by either asyn port threads or threads created with epicsThreadCreate in the constructor. I have never heard of an issue relating to what task "owns" a mutex. The whole point is that mutexes must be shared among multiple tasks (threads).

I use the drvMAXv.cc driver a lot on vxWorks and have never seen the errors you describe.

Is this on RTEMS?

@hjunkes
Copy link
Author

hjunkes commented Dec 4, 2020

Yes, this is in RTEMS5 and we now use the posix-mutex'es.
And in modules/libcom/src/osi/os/posix/osdMutex.c one find

void epicsMutexOsdUnlock(struct epicsMutexOSD * pmutex)
{
int status;

status = mutexLock(&pmutex->lock);
checkStatus(status, "pthread_mutex_lock epicsMutexOsdUnlock");
if(status)
    return;

if ((pmutex->count <= 0) || (pmutex->ownerTid != pthread_self())) {
    pthread_mutex_unlock(&pmutex->lock);
    checkStatus(status, "pthread_mutex_unlock epicsMutexOsdUnlock");
    errlogPrintf("epicsMutexOsdUnlock but caller is not owner\n");
    cantProceed("epicsMutexOsdUnlock but caller is not owner");
    return;
}

and (https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_74/apis/users_65.htm)

The pthread_mutex_unlock() function unlocks the mutex specified. If the calling thread does not currently hold the mutex (via a previous call to pthread_mutex_lock(), pthread_mutex_trylock(), or pthread_mutex_timedlock_np()) the unlock request fails with the EPERM error.

And I have the feeling that on creation of the mutex the tid is crucial. That should be noticeable with Linux systems?

@MarkRivers
Copy link
Member

Is the problem that unlock is being called but lock has not been called by that thread?

@hjunkes
Copy link
Author

hjunkes commented Dec 4, 2020

Yes, that's what I suspected at first. But I could not verify it for sure. A problem could be that I have no motors connected to the MAXv at the moment. Possibly it is an unfavorable timing. Maybe it's enough to make sure that no more messages are pending before generating the motor-task? I don't know if it is generally a good idea to create a lock in a task that can be used for other tasks after the init task is finished?

@hjunkes
Copy link
Author

hjunkes commented Jan 11, 2021

I have now looked at the whole thing again.

In drvMAXv.cc (and only there, with other oms boards mutexLock/Unlock is not used) the mutex is created in motor_init, which is then used in recv/send_mess().

motor_init is called by MAXvConfig() in st.cmd (i.e. in the init - task).

At the end of motor_init the epicsThread "MAXv_motor" is created and in it the mutex previously created by init-task (with a different task-ID) is further used.

That's why one get these messages:
epicsMutex pthread_mutex_unlock epicsMutexOsdUnlock failed: error Not owner

@anjohnson
Copy link
Member

I now understand the problem @hjunkes is bringing up.

The drvMAXv.cc code uses this construct to take its mutexes:

    if (!epicsMutexTryLock(MUTEX(card))) {
        Debug(1, "Waiting for mutex\n");
        epicsMutexLock(MUTEX(card));
    }

That looks like it should be reasonable, but unfortunately it's actually very badly broken, because the return status value from epicsMutexTryLock() is epicsMutexLockOK=0 (false) when it has successfully taken the lock.

The first time the above code gets called there is no contention for the lock so epicsMutexTryLock() succeeds and returns 0. The if (!0) block thus gets run, calling epicsMutexLock() which takes the mutex a second time. Since there is only one call to epicsMutexUnlock() afterwards the mutex is still owned by the thread that first took it. Now another thread wants to access the device and calls the above code again; this time the epicsMutexTryLock() fails and returns epicsMutexLockTimeout=1 but the if (!1) block gets skipped and the thread carries on its merry way and does its I/O anyway. Finally it releases the mutex by calling epicsMutexUnlock(), but since the mutex ownership is wrong that generates the error that Heinz was seeing.

I can understand how this coding error might have happened: The C++ class epicsMutex wrapper has a tryLock() method which returns true if it succeeds, whereas the epicsMutexTryLock() routine returns 0 (false) if it succeeds. That's a bad API design from EPICS Core, for which I apologize but probably can't do much to change.

I recommend replacing all instances of if (!epicsMutexTryLock(...)) with if (epicsMutexTryLock(...) != epicsMutexLockOK). I would also check the other modules in synApps for this usage in case the same construct is used elsewhere.

@MarkRivers
Copy link
Member

I just searched for epicsMutexTryLock in all .c* files in my synApps tree, including areaDetector. This is what I find:

corvette:~/devel>find . -type f -name '*.c*' -exec grep -U3 -H epicsMutexTryLock {} \;
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-static int send_recv_mess(int card, const char * command, const char *axis, char *buf, int nMessages) {
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    int retval;
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc:    if (!epicsMutexTryLock(MUTEX(card))) {
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        Debug(1, "send_recv_mess: waiting for mutex\n");
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        epicsMutexLock(MUTEX(card));
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    }
--
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    pmotor = (struct MAXv_motor *) motor_state[card]->localaddr;
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    Debug(9, "send_mess: pmotor = %p\n", pmotor);
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc:    if (!epicsMutexTryLock(MUTEX(card))) {
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        Debug(1, "send_mess: waiting for mutex\n");
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        epicsMutexLock(MUTEX(card));
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    }
--
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    pmotor = (struct MAXv_motor *) motor_state[card]->localaddr;
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc:    if (!epicsMutexTryLock(MUTEX(card))) {
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        Debug(1, "recv_mess: waiting for mutex\n");
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-        epicsMutexLock(MUTEX(card));
./motor-7-0/modules/motorOms/omsApp/src/drvMAXv.cc-    }
./asyn-4-41/asyn/asynDriver/asynManager.c-    if(details>=1) {
./asyn-4-41/asyn/asynDriver/asynManager.c-        epicsMutexLockStatus mgrStatus;
./asyn-4-41/asyn/asynDriver/asynManager.c-        epicsMutexLockStatus syncStatus;
./asyn-4-41/asyn/asynDriver/asynManager.c:        syncStatus = epicsMutexTryLock(pport->synchronousLock);
./asyn-4-41/asyn/asynDriver/asynManager.c-        if(syncStatus==epicsMutexLockOK)
./asyn-4-41/asyn/asynDriver/asynManager.c-             epicsMutexUnlock(pport->synchronousLock);
./asyn-4-41/asyn/asynDriver/asynManager.c:        mgrStatus = epicsMutexTryLock(pport->asynManagerLock);
./asyn-4-41/asyn/asynDriver/asynManager.c-        if(mgrStatus==epicsMutexLockOK)
./asyn-4-41/asyn/asynDriver/asynManager.c-             epicsMutexUnlock(pport->asynManagerLock);
./asyn-4-41/asyn/asynDriver/asynManager.c-        fprintf(fp,"    enabled:%s connected:%s numberConnects %lu\n",
corvette:~/devel>

The only place I see a problem is in drvMAXv.cc.

Note that I tried to use "git submodules" with "git grep" in motor, but that returns an error:

corvette:~/devel/motor>git submodule foreach git grep epicsMutexTryLock
Entering 'modules/motorAMCI'
Stopping at 'modules/motorAMCI'; script returned non-zero status.

Why doesn't that work?

@tboegi
Copy link
Contributor

tboegi commented Jan 11, 2021 via email

@anjohnson
Copy link
Member

The only place I see a problem is in drvMAXv.cc.

Good, thanks for checking that.

I tried to use "git submodules" with "git grep" in motor, but that returns an error:
Why doesn't that work?

git grep has a --recurse-submodules option which is probably faster. Your first inner 'git grep' command returned a non-zero status because it didn't get any matches from the motorAMCI module, which stopped the git submodule foreach from looping (it thought it got an error). You could have used this instead which ignores errors:

$ git submodule foreach 'git grep epicsMutexTryLock || true'

@hjunkes
Copy link
Author

hjunkes commented Jan 12, 2021

will create a pull request for this patch:

diff --git a/omsApp/src/devOms.dbd b/omsApp/src/devOms.dbd
index 569232b..9c0401e 100644
--- a/omsApp/src/devOms.dbd
+++ b/omsApp/src/devOms.dbd
@@ -11,6 +11,7 @@ registrar(oms58Registrar)
 device(motor,VME_IO,devMAXv,"OMS MAXv")
 driver(drvMAXv)
 registrar(OmsMAXvRegister)
+variable(drvMAXvdebug)
 
 # Oregon Micro Systems PC68/78 RS-232 serial driver support.
 device(motor,VME_IO,devOmsPC68,"OMS PC68/78")
diff --git a/omsApp/src/drvMAXv.cc b/omsApp/src/drvMAXv.cc
index 0c4a3e8..8eff6b9 100644
--- a/omsApp/src/drvMAXv.cc
+++ b/omsApp/src/drvMAXv.cc
@@ -609,7 +609,7 @@ errorexit:      errMessage(-1, "Invalid device directive");
 static int send_recv_mess(int card, const char * command, const char *axis, char *buf, int nMessages) {
     int retval;
 
-    if (!epicsMutexTryLock(MUTEX(card))) {
+    if (epicsMutexTryLock(MUTEX(card)) != epicsMutexLockOK) {
         Debug(1, "send_recv_mess: waiting for mutex\n");
         epicsMutexLock(MUTEX(card));
     }
@@ -647,7 +647,7 @@ static RTN_STATUS send_mess(int card, const char *com, const char *name)
     pmotor = (struct MAXv_motor *) motor_state[card]->localaddr;
     Debug(9, "send_mess: pmotor = %p\n", pmotor);
 
-    if (!epicsMutexTryLock(MUTEX(card))) {
+    if (epicsMutexTryLock(MUTEX(card)) != epicsMutexLockOK) {
         Debug(1, "send_mess: waiting for mutex\n");
         epicsMutexLock(MUTEX(card));
     }
@@ -759,7 +759,7 @@ static int recv_mess(int card, char *com, int amount)
 
     pmotor = (struct MAXv_motor *) motor_state[card]->localaddr;
 
-    if (!epicsMutexTryLock(MUTEX(card))) {
+    if (epicsMutexTryLock(MUTEX(card)) != epicsMutexLockOK) {
         Debug(1, "recv_mess: waiting for mutex\n");
         epicsMutexLock(MUTEX(card));
     }

@hjunkes hjunkes closed this as completed Jan 12, 2021
kmpeters added a commit to epics-motor/motorOms that referenced this issue Jan 12, 2021
Fix epicsMutexTryLock issue #172

Reference: epics-modules/motor#172
kmpeters added a commit that referenced this issue Feb 2, 2021
kmpeters added a commit that referenced this issue Feb 2, 2021
…utexMustLock

Fixes the problem described here:
#172 (comment)
(cherry picked from commit ac73654)
@kmpeters kmpeters added this to the R7-3 milestone May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants