forked from fghaas/corosync
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.devmap
1223 lines (936 loc) · 41.2 KB
/
README.devmap
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Copyright (c) 2002-2004 MontaVista Software, Inc.
Copyright (c) 2006, 2009 Red Hat, Inc.
All rights reserved.
This software licensed under BSD license, the text of which follows:
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
- Neither the name of the MontaVista Software, Inc. nor the names of its
contributors may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
THE POSSIBILITY OF SUCH DAMAGE.
-------------------------------------------------------------------------------
This file provides a map for developers to understand how to contribute
to the corosync project. The purpose of this document is to prepare a
developer to write a service for corosync, or understand the architecture
of corosync.
The following is described in this document:
* all files, purpose, and dependencies
* architecture of corosync
* taking advantage of virtual synchrony
* adding libraries
* adding services
-------------------------------------------------------------------------------
all files, purpose, and dependencies.
-------------------------------------------------------------------------------
*----------------*
*- AIS INCLUDES -*
*----------------*
include/saAmf.h
-----------------
Definitions for AMF interface.
include/saCkpt.h
------------------
Definitions for CKPT interface.
include/saClm.h
-----------------
Definitions for CLM interface.
include/saAmf.h
-----------------
Definitions for the AMF interface.
include/saEvt.h
-----------------
Defintiions for the EVT interface.
include/saLck.h
-----------------
Definitions for the LCK interface.
include/cfg.h
Definitions for the CFG interface.
include/cpg.h
Definitions for the CPG interface.
include/evs.h
Definitions for the EVS interface.
include/ipc_amf.h
IPC interface between client and server for AMF service.
include/ipc_cfg.h
IPC interface between client and server for CFG service.
include/ipc_ckpt.h
IPC interface between client and server for CKPT service.
include/ipc_clm.h
IPC interface between client and server for CLM service.
include/ipc_cpg.h
IPC interface between client and server for CPG service.
include/ipc_evs.h
IPC interface between client and server for EVS service.
include/ipc_evt.h
IPC interface between client and server for EVT service.
include/ipc_gen.h
IPC interface for generic operations.
include/ipc_lck.h
IPC interface between client and server for LCK service.
include/ipc_msg.h
IPC interface between client and server for MSG service.
include/hdb.h
Handle database implementation.
include/list.h
Linked list implementation.
include/swab.h
Byte swapping implementation.
include/queue.h
FIFO queue implementation.
include/sq.h
Sort queue where items are sorted according to a sequence number. Avoids
Sort, hence, install of a new element takes is O(1). Inline implementation.
depends on list.
*---------------*
* AIS LIBRARIES *
*---------------*
lib/amf.c
---------
AMF user library linked into user application.
lib/cfg.c
---------
CFG user library linked into user application.
lib/ckpt.c
---------
CKPT user library linked into user application.
lib/clm.c
---------
CLM user library linked into user application.
lib/cpg.c
---------
CPG user library linked into user application.
lib/evs.c
---------
EVS user library linked into user application.
lib/evt.c
---------
EVT user library linked into user application.
lib/lck.c
---------
LCK user library linked into user application.
lib/msg.c
---------
MSG user library linked into uer application.
lib/amf.c
---------
AMF user library linked into user application.
lib/ckpt.c
----------
CKPT user library linked into user application.
lib/evt.c
----------
EVT user library linked into user application.
lib/util.c
----------
Utility functions used by all libraries.
*-----------------*
*- AIS EXECUTIVE -*
*-----------------*
exec/aisparser.{h|c}
Parser plugin for default configuration file format.
exec/aispoll.{h|c}
Poll abstraction interface.
exec/amfapp.c
AMF application handling.
exec/amfcluster.c
AMF cluster handling.
exec/amfcomp.c
AMF component level handling.
exec/amf.h
Defines all AMF symbol names.
exec/amfnode.c
AMF node level handling.
exec/amfsg.c
AMF service group handling.
exec/amfsi.c
AMF Service instance handling.
exec/amfsu.c
AMF service unit handling.
exec/amfutil.c
AMF utility functions.
exec/cfg.c
Server side implementation of CFG service which is used to display
redundant ring status and reenabling redundant rings.
exec/ckpt.c
Server side implementation of Checkpointing (CKPT API).
exec/clm.c
Server side implementation of Cluster Membership (CLM API).
exec/cpg.c
Server side implementation of closed procss groups (CPG API).
exec/crypto.{c|h}
Cryptography functions used by corosync.
exec/evs.c
Server side implementation of extended virtual synchrony passthrough
(EVS API).
exec/evt.c
Server side implementation of Event Service (EVT API).
exec/ipc.{c|h}
All IPC operations used by corosync.
exec/jhash.h
A hash routine.
exec/keygen.c
Secret key generator used by corosync encryption tools.
exec/lck.c
Server side implementation of the distributed lock service (LCK API).
exec/main.{c|h}
Main function which connects all components together.
exec/mainconfig.{c|h}
Reads main configuration that is set in the configuration parser.
exec/mempool.{c|h}
Currently unused.
exec/msg.c
Server side implementation of message service (MSG API).
exec/objdb.{c|h}
Object database used to configure services.
exec/corosync-instantiate.c
instantiates a component by forking and exec'ing it and writing its
pid to a pid file.
exec/print.{c|h}
Non-blocking thread-based logging service with overflow protection.
exec/service.{c|h}
Service handling routines including the default service handler
description.
exec/sync.{c|h}
The synchronization service implementation.
exec/timer.{c|h}
Threaded based timer service.
exec/tlist.h
Timer list used to expire timers.
exec/totemconfig.{c.h}
The totem configuration configurator from data parsed with aisparser
in the configuration file.
exec/totem.h
General definitions for the totem protocol used by the totem stack.
exec/totemip.{c.h}
IP handling functions for totem - lowest on stack.
exec/{totemrrp.{c.h}
The totem multi ring protocool and currently unimplemented. Between
totemsrp and totempg.
exec/totemnet.{c.h}
Network handling functions for totem - between totemip and totemrrp.
exec/totempg.{c|h}
Process groups interface which is used by all applications - highest on
stack.
exec/totemrrp.{c.h}
Redundant ring functions for totem - between totemnet and totemsrp.
exec/util.{c|h}
Utility functions used by corosync executive.
exec/version.h
Defines build version.
exec/vsf.h
Virtual Synchrony plugin API.
exec/vsf_ykd.c
Virtual Synchrony YKD Dynamic Linear Voting algorithm.
exec/wthread.{c|h}
Worker threads API.
loc
---
Counts the lines of code in the AIS implementation.
-------------------------------------------------------------------------------
architecture of corosync
-------------------------------------------------------------------------------
The corosync standards based cluster framework is a generic cluster plugin
architecture used to create cluster APIs and services. Usually there are
libraries which implement APIs and are linked into the end user application.
The libraries request services from the aisexec process, called the AIS
executive. The AIS executive uses the Totem protocol stack to communicate
within the cluster and execute operations on behalf of the user. Finally the
response of the API is delivered once the operation has completed.
--------------------------------------------------
| AMF and more services libraries |
--------------------------------------------------
| IPC API |
--------------------------------------------------
| corosync Executive |
| |
| +---------+ +--------+ +---------+ |
| | Object | | AIS | | Service | |
| | Datbase | | Config | | Handler | |
| | Service | | Parser | | Manager | |
| +---------+ +--------+ +---------+ |
| +-------+ +-------+ |
| | AMF | | more | |
| |Service| |svcs...| |
| +-------+ +-------+ |
| +---------+ |
| | Sync | |
| | Service | |
| +---------+ |
| +---------+ |
| | VSF | |
| | Service | |
| +---------+ |
| +--------------------------------+ +--------+ |
| | Totem | | Timers | |
| | Stack | | API | |
| +--------------------------------+ +--------+ |
| +-----------+ |
| | Poll | |
| | Interface | |
| +-----------+ |
| |
-------------------------------------------------
Figure 1: corosync Architecture
Every application that intends to use corosync links with the libais library.
This library uses IPC, or more specifically BSD unix sockets, to communicate
with the executive. The library is a small program responsible only for
packaging the request into a message. This message is sent, using IPC, to
the executive which then processes it. The library then waits for a response.
The library itself contains very little intelligence. Some utility services
are provided:
* create a connection to the executive
* send messages to the executive
* retrieve messages from the executive
* Poll on a fd
* create a handle instance
* destroy a handle instance
* get a reference to a handle instance
* release a reference to a handle instance
When a library connects, it sends via a message, the service type. The
service type is stored and used later to reference the message handlers
for both the library message handlers and executive message handlers.
Every message sent contains an integer identifier, which is used to index
into an array of message handlers to determine the correct message handler
to execute For the library. Hence a message is uniquely identified by the
message handler ID number and the service handler ID number.
When a library sends a message via IPC, the delivery of the message occurs
to the proper library message handler. The library message handler is
responsible for sending the message via the totem process groups API to all
nodes in the system.
This simplifies the library handler significantly. The main purpose of the
library handler should be to package the library request into a message that
can be sent to all nodes.
The totem process groups API sends the message according to the extended
virtual synchrony model. The group messaging interface also delivers the
message according to the extended virtual synchrony model. This has several
advantages which are described in the virtual synchrony section. One
advantage that must be described now is that messages are self-delivered;
if a node sends a message, that same message is delivered back to that
node.
When the executive message is delivered, it is processed by the executive
message handler. The executive message handler contains the brains of
AIS and is responsible for making all decisions relating to the request
from the libais library user.
-------------------------------------------------------------------------------
taking advantage of virtual synchrony
-------------------------------------------------------------------------------
definitions:
processor: a system responsible for executing the virtual synchrony model
configuration: the list of processors under which messages are delivered
partition: one or more processors leave the configuration
merge: one or more processors join the configuration
group messaging: sending a message from one sender to many receivers
Virtual synchrony is a model for group messaging. This is often confused
with particular implementations of virtual synchrony. Try to focus on
what virtual syncrhony provides, not how it provides it, unless interested
in working on the group messaging interface of corosync.
Virtual synchrony provides several advantages:
* integrated membership
* strong membership guarantees
* agreed ordering of delivered messages
* same delivery of configuration changes and messages on every node
* self-delivery
* reliable communication in the face of unreliable networks
* recovery of messages sent within a configuration where possible
* use of network multicast using standard UDP/IP
Integrated membership allows the group messaging interface to give
configuration change events to the API services. This is obviously beneficial
to the cluster membership service (and its respective API0, but is helpful
to other services as described later.
Strong membership guarantees allow a distributed application to make decisions
based upon the configuration (membership). Every service in corosync registers
a configuration change function. This function is called whenever a
configuration change occurs. The information passed is the current processors,
the processors that have left the configuration, and the processors that have
joined the configuration. This information is then used to make decisions
within a distributed state machine. One example usage is that an AMF component
running a specific processor has left the configuration, so failover actions
must now be taken with the new configuration (and known components).
Virtual synchrony requires that messages may be delivered in agreed order.
FIFO order indicates that one sender and one receiver agree on the order of
messages sent. Agreed ordering takes this requirement to groups, requiring that
one sender and all receivers agree on the order of messages sent.
Consider a lock service. The service is responsible for arbitrating locks
between multiple processors in the system. With fifo ordering, this is very
difficult because a request at about the same time for a lock from two seperate
processors may arrive at all the receivers in different order. Agreed ordering
ensures that all the processors are delivered the message in the same order.
In this case the first lock message will always be from processor X, while the
second lock message will always be from processor Y. Hence the first request
is always honored by all processors, and the second request is rejected (since
the lock is taken). This is how race conditions are avoided in distributed
systems.
Every processor is delivered a configuration change and messages within a
configuration in the same order. This ensures that any distributed state
machine will make the same decisions on every processor within the
configuration. This also allows the configuration and the messages to be
considered when making decisions.
Virtual synchrony requires that every node is delivered messages that it
sends. This enables the logic to be placed in one location (the handler
for the delivery of the group message) instead of two seperate places. This
also allows messages that are sent to be ordered in the stream of other
messages within the configuration.
Certain guarantees are required by virtual synchrony. If a message is sent,
it must be delivered by every processor unless that processor fails. If a
particular processor fails, a configuration change occurs creating a new
configuration under which a new set of decisions may be made. This implies
that even unreliable networks must reliably deliver messages. The
mplementation in corosync works on unreliable as well as reliable networks.
Every message sent must be delivered, unless a configuration change occurs.
In the case of a configuration change, every message that can be recovered
must be recovered before the new configuration is installed. Some systems
during partition won't continue to recover messages within the old
configuration even though those messages can be recovered. Virtual synchrony
makes that impossible, except for those members that are no longer part
of a configuration.
Finally virtual syncrhony takes advantage of hardware multicast to avoid
duplicated packets and scale to large transmit rates. On 100mbit network,
corosync can approach wire speeds depending on the number of messages queued
for a particular processor.
What does all of this mean for the developer?
* messages are delivered reliably
* messages are delivered in the same order to all nodes
* configuration and messages can both be used to make decisions
-------------------------------------------------------------------------------
adding libraries
-------------------------------------------------------------------------------
The first stage in adding a library to the system is to develop the library.
Library code should follow these guidelines:
* use SA Forum coding style for SA Forum APIs to aid in debugging
* use corosync coding guidelines for APIs that are not SA Forum that
are to be merged into the corosync tree.
* implement all library code within one file named after the api.
examples are ckpt.c, clm.c, amf.c.
* use parallel structure as much as possible between different APIs
* make use of utility services provided by util.c.
* if something is needed that is generic and useful by all services,
submit patches for other libraries to use these services.
* use the reference counting handle manager for handle management.
------------------
Version checking
------------------
struct saVersionDatabase {
int versionCount;
SaVersionT *versionsSupported;
};
The versionCount number describes how many entries are in the version database.
The versionsSupported member is an array of SaVersionT describing the acceptable
versions this API supports.
An api developer specifies versions supported by adding the following C
code to the library file:
/*
* Versions supported
*/
static SaVersionT clmVersionsSupported[] = {
{ 'B', 1, 1 },
{ 'b', 1, 1 }
};
static struct saVersionDatabase clmVersionDatabase = {
sizeof (clmVersionsSupported) / sizeof (SaVersionT),
clmVersionsSupported
};
After this is specified, the following API is used to check versions:
SaErrorT
saVersionVerify (
struct saVersionDatabase *versionDatabase,
const SaVersionT *version);
An example usage of this is
SaErrorT error;
error = saVersioNVerify (&clmVersionDatabase, version);
where version is a pointer to an SaVersionT passed into the API.
error will return SA_OK if the version is valid as specified in the
version database.
------------------
Handle Instances
------------------
Every handle instance is stored in a handle database. The handle database
stores instance information for every handle used by libraries. The system
includes reference counting and is safe for use in threaded applications.
The handle database structure is:
struct saHandleDatabase {
unsigned int handleCount;
struct saHandle *handles;
pthread_mutex_t mutex;
void (*handleInstanceDestructor) (void *);
};
handleCount is the number of handles
handles is an array of handles
mutex is a pthread mutex used to mutually exclude access to the handle db
handleInstanceDestructor is a callback that is called when the handle
should be freed because its reference count as dropped to zero.
The handle database is defined in a library as follows:
static void clmHandleInstanceDestructor (void *);
static struct saHandleDatabase clmHandleDatabase = {
.handleCount = 0,
.handles = 0,
.mutex = PTHREAD_MUTEX_INITIALIZER,
.handleInstanceDestructor = clmHandleInstanceDestructor
};
There are several APIs to access the handle database:
SaErrorT
saHandleCreate (
struct saHandleDatabase *handleDatabase,
int instanceSize,
int *handleOut);
Creates an instance of size instanceSize in the handleDatabase paraemter
returning the handle number in handleOut. The handle instance reference
count starts at the value 1.
SaErrorT
saHandleDestroy (
struct saHandleDatabase *handleDatabase,
unsigned int handle);
Destroys further access to the handle. Once the handle reference count
drops to zero, the database destructor is called for the handle. The handle
instance reference count is decremented by 1.
SaErrorT
saHandleInstanceGet (
struct saHandleDatabase *handleDatabase,
unsigned int handle,
void **instance);
Gets an instance specified handle from the handleDatabase and returns
it in the instance member. If the handle is valid SA_OK is returned
otherwise an error is returned. This is used to ensure a handle is
valid. Eveyr get call increases the reference count on a handle instance
by one.
SaErrorT
saHandleInstancePut (
struct saHandleDatabase *handleDatabase,
unsigned int handle);
Decrements the reference count by 1. If the reference count indicates
the handle has been destroyed, it will then be removed from the database
and the destructor called on the instance data. The put call takes care
of freeing the handle instance data.
Create a data structure for the instance, and use it within the libraries
to store state information about the instance. This information can be
the handle, a mutex for protecting I/O, a queue for queueing async messages
or whatever is needed by the API.
-----------------------------------
communicating with the executive
-----------------------------------
A service connection is created with the following API;
SaErrorT
saServiceConnect (
int *responseOut,
int *callbackOut,
enum service_types service);
The responseOut parameter specifies the file descriptor where response messages
will be delivered. The callback out parameter describes the file descriptor
where callback messages are delivered.
The service specifies the service to use.
Messages are sent and received from the executive with the following functions:
SaAisErrorT saSendMsgRetry (
int s,
struct iovec *iov,
unsigned int iov_len);
the s member is the socket to use retrieved with saServiceConnect
The iov is the iovector used to send a message.
the iov_len is the number of elements in iov.
This sends an IO-vectorized message.
SaErrorT
saSendRetry (
int s,
const void *msg,
size_t len,
int flags);
the s member is the socket to use retrieved with saServiceConnect
the msg member is a pointer to the message to send to the service
the len member is the length of the message to send
the flags parameter is the flags to use with the sendmsg system call
This sends a data blob to the exective.
A message is received from the executive with the function:
SaErrorT
saRecvRetry (
int s,
void *msg,
size_t len,
int flags);
the s member is the socket to use retrieved with saServiceConnect
the msg member is a pointer to the message to receive to the service
the len member is the length of the message to receive
the flags parameter is the flags to use with the sendmsg system call
A message may be send and a reply waited for with the following function:
SaAisErrorT saSendMsgReceiveReply (
int s,
struct iovec *iov,
unsigned int iov_len,
void *responseMessage,
int responseLen)
s is the socket to send and receive the response.
iov is the iovector to send.
iov_len is the number of elements in iov.
responseMessage is the data block used to store the response.
responesLen is the length of the data block that is expected to be received.
Waiting for a file descriptor using poll systemcall is done with the api:
SaErrorT
saPollRetry (
struct pollfd *ufds,
unsigned int nfds,
int timeout);
where the parameters are the standard poll parameters.
Messages can be received out of order searching for a specific message id with:
----------
messages
----------
Please follow the style of the messages. It makes debugging much easier
if parallel style is used.
An service should be added to service_types enumeration in ipc_gen or in the
case of an external project, a number should be registered with the project.
enum service_types {
EVS_SERVICE = 0,
CLM_SERVICE = 1,
AMF_SERVICE = 2,
CKPT_SERVICE = 3,
EVT_SERVICE = 4,
LCK_SERVICE = 5,
MSG_SERVICE = 6,
CFG_SERVICE = 7,
CPG_SERVICE = 8
};
These are the request CLM message identifiers:
Each library should have an ipc_APINAME.h file in include. It should define
request types and response types.
enum req_clm_types {
MESSAGE_REQ_CLM_TRACKSTART = 0,
MESSAGE_REQ_CLM_TRACKSTOP = 1,
MESSAGE_REQ_CLM_NODEGET = 2,
MESSAGE_REQ_CLM_NODEGETASYNC = 3
};
These are the response CLM message identifiers:
enum res_clm_types {
MESSAGE_RES_CLM_TRACKCALLBACK = 0,
MESSAGE_RES_CLM_TRACKSTART = 1,
MESSAGE_RES_CLM_TRACKSTOP = 2,
MESSAGE_RES_CLM_NODEGET = 3,
MESSAGE_RES_CLM_NODEGETASYNC = 4,
MESSAGE_RES_CLM_NODEGETCALLBACK = 5
};
A request header should be placed at the front of every message send by
the library.
typedef struct {
int size __attribute__((aligned(8)));
int id __attribute__((aligned(8)));
} mar_req_header_t __attribute__((aligned(8)));
There is also a response message header which should start every response
message:
typedef struct {
int size; __attribute__((aligned(8)))
int id __attribute__((aligned(8)));
SaAisErrorT error __attribute__((aligned(8)));
} mar_res_header_t __attribute__((aligned(8)));
the error parameter is used to pass errors from the executive to the library,
including SA_ERR_TRY_AGAIN for flow control, which is described later.
This is described later:
typedef struct {
mar_uint32_t nodeid __attribute__((aligned(8)));
void *conn __attribute__((aligned(8)));
} mar_message_source_t __attribute__((aligned(8)));
This is the MESSAGE_REQ_CLM_TRACKSTART message id above:
struct req_clm_trackstart {
mar_req_header_t header;
SaUint8T trackFlags;
SaClmClusterNotificationT *notificationBufferAddress;
SaUint32T numberOfItems;
};
The saClmClusterTrackStart api should create this message and send it to the
executive.
responses should be of:
struct res_clm_trackstart
------------
some notes
------------
* Avoid doing anything tricky in the library itself. Let the executive
handler do all of the work of the system. minimize what the API does.
* Once an api is developed, it must be added to the makefile. Just add
a line for the file to EXECOBJS build line.
* protect I/O send/recv with a mutex.
* always look at other libraries when there is a question about how to
do something. It has likely been thought out in another library.
-------------------------------------------------------------------------------
adding services
-------------------------------------------------------------------------------
Services are defined by service handlers and messages described in
include/ipc_SERVICE.h. These two peices of information are used by the
executive to dispatch the correct messages to the correct receipients.
-------------------------------
the service handler structure
-------------------------------
A service is added by defining a structure defined in exec/service.h. The
structure is a little daunting:
struct libais_handler {
int (*libais_handler_fn) (void *conn, void *msg);
int response_size;
int response_id;
enum corosync_flow_control flow_control;
};
The response_size, response_id, and flow_control for a library handler are
used for flow control. A response message will be sent to the library of the
size response_size, with the header id of response_id if the totem message
queue is full. Some library APIs may not need to block in this condition
(because they don't have to use totem), so they should specify
COROSYNC_FLOW_CONTROL_NOT_REQUIREDin the flow control field.
The libais_handler_fn is a function to be called when the library handler is
requested to be executed.
struct corosync_exec_handler {
void (*exec_handler_fn) (void *msg, unsigned int nodeid);
void (*exec_endian_convert_fn) (void *msg);
};
The exec_handler_fn is a function to be called when the executive handler is
requested to execute.
The exec_endian_convert_fn is a function to be called to convert the endianess
of the executive message. Note messages are not stored in big or little endian
format before transmit. Instead they are transmitted in either big endian or
little endian depending on the byte order of the transmitter and converted to
the host machine order on receipt of the message.
struct corosync_service_handler {
unsigned char *name;
unsigned short id;
unsigned int private_data_size;
int (*lib_init_fn) (void *conn);
int (*lib_exit_fn) (void *conn);
struct corosync_lib_handler *lib_service;
int lib_service_count;
struct corosync_exec_handler *exec_service;
int (*exec_init_fn) (struct objdb_iface_ver0 *);
int (*config_init_fn) (struct objdb_iface_ver0 *);
void (*exec_dump_fn) (void);
int exec_service_count;
void (*confchg_fn) (
enum totem_configuration_type configuration_type,
const unsigned int *member_list, size_t member_list_entries,
const unsigned int *left_list, size_t left_list_entries,
const unsigned int *joined_list, size_t joined_list_entries,
const struct memb_ring_id *ring_id);
void (*sync_init) (void);
int (*sync_process) (void);
void (*sync_activate) (void);
void (*sync_abort) (void);
};
name is the name of the service.
id is the identifier of the service.
private_data_size is the size of the private data used by the connection
which the library and executive handlers can reference.
lib_init_fn is the function executed when a library connection is made to
the service handler.
lib_exit_fn is the function executed when a library connection is exited
either because the application closed the file descriptor, or the OS
closed the file descriptor.
lib_service is an array of corosync_lib_handler data structures which define
the library service handler.
lib_service_count is the number of elements in lib_service.
exec_service is an array of corosync_exec_handler data structures which define
the executive service handler.
exec_init_fn is a function used to initialize the executive service. This
is only called once.
config_init_fn is called to parse config files and populate the object
database.
exec_dump_fn is called when SIGUSR2 is sent to the executive to dump the
current state of the service.
exec_service_count is the number of entries in the exec_service array.
confchg_fn is called every time a configuration change occurs.
sync_init is called when the service should begin synchronization.
sync_process is called to process synchronization messages.
sync_activate is called to activate the current service synchronization.
sync_abort is called to abort the current service synchronization.
--------------
flow control
--------------
The totem protocol includes flow control so that it doesn't send too many
messages when the network is completely full. But the library can
still send messages to the executive much faster then the executive can send
them over totem. So the library relies on the group messaging flow control to
control flow of messages sent from the library. If the totem queues are full,
no more messages may be sent, so the executive in ipc.c automatically detects
this scenario and returns an SA_ERR_TRY_AGAIN error.
When a library gets SA_ERR_TRY_AGAIN, the library may either retry, or return
this error to the user if the error is allowed by the API definitions. The
The other information is critical to ensuring that the library reads the correct
message and size of message. Make sure the libais_handler matches the messages
used in the handler function.
------------------------------------------------
dynamically linking the service handler plugin
------------------------------------------------
The service handler needs some special magic to dynamically be linked into