Skip to content

kwangiit/matrix_v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

MATRIX: a MAny-Task computing execution fabRIc at eXascale

=============================================================================== Section 1: MATRIX overview

MATRIX is a fully-distributed task execution framework for Many-Task Computing data intensive applications. MATRIX delegates a scheduler on each compute node to manage local resources and schedule tasks. MATRIX utilizes a distributed key-value store, i.e. ZHT, for distributed task metadata management. MATRIX implements a data-aware work stealing technique to optimize both load balancing and data-locality. MATRIX aims to deliver high throughput, high efficiency, and low latency for data-intensive applications in both scientific computing and Cloud domains.

===============================================================================

=============================================================================== Section 2: MATRIX Dependencies

[1] Google protocol buffers c binding

https://github.com/protobuf-c/protobuf-c

[2] Google protocol buffers c++ binding

https://github.com/google/protobuf

[3] ZHT distributed key-value store (no need to download the ZHT codebase, which is included in MATRIX)

https://github.com/mierl/ZHT

===============================================================================

=============================================================================== Section 3: MATRIX compile and install (single node)

[step 1] download MATRIX code to local machine, assuming the directory to download MATRIX is $dir:

	git clone https://github.com/kwangiit/matrix_v2.git

[step 2] install google protocol buffer for both c and c++ bindings, refers to:

	https://github.com/mierl/ZHT

[step 3] go to the matrix directory to compile ZHT:

(1) cd $dir/matrix_v2/ZHT/src

(2) make

(3) check whether ZHT is compiled correcly:

	if the command "./zhtserver -z zht.conf -n neighbor.conf" is running, then ZHT is compiled correctly

[step 4] go to the matrix directory to compile MATRIX:

(1) cd $dir/matrix_v2/matrix/src

(2) make

===============================================================================

=============================================================================== Section 4: Running MATRIX (single node), assuming the directory to download MATRIX is $dir

[step 1] specify the MATRIX membership list of all the schedulers (typically node ip):

    1. cd $dir/matrix_v2/matrix/src

    2. rm -rf memlist
        
    3. echo "your_ip_address" >> memlist (e.g. echo "138.32.143.100" >> memlist)

[step 2] specify the ZHT membership list of all the servers:

        1. cd $dir/matrix_v2/ZHT/src

        2. rm -rf neighbor.conf

        3. echo "your_ip_address port_number" >> neighbor.conf (e.g. echo "138.32.143.100 50000" >> neighbor.conf)

[step 3] specify the tasks that client is going to submit:

        1. the tasks are put in a workload file with each line specifying a task

        2. a task has the format: user directory command (e.g. kewang /bin/ hostname)

        3. assuming that you have a workload file: $dir/matrix_v2/matrix/src/workload

[step 4] sepcify the configure file of MATRIX:

        1. the example configure file is located at: $dir/matrix_v2/matrix/src/config

        2. the specifications of the configure file is as follows:

	(1)	NumTaskPerClient        1000 // specify the number of tasks per client, usually there is one client, but you can have more

	(2)	NumAllTask      1000 // specify the number of all tasks (NumTaskPerClient * number of clients) 

	(3)	NumMapTask      0 // if it is a map reduce job, specify the number of map tasks. Otherwise set it to 0

	(4)	NumReduceTask   0 // if it is a map reduce job, specify the number of reduce tasks. Otherwise set it to 0

	(5)	NumCorePerExecutor      1 // specify the number of executing threads to exeucte tasks, usually equals to the number of physical cores of the machine

	(6) 	MaxTaskPerPkg   100 // specify the maximum number of tasks to send in a package, this is to limit the send and receive the buffer sizes of networking communication

	(7)	MonitorInterval 10000 // the monitoring interval of the client in micro-second

	(8)	SchedulerPortNo 60000 // the scheduler port number

	(9)	SleepLength     10000 // the sleep length used in the code whenever there is needed in micro-second

	(10)	WorkStealOn     1 // turn on/off the work stealing (1 means on, 0 means off)

	(11)	WorkStealPollIntervalStart      1000 // the initial polling interval of work stealing in micro-second

	(12)	WorkStealPollIntervalUpperBound 1000000 // the upper bound of the work stealing polling interval in micro-second

	(13)	DataSizeThreshold       1000 // the data size threshold to determine whether a task can be migrated or not in byte

	(14)	Policy  MDL // the scheduling policies (MDL, MLB, RLDS, FLDS)

	(15)	EstimatedTimeThreshold  20 // used by the FLDS policy to determine moving some tasks from local ready queue to work stealing ready queue

	(16)	SchedulerMemlistFile    /home/kwang/Documents/work_kwang/cppprogram/matrix/matrix_v2/matrix/src/memlist // the scheduler membership list

	(17)	NetworkProtocol TCP // the network communication protocol (TCP, UDP)

	(18)	DagType Pipeline // the workload DAG pattern (BOT, FanIn, FanOut, Pipeline)

	(19)	DagArgument     10 // the parameter of the workload DAG

	(20)	HostIdentityType        ip // the identity used to identify a scheduler (ip, hostname)

	(21)	SubmissionMode  bestcase // the workload submission mode used by client, either bestcase or worstcase. bestcase means to submit all the tasks in a load balanced way, while worst case menas to submit all the tasks to a random scheduler

	(22)	WorkloadFile    /home/kwang/Documents/work_kwang/cppprogram/matrix/matrix_v2/matrix/src/workload // the workload file

	(23)	SchedulerWorkloadPath   /home/kwang/Documents/work_kwang/cppprogram/matrix/matrix_v2/matrix/src/workload_dag // this is for complicated workload dags specified in a workload file. For simplicity, just specify it as $dir/matrix/src/workload_dag

	(24)	ClientLog       1 // whether to do the client-side logging (1 means do, 0 means don't)

	(25)	TaskLog 1 // whether to do per-task logging (1 means do, 0 means don't)

	(26)	SystemLog       1 // whether to do the system logging (1 means do, 0 means don't)

	(27)	SchedulerLog    1 // whether to do the scheduler logging (1 means do, 0 means don't)

	(28)	ZhtMemlistFile  /home/kwang/Documents/work_kwang/cppprogram/matrix/matrix_v2/ZHT/src/neighbor.conf // specify the ZHT memberlist file
	(29)	ZhtConfigFile   /home/kwang/Documents/work_kwang/cppprogram/matrix/matrix_v2/ZHT/src/zht.conf // specify the ZHT configuration file

[step 5] run MATRIX:

        1. run ZHT server:

            (1) go to the ZHT src directory: cd $dir/matrix_v2/ZHT/src

            (2) run command: ./zhtserver -z zht.conf -n neighbor.conf

        2. run MATRIX server:

            (1) go to the MATRIX src directory: cd $dir/matrix_v2/matrix/src

            (2) run MATRIX scheduler: ./scheduler the_matrix_config_file

            (3) run MATRIX client: ./client the_matrix_config_file

===============================================================================

=============================================================================== Section 5: Potential Problems

[1] google protocol buffer is not compatible

 solutions: recompile the .proto files in the source files of both matrix and ZHT:

(1) go to the ZHT src folder: cd $dir/matrix_v2/ZHT/src

(2) compile c++ bindings: protoc --cpp_out=. *.proto

(3) compile c bindings: protoc-c --c_out=. *.proto

(4) go to the matrix src folder: cd $dir/matrix_v2/matrix/src

(5) repeat step 2 and step 3 

===============================================================================