An application for processing image files in a distributed, synchronous manner.
- Running
- Environment
- Machines
- Commands
- Nodes
- Server
- Client
- Scheduling Policy
- Data
- Tests
Pre build:
Distributed-Image-Processing
|-- proj_dir
| |-- input_dir
| |-- src
| | |-- client
| | |-- node
| | |-- server
| | |-- utils
| |-- build.xml
| |-- commands.txt
| |-- config.txt
| |-- env.txt
| |-- machine.txt
| |-- pa1.thrift
|-- tests
|-- DesignSpecifications.md
|-- README.md
|-- ssh_cleanup.txt
|-- ssh_commands.txt
Upon running any ant targets there will be new directories within proj_dir:
proj_dir
|-- build <--
|-- gen-java <--
|-- input_dir
|-- log <--
|-- output_dir <--
|-- src
|-- build.xml
|-- commands.txt
|-- config.txt
|-- env.txt
|-- machine.txt
|-- pa1.thrift
This description is for running this app on machines with shared memory space. If you are not using shared memory space this setting up all entites in the system will be complicated and we will not detail that process.
In order to run this application you must fulfill the mandatory OpenCV and Thrift dependencies.
In order for the application to resolve your dependencies set the these environment variables:
THRIFT_LIB_PATH /<path-to-thrift-libs/>
OPENCV_LIB_PATH /<path-to-opencv-jar/>
PROJ_PATH /<path>/proj_dir
Being that the application is a distributed system we have a machine.txt document. This file contains the addresses for each node, server, and client. This file is used by the application to locate the address at which the nodes, server, and client are operating.
Feel free to change machines however do not disturb the identifiers at the beginning of each line as they are required by the application.
node_0 csel-kh4250-20.cselabs.umn.edu
node_1 csel-kh4250-21.cselabs.umn.edu
node_2 csel-kh4250-22.cselabs.umn.edu
node_3 csel-kh4250-23.cselabs.umn.edu
server csel-kh4250-24.cselabs.umn.edu
client csel-kh4250-25.cselabs.umn.edu
- Follow Environment step above^
- Place the
grader.shfile within theproj_dir - I had to explicitly set the
USERIDvariable within the autograder in order for it to work. - Verify that the input_dir has the correct images in it. Duplicates can be found in
tests/test01 - Run the autograder.
- Output can be found in the logs found in the
logdirectory - if you need to change ports use the
config.txt. Details about that are found in the Configuration Details section below - If you run our tests via the autograder just know that ouput is directed to the test diretory and will not be picked up by the autograder.
- Open up 6 terminals and run the commands found in ssh_commands.txt in order. Be sure the machines correspond to the machines in
machine.txtand that you wait for the server and nodes to start prior to running the client. - View output in
logdirectory - Run
source ssh_cleanup.shto close all processes on the machines. Once again, be sure the machines correspond to the same machines inmachine.txt/ssh_command.txt. - To change run configuration simply alter the
config.txtfile or replace the existing one with aconfig.txtfile from one of our tests. Run thessh_cleanup.shfile and then restart all machines in your six terminals as outlined in the first bullet.
The config.txt file contains configuration details for each entity in the progam. This is where nodes get their load percentage and port number and the server gets its port number. This is also where the load injection policy is set from. Finally, the data path is set here too. Read more about the data attribute in the Data section below. If you change the config file you must restart all entities for the effect to take place.
node_0 0.8 8125
node_1 0.6 8126
node_2 0.5 8127
node_3 0.2 8128
server 8129
policy random
data
command.txt contains commands for starting app entities.
If you change a machine address in machine.txt be sure to adjust the associated command in command.txt to reflect the change in address.
Nodes depend on a scheduling policy that is outlined in the Scheduling Policy section below.
On the correct host machine (as declared in machine.txt) navigate to: Distributed-Image-Processing/proj_dir and run 1 of 4 commands:
ant node_zeroant node_oneant node_twoant node_three
You can also run the node commands that reside in ssh_commands.txt. This will automatically ssh into a kh4250 lab machine and start up the specific node you select.
Be sure the machine.txt file reflects the proper address of the machine you are running the specific node from.
The machine.txt entry syntax for nodes looks like this:
node_<num> <machine address>
Example:
node_0 csel-kh4250-08.cselabs.umn.edu
Remember load probability simulates the probability of injecting a delay no matter the scheduling policy . It also simulates the probability of rejection when using the Random scheduling policy. More details in DesignSpecifications.md. Also take note that the machine.txt file reads the nodes as 'node_2' while the command to run it will be ant 'node_two'.
On the correct host machine (as declared in machine.txt) navigate to: Distributed-Image-Processing/proj_dir and run:
ant server
You can also run the server command that resides in ssh_commands.txt. This will automatically ssh into a kh4250 lab machine and start up the server.
Be sure the machine.txt file reflects which machine you are going to run the server on.
The machine.txt entry syntax for nodes looks like this:
server <machine address>
Example:
server csel-kh1260-12.cselabs.umn.edu
On the correct host machine (as declared in machine.txt) navigate to: Distributed-Image-Processing/proj_dir and run:
ant client
You can also run the client command that resides in ssh_commands.txt. This will automatically ssh into a kh4250 lab machine and start up the client.
Be sure the machine.txt file reflects which machine the client is going to run on.
The machine.txt entry syntax for nodes looks like this:
client <machine address>
Example:
client csel-kh4250-01.cselabs.umn.edu
We have inserted print statements throughout the code to help bring life to the process. Instead of printing to the terminal, each server, client, and node will get their own sepcific log.txt file to record their output. These will be titled server_log.txt, client_log.txt, and node_<num>_log.txt where num is the number of the node it represents. We can look at the statements printed to these files and see more details on what each machine is doing. Examples of this include printing out messages if a task is rejected or a delay is implemented (three second sleep). The final JobReceipt that the client receives will also be recorded in its respective client_log.txt file.
Currently two scheduling policies are implemented:
random- nodes must accept randomly assigned tasks from the serverbalancing- nodes potentially reject randomly assigned tasks from the server (details are given inDesignSpecifications.md).
Scheduling policy can be set in the config.txt file on any line starting with 'policy'.
policy <policy type>
example
policy random
Users may set the directory from which images are used and then outputted to in the config.txt file. This should be done for testing cases. If the data you want to modify is in proj_dir/input_dir then the field should be followed by nothing.
Remember, if you explicitly set a data directory (not proj_dir/input_dir) then the directory must have an input_dir and an output_dir within it.
Testing:
node_0 0.8 8115
node_1 0.6 8116
node_2 0.5 8117
node_3 0.2 8118
server 8119
policy random
data ../tests/test03/data
Normal:
node_0 0.8 8115
node_1 0.6 8116
node_2 0.5 8117
node_3 0.2 8118
server 8119
policy random
data
In order to run the tests users must complete the following steps:
- navigate to the respective test directory, for instance
test02:Distributed-Image-Processing |-- proj_dir |-- tests |-- test01 |-- test02 <-- expand |-- ... - Within each test directory you will find the same structure (unless it is an error case). The difference is in the
input_dirand the contents of theconfig.txtfile.|-- test02 |-- data |-- machine.txt - Copy (don't cut) the
config.txtfile and replace theconfig.txtlocated here:Distributed-ImageProcessing |-- proj_dir | |-- input_dir | |-- java | |-- ... | |-- config.txt <-- replace me | |-- ... - Run the commands for the nodes, followed by the server, then finally the client.
- See transformed images in the test's
output_dirand see logs inDistributed-Image-Processing/proj_dir/logDistributed-Image-Processing |-- proj_dir | |-- input_dir | |-- java | |-- log <-- logs | |-- ... |-- tests |-- test01 |-- test02 | |-- data | | |-- input_dir | | |-- output_dir <-- output images | |-- config.txt |-- ...
Test a spread of probabilities with random scheduling policy.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.8 | 0.6 | 0.5 | 0.2 |
client:
Job Receipt:
Job: /project/kivix019/Distributed-Image-Processing/proj_dir/../tests/test01/data
Time: 3045
Status: SUCCESS
Msg: All tasks completed successfully.
Test a spread of probabilities with balancing scheduling policy.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.8 | 0.6 | 0.5 | 0.2 |
Job Receipt:
Job: /project/kivix019/Distributed-Image-Processing/proj_dir/../tests/test02/data
Time: 3145
Status: SUCCESS
Msg: All tasks completed successfully.
Test that if input_dir is empty we get a success status and a report that the directory was empty.
Job Receipt:
Job: /project/kivix019/Distributed-Image-Processing/proj_dir/../tests/test03/data
Time: 41
Status: SUCCESS
Msg: Job held a directory with an empty input_dir
If the data directory is incorrectly laid out the app should return a failure and a useful message about why it failed.
Job Receipt:
Job: /project/kivix019/Distributed-Image-Processing/proj_dir/../tests/test04/data
Time: 23
Status: FAILURE
Msg: Data directory must contain an input_dir and an output_dir
Two nodes are 100% full and two are completely open at the time of each access. This guarantees a delay for half of the tasks.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 1.0 | 1.0 | 0.0 | 0.0 |
Job Receipt:
Job: /project/kivix019/Distributed-Image-Processing/proj_dir/../tests/test05/data
Time: 3247
Status: SUCCESS
Msg: All tasks completed successfully.
Two nodes are 100% full and two are completely open at the time of each access. This guarantees a rejection for half of the tasks. There will be no explictly imposed delays like with the random policy. The only delay is caused by the server retrying a different node.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 1.0 | 1.0 | 0.0 | 0.0 |
Job Receipt:
Job: /project/kivix019/Distributed-Image-Processing/proj_dir/../tests/test06/data
Time: 328
Status: SUCCESS
Msg: All tasks completed successfully.
Three nodes have a 100% load injecting probability and the last has a 0% probability. As long as one of those three nodes are randomly chose, there will be a three seconds delay injected. Therefore we expect the time to be greater than 3000 ms. This will take more time than the Balancing policy version of this test since the balancing will just reject the tasks for the three nodes and the last node will always accept and never inject a delay since its probability is 0%.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 1.0 | 1.0 | 1.0 | 0.0 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test07/data
Time: 3318
Status: SUCCESS
Msg: All tasks completed successfully.
Three nodes have a 100% load injecting probability and the last has a 0% probability. Therefore those three nodes will reject every task assigned to them and then the last node will acccept every task assigned. Since the only node accepting tasks has a 0% probability for load injecting and task rejection, there won't be a three second delay implemented so this should take much less time than its Random policy counterpart where it has to accept everything.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 1.0 | 1.0 | 1.0 | 0.0 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test08/data
Time: 382
Status: SUCCESS
Msg: All tasks completed successfully.
One node has a 100% load injection probability, other three have a 0% probability. We expect the time to be greater than 3000 ms since there will be a 3 second delay if node Three is ever chosen for a task.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.0 | 0.0 | 0.0 | 1.0 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test09/data
Time: 3690
Status: SUCCESS
Msg: All tasks completed successfully.
One node has a 100% load injection probability, other three have a 0% probability. Node Three will always reject a task and the other nodes have a 0% injection probability so a load injection delay should never occur in this task.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.0 | 0.0 | 0.0 | 1.0 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test10/data
Time: 346
Status: SUCCESS
Msg: All tasks completed successfully.
All Nodes have an 80% load injection probability so most of the tasks will have a three second delay injected. However, the tasks are run as threads so they don't have to wait for each other to finish before the next starts so most of these will probably overlap with each other when executing. No tasks should be rejected as well since we are using the random policy. Also used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.8 | 0.8 | 0.8 | 0.8 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test11/data
Time: 3306
Status: SUCCESS
Msg: All tasks completed successfully.
All Nodes have an 80% load injection probability so most of the tasks will actually be rejected since we are using the balancy policy. If the task gets accepted, there is a strong chance there is a delay injected as well. Also used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.8 | 0.8 | 0.8 | 0.8 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test12/data
Time: 3359
Status: SUCCESS
Msg: All tasks completed successfully.
All Nodes have an 20% load injection probability so most of the tasks will not have a load injected. Random policy is used as well so no tasks can be rejected. Also used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.2 | 0.2 | 0.2 | 0.2 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test13/data
Time: 3210
Status: SUCCESS
Msg: All tasks completed successfully.
All Nodes have an 20% load injection probability so most of the tasks will not have the load injected. Since it's the balancing policy, tasks can be rejected but there is still a decently low chance of that happening. When compared to test 13, we can expect this one to take a bit longer since tasks can be rejeted in this test. Also used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.2 | 0.2 | 0.2 | 0.2 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test14/data
Time: 3284
Status: SUCCESS
Msg: All tasks completed successfully.
All Nodes have a 100% load injection rate so every single task will have a three second delay injected. However, these tasks are run as separate threads so the delay of one task shouldn't delay another task from starting. Also used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 1.0 | 1.0 | 1.0 | 1.0 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test15/data
Time: 3339
Status: SUCCESS
Msg: All tasks completed successfully.
All Nodes have a 100% load injection rate but since it is the balancing policy, the nodes also have a 100% rejection rate. Therefore the tasks will never finish. The server will see that every single task is being rejected and detect a node clog and send back a FAILURE status to the client. Also used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 1.0 | 1.0 | 1.0 | 1.0 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test16/data
Time: 937
Status: FAILURE
Msg: 6/6 tasks failed [java] /project/droeg022/Distributed-Image-Processing/proj_dir/../tests
18 images, used to make sure the number of images doesn't affect the distributed system's integrity.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.8 | 0.6 | 0.5 | 0.2 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test17/data
Time: 3308
Status: SUCCESS
Msg: All tasks completed successfully.
18 images, used to make sure the number of images doesn't affect the distributed system's integrity.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.8 | 0.6 | 0.5 | 0.2 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test18/data
Time: 3267
Status: SUCCESS
Msg: All tasks completed successfully.
1 image, used to make sure the number of images doesn't affect the distributed system's integrity. Added a 2nd run where the node happened to inject a delay on the process just for comparison.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.8 | 0.6 | 0.5 | 0.2 |
Run 1
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test19/data
Time: 292
Status: SUCCESS
Msg: All tasks completed successfully.
Run 2
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test19/data
Time: 3307
Status: SUCCESS
Msg: All tasks completed successfully.
1 image, used to make sure the number of images doesn't affect the distributed system's integrity. Added a 2nd run where the node didn't happen to inject a delay on the process just for comparison.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.8 | 0.6 | 0.5 | 0.2 |
Run 1
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test20/data
Time: 3403
Status: SUCCESS
Msg: All tasks completed successfully.
Run 2
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test20/data
Time: 460
Status: SUCCESS
Msg: All tasks completed successfully.
Nodes have a 40% chance at injecting a three second delay. Used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.4 | 0.4 | 0.4 | 0.4 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test21/data
Time: 3278
Status: SUCCESS
Msg: All tasks completed successfully.
Nodes have a 40% chance of rejecting tasks and a 40% chance at injecting a three second delay. Used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.4 | 0.4 | 0.4 | 0.4 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test22/data
Time: 3319
Status: SUCCESS
Msg: All tasks completed successfully.
Nodes have a 60% chance of injecting a load delay and always accept the task. Used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.6 | 0.6 | 0.6 | 0.6 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test23/data
Time: 3268
Status: SUCCESS
Msg: All tasks completed successfully.
Nodes have 60% chance of rejection and then a 60% chance of injecting a load delay. Used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.6 | 0.6 | 0.6 | 0.6 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test24/data
Time: 3399
Status: SUCCESS
Msg: All tasks completed successfully.
Nodes won't inject any delay so time shold be low. Used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.0 | 0.0 | 0.0 | 0.0 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test25/data
Time: 633
Status: SUCCESS
Msg: All tasks completed successfully.
Nodes will accept every time and there is no injected delay so time should be low. Used to compare against other consistent node values.
| Node Zero | Node One | Node Two | Node Three | |
|---|---|---|---|---|
| Probability | 0.0 | 0.0 | 0.0 | 0.0 |
Job Receipt:
Job: /project/droeg022/Distributed-Image-Processing/proj_dir/../tests/test26/data
Time: 271
Status: SUCCESS
Msg: All tasks completed successfully.