-
Notifications
You must be signed in to change notification settings - Fork 1
Implementation Progress
Matheus C. Santos edited this page Jan 17, 2015
·
3 revisions
- Simple ( char, int, long int, float and double ).
- Pointer ( char *, int *, long int *, float * and double * ) (WILL BE DISCONTINUED IN THE FUTURE).
- Containers ( std::vector, std::string ).
- Indexed FDDs - pair of Key (simple or string) and Data (simple, pointer or container).
- Grouped (a group of two or tree datasets).
- Map - transform a data item in any other type ( 1 to 1 ).
- Reduce - reduce all elements into one ( 2 to 1 ).
- FlatMap - generate a new set of data ( 1 to n ).
- Bulk Map and FlatMap - performance efficient function enables sub-iteration implementations.
- MapByKey - transform all indexed datasets items with the same key ( n to 1 ).
- FlatMapByKey - export a new set of data from entries grouped by keys.
- UpdateByKey - a function to modify a dataset content.
- FDD creation from local memory ( through constructor ).
- Distributed read from file through constructor - each process read from a global file offset.
- collect - get a local copy of the dataset ( send the distributed data to the driver process ).
- coutByKey - just like a histogram ( count occurrence of every key and send to driver process ).
- groupByKey - Group a dataset data by key, data with the same key migrates to a single machine.
- printInfo - Prints runtime information of all tasks
- printHeader - Prints the header of the runtime information
- updateInfo - Prints runtime information for all tasks called after last updateInfo (useful for program status update).
- Global variables - Global variables that can be modified by the driver process transparently.
- Memmory leak plug.
Examples:
Pagerank - (w/ and wo/ bulk) http://en.wikipedia.org/wiki/PageRank Latency test - Tests Framework latency woth O(1) functions.
(in order of priority)
- Cogroup Optimization.
- Load Redistribution/Tune
- Aditional function arguments - Arguments passed with custom function pointer ) ex.: myFdd->map(&mapFunc, arg1, arg2);
- Distributed directory read - Each process reads a local file from a directory (simulate a DFS)
- HDFS support
- Fault Tolerance
- Dataset data replication
- Process restart/replacement
- Dataset data replication