shard is a lightweight, easy-to-use MapReduce framework for Go. It provides a
simple and flexible way to write and run distributed computations on a cluster
of machines.
- Simple API:
shardprovides a simple and intuitive API for writing MapReduce programs. - Pluggable Components:
shardallows you to bring your ownMapper,Reducer,Combiner,Partitioner, andFilesystemimplementations. - Master-Worker Architecture:
sharduses a master-worker architecture to distribute and manage tasks. - gRPC for Communication:
sharduses gRPC for efficient and reliable communication between the master and worker nodes.
To install shard, use go get:
go get github.com/prxssh/shardshard can be configured using environment variables or through the Config
struct.
| Environment Variable | Config Field |
Description | Default |
|---|---|---|---|
SHARD_MODE |
- | The mode to run in (master or worker). |
master |
SHARD_MASTER_ADDR |
MasterAddress |
The address of the master node. | localhost:6969 |
| - | InputPath |
The path to the input file or directory. | - |
| - | OutputDir |
The path to the output directory. | ./shard |
| - | NumReducers |
The number of reduce tasks. | 16 |
| - | ChunkSize |
The size of each input split. | 64MB |
| - | MaxConcurrency |
The maximum number of concurrent tasks. | runtime.NumCPU() * 2 |
Check the config.go for complete configuration.
Warning
This project is written just for learning purposes and breaking changes are to be expected.
Here is an example of how to use shard to implement a word count program:
package main
import (
"fmt"
"strconv"
"strings"
"github.com/prxssh/shard"
"github.com/prxssh/shard/api"
"github.com/prxssh/shard/pkg/filesystem"
)
func main() {
// Create a new shard config.
cfg, err := shard.NewConfig(
shard.WithInputPath("input.txt"),
shard.WithMapper(Map),
shard.WithReducer(Reduce),
shard.WithFilesystem(filesystem.NewLocal()),
)
if err != nil {
panic(err)
}
// Run the shard job.
if err := shard.Run(cfg); err != nil {
panic(err)
}
}
// Map is a mapper that emits a count for each word.
func Map(key, value string, emit api.Emitter) error {
words := strings.Fields(value)
for _, word := range words {
if err := emit(word, "1"); err != nil {
return err
}
}
return nil
}
// Reduce is a reducer that sums the counts for each word.
func Reduce(key string, values api.Iterator, emit api.Emitter) error {
count := 0
for {
_, ok := values.Next()
if !ok {
break
}
count++
}
return emit(key, strconv.Itoa(count))
}Information for developers, including how to run tests and generate protobuf files.
To run the tests, use the following command:
make testTo generate the protobuf files, use the following command:
make gen-proto FILE=path/to/file.protoThis project is licensed under the MIT License. See the LICENSE file for details.