BigData

Learn Big Data through its Python (PySpark) API by running the Jupyter notebooks with examples on how to read, process and write data.

Quick Start

Cluster overview

Application	URL
Hadoop	localhost:9870
MapReduce	localhost:8089
HUE	localhost:8088
Mongo Cluster	localhost:27017
Kafka Manager	localhost:9000
JupyterLab	localhost:8888
Spark Master	localhost:8080

Prerequisites

Install Docker and Docker Compose

Build from Docker Hub

Download the source code or clone the repository
Build the cluster

docker-compose up -d
./config.sh

Remove the cluster by typing

docker-compose down

Tech

Hadoop

Apache Spark Standalone Cluster

Mongo Sharded Cluster

WARNING (Windows & OS X)

The default Docker setup on Windows and OS X uses a VirtualBox VM to host the Docker daemon. Unfortunately, the mechanism VirtualBox uses to share folders between the host system and the Docker container is not compatible with the memory mapped files used by MongoDB (see vbox bug, docs.mongodb.org and related jira.mongodb.org bug). This means that it is not possible to run a MongoDB container with the data directory mapped to the host.

– Docker Hub (source here or here)

Mongo Components

Config Server (3 member replica set): configsvr01,configsvr02,configsvr03
3 Shards (each a 3 member PSS replica set):
- shard01-a,shard01-b, shard01-c
- shard02-a,shard02-b, shard02-c
- shard03-a,shard03-b, shard03-c
2 Routers (mongos): router01, router02

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
build/spark		build/spark
model		model
scripts		scripts
spark-lab		spark-lab
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.sh		config.sh
docker-compose.yml		docker-compose.yml
hadoop.env		hadoop.env
hue.ini		hue.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigData

Contents

Quick Start

Cluster overview

Prerequisites

Build from Docker Hub

Tech

Hadoop

Apache Spark Standalone Cluster

Mongo Sharded Cluster

WARNING (Windows & OS X)

Mongo Components

References

About

Releases

Packages

Languages

fixcer/bigdata

Folders and files

Latest commit

History

Repository files navigation

BigData

Contents

Quick Start

Cluster overview

Prerequisites

Build from Docker Hub

Tech

Hadoop

Apache Spark Standalone Cluster

Mongo Sharded Cluster

WARNING (Windows & OS X)

Mongo Components

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages