Skip to content

kwyoke/DDBS-2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DDBS Project 2020

In this project, we demonstrate how to set up and manage a distributed database system using MongoDB. We are given 10GB of structured data in the form of three JSON files (user.dat, article.dat, read.dat) and also unstructured data in the form of jpg, flv and txt files associated with each article. We are required to populate three more collections (db.read, db.beread, db.popRank) by aggregating the given raw data files. We then distribute each collection across two DBMS sites, dbms1shard and dbms2shard, which are simulated by docker containers. The unstructured multimedia data are distributed in two other GridFS servers, grid1shard and grid2shard. All sharded collections can be loaded into the mongos server which acts like the data centre, where queries, inserts and updates on each collection can be made even though the collections reside on other servers. The status of the shard clusters can be easily monitored using MongoDB's utilities.

Prerequisites to install

I implemented this project on a Ubuntu 18.04 machine, following instructions from YouTube tutorial series for MongoDB to install MongoDB and docker. This tutorial series is also good for understanding the basics in MongoDB.

  1. Follow the official MongoDB documentation to install for your machine type. You will also need to install mongofiles if it does not come with your MongoDB installation.

  2. Follow the official Docker documentation to install docker for your machine type. You will also need to install docker-compose if it does not come with the Docker installation.

  3. Install pymongo as we will be using python scripts to interact with the MongoDB collections.

  4. (Optional) I used the command line throughout in this project, but MongoDB comes with GUIs like MongoDB Compass that would make MongoDB more user friendly, so you can install if you wish to.

Demo

Watch the demo video to see DBMS in action

Documentation organisation

See executive_program/ for quick setup

Inside executive_program/, we created a Makefile that compiles all the required bash commands to set up all docker containers, populating and sharding collections, and storing multimedia data. More details on how to run the Makefile can be found inside the directory.

See mongoshell_tutorial/ for full tutorial and explanation

Inside mongoshell_tutorial/, we provided detailed instructions and explanations on how to set up the docker containers and shard clusters, how to populate and shard the collections, how to automatically refresh data when there are modifications made to other related collections, and how to store and view multimedia data using GridFS.

About

DDBS project with mongodb

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published