Skip to content
This repository has been archived by the owner on Nov 3, 2022. It is now read-only.

Archiving and Querying MongoDB collection data from S3 #132

Closed
pcinnusamy opened this issue Apr 30, 2021 · 0 comments
Closed

Archiving and Querying MongoDB collection data from S3 #132

pcinnusamy opened this issue Apr 30, 2021 · 0 comments

Comments

@pcinnusamy
Copy link

pcinnusamy commented Apr 30, 2021

@stefanprodan
Please help to guide me on the below archival process that how we can achieve this thru your approach

Description

Background:

MongoDB is being deprecated in favour of DocumentDB, therefore Product Engineering will be moving data to DocumentDB.After the move, the corresponding collections in Mongo will be archived for safekeeping
We are on Redshift, but planning to migrate to Snowflake in future.

Problem Statement:

The current archival process is to run mongodump on a collection, and finally produces a bson.gz file by a particular partition
However, this is not very access-friendly when data needs to be read back:
Have to run mongorestore on all required bson.gz files into a Mongo / DocumentDB cluster before querying it back

Questions to Resolve:

1.What should be the right process to archive data? When to archive? How frequent? By whom?
2.mongodump is being run on an EC2 instance, how should this be run without SSH-ing into a server? And runnable by all teams planning to archive their data.
3.What should be the right way to store archived data ?
4.How should the dump (if any) be partitioned ?
5.What is the right output format to improve readability? What tool (query engine) can we read it with?
6.How should a user read the data back, when this is needed ?
7.Data organisation structure should assume working with multiple pods, applications, tables / collections, etc

Other Contexts:

We are on Redshift, but planning to migrate to Snowflake (do these support querying the archived data, if yes, how?)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants