Archiving and Querying MongoDB collection data from S3 #132

pcinnusamy · 2021-04-30T09:05:34Z

@stefanprodan
Please help to guide me on the below archival process that how we can achieve this thru your approach

Description

Background:

MongoDB is being deprecated in favour of DocumentDB, therefore Product Engineering will be moving data to DocumentDB.After the move, the corresponding collections in Mongo will be archived for safekeeping
We are on Redshift, but planning to migrate to Snowflake in future.

Problem Statement:

The current archival process is to run mongodump on a collection, and finally produces a bson.gz file by a particular partition
However, this is not very access-friendly when data needs to be read back:
Have to run mongorestore on all required bson.gz files into a Mongo / DocumentDB cluster before querying it back

Questions to Resolve:

1.What should be the right process to archive data? When to archive? How frequent? By whom?
2.mongodump is being run on an EC2 instance, how should this be run without SSH-ing into a server? And runnable by all teams planning to archive their data.
3.What should be the right way to store archived data ?
4.How should the dump (if any) be partitioned ?
5.What is the right output format to improve readability? What tool (query engine) can we read it with?
6.How should a user read the data back, when this is needed ?
7.Data organisation structure should assume working with multiple pods, applications, tables / collections, etc

Other Contexts:

We are on Redshift, but planning to migrate to Snowflake (do these support querying the archived data, if yes, how?)

stefanprodan closed this as completed Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archiving and Querying MongoDB collection data from S3 #132

Archiving and Querying MongoDB collection data from S3 #132

pcinnusamy commented Apr 30, 2021 •

edited

Archiving and Querying MongoDB collection data from S3 #132

Archiving and Querying MongoDB collection data from S3 #132

Comments

pcinnusamy commented Apr 30, 2021 • edited

Description

Background:

Problem Statement:

Questions to Resolve:

Other Contexts:

pcinnusamy commented Apr 30, 2021 •

edited