You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 3, 2022. It is now read-only.
@stefanprodan
Please help to guide me on the below archival process that how we can achieve this thru your approach
Description
Background:
MongoDB is being deprecated in favour of DocumentDB, therefore Product Engineering will be moving data to DocumentDB.After the move, the corresponding collections in Mongo will be archived for safekeeping
We are on Redshift, but planning to migrate to Snowflake in future.
Problem Statement:
The current archival process is to run mongodump on a collection, and finally produces a bson.gz file by a particular partition
However, this is not very access-friendly when data needs to be read back:
Have to run mongorestore on all required bson.gz files into a Mongo / DocumentDB cluster before querying it back
Questions to Resolve:
1.What should be the right process to archive data? When to archive? How frequent? By whom?
2.mongodump is being run on an EC2 instance, how should this be run without SSH-ing into a server? And runnable by all teams planning to archive their data.
3.What should be the right way to store archived data ?
4.How should the dump (if any) be partitioned ?
5.What is the right output format to improve readability? What tool (query engine) can we read it with?
6.How should a user read the data back, when this is needed ?
7.Data organisation structure should assume working with multiple pods, applications, tables / collections, etc
Other Contexts:
We are on Redshift, but planning to migrate to Snowflake (do these support querying the archived data, if yes, how?)
The text was updated successfully, but these errors were encountered:
@stefanprodan
Please help to guide me on the below archival process that how we can achieve this thru your approach
Description
Background:
MongoDB is being deprecated in favour of DocumentDB, therefore Product Engineering will be moving data to DocumentDB.After the move, the corresponding collections in Mongo will be archived for safekeeping
We are on Redshift, but planning to migrate to Snowflake in future.
Problem Statement:
The current archival process is to run mongodump on a collection, and finally produces a bson.gz file by a particular partition
However, this is not very access-friendly when data needs to be read back:
Have to run mongorestore on all required bson.gz files into a Mongo / DocumentDB cluster before querying it back
Questions to Resolve:
1.What should be the right process to archive data? When to archive? How frequent? By whom?
2.mongodump is being run on an EC2 instance, how should this be run without SSH-ing into a server? And runnable by all teams planning to archive their data.
3.What should be the right way to store archived data ?
4.How should the dump (if any) be partitioned ?
5.What is the right output format to improve readability? What tool (query engine) can we read it with?
6.How should a user read the data back, when this is needed ?
7.Data organisation structure should assume working with multiple pods, applications, tables / collections, etc
Other Contexts:
We are on Redshift, but planning to migrate to Snowflake (do these support querying the archived data, if yes, how?)
The text was updated successfully, but these errors were encountered: