-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script to export database #36
Conversation
Some extra code can be removed after some days. Its just there so that all previous don't get resent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @iakshat Thank you very much for working on this crucial issue - we really needed this migration to be done and the deadline is close. I also liked how there are so many debugging prints in the script to make sure we know right when things go wrong.
PS. I'll do an in-depth review once we have generalized the migration script.
@thealphadollar Now the migration script is independent of the main file and accepts two raw inputs for OLD_URI and NEW_URI. I couldn't make them script arguments as python thinks of URIs as some directory locations and throws an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Akshat, I've left a few more comments on the code structure. There are three major updates that would be required:
- Having a CLI workflow, for instance, an argument
--try-defaulters
would just ask for a new DB URI, and running without it will require both. You may choose to do it as you feel better, this is just an example. - Adding the instructions about what the script does, various arguments, how to use the script, and anything else you seem fit in the
README.md
. - Letting us know that you have tested the script with two MongoDB instances and providing any test results or anomalies that you noticed and we should be aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I am not sure why we need a script, MongoDB has native helpers for doing this: mongoexport
and mongoimport
. I don't think we should use a hand-made script as there are several corner cases which won't be handled (eg: Connection error will cause us to clean destination DB and restart the script; etc)
As long as a host can connect to both databases, we can simply dump the original DB and clone it into the new DB. ✅
In general, changing master DBs is done with downtime and the following process is followed:
- Stop service (downtime starts)
- Export original DB using
mongoexport
- Import into new master DB using
mongoimport
- Update environment variable
MONGODB_URI
(say) to the new DB's host - Restart service (downtime ends)
It's completely fine to have a few minutes of downtime during this upgrade. It is the safest way to ensure that we don't mess something up during the upgrade.
In this PR, there are some changes to update.py
, I am not sure what they are or why they are in this PR 🙇♂️ Master DB changes should not require any application changes 😌
@icyflame I am really not aware of the process usually carried out at these times so I felt this was a good way as this ensured a zero downtime for the script as people generally depend on it completely. But if this method is prone to errors then it's really a better way to stop the script for some time and keep the data safe. @thealphadollar Please guide me if I should move ahead with the changes you suggested or should scrape this pr and workout the way with mongoexport. Afaik @mukul-mehta looked into this way, he'll be able to give a better insight into this. Regarding the script :
I'm not sure why we'll need to clean the destination DB. The script I have written is handling such problems and if somehow connection is lost, we can just restart the script with no problems of duplication.
Sorry for this but the changes that are showing up in update.py are already in the main of this repo. They should really not show up here 😕 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @thealphadollar, I will change my review status to Comment
🙆♂️
@iakshat Thanks for working on this! In the future, please do search for existing solutions that are native or near-native and built or associated with the project, before deciding to roll your own solution. These native solutions are well tested (unit tests), maintained by the community, well-documented and used by a lot of people in production environments. This ensures that there are no corner cases that one would run into when running them in production. This is extremely important as database related operations are high risk. 😅 In general, running a well-tested tool that is used by several other people is safer than running a script that was written by hand and tested a few times in controlled test environments. |
@iakshat Please go ahead with the changes I had requested - I and @icyflame have discussed and definitely it is not a good idea to reinvent the wheel if an easier method existed already. I see you've taken considerable caution for errors and even implemented error handling and resumption in case of issues - that is great and for this purpose, your script seems to be doing fine. It will also help us avoid installing mongo and associated import-export function, and that's an added plus ➕ Let's finish this script with the required changes, thorough testing, and add all instructions to readme so that we can do the migration by the end of the week. PS. I'd like to understand your thought process as to what made you write the script if you were aware of an existing solution. This will help us guide each other better and know on a deeper level in the future. |
@iakshat Thanks for the PR! +1 to using their recommended migration tool instead, as @thealphadollar had linked previously https://docs.mlab.com/how-to-migrate-sandbox-databases-to-atlas/ Even if we have some code already, and even if we've reviewed it, it's impossible to write bug-free code, and the safer option is to use a well-tested migration tool written by people with more expertise and time than you and me :) After transferring data to the destination db, and before switching to it, it would be good to carry out some sanity checks on the source and destination databases:
Also, storage for Atlas is capped at 512MB for their free tier, what's our current usage? |
@iakshat I too have talked to @thealphadollar 😆 My comment is not a blocker for this PR, please go ahead (with appropriate caution)! |
@icyflame @thealphadollar @amrav Thank you for all your insights. I am really learning a lot new things working on these issues! I too feel that it should have been done with a less error-prone tool which has already been properly tested. To be honest I was a bit lazy in not properly checking out each step of the suggested export procedure(which had way too many things 🤕) and I felt that it would be easier to write this script since the database has only one collection. I thing we can try this way and if anything goes wrong we can use the mLab prescribed procedure here as mentioned by @amrav . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Thanks for making the changes - below are some queries I've.
- Should we include a
requirements.txt
file? It will help install all the required documents. - Do you think specifying the python version would be helpful in the README.md?
- I propose that we create a new folder with the script, requirements file, and a README.md specific to script. In the root README.md we would just mention that for migration, look into the folder. This was we can have all the related code at one place.
Let me know what you think of the above suggestions. Let's aim for a migration by the end of the week (8th November, Sunday).
Co-authored-by: Shivam Kumar Jha <shivam.cs.iit.kgp@gmail.com>
Included the suggested changes to reorganise the migration script in another directory and all other minor changes suggested. If it's possible please include requirements.txt because my environment adds some un-used libraries to fresh envs cause of some setup issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I have some more doubts - and I think we might have forgot to update the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! This is beautiful <3 I'm now trying to use this to migrate MFTP.
@iakshat Can we please update the README.md with the latest environment variables details and the fact that we have migrated to mLab? The document is outdated at the moment. |
This PR intends to solve Issue #33 to export from mLab. I've included a simple script to collect each notice from the original database and send it one by one to the terget database.
Proposed Step by Step Process:
If there's some problems while running the export_db function we can run it multiple times and the error handling will sort out what to do.
The insertion of documents includes _id of the original mLab database so we'll actually be replicating the same database in Atlas.
Once a database is ready in Atlas we can test it with a local mftp script without export db scripts and see if there's some problems.
PS: I think that we'll not need to run the reprocessing defaulters part as there's very little chance of defaulting since almost all lines of code are api calls to various databases
PS: Please ignore the changes made to update.py, I forgot to take a pull before pushing my changes 😰