Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RT-5] Add migration script to examples #23

Merged
merged 2 commits into from
Oct 8, 2021
Merged

[RT-5] Add migration script to examples #23

merged 2 commits into from
Oct 8, 2021

Conversation

susodapop
Copy link
Collaborator

@susodapop susodapop commented Apr 17, 2020

Usage

See DEVELOPING THIS SCRIPT in the readme.

Type of PR

  • Add a script

Description

This effort adds a revised version of this gist that uses the redash_toolbelt client, supports recent versions of Redash and migrates all of the content (the gist was skipping alerts and favorites).

Steps

  • Convert to Python3 with 2to3
  • Refactor to use client.py
  • Add support for migrating alerts
  • Add support for migrating favorites
  • Add a "dry run" / "read only" option
  • Add script entry point to pyproject.toml
  • Publish documentation / examples

Expanded scope

  • Create stub data sources on destination instance
  • Create stub alert destinations on destination instance
  • Don't require manual entry of destination admin credentials (currently working on this 7 Oct 21)

Related Tickets & Documents

#5

return get_paginated_resource(path, api_key)


def import_users():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might fail due to rate limit (50/hr, 200/day). Worth at least documenting this if not add a backoff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note to client.py about this. In the official instructions I'll recommend disabling rate limits for the migration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to disable this, looks like it's hardcoded here.

As a side note, could it be useful to log the exception to help debug this issue?

        try:
            response = dest_client._post(f"api/users?no_invite=1", json=data)
        except Exception as ex:
            print(
                f"Could not create user: {ex}"
            )
            continue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to disable this, looks like it's hardcoded here.

You can disable rate limits with this setting.

log the exception to help debug this issue?

This is a good idea. Will implement it.

Comment on lines 281 to 283
for p in options.get('parameters', []):
if 'queryId' in p:
p['queryId'] = meta['queries'].get(str(p['queryId']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might fail if the query has no values. Worth triggering a refresh, wait for it to complete (with some timeout?) and then continue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitate to kick this off synchronously since we a) don't want to hammer databases with queries b) delay the rest of the migration if these queries are long-running.

@JSpenced
Copy link

@arikfr I had a go at testing some of this and there are a large number of issues that came up when trying to migrate. Their were some small formatting issues and incorrect calls to get/post, but the biggest one was that it didn't seem to recognize pending invitations to users as real users. So when you try to import queries after moving all the users over. The part with the following doesn't work correctly:

    user_api_key = get_api_key(orig_client, query["user"]["id"])
    user_client = Redash(DESTINATION, user_api_key)
    response = user_client._post("/api/queries", json=data)

It gives a 400 error I think because technically that user doesn't exist yet. Is there any way around that or to face those users to be real users and not pending?

@susodapop
Copy link
Collaborator Author

Hi @JSpenced, thanks for your efforts! This PR is still WIP so it definitely doesn't work in its current form. We'll post both here and on our user form (discuss.redash.io) once it's ready for testing (sometime in August, probably).

@susodapop
Copy link
Collaborator Author

Force pushed changes from local dev.

@ozgenbaris1
Copy link

Hi, first of all thanks for your efforts!

  1. Will you add a feature for importing disabled users? It seems the queries that have been created by disabled users cannot be migrated.
  2. Please consider importing all the content (queries/users) in reversed created_at order. Because e.g. the latest created query/user in the origin instance appears in the last page of the destination instance.

Regards

@susodapop susodapop changed the title Add migration script to examples [RT-5] Add migration script to examples Aug 25, 2021
@bmtKIA6
Copy link
Contributor

bmtKIA6 commented Aug 27, 2021

Missing extra step after migrate queries and before dashboards

  • redash-migrate visualizations

@susodapop
Copy link
Collaborator Author

Missing extra step after migrate queries and before dashboards

  • redash-migrate visualizations

Fixed! Thanks :)

@susodapop
Copy link
Collaborator Author

Hi, first of all thanks for your efforts!

  1. Will you add a feature for importing disabled users? It seems the queries that have been created by disabled users cannot be migrated.
  2. Please consider importing all the content (queries/users) in reversed created_at order. Because e.g. the latest created query/user in the origin instance appears in the last page of the destination instance.

Regards

I incorporated the chronological import for queries in 8a65452. Can you see how well this works for your needs?

Yes, I think we can import disabled users. Just requires a separate request to the UserListResource. Should we disable them again in the new instance?

@bmtKIA6
Copy link
Contributor

bmtKIA6 commented Aug 28, 2021

Yes, I think we can import disabled users. Just requires a separate request to the UserListResource. Should we disable them again in the new instance?

I think if the goal of the migration script is to get the destination instance as close as possible to the original state, then I would expect the disabled users to be migrated too.

At the moment, if the user is disabled, the queries that are owned by that user are not migrated either.

@ozgenbaris1
Copy link

I incorporated the chronological import for queries in 8a65452. Can you see how well this works for your needs?

This is exactly what we need but in my opinion, implementing this feature to all migrated content (queries, users, dashboards etc.) will be much better..

@susodapop
Copy link
Collaborator Author

This is exactly what we need but in my opinion, implementing this feature to all migrated content (queries, users, dashboards etc.) will be much better..

Done! See 623323d

@susodapop
Copy link
Collaborator Author

I think if the goal of the migration script is to get the destination instance as close as possible to the original state, then I would expect the disabled users to be migrated too.

Agreed. Disabled user import (and a command to disable them again) is part of f8a6f6e

@susodapop
Copy link
Collaborator Author

I'd like to merge these changes and put out documentation within the next week in advance of the V10 OSS Redash release. If you have more feedback best to provide it sooner than later 👍

redash_toolbelt/client.py Outdated Show resolved Hide resolved
@smaraf
Copy link

smaraf commented Sep 27, 2021

I completed migrating data from the hosted version to a self-hosted OSS version (running on v8.0). Using the migration scripts was easy and I did not run into any errors. I noticed a few things that did not get migrated in the process or were incomplete:

  • Alerts
    • Alert destinations - We have different Slack channels configured as a destination for all alerts and they did not get migrated when running redash-migrate alerts
    • Alert configuration - This is a mixed bag and I did not find a pattern. For all alerts, the operator is not set in the destination. For some, the value column, reference value, rearm seconds, or a mix is not set in the destination.
  • Visualizations
    • Visualization configuration - Any customizations to a visualization have not been migrated. Such customizations, in a table visualization, are column names, display as type & format, column order, and whether each column should be displayed.
  • Queries
    • For queries using a Date Range parameter with a default value of 30/90 days, the default value is lost in the migration. 30/90 days is not an option for a date range parameter default in the destination version, v8.0.

@ChloeBellm
Copy link

ChloeBellm commented Sep 28, 2021

Really grateful to have this script! We've completed migrating from hosted (version: 7b026a19) to AWS hosted using the AMI (v8.0.0). This script worked well, with 2 issues we've spotted so far:

  1. 5 of our queries failed. We noticed they all had parameters defined, not sure if there were issues with how these were defined - possible due to the params that were set to come from another query? (Resolved by copying manually)

  2. Descriptions in vizualisations have not been copied over. However, further than this we have noticed a difference in the two environments that means we can't even add the descriptions manually. Looks like the hosted version is using v9, but the latest AMI is v8 :(

Thanks all!

@susodapop
Copy link
Collaborator Author

I completed migrating data from the hosted version to a self-hosted OSS version (running on v8.0). Using the migration scripts was easy and I did not run into any errors. I noticed a few things that did not get migrated in the process or were incomplete:

  • Alerts

    • Alert destinations - We have different Slack channels configured as a destination for all alerts and they did not get migrated when running redash-migrate alerts
    • Alert configuration - This is a mixed bag and I did not find a pattern. For all alerts, the operator is not set in the destination. For some, the value column, reference value, rearm seconds, or a mix is not set in the destination.
  • Visualizations

    • Visualization configuration - Any customizations to a visualization have not been migrated. Such customizations, in a table visualization, are column names, display as type & format, column order, and whether each column should be displayed.
  • Queries

    • For queries using a Date Range parameter with a default value of 30/90 days, the default value is lost in the migration. 30/90 days is not an option for a date range parameter default in the destination version, v8.0.

@smaraf Thanks for your feedback. Important note: this migration script is only intended for migrating to OSS V10 (currently in beta). Would love to hear feedback from you if you've tried to move to V10. Your viz customisations should migrate just fine. Ditto the date-range parameters.

I'm wrapping up the alert destination migration code right now.

@susodapop
Copy link
Collaborator Author

Really grateful to have this script! We've completed migrating from hosted (version: 7b026a19) to AWS hosted using the AMI (v8.0.0). This script worked well, with 2 issues we've spotted so far:

  1. 5 of our queries failed. We noticed they all had parameters defined, not sure if there were issues with how these were defined - possible due to the params that were set to come from another query? (Resolved by copying manually)
  2. Descriptions in vizualisations have not been copied over. However, further than this we have noticed a difference in the two environments that means we can't even add the descriptions manually. Looks like the hosted version is using v9, but the latest AMI is v8 :(

Thanks all!

@ChloeConnor Thanks for your feedback. This migration script is only targeted at OSS V10. I would expect different things to break if trying to move to an older API.

The AMI's will be updated to V10 once we release it (soon!).

@justinclift
Copy link
Member

justinclift commented Sep 30, 2021

to AWS hosted using the AMI (v8.0.0).

As a data point with that, make sure you have a security group set on the VM to only allow incoming traffic on the right ports. eg ssh and https.

Docker can muck around with the firewall rules on the host it's running on, so having an AWS Security Group providing a safety layer is a good idea. 😄

@justinclift
Copy link
Member

@susodapop Would it feasible to enable the wiki for this repo?

That'd make for a fairly wasy way to document the migration process, and let anyone update it as the migration script is improved.

@smaraf
Copy link

smaraf commented Oct 4, 2021

Important note: this migration script is only intended for migrating to OSS V10 (currently in beta). Would love to hear feedback from you if you've tried to move to V10. Your viz customisations should migrate just fine. Ditto the date-range parameters.

@susodapop Thank you for the feedback! I've updated to v10 (which I was excited to see came out of beta 🥳 ) and re-ran the migrations. I can confirm that the date-range params are no longer an issue and they've been migrated correctly. Unfortunately, I'm still seeing the same results on the Visualization Configurations. Column names, their format, and whether they should be displayed have not been migrated to v10.0.0.

@susodapop
Copy link
Collaborator Author

Unfortunately, I'm still seeing the same results on the Visualization Configurations. Column names, their format, and whether they should be displayed have not been migrated to v10.0.0.

@smaraf Thank you for the valuable feedback.

😦 Can you provide more details about the exact steps you followed? Can you share an example visualisation configuration that was not moved? I'm going to see if I can reproduce. Would like to publish the migration script this week.

@susodapop
Copy link
Collaborator Author

The WIP alert destinations script is now up. I've also added a README file for this script.

Now that the V10 release is up I'm using that as my target for testing. There are a few issues with destinations to sort out yet, which I will be working through this afternoon (CST). Targeting Friday this week for the final release of redash-migrate.

I still have not managed to replicate @smaraf 's issue with visualisation definitions not moving. Would appreciate help if anyone knows reproduction steps.

@susodapop
Copy link
Collaborator Author

susodapop commented Oct 6, 2021

@susodapop Would it feasible to enable the wiki for this repo?

That'd make for a fairly wasy way to document the migration process, and let anyone update it as the migration script is improved.

@justinclift There is now a README specific to this script. Contributors are welcome to pull request changes to it. We don't use the wiki feature on any of our repositories but this could be a good idea in the future. Will add it to the slate of decisions to be made by the community over the next few months.

@justinclift
Copy link
Member

justinclift commented Oct 6, 2021

@susodapop No worries. The wiki idea is mostly because it's a fairly well known/understood approach for letting Community members document things, and the Redash Community has the right kind of constructive attitude to make it work.

Also, I've personally been taking notes as I've been using the migrate.py script to transfer customer's data from the hosted Redash servers to our Newdash infrastructure. Having a wiki to put the useful info in would be a lot easier (my perspective) than having to make things neat and create PR's. Much faster iteration time that way.

@susodapop
Copy link
Collaborator Author

@susodapop No worries. The wiki idea is mostly because it's a fairly well known/understood approach for letting Community members document things, and the Redash Community has the right kind of constructive attitude to make it work.

Also, I've personally been taking notes as I've been using the migrate.py script to transfer customer's data from the hosted Redash servers to our Newdash infrastructure. Having a wiki to put the useful info in would be a lot easier (my perspective) than having to make things neat and create PR's. Much faster iteration time that way.

Good points! After the Hosted Redash EOL we'll need to consider a wiki more fully. Would be happy to review your notes and make sure we've covered any relevant portions in our docs if you want to share either as an issue on this repo or you can email me.

If the context helps: we have a huge makeover in the works for our documentation as well 😃

Copy link
Member

@arikfr arikfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐥

@justinclift
Copy link
Member

justinclift commented Oct 7, 2021

No worries @susodapop. 😄

Probably the main things I've so far found a bit weird, is the having to manually create data sources in the destination, then a mapping for them in the meta.json, then ditto for the admin user.

To me, those should all be automatically done by the script. It should pretty much assume the destination is a new Redash, with just the initial admin user created. Then take it from there.

As a workaround, I've been manually pulling the info, then plugging it into the meta.json. eg:

$ curl -H "Authorization: api_key_here" http://app.redash.io/slug/api/users | jq
$ curl -H "Authorization: api_key_here" http://app.redash.io/slug/api/data_sources | jq

So far, I haven't yet seen a way of extracting the complete data source info (including pw) via the api. So, it doesn't seem possible to create a proper backup of an install this way.

That being said, it should still be possible to have the script pull over stub details for each source (along with queries, etc), and have the admin user fill them out after the migration script has completed.

With the general caveats about making sure data sources don't actually try connecting to things over a network until there's a secure encryption layer in place (ssl, ssh, vpn, whatever).

@susodapop
Copy link
Collaborator Author

No worries @susodapop. 😄

Probably the main things I've so far found a bit weird, is the having to manually create data sources in the destination, then a mapping for them in the meta.json, then ditto for the admin user.

To me, those should all be automatically done by the script. It should pretty much assume the destination is a new Redash, with just the initial admin user created. Then take it from there.

As a workaround, I've been manually pulling the info, then plugging it in. eg:

$ curl -H "Authorization: api_key_here" http://app.redash.io/slug/api/users

$ curl -H "Authorization: api_key_here" http://app.redash.io/slug/api/data_sources

So far, I haven't yet seen a way of extracting the complete data source info (including pw) via the api. So, it doesn't seem possible to create a proper backup of an install this way.

That being said, it should still be possible to have the script pull over stub details for each source (along with queries, etc), and have the admin user fill them out after the migration script has completed.

With the general caveats about making sure data sources don't actually try connecting to things over a network until there's a secure encryption layer in place (ssl, ssh, vpn, whatever).

Good feedbacks all @justinclift! If you look at the latest commits to this script (from the past couple weeks) you'll see we're now creating stub data sources.

The API purposefully doesn't reveal passwords. No way around that one.

@justinclift
Copy link
Member

The API purposefully doesn't reveal passwords. No way around that one.

Yeah. Sounds like a safety measure, to stop exfiltration if something goes wrong (or along similar lines).

That being said, it stops the possibility of having a true backup/restore tool. At least through the API in it's current state.

Still, being able to grab most stuff is 90% of the way there, and that's good enough for our purposes. 😄

@susodapop
Copy link
Collaborator Author

@justinclift

Yeah. Sounds like a safety measure, to stop exfiltration if something goes wrong (or along similar lines).

Correct. This will not change.

That being said, it stops the possibility of having a true backup/restore tool.

This is only true for hosted Redash accounts, which won't exist in a couple months. For the vast majority of Redash admins, the backup/restore process is painless: just dump the database to a file. After 30 November 2021 there's no reason to maintain the migration script other than to demonstrate what's possible with redash-toolbelt generally 👍

@susodapop
Copy link
Collaborator Author

I just pushed the addition to the alerts command which migrates subscriptions from origin to destination. If anyone has a chance to test this that would be marvellous. It works well in my test environment at least.

@smaraf
Copy link

smaraf commented Oct 11, 2021

@susodapop Sorry for the delayed response here. Thank you for the improvements for migrating alerts and for releasing the migration toolkit. This has been a huge help for me.

I re-ran migrations this morning and made a simple example to showcase the visualization configurations that did not get moved over to my destination. In the screenshot below, see the source on the left and destination on the right. In this example, I customized the column name and description.

Screen Shot 2021-10-11 at 10 36 05

@susodapop
Copy link
Collaborator Author

Thanks @smaraf! Working on a fix for this. I opened #69 to track the effort. Since the beta is now closed further feedback should be shared as an issue.

@getredash getredash locked as resolved and limited conversation to collaborators Oct 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.