# Migration and sync of assets between prod and staging

## Summary

Currently the production API is the one that has the latest updated data by the WRI team. 
This notebook copies assets from `production` to `staging` maintening the match between IDs. Optionally, it would be possible to copy assets back from `staging` to `production`. 

### Steps:
1. upload/update assest to `production`
2. make a copy of the assests from `production` to `staging` using this script
3. synchronise the ids of the assets.


## Instructions

1. run the `Functions`.
2. create a list with the assets urls to copy.
3. `Processing` has the steps to carry out the migration. 

## Functions
These are the functions we need to create and synchronise assets from `staging` to `production`.

In [32]:
import getpass
import requests as re
import json
from datetime import datetime
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

In [66]:
staging_server = "https://staging-api.globalforestwatch.org"
prod_server = "https://api.resourcewatch.org"

In [4]:
class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKCYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'

In [5]:
def auth(env='prod'):
    serverUrl = {
        'prod': prod_server,
        'staging': staging_server
    }
    print(f'You are login into {bcolors.HEADER}{bcolors.BOLD}{env}{bcolors.ENDC}')
    with re.Session() as s:
        headers = {'Content-Type': 'application/json'}
        payload = json.dumps({ 'email': f'{input(f"Email: ")}',
                               'password': f'{getpass.getpass(prompt="Password: ")}'})
        response = s.post(f'{serverUrl[env]}/auth/login',  headers = headers,  data = payload)
        response.raise_for_status()
        print(f'{bcolors.OKGREEN}Successfully logged into {env}{bcolors.ENDC}')
    return response.json().get('data').get('token')

In [9]:
token = {
    'staging': auth('staging'),
    'prod':auth('prod')
}

You are login into [95m[1mstaging[0m


Email:  greta.carrete@vizzuality.com
Password:  ···············


[92mSuccessfully logged into staging[0m
You are login into [95m[1mprod[0m


Email:  greta.carrete@vizzuality.com
Password:  ···············


[92mSuccessfully logged into prod[0m


In [14]:
# @TODO 
# * Migrate one day the body payloads to data model classes and refactor to classes following inheritance and recursive property copies
# * Type function with Mypy
# * Add proper method descriptions
# * Refactor methods to reuse more code
#from typing import List
#from pydantic import BaseModel, parse_obj_as
# class DatasetModel(BaseModel):

# class LayerModel(BaseModel):

# class widgetModel(BaseModel):

# class metadataModel(BaseModel):
     
# class vocabularyModel(BaseModel):


In [15]:
def setTokenHeader(env, token=token):
    '''
    set up the token
    '''
    return {'Authorization':f'Bearer {token[env]}', 
            'Content-Type': 'application/json'}

def logResponseErrors(status_code, response = None, url = None, body = None):
    '''
    log errors in http calls
    '''
    if status_code !=200:
        logging.error(response)
        logging.error(response.text) if response else None
        logging.error(url) if url else None
        logging.error(json.dumps(body)) if body else None
    

def getAssets(url, payload):
    '''
    Get asset operation
    '''
    response = re.get(url, payload)
    
    logResponseErrors(response.status_code, response, url, payload)
    
    response.raise_for_status()
    
    return response.json()

def deleteAssets(url, headers):
    '''
    delete asset operation
    '''
    response = re.delete(url, headers = headers)
    
    logResponseErrors(response.status_code, response, url)
    
    response.raise_for_status()
    
    return response.status_code

def postAssets(url, body, headers, payloads = None):
    '''
    create asset operation
    '''    
    response = re.post(url, params = payloads, data=json.dumps(body), headers = headers)
    
    logResponseErrors(response.status_code, response, url, body)
    
    response.raise_for_status()
    
    return response.json()

def updateAssets(url, body, headers):
    '''
    patch asset operation
    '''
    response = re.patch(url, data=json.dumps(body), headers = headers)
    
    logResponseErrors(response.status_code, response, url, body)
    
    response.raise_for_status()
    
    return response.json()

def upsert(conditon = False):
    '''
    Return an update/post operation base on a condition
    '''
    if conditon:
        return updateAssets
    else:
        return postAssets
    
def recreateDataset(dataset, toEnv = 'prod', destinationDatasetId = None):
    '''
    Copy the dataset from one env to the other
    '''
    
    serverUrl = {
        'prod': prod_server,
        'staging': staging_server
    }
    if dataset.get('type')!='dataset':
        return None
    
    url = f'{serverUrl[toEnv]}/v1/dataset'
    
    if destinationDatasetId:
        url = f'{url}/{destinationDatasetId}' 
        
    body = {'dataset':{
        'application': dataset['attributes'].get('application'),
        'name': dataset['attributes'].get('name'),
        'connectorType': dataset['attributes'].get('connectorType'),
        'provider': dataset['attributes'].get('provider'),
        'published': dataset['attributes'].get('published'),
        'overwrite': dataset['attributes'].get('overwrite'),
        'env': dataset['attributes'].get('env'),
        'geoInfo': dataset['attributes'].get('geoInfo'),
        'protected': dataset['attributes'].get('protected'),
        'legend': dataset['attributes'].get('legend'),
        'widgetRelevantProps': dataset['attributes'].get('widgetRelevantProps'),
        'layerRelevantProps': dataset['attributes'].get('layerRelevantProps')
        }
    }
    headers = setTokenHeader(toEnv)
    
    if dataset['attributes'].get('provider') == 'cartodb':
            body['dataset']['connectorUrl'] =  dataset['attributes'].get('connectorUrl')
    
    if dataset['attributes'].get('provider') == 'gee':
            body['dataset']['tableName'] =  dataset['attributes'].get('tableName')
    if dataset['attributes'].get('mainDateField'):
        body['dataset']['mainDateField'] = dataset['attributes'].get('mainDateField')
    
    
    response = upsert(destinationDatasetId)
    
    logger.debug(response)
    if destinationDatasetId:       
        return response(url, body['dataset'], headers)
    else:
        return response(url, body, headers)
    
    
    

def recreateLayer(datasetId, layer, toEnv = 'prod', destinationLayerId = None):
    '''
    Copy the layer from one env to the other
    '''
    
    serverUrl = {
        'prod': prod_server,
        'staging': staging_server
    }
    if layer.get('type')!='layer':
        return None
    
    headers = setTokenHeader(toEnv)
    url = f'{serverUrl[toEnv]}/v1/dataset/{datasetId}/layer'
    
    if destinationLayerId:
        url = f'{url}/{destinationLayerId}'

    body = {
        'application': layer['attributes'].get('application'),
        'name': layer['attributes'].get('name'),
        'iso': layer['attributes'].get('iso'),
        'provider': layer['attributes'].get('provider'),
        'default': layer['attributes'].get('default'),
        'protected': layer['attributes'].get('protected'),
        'published': layer['attributes'].get('published'),
        'env': layer['attributes'].get('env'),
        'description': layer['attributes'].get('description'),
        'layerConfig': layer['attributes'].get('layerConfig'),
        'legendConfig': layer['attributes'].get('legendConfig'),
        'interactionConfig': layer['attributes'].get('interactionConfig'),
        'applicationConfig': layer['attributes'].get('applicationConfig'),
        'staticImageConfig': layer['attributes'].get('staticImageConfig')
    }
    
    response = upsert(destinationLayerId)
    return response(url, body, headers)

def recreateWidget(datasetId, widget, toEnv = 'prod', destinationWidgetId = None):
    '''
    Copy the widget from one env to the other
    '''
    
    serverUrl = {
        'prod': prod_server,
        'staging': staging_server
    }
    
    if widget.get('type')!='widget':
        return None
    
    headers = setTokenHeader(toEnv)
    url = f'{serverUrl[toEnv]}/v1/dataset/{datasetId}/widget'
    
    if destinationWidgetId:
        url = f'{url}/{destinationWidgetId}'
    
    body = {
        'application': widget['attributes'].get('application'),
        'name': widget['attributes'].get('name'),
        'description': widget['attributes'].get('description'),
        'verified': widget['attributes'].get('verified'),
        'default': widget['attributes'].get('default'),
        'protected': widget['attributes'].get('protected'),
        'defaultEditableWidget': widget['attributes'].get('defaultEditableWidget'),
        'published': widget['attributes'].get('published'),
        'freeze': widget['attributes'].get('freeze'),
        'env': widget['attributes'].get('env'),
        'queryUrl': widget['attributes'].get('queryUrl'),
        'widgetConfig': widget['attributes'].get('widgetConfig'),
        'template': widget['attributes'].get('template'),
        'layerId': widget['attributes'].get('layerId')
    }
    
    response = upsert(destinationWidgetId)
    return response(url, body, headers)

def recreateMetadata(datasetId, metadata, layerId=None, widgetId=None, toEnv = 'prod'):
    '''
    Copy the metadata from one env to the other
    '''
    
    serverUrl = {
        'prod': prod_server,
        'staging': staging_server
    }
    headers = setTokenHeader(toEnv)
    
    if metadata.get('type')!='metadata':
        return None
    if layerId and widgetId:
        raise Exception("layerId and widgetId not allowed at the same time")
    elif layerId:
        url = f'{serverUrl[toEnv]}/v1/dataset/{datasetId}/layer/{layerId}/metadata'
    elif widgetId:
        url = f'{serverUrl[toEnv]}/v1/dataset/{datasetId}/widget/{widgetId}/metadata'
    else:
        url = f'{serverUrl[toEnv]}/v1/dataset/{datasetId}/metadata'
    
    body = {
        'application': metadata['attributes'].get('application'),
        'language': metadata['attributes'].get('language'),
        'description': metadata['attributes'].get('description'),
        'source': metadata['attributes'].get('source'),
        'info': metadata['attributes'].get('info'),
    }
    if metadata['attributes'].get('name'):
        body['name'] = metadata['attributes'].get('name')
    
    try:
        response = upsert()
        return response(url, body, headers)
    except Exception as e:
        response = upsert(True)
        return response(url, body, headers)
        pass

def recreateVocabulary(datasetId, vocabulary, toEnv = 'prod'):
    '''
    Copy the vocabulary from one env to the other
    '''
    
    serverUrl = {
        'prod': prod_server,
        'staging': staging_server
    }
    
    if vocabulary.get('type')!='vocabulary':
        return None
    
    headers = setTokenHeader(toEnv)
    
    url = f"{serverUrl[toEnv]}/v1/dataset/{datasetId}/vocabulary/{vocabulary['attributes']['name']}"
    body = {
        'application': vocabulary['attributes'].get('application'),
        'tags': vocabulary['attributes'].get('tags')
    }
    
    response = postAssets(url, body, headers)
    return response

def getAssetList(fromEnv = 'prod', datasetList=None):
    '''
    Gets a list of assets from the selected env or from the constrained dataset list
    '''
    serverUrl = {
        'prod': prod_server,
        'staging': staging_server
    }
    headers = setTokenHeader(fromEnv)
    url = f'{serverUrl[fromEnv]}/v1/dataset'
    payload={
        'application':'rw',
        'status':'saved',
        'published':'false',
        'includes':'widget,layer,vocabulary,metadata',
        'page[size]':1613982331640
    }
    if datasetList:
        url = f'{serverUrl[fromEnv]}/v1/dataset/find-by-ids'
        body = {
            'ids': datasetList
        }
        return postAssets(url, body, headers, payload)
    else:
        return getAssets(url, payload)
    
def backupAssets(env = 'prod', datasetList = None):
    '''
    save a backup of production data just in case we need to recreate it again
    '''
    data = getAssetList(env, datasetList)
    

    with open(f'RW_{Env}_backup_{datetime.now().strftime("%Y%m%d-%H%M%S")}.json', 'w') as outfile:
        json.dump(data, outfile)

def deleteDataFrom(env='staging', datasetList = None):
    '''
    Deletes all assets from an env.
    '''
    serverUrl = {
        'prod': prod_server,
        'staging': staging_server
    }
    userConfirmation = input(f'{bcolors.WARNING}Are you sure you want to delete \
        {str(datasetList)  if datasetList else "everything" } in {env}:{bcolors.ENDC} \
        Y/n') or "N"
    if userConfirmation == 'Y':
        headers = setTokenHeader(env)
        data = getAssetList(env, datasetList)
        
        for dataset in data['data']:
            #@TODO: this needs to be reworked a bit
            try:
                logger.info(f"deleting {serverUrl[env]}/v1/dataset/{dataset['id']}... ")
                status = deleteAssets(f"{serverUrl[env]}/v1/dataset/{dataset['id']}", headers)
                    
            except re.exceptions.HTTPError as err:
                logger.error(err)
                pass
    else:
        print('nothing was deleted')

def assetIdToBeSync(sync, syncList, assetToSync, fromEnv, toEnv):
    '''
    controls the asset id to be sync
    '''
    if sync:
        return next((asset.get(f'{toEnv}Id') for asset in syncList \
                     if (asset.get('type') == assetToSync.get('type') \
                         and asset.get(f'{fromEnv}Id') == assetToSync.get('id'))), 
                    False)
    else:
        return None
    
def copyAssets(assetList, sync=False, fromEnv='prod', toEnv='staging'):
    '''
    Creates a new copy or syncs the assets that we set up in the fromEnv into the destination Env 
    '''
    if fromEnv == toEnv:
        raise NameError(f'fromEnv:{fromEnv} and toEnv:{toEnv} cannot be the same')
        
    if not assetList or len(assetList) == 0:
        raise IndexError(f'asset list is empty or not defined')
        
    
    dataAssets = []    
    
    if sync:
        newDatasetList = [asset[f'{fromEnv}Id'] for asset in assetList if asset['type'] == 'dataset']
        dataAssets = getAssetList(fromEnv, newDatasetList)

    else:   
        dataAssets = getAssetList(fromEnv, assetList)
    
    try:
        logger.info(f'{bcolors.OKBLUE}Preparing to {"sync" if sync else "copy"} from {fromEnv} to {toEnv}...{bcolors.ENDC}')
        resources = []
        
        # @TODO:
        # Improve loop performance with multiprocessing
        # move loops into reusable function based on type
        # For sync only path updated data
        
        for dataset in dataAssets['data']:
            
            toDatasetId = assetIdToBeSync(sync, assetList, dataset, fromEnv, toEnv)
            
            newDataset = recreateDataset(dataset, toEnv, toDatasetId)

            resources.append({
                'type': 'dataset',
                f'{fromEnv}Id':dataset.get('id'),
                f'{toEnv}Id': newDataset['data'].get('id')
            })

            for vocabulary in dataset['attributes'].get('vocabulary'):
                newVocabulary = recreateVocabulary(newDataset['data'].get('id'), vocabulary, toEnv)
                
                resources.append({
                'type': 'vocabulary',
                f'{fromEnv}Id':vocabulary.get('id'),
                f'{toEnv}Id': newVocabulary['data']
            })

            for layer in dataset['attributes'].get('layer'):
                
                toLayerId = assetIdToBeSync(sync, assetList, layer, fromEnv, toEnv)
                
                newLayer = recreateLayer(newDataset['data'].get('id'), layer, toEnv, toLayerId)
                
                resources.append({
                'type': 'layer',
                f'{fromEnv}Id':layer.get('id'),
                f'{toEnv}Id': newLayer['data'].get('id')
            })

            for widget in dataset['attributes'].get('widget'):
                
                toWidgetId = assetIdToBeSync(sync, assetList, widget, fromEnv, toEnv)
                
                newWidget = recreateWidget(newDataset['data'].get('id'), widget, toEnv, toWidgetId)
                
                resources.append({
                'type': 'widget',
                f'{fromEnv}Id':widget.get('id'),
                f'{toEnv}Id': newWidget['data'].get('id')
            })

            for metadata in dataset['attributes'].get('metadata'):
                
                newMetadata = recreateMetadata(newDataset['data'].get('id'), metadata, toEnv=toEnv)
                
                resources.append({
                'type': 'metadata',
                f'{fromEnv}Id':metadata.get('id'),
                f'{toEnv}Id': newMetadata['data']
            })
    except NameError or IndexError as e:
        logger.error(e)
        raise e
    except:
        pass
    
    if not sync:
        filename = f'RW_prod_staging_match_{datetime.now().strftime("%Y%m%d-%H%M%S")}.json'
        logger.info(f'creating sync file with name: {filename}')
        with open(filename, 'w') as outfile:
            json.dump(resources, outfile)
    
    logger.info(f'{bcolors.OKGREEN}{"sync" if sync else "copy"} process finished{bcolors.ENDC}')
        
def syncAssets(syncList, fromEnv='prod', toEnv='staging'):
    '''
    Allows sync of Assets
    '''
    
    return copyAssets(syncList, True, fromEnv, toEnv)

# Processing
## Get list of assets that we want to modify or sync

#### List of assets:

* `datasetsProd` will contain the id of the assets in productioon that need to be migrated to `staging`. We need to make sure that this list is in sync with the document we have shared with the assets.

### For testing purposes
Dummy assests to create `datasetsProd`

In [68]:
# Dummy data to test the notebook: creation of a dummy dataset with a layer in production.
toEnv = 'prod'
serverUrl = {
        'prod': prod_server,
        'staging': staging_server
    }
headers = setTokenHeader(toEnv)
urlDataset = f'{serverUrl[toEnv]}/v1/dataset'
bodyDataset = {'dataset':{
    'application': ['rw'],
    'name': 'This is a test',
    'connectorType': 'rest',
    'provider': 'cartodb',
    'published': False,
    'overwrite': False,
    'protected':False,
    'env': 'production',
    'connectorUrl': "https://wri-rw.carto.com/api/v2/sql?q=select * from air_temo_anomalies"
    }
}

responseDataset = postAssets(urlDataset, bodyDataset, headers)
responseDataset

{'data': {'attributes': {'application': ['rw'],
                         'attributesPath': None,
                         'clonedHost': {},
                         'connectorType': 'rest',
                         'connectorUrl': 'https://wri-rw.carto.com/api/v2/sql?q=select '
                                         '* from air_temo_anomalies',
                         'createdAt': '2021-05-20T16:09:57.922Z',
                         'dataLastUpdated': None,
                         'dataPath': None,
                         'env': 'production',
                         'errorMessage': None,
                         'geoInfo': False,
                         'layerRelevantProps': [],
                         'legend': {'binary': [],
                                    'boolean': [],
                                    'byte': [],
                                    'country': [],
                                    'date': [],
                                    'double': [],
       

In [69]:
urlLayer = f'{urlDataset}/{responseDataset["data"].get("id")}/layer'
bodyLayer = {
        'application': ['rw'],
        'name': 'test-121',
        'provider': 'cartodb',
        'default': True,
        'published': False,
        'env': 'production',
        'layerConfig': {
            "body": {}
            },
        'legendConfig': {},
        'interactionConfig': {},
        'applicationConfig': {}
    }
responseLayer = postAssets(urlLayer, bodyLayer, headers)
responseLayer

{'data': {'attributes': {'application': ['rw'],
                         'applicationConfig': {},
                         'createdAt': '2021-05-20T16:10:03.362Z',
                         'dataset': 'e4409726-1a7a-4267-9385-e4f8c80ced71',
                         'default': True,
                         'env': 'production',
                         'interactionConfig': {},
                         'iso': [],
                         'layerConfig': {'body': {}},
                         'legendConfig': {},
                         'name': 'test-121',
                         'protected': False,
                         'provider': 'cartodb',
                         'published': False,
                         'slug': 'test-121',
                         'staticImageConfig': {},
                         'updatedAt': '2021-05-20T16:10:03.362Z',
                         'userId': '59db4eace9c1380001d6e4c3'},
          'id': '308cf5f3-781b-4ce7-9174-c1eb1bd63d3e',
          'type': 'laye

In [70]:
# in the future we can automate this listing based on the doc using the google sheet api both for writing and reading from
# providing a sample of the list by printing it
datasetsProd = [responseDataset['data']['id']]

In [103]:
datasetsProd

['e4409726-1a7a-4267-9385-e4f8c80ced71']

### Backup Data in both environments

In [None]:
#backupAssets('prod')
#backupAssets('staging')

### Only do this if you want to clean data in staging. 
* You will need to be logged in

In [None]:
#deleteDataFrom()

### Copy resources from production to staging. 
The running time will depend on the size of the asset.   
Running this cell is only needed to create new assets from `production` to `staging`.
A json file is created with a unique name in local. The json files contains for each assest:
- type: this can be a "layer", a "dataset", a "widget", "vocabulary", "metadata"
- prodId: the id of the item in `production`
- stagingId: the id of the item in `staging`

In [37]:
copyAssets(datasetsProd)

INFO:root:[94mPreparing to copy from prod to staging...[0m
INFO:root:creating sync file with name: RW_prod_staging_match_20210520-172552.json
INFO:root:[92mcopy process finished[0m


### Open sync list of assets, match items with list and update them.

In [None]:
# use the printed json filename in the previous cell
with open('RW_prod_staging_match_20210520-172552.json') as json_file:
    syncList = json.load(json_file)

syncAssets(syncList, fromEnv='prod', toEnv='staging')

In [104]:
# delete testing datasets from both envs after testing:
# deleteDataFrom('prod', [responseDataset['data']['id']])
# deleteDataFrom('staging', [syncList[0]['stagingId']])

[93mAre you sure you want to delete ['e4409726-1a7a-4267-9385-e4f8c80ced71'] in prod:[0m Y/n Y


INFO:root:deleting https://api.resourcewatch.org/v1/dataset/e4409726-1a7a-4267-9385-e4f8c80ced71... 


[93mAre you sure you want to delete ['196588b3-7663-4f9c-8209-d037db8bd1bd'] in staging:[0m Y/n Y
