Skip to content
This repository has been archived by the owner on Aug 25, 2023. It is now read-only.

Commit

Permalink
Merge pull request #2 from ocadotechnology/simplify_setup
Browse files Browse the repository at this point in the history
Simplify setup
  • Loading branch information
radkomateusz committed Jun 19, 2018
2 parents 8bf0f58 + 11cbac6 commit ff611fe
Show file tree
Hide file tree
Showing 2 changed files with 67 additions and 25 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ It's worth to underline that:
## How to list already created backups?
In order to find where is stored backup __Y__ for table __X__:
1. In Cloud Console visit [Datastore](https://console.cloud.google.com/datastore),
1. Check __Key literal__ for table _X_:
1. Find __Key literal__ for table _X_:
* Select __Table__ kind,
* Filter entities equal to _X.project_id_, _X.dataset_id_, _X.table_id_ or _X.partition_id_,
* Find table _X_ from results and copy _Key literal_,
Expand All @@ -159,11 +159,11 @@ There are several options to restore data, available from _\<your-project-id>_._
* __Restore whole dataset__ (_\<your-project-id>.appspot.com_/__ui/restoreDataset__). Parameters:
* Source project id: id of project where dataset is placed originally,
* Source dataset id: original dataset id,
* Target dataset id (optional): id of temporary dataset that will be used (and created if does not exist) as container for restored table. Remember that this will be a temporary dataset with expiration time set to 7 days. __Note that passed dataset could already exists - it should be in the same localisation as backup__.
* Target dataset id (optional): id of temporary dataset that will be used (and created if does not exist) as container for restored table. Remember that this will be a temporary dataset with expiration time set to 7 days. __Note that passed dataset could already exists - it should be in the same location as backup__.
If _target dataset id_ is not passed, then _source dataset id_ value will be used as a target dataset id in restoration project
* Max partition days (optional): number of days from partitioned tables will be restored (eg. 30 means that partitions from last 30 days will be restored),
* __Restore tables from list of backups__ (_\<your-project-id>.appspot.com_/__ui/restoreList__). Parameters:
* Target dataset id (optional): id of temporary dataset that will be used (and created if does not exist) as container for restored table. Remember that this will be a temporary dataset with expiration time set to 7 days. __Note that passed dataset could already exists - it should be in the same localisation as backup__.
* Target dataset id (optional): id of temporary dataset that will be used (and created if does not exist) as container for restored table. Remember that this will be a temporary dataset with expiration time set to 7 days. __Note that passed dataset could already exists - it should be in the same location as backup__.
If _target dataset id_ is not passed, then source dataset id value of each backup will be used as a target dataset id in restoration project.
In case of restoring backups from different datasets multiple target datasets will be created.
* Backup list: set of backups in __JSON__ format, each of them is designated by the url safe key of backup entity available from [Datastore](https://console.cloud.google.com/datastore). Example:
Expand Down
86 changes: 64 additions & 22 deletions SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Ownership of the GCP project with assigned billing (backups will be stored in that project).
* see [creating a project in GCP](https://support.google.com/cloud/answer/6251787?hl=en#) doc

### Instalation steps
### Installation steps

The easiest way is to use Google Cloud Shell - click button below. It opens GCShell and clones the repository.

Expand All @@ -14,78 +14,120 @@ The easiest way is to use Google Cloud Shell - click button below. It opens GCSh
* Note: It is possible to do it from local environment. But it requires installing Google Cloud SDK for Python (see at [installing Cloud SDK for Python](https://cloud.google.com/appengine/docs/standard/python/download))

Then you could follow below steps:
1. Open [./config/prd/config.yaml](./config/prd/config.yaml) and change all **\<your-project-id\>** to your previously created project id.
1. Export your project id:
```bash
export PROJECT_ID="<your-project-id>"
```

1. Change all **\<your-project-id\>** to your previously created project in [./config/prd/config.yaml](./config/prd/config.yaml) config file.
```bash
sed -i -e "s/<your-project-id>/${PROJECT_ID}/g" config/prd/config.yaml
```

1. Install dependency requirements
```bash
pip install -t lib -r requirements.txt
```
1. Deploy App Engine application
```bash
gcloud app deploy --project "<your-project-id>" app.yaml config/cron.yaml config/prd/queue.yaml config/index.yaml
```
```bash
gcloud app deploy --project ${PROJECT_ID} app.yaml config/cron.yaml config/prd/queue.yaml config/index.yaml
```

Note: If it is your first App Engine deploy, App Engine instance needs to be created and you will need to choose preferred localisation.
1. Grant IAM role **BigQuery Data Viewer** for App Engine default service account (*\<your-project-id\>@appspot.gserviceaccount.com*) to each project which should be backed up
Note: If it is your first App Engine deploy, App Engine needs to be initialised and you will need to choose [region/location](https://cloud.google.com/appengine/docs/locations).
1. Grant IAM role **BigQuery Data Viewer** for App Engine default service account (*\<your-project-id\>@appspot.gserviceaccount.com*) to each project which should be backed up, e.g.:
```bash
gcloud projects add-iam-policy-binding <project-id-to-be-backed-up> --member='serviceAccount:'${PROJECT_ID}'@appspot.gserviceaccount.com' --role='roles/bigquery.dataViewer'
```
* You can also grant this permission for the whole folder or organisation. It will be inherited by all of the projects underneath.

1. Congratulations! BBQ is running now. The backup process will start on time defined in *cron.yaml* file.
For more details look at *Usage* section in README.md.
1. Congratulations! BBQ is running now. The backup process will start on time defined in [cron.yaml](./config/cron.yaml) file.
You can also trigger it manually, for more details look at [Usage section](README.md#usage).

### Advanced setup
It is possible to manage what projects will be backed up using project IAMs and also using config.yaml file.
* **custom_project_list** - list of projects to backup. If empty, BBQ will backup everything it has read (**BigQuery Data Viewer**) access to. If list is provided you still need to grant **BigQuery Data Viewer** role for BBQ service account for each mentioned projects.
* **projects_to_skip** - list of projects to skip (it's recommended to skip BBQ project itself). It is useful when you grant **BigQuery Data Viewer** for BBQ service account for the whole organization or folder and want to exclude some of the projects.
* **backup_project_id** - project id where backups will be stored (it can also be the same project on which BBQ runs)
* **restoration_project_id** - project into which data will be restored by default (you can also define restoration destination directly while executing restoration)



### Local environment setup

Note: App Engine SDK has useful feature which allows to run App Engine application on your local computer.
Unfortunatelly, it does not provide any emulator for BigQuery so it is not possible to have BigQuery locally.
Unfortunately, it does not provide any emulator for BigQuery so it is not possible to have BigQuery locally.
Therefore, in order to have application working locally we need to have GCP project with BigQuery enabled.
All backups that was invoked on local application will end up in this project.

#### Steps

1. BBQ requires Python in version 2.7.x. Make sure correct version is set up on your PATH
```bash
python -V
```

1. Install Google Cloud SDK (see at [installing Cloud SDK for Python](https://cloud.google.com/appengine/docs/standard/python/download))

1. Clone repository to the localisation of your choice.
1. Run `gcloud init` to set up your account which will be used by BBQ

1. Grant yourself BigQuery Data Viewer role in project that you will backup and Editor role on main backup project where backups will be stored.
```bash
git clone git@github.com:ocadotechnology/bbq.git
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member='user:<name.surname@example.com>' --role='roles/editor'
gcloud projects add-iam-policy-binding <project-id-to-be-backed-up> --member='user:<name.surname@example.com>' --role='roles/bigquery.dataViewer'
```

1. Open [./config/local/config.yaml](./config/local/config.yaml) and change all **\<your-project-id\>** to your previously created project id.
1. Clone repository to the location of your choice and change the directory to bbq
```bash
git clone https://github.com/ocadotechnology/bbq
cd bbq
```

1. Export your project id
```bash
export PROJECT_ID="<your-project-id>"
```

1. Change all **\<your-project-id\>** to your previously created project id in [./config/local/config.yaml](./config/local/config.yaml).
```bash
sed -i -e "s/<your-project-id>/${PROJECT_ID}/g" config/local/config.yaml
```

1. Install dependency requirements
```bash
pip install -t lib -r requirements.txt
```

1. The BBQ will use your personal google account (see at [gcloud auth](https://cloud.google.com/sdk/gcloud/reference/auth/)), so grant yourself BigQuery Data Viewer IAM role in project that you will backup and Editor role on main backup project where backups will be stored.

1. Copy (or make link) ./config/local/queue.yaml , ./config/cron.yaml and ./config/index.yaml to main application folder (due to lack of possibility to pass full path to dev_appserver.py)
1. Link config files to main application folder (due to lack of possibility to pass full path to dev_appserver.py)
```bash
ln -s config/local/queue.yaml queue.yaml
ln -s config/cron.yaml cron.yaml
ln -s config/index.yaml index.yaml
```

1. Run command
```bash
dev_appserver.py app.yaml
```

1. Local instance of App Engine application (with own queues, datastore) should be running at: http://localhost:8080 You can also view admin server at: http://localhost:8000
1. To run backup process go to: http://localhost:8080/cron/backup
1. To run backup process go to: http://localhost:8080/cron/backup and sign in as administrator

#### Running unit tests

1. Clone repository
1. Install Google Cloud SDK (see at [installing Cloud SDK for Python](https://cloud.google.com/appengine/docs/standard/python/download))

1. Clone repository to the location of your choice and change the directory to bbq
```bash
git clone https://github.com/ocadotechnology/bbq
cd bbq
```

1. Install dependency requirements
```bash
pip install -t lib -r requirements.txt
pip install -r requirements_tests.txt
```
1. Run command

1. Run following Python command (you might need to update Google Cloud SDK path)
```bash
python test_runner.py --test-path tests/ -v --test-pattern "test*.py" <path to google cloud sdk>
python test_runner.py --test-path tests/ -v --test-pattern "test*.py" ./google-cloud-sdk
```

0 comments on commit ff611fe

Please sign in to comment.