-
Notifications
You must be signed in to change notification settings - Fork 7
Conversation
Pull Request Test Coverage Report for Build 330
💛 - Coveralls |
request = { | ||
'project_id': app_id, | ||
'output_url_prefix': output_url_prefix, | ||
'entity_filter': entity_filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entity_filter could be inlined
dataset_id, date = source_gcs_bucket.split("//")[1].split("/") | ||
return { | ||
"projectId": configuration.backup_project_id, | ||
"location": "EU", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed dataset and job should be created in the same region as GAE app.
def get(self): | ||
logging.info("Scheduling export of Datastore entities to GCS ...") | ||
output_url = ExportDatastoreToGCSService\ | ||
.invoke(self.request, self.response)\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handler shouldn't pass request/response objects
logging.info("Scheduling export of Datastore entities to GCS ...") | ||
output_url = ExportDatastoreToGCSService\ | ||
.invoke(self.request, self.response)\ | ||
.wait_till_done(timeout=600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we simply raise an alert via error reporting if the whole request took more than 10 minutes?
app_id = configuration.backup_project_id | ||
url = 'https://datastore.googleapis.com/v1/projects/%s:export' % app_id | ||
|
||
output_url_prefix = cls.get_output_url_prefix(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use convention staging.project_id.appspot.com bucket, where lifecycle management is enabled. No need for configuration then
class ExportDatastoreToGCSService(webapp2.RequestHandler): | ||
@classmethod | ||
def invoke(cls, request, response): | ||
access_token, _ = app_identity.get_access_token( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using google-api-python-client for consistency with other APIs
finish_time = time.time() + timeout | ||
while time.time() < finish_time: | ||
logging.info( | ||
"Export from GCS to BQ - " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would log load_job_id here as well
src/appinfo.py
Outdated
return httplib2.Http(timeout=60) | ||
|
||
@staticmethod | ||
def __map(location_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just check prefixes, not every zone? They change quite often now
src/appinfo.py
Outdated
return "EU" | ||
if location_id in ASIA_LOCATIONS: | ||
return "Asia" | ||
return "UNKNOWN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Australia?
ExportDatastoreBackupsToGCSService().export(gcs_output_uri, kinds) | ||
|
||
logging.info("Loading Datastore backups from GCS to Big Query") | ||
LoadDatastoreBackupsToBigQueryService(now_date)\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here in case of timeout after error reporting is sent?
|
||
@staticmethod | ||
def __create_gcs_output_url(gcs_folder_name): | ||
app_id = configuration.backup_project_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be taken from current GAE app, backup project id could be different where we don't have access.
src/appinfo.py
Outdated
if location_id in EU_LOCATIONS: | ||
return "EU" | ||
if location_id in ASIA_LOCATIONS: | ||
return "Asia" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Asia there is only: asia-northeast1
https://cloud.google.com/bigquery/docs/dataset-locations
def __create_job_body(self, source_uri, kind): | ||
return { | ||
"projectId": configuration.backup_project_id, | ||
"location": "EU", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EU?
Should be the same as dataset id I think
|
||
self.response.headers['Content-Type'] = 'application/json' | ||
self.response.set_status(200) | ||
self.response.out.write(json.dumps({'status': 'success'})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not success in case of timeout
"Loading Datastore backups from GCS to BQ (jobId: %s) - " | ||
"waiting %d seconds for request to end...", load_job_id, PERIOD | ||
) | ||
time.sleep(PERIOD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move sleeping after check result logic to avoid waiting 1minute before checking for the first time as discussed
No description provided.