Skip to content
This repository has been archived by the owner on Dec 31, 2019. It is now read-only.

Bug 1470942 - Support maven on S3 #163

Merged
merged 34 commits into from
Aug 13, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
d9df485
Bug 1470942 - Support maven on S3
JohanLorenzo Jun 25, 2018
e1b61ec
Use scriptworker's get_and_check_single_upstream_artifact_full_path
JohanLorenzo Jul 9, 2018
76aa949
Allow tasks without locales in upstream artifact
JohanLorenzo Jul 9, 2018
7b5901d
WIP
JohanLorenzo Jul 10, 2018
262e9b2
WIP 2. Still missing tests in test_zip. Still missing integration wit…
JohanLorenzo Jul 10, 2018
7943a29
Cover zip.py with
JohanLorenzo Jul 20, 2018
2747d97
Cover zip.py with 100% test coverage
JohanLorenzo Jul 20, 2018
f8ef32d
WIP3
JohanLorenzo Jul 23, 2018
0a4c86b
Test get_upstream_artifacts_with_zip_extract_param
JohanLorenzo Jul 23, 2018
5c292ba
All tests back to green
JohanLorenzo Jul 23, 2018
9f55e9c
wire artifacts_to_beetmove to move_beets
JohanLorenzo Jul 23, 2018
3f850cc
Fix wire
JohanLorenzo Jul 23, 2018
b638c20
Fix breakages found in end to end test (part 1)
JohanLorenzo Jul 23, 2018
a05f3c2
Fix breakages found in end to end test (part 2)
JohanLorenzo Jul 24, 2018
a58f7a4
Let maven give the full path to target.maven.zip
JohanLorenzo Jul 24, 2018
381d845
Fix bad max zip file
JohanLorenzo Jul 24, 2018
41382a7
pass right data structure to move_beets
JohanLorenzo Jul 25, 2018
af1f061
Log out expected files
JohanLorenzo Jul 25, 2018
abbabd3
Fix bad version number taken from payload
JohanLorenzo Jul 25, 2018
215ddb7
Path base name to move_beets
JohanLorenzo Jul 25, 2018
23b85d8
Log out what files were extracted
JohanLorenzo Jul 25, 2018
d9ed157
Fill empty raw_balrog_manifest
JohanLorenzo Jul 25, 2018
54b5368
Catch up missing coverage
JohanLorenzo Jul 25, 2018
39294f9
Rename constant ZIP_MAX_FILE_SIZE_IN_MB into DEFAULT_{}
JohanLorenzo Jul 30, 2018
0985fe8
Document check_and_extract_zip_archives()
JohanLorenzo Jul 30, 2018
570e4e5
Fix nit in code comment
JohanLorenzo Jul 30, 2018
e461f6a
Fix inconsistency regarding files_in_archive
JohanLorenzo Jul 30, 2018
54a6fde
Avoid using "file" keyword
JohanLorenzo Jul 30, 2018
32ecbe3
Rename files_in_archive into relative_paths_in_archive
JohanLorenzo Jul 30, 2018
d69ae80
Fix doc in is_maven_action()
JohanLorenzo Jul 30, 2018
2b66aa8
indent product in maven_geckoview.yml
JohanLorenzo Jul 30, 2018
44a526a
Rename maven.py into maven_utils.py
JohanLorenzo Jul 30, 2018
8c88f83
Fix bad config_example.json
JohanLorenzo Jul 30, 2018
f98517f
split push_to_maven() into 2 functions
JohanLorenzo Jul 31, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
16 changes: 16 additions & 0 deletions beetmoverscript/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,12 @@
'partner': [
'', # all legal
],
'maven': [
'maven2/',
],
'maven-staging': [
'maven2/',
],
}

# actions that imply actual releases, hence the need of `build_number` and
Expand All @@ -108,6 +114,10 @@
'push-to-partner',
)

MAVEN_ACTIONS = (
'push-to-maven',
)

# XXX this is a fairly clunky way of specifying which files to copy from
# candidates to releases -- let's find a nicer way of doing this.
# XXX if we keep this, let's make it configurable? overridable in config?
Expand Down Expand Up @@ -164,3 +174,9 @@
'target.dmg',
'target.apk',
)

# Zip archive can theoretically have a better compression ratio, like when there's a big amount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on the comment. Any reason why we're not moving these two ZIP_* related-constants within the script_config ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good call. I had forgotten about script_config. I added the max file size in mozilla-releng/build-puppet@bcef5fc.

I don't foresee a need to change the compression ratio as much as the max file size. So, I haven't added it. Moreover, I think increasing the ratio can be a bigger risk than just increasing the file size. If one day we realize we need to deal with zip files that are compressed more than 10 times, we probably want to revisit the ratio check within beetmover. That's why, I think we shouldn't expose that value in script_config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks for clarifying.

# of redundancy (e.g.: files full of zeros). Let beetmover only deal with regular cases. Edge cases
# are considered too suspicious, so we bail out on them.
DEFAULT_ZIP_MAX_FILE_SIZE_IN_MB = 100
ZIP_MAX_COMPRESSION_RATIO = 10
7 changes: 5 additions & 2 deletions beetmoverscript/data/beetmover_task_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
"type" : "string"
}
},
"required" : ["appName", "buildid", "appVersion", "hashType", "platform", "branch"]
"required" : ["appName", "buildid", "appVersion", "branch"]
},
"upstreamArtifacts": {
"type": "array",
Expand All @@ -79,9 +79,12 @@
"items": {
"type": "string"
}
},
"zipExtract": {
"type": "boolean"
}
},
"required": ["taskId", "taskType", "paths", "locale"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume you remove 'locale' to fit the maven jobs right? The old beetmover jobs are still having the locale. Maybe it'd be worth adding it as optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's what this change is about. locale is now optional

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

"required": ["taskId", "taskType", "paths"]
},
"minItems": 1,
"uniqueItems": true
Expand Down
5 changes: 4 additions & 1 deletion beetmoverscript/data/release_beetmover_task_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,12 @@
"items": {
"type": "string"
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: in the near future, in my backlog, I have a task to add integration tests for beetmover in such a way that we have beetmover schema for all possible beetmover jobs + actual tests against them. Like we currently have in bouncerscript or shipitscript. Right now we are reusing these two possible tasks but it's not enough.

"zipExtract": {
"type": "boolean"
}
},
"required": ["taskId", "taskType", "paths", "locale"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why trim locale here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geckoview tasks don't specify a locale to beetmove. That's why I removed it from the required fields. I'm fine putting it back and creating a different schema. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my fault from the very beginnings of beetmoverscript. In shipitscript and bouncerscript we have task schemas for each of the possible tasks and full integration tests as we should have! For beetmover unfortunately it is what it is. So to answer your question, ideally, we add a new schema so that we don't touch existing ones (like trimming the locale or hashType or platform to fit new maven tasks). Old beetmover jobs still have those, hence be tested against those correctly.

Again, this is not blocking so we can definitely follow-up in a separate PR. I have a backlog task to take over (hopefully) within declarative artifacts, to rewrite all the tasks to have integration tests for all possible beetmover jobs so this work will be undertaken anyway at some point this quarter so not sure it's worth time investing now. If it's easy for you to tweak tasks to fit to a new schema and not touch existing, that's be ideal, but again, we're going to do that sooner or later anyway for all tasks. Up to you, am fine either way 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got! I'd prefer to make it in a follow up. I think we're safe for now because beetmoverscript relies on locale being defined. So if a task def misses one, a KeyError will be raised somewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

"required": ["taskId", "taskType", "paths"]
},
"minItems": 0,
"uniqueItems": true
Expand Down
54 changes: 54 additions & 0 deletions beetmoverscript/maven_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import os

_MAVEN_ZIP_NAME = 'target.maven.zip'


def get_maven_expected_files_per_archive_per_task_id(upstream_artifacts_per_task_id, mapping_manifest):
task_id, maven_zip_full_path = _get_task_id_and_full_path_of_maven_archive(upstream_artifacts_per_task_id)

return {
task_id: {
maven_zip_full_path: _get_maven_expected_files_in_archive(mapping_manifest)
}
}


def _get_task_id_and_full_path_of_maven_archive(upstream_artifacts_per_task_id):
candidate_task_id = ''
candidate_path = ''

for task_id, upstream_definitions in upstream_artifacts_per_task_id.items():
for upstream_definition in upstream_definitions:
for path in upstream_definition['paths']:
if path.endswith(_MAVEN_ZIP_NAME):
if candidate_task_id:
raise ValueError(
'Too many upstream artifact ending with "{}" found: ({}, {}) and ({}, {})'.format(
_MAVEN_ZIP_NAME, candidate_task_id, candidate_path, task_id, path
)
)

candidate_task_id = task_id
candidate_path = path

if not candidate_task_id:
raise ValueError('No upstream artifact ending with "{}" found. Given: {}'.format(
_MAVEN_ZIP_NAME, upstream_artifacts_per_task_id)
)

return candidate_task_id, candidate_path


def _get_maven_expected_files_in_archive(mapping_manifest):
files = mapping_manifest['mapping']['en-US'].keys()
return [
os.path.join(
_remove_first_directory_from_bucket(mapping_manifest['s3_bucket_path']),
file
) for file in files
]


def _remove_first_directory_from_bucket(s3_bucket_path):
# remove 'maven2' because it's not in the archive, but it exists on the maven server
return '/'.join(s3_bucket_path.split('/')[1:])
48 changes: 47 additions & 1 deletion beetmoverscript/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@
from scriptworker.exceptions import ScriptWorkerTaskException, ScriptWorkerRetryException
from scriptworker.utils import retry_async, raise_future_exceptions

from beetmoverscript import task, zip, maven_utils

from beetmoverscript.constants import (
MIME_MAP, RELEASE_BRANCHES, CACHE_CONTROL_MAXAGE, RELEASE_EXCLUDE,
NORMALIZED_BALROG_PLATFORMS, PARTNER_REPACK_PUBLIC_PREFIX_TMPL,
PARTNER_REPACK_PRIVATE_REGEXES, PARTNER_REPACK_PUBLIC_REGEXES, BUILDHUB_ARTIFACT,
INSTALLER_ARTIFACTS
INSTALLER_ARTIFACTS, DEFAULT_ZIP_MAX_FILE_SIZE_IN_MB
)
from beetmoverscript.task import (
validate_task_schema, add_balrog_manifest_to_artifacts,
Expand Down Expand Up @@ -154,6 +156,49 @@ async def push_to_releases(context):
copy_beets(context, candidates_keys_checksums, releases_keys_checksums)


async def push_to_maven(context):
"""Push artifacts to locations expected by maven clients (like mvn or gradle)"""
artifacts_to_beetmove = task.get_upstream_artifacts_with_zip_extract_param(context)
context.release_props = get_release_props(context)
context.checksums = dict() # Needed by downstream calls
context.raw_balrog_manifest = dict() # Needed by downstream calls

mapping_manifest = generate_beetmover_manifest(context)
validate_bucket_paths(context.bucket, mapping_manifest['s3_bucket_path'])

context.artifacts_to_beetmove = _extract_and_check_maven_artifacts_to_beetmove(
artifacts_to_beetmove,
mapping_manifest,
context.config.get('zip_max_file_size_in_mb', DEFAULT_ZIP_MAX_FILE_SIZE_IN_MB)
)

await move_beets(context, context.artifacts_to_beetmove, mapping_manifest)


def _extract_and_check_maven_artifacts_to_beetmove(artifacts, mapping_manifest, zip_max_file_size_in_mb):
expected_files = maven_utils.get_maven_expected_files_per_archive_per_task_id(
artifacts, mapping_manifest
)

extracted_paths_per_archive = zip.check_and_extract_zip_archives(
artifacts, expected_files, zip_max_file_size_in_mb
)

number_of_extracted_archives = len(extracted_paths_per_archive)
if number_of_extracted_archives == 0:
raise ScriptWorkerTaskException('No archive extracted')
elif number_of_extracted_archives > 1:
raise NotImplementedError('More than 1 archive extracted. Only 1 is supported at once')
extracted_paths_per_relative_path = list(extracted_paths_per_archive.values())[0]

return {
'en-US': {
os.path.basename(path_in_archive): full_path
for path_in_archive, full_path in extracted_paths_per_relative_path.items()
}
}


# copy_beets {{{1
def copy_beets(context, from_keys_checksums, to_keys_checksums):
creds = get_creds(context)
Expand Down Expand Up @@ -217,6 +262,7 @@ def list_bucket_objects(context, s3_resource, prefix):
# push to candidates is at this point identical to push_to_nightly
'push-to-candidates': push_to_nightly,
'push-to-releases': push_to_releases,
'push-to-maven': push_to_maven,
}


Expand Down
39 changes: 28 additions & 11 deletions beetmoverscript/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
RESTRICTED_BUCKET_PATHS,
CHECKSUMS_CUSTOM_FILE_NAMING
)
from scriptworker import artifacts as scriptworker_artifacts
from scriptworker.exceptions import ScriptWorkerTaskException

log = logging.getLogger(__name__)
Expand Down Expand Up @@ -109,29 +110,45 @@ def add_balrog_manifest_to_artifacts(context):
utils.write_json(abs_file_path, context.balrog_manifest)


def get_upstream_artifact(context, taskid, path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, nice cleanup, didn't know of scriptworker_artifacts.get_and_check_single_upstream_artifact_full_path 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realize I implemented it a while ago in mozilla-releng/scriptworker#95. I knew beetmover had its own logic, but I didn't cc you on it. Sorry about this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, absolutely no worries, this is super nice! 👍

abs_path = os.path.abspath(os.path.join(context.config['work_dir'], 'cot', taskid, path))
if not os.path.exists(abs_path):
raise ScriptWorkerTaskException(
"upstream artifact with path: {}, does not exist".format(abs_path)
)
return abs_path


def get_upstream_artifacts(context, preserve_full_paths=False):
artifacts = {}
for artifact_dict in context.task['payload']['upstreamArtifacts']:
locale = artifact_dict['locale']
artifacts[locale] = artifacts.get(locale, {})
for path in artifact_dict['paths']:
abs_path = get_upstream_artifact(context, artifact_dict['taskId'], path)
abs_path = scriptworker_artifacts.get_and_check_single_upstream_artifact_full_path(
context, artifact_dict['taskId'], path
)
if preserve_full_paths:
artifacts[locale][path] = abs_path
else:
artifacts[locale][os.path.basename(abs_path)] = abs_path
return artifacts


def get_upstream_artifacts_with_zip_extract_param(context):
# XXX A dict comprehension isn't used because upstream_definition would be erased if the same
# taskId is present twice in upstreamArtifacts
upstream_artifacts_per_task_id = {}

for artifact_definition in context.task['payload']['upstreamArtifacts']:
task_id = artifact_definition['taskId']
upstream_definitions = upstream_artifacts_per_task_id.get(task_id, [])

new_upstream_definition = {
'paths': [
scriptworker_artifacts.get_and_check_single_upstream_artifact_full_path(context, task_id, path)
for path in artifact_definition['paths']
],
'zip_extract': artifact_definition.get('zipExtract', False),
}

upstream_definitions.append(new_upstream_definition)
upstream_artifacts_per_task_id[task_id] = upstream_definitions

return upstream_artifacts_per_task_id


def get_release_props(context, platform_mapping=STAGE_PLATFORM_MAP):
"""determined via parsing the Nightly build job's payload and
expanded the properties with props beetmover knows about."""
Expand All @@ -151,7 +168,7 @@ def update_props(context, props, platform_mapping):
`stage_platform` as we need both in the beetmover template manifests."""
props = deepcopy(props)

stage_platform = props["platform"]
stage_platform = props.get('platform', '')
# for some products/platforms this mapping is not needed, hence the default
props["platform"] = platform_mapping.get(stage_platform, stage_platform)
props["stage_platform"] = stage_platform
Expand Down
62 changes: 62 additions & 0 deletions beetmoverscript/templates/maven_geckoview.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
metadata:
name: "Maven repository"
description: "Maps artifacts to spec'd maven location"
owner: "release@mozilla.com"

s3_bucket_path: maven2/org/mozilla/{{ artifact_id }}/{{ version }}/ # Maven groupId is org.mozilla

mapping:
{% for locale in ['en-US'] %}
"{{ locale }}": # Locale is not needed for geckoview, it's used by move_beets, though
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I feel you. However this is going to be washed away when declarative artifacts is done.

{% for product in ['geckoview'] %}
"{{ artifact_id }}-{{ version }}.aar":
s3_key: {{ artifact_id }}-{{ version }}.aar
destinations:
- {{ artifact_id }}-{{ version }}.aar
"{{ artifact_id }}-{{ version }}.aar.md5":
s3_key: {{ artifact_id }}-{{ version }}.aar.md5
destinations:
- {{ artifact_id }}-{{ version }}.aar.md5
"{{ artifact_id }}-{{ version }}.aar.sha1":
s3_key: {{ artifact_id }}-{{ version }}.aar.sha1
destinations:
- {{ artifact_id }}-{{ version }}.aar.sha1
"{{ artifact_id }}-{{ version }}.pom":
s3_key: {{ artifact_id }}-{{ version }}.pom
destinations:
- {{ artifact_id }}-{{ version }}.pom
"{{ artifact_id }}-{{ version }}.pom.md5":
s3_key: {{ artifact_id }}-{{ version }}.pom.md5
destinations:
- {{ artifact_id }}-{{ version }}.pom.md5
"{{ artifact_id }}-{{ version }}.pom.sha1":
s3_key: {{ artifact_id }}-{{ version }}.pom.sha1
destinations:
- {{ artifact_id }}-{{ version }}.pom.sha1
"{{ artifact_id }}-{{ version }}-javadoc.jar":
s3_key: {{ artifact_id }}-{{ version }}-javadoc.jar
destinations:
- {{ artifact_id }}-{{ version }}-javadoc.jar
"{{ artifact_id }}-{{ version }}-javadoc.jar.md5":
s3_key: {{ artifact_id }}-{{ version }}-javadoc.jar.md5
destinations:
- {{ artifact_id }}-{{ version }}-javadoc.jar.md5
"{{ artifact_id }}-{{ version }}-javadoc.jar.sha1":
s3_key: {{ artifact_id }}-{{ version }}-javadoc.jar.sha1
destinations:
- {{ artifact_id }}-{{ version }}-javadoc.jar.sha1
"{{ artifact_id }}-{{ version }}-sources.jar":
s3_key: {{ artifact_id }}-{{ version }}-sources.jar
destinations:
- {{ artifact_id }}-{{ version }}-sources.jar
"{{ artifact_id }}-{{ version }}-sources.jar.md5":
s3_key: {{ artifact_id }}-{{ version }}-sources.jar.md5
destinations:
- {{ artifact_id }}-{{ version }}-sources.jar.md5
"{{ artifact_id }}-{{ version }}-sources.jar.sha1":
s3_key: {{ artifact_id }}-{{ version }}-sources.jar.sha1
destinations:
- {{ artifact_id }}-{{ version }}-sources.jar.sha1
{% endfor %}
{% endfor %}