Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure storage #118

Merged
merged 23 commits into from
Apr 12, 2021
Merged

Azure storage #118

merged 23 commits into from
Apr 12, 2021

Conversation

achamayou
Copy link
Collaborator

@achamayou achamayou commented Apr 5, 2021

Spawned off of #103, the idea is to:

  1. stop using a custom branch in the repo for storage
  2. curb growth by storing runs for the same PR/monitoring issue in place

I was initially worried about the implications of 2. with concurrent jobs, but it seems to me that with batching/autoCancel it is in fact a non issue. If this turns out not to be the case, I suppose we can look at using ETags. I've added this as a side-option for now, but my intention is for this to replace the branch-based storage altogether.

Assuming we're happy with this, the plan is to:

  1. release, enable in CCF
  2. remove the repo branch code path, which I don't think we seriously want to continue using

I don't think it's worth spending time migrating old PRs, but we could keep the branch around for say, six months, before dropping it. Also pinging @eddyashton for review/feedback.

@cimetrics
Copy link
Collaborator

cimetrics commented Apr 5, 2021

azure_storage@703 aka 20210412.3 vs main ewma over 13 builds from 637 to 687

Click to see table
build_id build_number Throughput (tx/s) ^ Latency (ms) Peak working set size (GB) Accuracy (%) ^ Error rate (%) Memory fragmentation (%) CPU usage (%) New metric (U)
637 20210205.1 48261 10 46 0.68 0.05 0.02 0.28 189
638 20210205.2 50232 11 51 0.69 0.04 0.019 0.32 101
646 20210205.10 47148 10 55 0.72 0.05 0.02 0.31 129
647 20210205.11 49154 10 49 0.68 0.05 0.022 0.32 197
652 20210205.16 47145 9 52 0.69 0.06 0.02 0.3 100
653 20210205.17 50877 10 55 0.7 0.04 0.022 0.28 100
655 20210326.1 46562 10 51 0.69 0.06 0.021 0.31 155
658 20210326.4 50718 9 53 0.68 0.05 0.021 0.29 128
660 20210326.6 46897 9 48 0.73 0.05 0.022 0.28 133
664 20210401.3 49393 11 53 0.73 0.05 0.019 0.29 188
669 20210401.8 51450 9 54 0.72 0.06 0.022 0.29 191
686 20210402.15 47190 9 45 0.69 0.04 0.02 0.29 190
687 20210402.16 51510 9 50 0.71 0.06 0.021 0.28 192

images

@achamayou achamayou marked this pull request as draft April 5, 2021 17:34
@@ -51,6 +51,8 @@ steps:
- script: python -m cimetrics.github_pr
env:
GITHUB_TOKEN: $(GITHUB_TOKEN)
AZURE_BLOB_URL: $(AZURE_BLOB_URL)
AZURE_WEB_URL: $(AZURE_WEB_URL)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be really nice to derive the second one from the first one, but I haven't worked out how to do it.

@achamayou achamayou marked this pull request as ready for review April 6, 2021 19:31
@letmaik
Copy link
Collaborator

letmaik commented Apr 6, 2021

Re curbing growth, there's an auto-delete feature for blobs. It has to be setup with rules etc., not when creating the blob, so it's not something to implement in cimetrics, but should probably be hinted at in the readme.

@achamayou
Copy link
Collaborator Author

@letmaik yeah I saw that, but I think at one png per-PR (as opposed to one per push now) we may not need to worry about cleanup all that soon.

@letmaik
Copy link
Collaborator

letmaik commented Apr 6, 2021

Note to self: discuss http caching when overwriting images

@letmaik
Copy link
Collaborator

letmaik commented Apr 7, 2021

The GitHub-proxied cimetrics image in this PR has cache-control: public, max-age=31536000 and when editing the comment the url doesn't change (same camo.githubusercontent.com url). The original blob image has no cache-control header.

According to https://docs.github.com/en/github/authenticating-to-github/about-anonymized-image-urls#an-image-that-changed-recently-is-not-updating it seems like if the origin server doesn't set cache-control then GitHub adds its own header which is essentially infinity.

I can think of two options:

  1. Set cache_control in content_settings when creating the blob, maybe to something like 1 minute or even no-cache.
  2. Don't reuse URLs and come up with a scheme that allows you to easily garbage collect old versions. This would probably mean listing all blobs named .../<pr_id>_.png and deleting all of them, before uploading the new one.

Without a doubt, option 1 is much easier, though it has a slight latency/bandwidth hit on follow-up requests. It's probably not too bad though and we could experiment with it in this PR.

@achamayou
Copy link
Collaborator Author

@letmaik I'm not sure it will work on this PR because: https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.models.ContentSettings?view=azure-python-previous

cache_control
str
If the cache_control has previously been set for the blob, that value is stored.

This implies that it may not be stored if the value wasn't set to start with. Anyway we can change the prefix if this really is the case.

@letmaik
Copy link
Collaborator

letmaik commented Apr 7, 2021

@letmaik I'm not sure it will work on this PR because: https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.models.ContentSettings?view=azure-python-previous

cache_control
str
If the cache_control has previously been set for the blob, that value is stored.

This implies that it may not be stored if the value wasn't set to start with. Anyway we can change the prefix if this really is the case.

I think it just means that it inherits an earlier value but you can also set a new one.

@achamayou
Copy link
Collaborator Author

@letmaik it seems to work either way.

@letmaik
Copy link
Collaborator

letmaik commented Apr 7, 2021

Looks good.

@achamayou achamayou merged commit 5352eac into jumaffre:main Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants