Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2019Q4 - 3.3 Create image upload service (for eventual cloud migration) #108

Open
miketaylr opened this issue Oct 1, 2019 · 11 comments
Assignees
Labels

Comments

@miketaylr
Copy link
Collaborator

@miketaylr miketaylr commented Oct 1, 2019

☁️

@miketaylr

This comment has been minimized.

Copy link
Collaborator Author

@miketaylr miketaylr commented Oct 1, 2019

stretch goal: Remove Duplicate and/or unused images on the server. Aka images which are not called from anywhere. Save space, and potentially remove images which were not necessarily meant to be there.

@miketaylr miketaylr added this to Planned in 2019 Q4 OKRs Oct 1, 2019
@karlcow

This comment has been minimized.

Copy link

@karlcow karlcow commented Oct 1, 2019

24-nuage-bleu

@karlcow

This comment has been minimized.

Copy link

@karlcow karlcow commented Oct 1, 2019

  • To take into account that we get sometimes requests to delete things, and/or we just need to delete things.
  • To take into account about the legality of hosting something we didn't decide to put up there.
@karlcow karlcow self-assigned this Oct 1, 2019
@karlcow karlcow moved this from Planned to In progress in 2019 Q4 OKRs Oct 3, 2019
@karlcow

This comment has been minimized.

Copy link

@karlcow karlcow commented Oct 15, 2019

I have started to try to have a local backup of all issues posts and their comments. But the current code I was using is being impaired by rate limits. So I need to explore ways of improving it.

@karlcow

This comment has been minimized.

Copy link

@karlcow karlcow commented Oct 16, 2019

We can probably do this in a couple of phases.

  • phase A: We could first move the new images to a host solution.
  • phase B: Remove orphan images
  • phase C: Move images already on webcompat servers to the cloud hosting solution.
  • (option) phase D: Have a checker script which erase orphaned images once a month.

(Not in the scope, images hosted on GitHub)

@karlcow

This comment has been minimized.

Copy link

@karlcow karlcow commented Oct 16, 2019

static assets

This OKR as currently phrased does not include uploading the static assets to an external hosting solutions.

Flask S3 is an unmaintained package which does that automatically. Maybe there is something to learn from the code.

https://github.com/e-dard/flask-s3

@karlcow

This comment has been minimized.

Copy link

@karlcow karlcow commented Oct 29, 2019

2019-10-29

Progress is being tracked. Read the comments.
I need to talk money with @miketaylr

@karlcow

This comment has been minimized.

Copy link

@karlcow karlcow commented Nov 8, 2019

2019-11-08

There might be an intermediate stage which gives us more flexibility.
Instead of migrating all the images to a cloud solution right away we can separate webcompat.com into two services. Currently the images are saved on the server, but we can perfectly imagine to HTTP PUT them on the server, by creating a separate instance (a microservice for data).

We can make sure that only webcompat.com is allowed to create put on this service. The service willl save at the exact same place, the images. We can also store other data types (like console.log json files).

If/when we switch to a cloud hosting solution, it would be just a matter of changing the destination for the server.

@karlcow

This comment has been minimized.

Copy link

@karlcow karlcow commented Nov 20, 2019

HUH. The last part of this blog post.
http://www.otsukare.info/2019/11/20/saving-images-microservices

GitHub is making a local copy of every single linked images AND link to that once it has done the copy.

@karlcow

This comment has been minimized.

Copy link

@karlcow karlcow commented Nov 26, 2019

In the process of creating a prototype to upload the images. There will be progress this week. Maybe at Risk given there are 3 weeks left and our needsdiagnosis went up. But let's hope.

@miketaylr miketaylr changed the title 2019Q4 - 3.3 Migrate webcompat image hosting to cloud 2019Q4 - 3.3 Create image upload service (for eventual cloud migration) Nov 26, 2019
@karlcow

This comment has been minimized.

Copy link

@karlcow karlcow commented Nov 28, 2019

This is an assessment on my failures to understand the problem space. The more I dig into coding and the more I reveal issues. I don't think this will be solved in 2019Q4. Many moving pieces. I probably haven forgotten more issues in there.

Option 1: Requirements for AWS S3

webcompat.com Code

Create AWS account for S3

We need to pay something at a point. How do we handle the testing/development phase.

Maybe we can use https://github.com/jserver/mock-s3 This is not maintained, python 2 and totally outdated.
This seems to be a better choice https://github.com/spulec/moto

Testing

A couple of things need to be modified for testing. Currently we are mocking for the File Storage function.
https://github.com/webcompat/webcompat.com/blob/c06feb1a08075e3d485634ebf9e22f540497b1c5/tests/unit/test_uploads.py#L50-L52

we need to adds tests and conditions on error conditions returned by AWS S3 and how that impacts our own processing.

This has probably consequences on the UI. What is happening when the user push the button to report an issue, and the screenshots fails to upload on AWS S3 because of the tubes (Amazon down, network error, etc.)

Modify the upload api code

This means basically updating the try: part and create an HTTP POST to AWS S3 instead of the save function.
It also means that we needs to massage the return information from AWS S3 to make it a JSON object understandable by webcompat.com

Assess which images are really necessary

There are probably images which are not required to be uploaded. Issues or comments have been deleted. But the images are still there on webcompat.com. We need to go through the 40,000+ issues and associated comments to evaluate this.

🚨 Link to previous images

  • Either rewriting all the bodies of comments and issues to change the new URL with bucket style information.
  • OR to create a big table in nginx with all the URLs old and new.

Option 2: Requirements for a microservice

create a new subdomain assets.webcompat.com

done at digital ocean console level?

configure nginx

Needs to create a new specific file for nginx

configure certbot

Done in nginx, but needs the cron job?

configure uwsgi

new uwsgi file with logging to configure.

deploy script to staging server

This would require probably a modification of the current script with targetting a different repo. hmmm more overhead. And more maintenance.

assets.webcompat.com code

code the new micro-service

can be done in Flask or Bottle, the code is relatively easy to move in principles, but the evil is in the details.

issues with CSP and redirection from webcompat.com to assets.webcompat.com

To minimize rewriting we could imagine that

https://webcompat.com/api/upload

route is being redirected to

https://assets.webcompat.com/upload

There are probably CSP issues with that.

webcompat.com code

remove relevant tests from webcompat.com code

The modifications will break tests here and there. This can be managed.
And the tests need to be moved in the new app.

url redirection for previously uploaded images.

To avoid to have to rewrite in all GitHub bodies, the path from

https://webcompat.com/uploads/2019/11/foo.jpeg
https://webcompat.com/uploads/2019/11/foo-thumb.jpeg

to

https://assets.webcompat.com/uploads/2019/11/foo.jpeg
https://assets.webcompat.com/uploads/2019/11/foo-thumb.jpeg

rewrite routes and JS code doing the HTTP POST or treat that with a redirection?

Currently the XHR upload is done when posting the full issue.
We can modify the JS script to point to the new URL
or we can redirect, but redirection might create CSP issues 🚨.

Maybe Modifying JS is easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
2019 Q4 OKRs
  
In progress
3 participants
You can’t perform that action at this time.