Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Cloudflare CDN? #1317

Closed
simon04 opened this issue Nov 14, 2020 · 18 comments
Closed

Migrate to Cloudflare CDN? #1317

simon04 opened this issue Nov 14, 2020 · 18 comments

Comments

@simon04
Copy link
Contributor

simon04 commented Nov 14, 2020

As discussed with @ojeytonwilliams, @raisedadead, @j-f1, @jmerle on 2020-11-12, we could/should investigate whether a migration from MaxCDN to Cloudflare CDN is feasible.

https://www.cloudflare.com/en-gb/cdn/

@simon04
Copy link
Contributor Author

simon04 commented Nov 21, 2020

Here's my understanding of the devdocs.io infrastructure and the processes involved:


When performing the navigation https://devdocs.io/https://devdocs.io/javascript/https://devdocs.io/javascript-date/https://devdocs.io/javascript/global_objects/date, the following resources are downloaded:

  1. https://devdocs.io/
  2. https://cdn.devdocs.io/assets/application-53506375bf6ce610b1c865f386df93f404e1ea2fa6ecb4bf710d85567471babf.css
  3. https://devdocs.io/assets/application-c9b94069896f3067113b3b77013bb6a2bbba61dc8ffd2a3af9bbbd8aa7f418a6.js
  4. https://cdn.devdocs.io/assets/docs-a4c7830caa1095b449b4095954dc8e323b224a582b1306ab3323be3ba2121fc0.js
  5. https://devdocs.io/service-worker.js
  6. https://devdocs.io/manifest.json
  7. https://cdn.devdocs.io/assets/sprites/docs-a2bbfe361374a7245942cd449a8970197047bbc1be711f31d5ef22473398e92d.png
  8. https://img.shields.io/github/stars/freeCodeCamp/devdocs.svg?style=social
  9. https://devdocs.io/
  10. https://devdocs.io/favicon.ico
  11. https://devdocs.io/manifest.json
  12. https://devdocs.io/assets/application-c9b94069896f3067113b3b77013bb6a2bbba61dc8ffd2a3af9bbbd8aa7f418a6.js
  13. https://cdn.devdocs.io/assets/application-53506375bf6ce610b1c865f386df93f404e1ea2fa6ecb4bf710d85567471babf.css
  14. https://cdn.devdocs.io/assets/sprites/docs-a2bbfe361374a7245942cd449a8970197047bbc1be711f31d5ef22473398e92d.png
  15. https://cdn.devdocs.io/assets/sprites/docs@2x-dbe637a327d0509c75bbc2cc5e21151e4b4f2ea065143d66286b8a759271ba9e.png
  16. https://cdn.devdocs.io/assets/docs-a4c7830caa1095b449b4095954dc8e323b224a582b1306ab3323be3ba2121fc0.js
  17. https://devdocs.io/docs/css/index.json?1605370323
  18. https://devdocs.io/docs/dom/index.json?1543157862
  19. https://devdocs.io/docs/dom_events/index.json?1543099589
  20. https://devdocs.io/docs/html/index.json?1605379887
  21. https://devdocs.io/docs/http/index.json?1605738379
  22. https://devdocs.io/docs/javascript/index.json?1605367875
  23. https://devdocs.io/images/webapp-icon-192.png
  24. https://docs.devdocs.io/javascript/index.html?1605367875
  25. https://cdn.devdocs.io/assets/sprites/docs-a2bbfe361374a7245942cd449a8970197047bbc1be711f31d5ef22473398e92d.png
  26. https://cdn.devdocs.io/favicon.ico
  27. https://docs.devdocs.io/javascript/global_objects/date.html?1605367875

Since both devdocs.io and cdn.devdocs.io are served via a CDN, we could get rid of cdn.devdocs.io without any infrastructural change? https://cdn.devdocs.io/assets/docs-a4c7830caa1095b449b4095954dc8e323b224a582b1306ab3323be3ba2121fc0.jshttps://devdocs.io/assets/docs-a4c7830caa1095b449b4095954dc8e323b224a582b1306ab3323be3ba2121fc0.js
The latter URL is transferred using brotli compression, thus resulting in fewer bytes (JavaScript size 93.90 KB, transferred gzip 26.01 KB → transferred brotli 19.29 KB).

The actual docs (currently served via docs.devdocs.io), are not available from devdocs.io?

@Mrugesh @raisedadead, @ojeytonwilliams, @jmerle, please correct me, if I'm wrong.

@Thibaut
Copy link
Member

Thibaut commented Nov 22, 2020

MaxCDN is discontinuing their sponsorship program, which since 2013 has provided DevDocs with free CDN. freeCodeCamp already uses Cloudflare extensively, so we should look to migrate soon.

cdn.devdocs.io fronts devdocs.io, with a number of rules so that the app doesn't get exposed in a broken state through that domain, and only asset-related requests go through. Here they are:

Screen Shot 2020-11-22 at 15 42 53

Rule 104173 adds the Access-Control-Allow-Origin "*" header to the /manifest.json path. Without it the app would break as it would fail to load the manifest with XHR due to the same-origin policy.

docs.devdocs.io fronts devdocs-assets.s3.amazonaws.com. This is an S3 bucket already operated by freeCodeCamp. There is also a rule to enable CORS in such a way that the browser doesn't constantly make OPTIONS requests:

Screen Shot 2020-11-22 at 15 48 44

Lastly, the doc packages downloaded by thor docs:download are stored in a storage zone in MaxCDN similar to S3, exposed at dl.devdocs.io. The main reason for using MaxCDN was free bandwidth (S3 can be pricey). We use on the order of 100-200GB per month (and 1.5GB of storage) This has no special rules.

I recommend migrating by setting up new subdomains on devdocs.io for each of these three things (e.g. assets.devdocs.io as a replacement for cdn.devdocs.io), updating the app to use them one by one. This way we can easily revert.

@ojeytonwilliams
Copy link
Contributor

Thanks @Thibaut and @simon04, that info helps a ton. I'll get to work figuring out how to translate this into something Cloudflare understands.

@simon04
Copy link
Contributor Author

simon04 commented Nov 25, 2020

@Thibaut, thanks for the insights in the current infrastructure setup. A few questions:

  • Where are the rules configured which are shown in your screenshots?
  • "with a number of rules so that the app doesn't get exposed in a broken state" → Those rules aren't shown in the screenshot?

For a migration away from MaxCDN we would need to…

  1. …get rid of cdn.devdocs.io, possibly simply by changing all links from cdn.devdoc.io to devdocs.io?
  2. …get rid of the 1.5 GB MaxCDN storage, possibly simply by moving the files to a static file hosting platform and putting CloudFlare in front?

Here's my updated understanding of the infrastructure:
Screenshot_2020-11-25 Mermaid live editor

graph TB
devdocs_user(user on devdocs.io)
thor_up(thor docs:upload)
Travis(Travis CI)
thor_down(thor docs:download)
subgraph domains
  devdocs.io
  cdn.devdocs.io
  devdocs.devdocs.netdna-cdn.com
  dl.devdocs.io
  docs.devdocs.io
  devdocs-assets.s3.amazonaws.com
  devdocs-dl.devdocs.netdna-cdn.com
end
subgraph servers
  Heroku
  S3[(S3)]
  MaxCDN[(MaxCDN)]
end
devdocs_user --> docs.devdocs.io --"+CORS headers"--> devdocs-assets.s3.amazonaws.com --> S3
devdocs_user --> cdn.devdocs.io --> devdocs.devdocs.netdna-cdn.com --"only assets"--> devdocs.io 
devdocs_user --> devdocs.io --"CloudFlare"--> Heroku
dl.devdocs.io --> devdocs-dl.devdocs.netdna-cdn.com --> MaxCDN
Travis -.-> thor_down --> dl.devdocs.io
Travis -.-> Heroku
thor_up -.-> S3
thor_up -.-> MaxCDN

@Thibaut
Copy link
Member

Thibaut commented Dec 6, 2020

Where are the rules configured which are shown in your screenshots?

In the MaxCDN control panel. The freeCodeCamp team should have access.

Those rules aren't shown in the screenshot?

All those rules are in the screenshots. I meant that for example if we didn't re-implement the rule that adds the Access-Control-Allow-Origin "*" response header on manifest.json, the app would fail to initialize.

…get rid of cdn.devdocs.io, possibly simply by changing all links from cdn.devdoc.io to devdocs.io?

I would implement an alternative CDN subdomain. devdocs.io points to Heroku. As much as possible we should shield Heroku from traffic, so freeCodeCamp incurs minimal hosting costs (CDN traffic is much cheaper than Heroku). It's also much faster going through a CDN than Heroku.

…get rid of the 1.5 GB MaxCDN storage, possibly simply by moving the files to a static file hosting platform and putting CloudFlare in front?

Yep that'd work; S3 for example. Although we wouldn't even need a CDN in front unless that happens to reduce costs. This bit only powers the thor docs:download command, which doesn't need maximum speed. I used MaxCDN's file storage at the time instead of S3 because the former was free.

Here's my updated understanding of the infrastructure:

Looks accurate.

@ojeytonwilliams
Copy link
Contributor

Thanks again for looking into this @simon04 and @Thibaut for confirming.

My current plan is just S3 to store everything with Cloudflare in front and some extra caching (if necessary) to minimise the transfers from S3. That and some proxies to make sure all the rules get applied.

This bit only powers the thor docs:download command, which doesn't need maximum speed

Am I right in thinking this is only used in local development? Or is there another use?

@simon04
Copy link
Contributor Author

simon04 commented Dec 7, 2020

Am I right in thinking this is only used in local development? Or is there another use?

It's also used in Travis CI for deploying to Heroku as sanity check. See https://travis-ci.com/github/freeCodeCamp/devdocs/builds/201642794#L1107 for an example where the check (and thus the deployment) failed because I forgot to thor docs:upload before merging a PR.

@ojeytonwilliams
Copy link
Contributor

Ah, good to know. I searched for docs:download, but wasn't sure if I'd missed something, so thanks!

It sounds like it would be better to put this behind the CDN, given that it's every doc and happens on every merge to master.

@simon04
Copy link
Contributor Author

simon04 commented Dec 23, 2020

@ojeytonwilliams, out of curiosity: How're you doing in the CDN migration process? Do you need any help?

@ojeytonwilliams
Copy link
Contributor

It's going reasonably well, I just had to focus on a few other things. Those are more or less under control now, so I should be able to dedicate myself to this again.

Thanks for the offer of help. I need to change a few bits of the code (various hard-coded links to the existing infrastructure, mainly), so I'll definitely reach out if I get horribly confused. Also, how's your Nginx knowledge - could I run things by you if it's being weird?

@simon04
Copy link
Contributor Author

simon04 commented Dec 23, 2020

I do operate a few Nginx servers with a bunch of more or less complex directives proxy_redirect, add_header, redirects via return

@ojeytonwilliams
Copy link
Contributor

I do operate a few Nginx servers with a bunch of more or less complex directives proxy_redirect, add_header, redirects via return

Awesome. Well, I've not run into huge problems yet, but I'll reach out if I get stuck. Either way, a quick review of the configs before we put them into use would be great.

@ojeytonwilliams
Copy link
Contributor

Hey, sorry for the long delay. To be transparent, the fCC team has been making a big push to get freecodecamp.org ready for translation and I've been heavily involved with that. My part in that is largely over, though, so now I can rededicate myself to this.

I've set up a staging site up at devdocs.in which replaces the cdn.devdocs links with just devdocs. The code change for which is on my fork, here. I've tested to make sure that a change in the code propagates through to the site, which it does - the popup asking for a reload appears as expected. I've also verified that it's hitting CF's cache, so it's looking good.

Part of the delay was that I tried to maintain the existing structure whereby the assets were proxied to assets.devdocs, but I could not find a way to do that. Everything I tried resulted in Nginx either redirecting from assets.devdocs to devdocs or, if I tried to make it transparent, it would get into an infinite redirect loop. @simon04 I don't think we need to do this any more, but do you happen to know if it's even possible?

Finally, the proxying to docs.devdocs and dl.devdocs isn't 100% done, but I've been making good progress and that's my next task.

@simon04
Copy link
Contributor Author

simon04 commented Jan 29, 2021

Wow, this sounds very promising. As expected, I couldn't find any obvious problems when clicking around on https://devdocs.in/.

The docs are fetched from documents.devdocs.in now. Is this the replacement for docs.devdocs.io, but being served via Cloudflare?

Your question regarding nginx redirects does not ring a bell. I'd probably have to call to mind the infrastructure from #1317 (comment) again. Since you've already resolved the issue, this is no longer required...

@ojeytonwilliams
Copy link
Contributor

This should be resolved now. Thanks for bearing with me while I figured out how to do this.

Since it's live, I'm going to close this. If anyone notices any problems because of the migration, we can track them in new issues.

@simon04
Copy link
Contributor Author

simon04 commented Feb 18, 2021

Awesome, @ojeytonwilliams! Thank you for finishing this monster task. 🚀

@ojeytonwilliams
Copy link
Contributor

You're very welcome, Simon, and thanks for putting together the infrastructure charts. They really helped me understand how things came together.

It was a fun challenge and I'm glad that it seems to have been a smooth transition.

@Thibaut
Copy link
Member

Thibaut commented Feb 19, 2021

Impressive! I can confirm that traffic on MaxCDN has completely stopped. This is a big step towards making DevDocs a sustainable long-term project. Thank you so much!

rchl added a commit to rchl/alfred-devdocs that referenced this issue Mar 13, 2021
The workflow was no longer working due to upstream CDN changes.
See freeCodeCamp/devdocs#1317
yannickglt pushed a commit to yannickglt/alfred-devdocs that referenced this issue Mar 17, 2021
* Make it work after upstream changes

The workflow was no longer working due to upstream CDN changes.
See freeCodeCamp/devdocs#1317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants