Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monorepo #8

Merged
merged 32 commits into from
Mar 26, 2022
Merged

Monorepo #8

merged 32 commits into from
Mar 26, 2022

Conversation

berekuk
Copy link
Collaborator

@berekuk berekuk commented Mar 21, 2022

PR for #4.

Due to the nature of this PR it wasn't really possible to come up with a clear diff, sorry.

I tried to keep the changes to the minimum whenever possible, but sometimes it wasn't practical, so I opted for documenting everything I changed instead.

Code layout:

  • Code is in src/ to avoid too many root-level dirs
  • Some files are necessary in root, e.g. tailwind & postcss configs, but the number is minimal, so it shouldn't be a problem (I have some previous experience with a similar setup)
  • Backend in src/backend/, frontend in src/web/; I'd like to refactor this further later

JS/TS setup:

  • I settled on npm instead of yarn (both were used), there are some potential issues with using both, and I think npm is good enough these days unless proven otherwise
  • I added some basic and very forgivable typescript configuration; mostly due to "type":"module" issues in package.json which made the frontend & backend code not very compatible in the monorepo setup
    • This could be fixed in other ways but ESM mess is so annoying that I didn't want to deal with it twice (not my area of expertise, tbh)
    • I converted some other files to ts/tsx, mostly to help with minor refactorings (I'll convert the rest in a separate PR later)
  • Backend CLI can now be invoked with ts-node src/backend/index.js, or with npm run cli which wraps the same command

Basic coding style:

  • I prettier-ized everything (because I edited some code and my vscode kept reformatting it and because this PR is a mess anyway)

Configuration:

  • As far as I could figure out, some conf were in .env and some were in input/secrets.json; I believe .env is enough and I'm moving in that direction, but I haven't reviewed all the code that uses getSecret yet
  • Next.js recommends using .env (and other files like .env.production) for env and secrets too
  • For Netlify we can set up all secrets in its web admin (I'll check this later, see "Stuff that's not ready yet" below)
  • There's an issue with CLI scripts (entry point is now in src/backend/index.js) that they don't pull .env vars by default; I set up dotenv for this and imported it in src/backend/index.js and src/backend/manual/manualDownload.js but there are probably other one-time CLI scripts which need it too

New:

  • I added POSTGRES_NO_SSL=1 option for local postgres (it's used in src/backend/database/pg-wrapper.js)

API:

  • I reintegrated API code to src/pages/api, as we discussed
  • Most methods are intact, some are renamed since you said API is not stable yet & it's not a big deal; I consider it all temporary anyway since we'll move in GraphQL direction soon
  • I removed /api/forecasts-by-id and getDashboardForecasts, I believe it was superseded by /api/dashboard-by-id (correct me if got this wrong, please)
  • I dropped duplicate code from api (pg-wrapper, mongo-wrapper), these were somewhat different from their backend versions.
    • Might be important! I haven't read the code carefully but assumed that backend version is correct/good enough, please let me know if I'm wrong here.
  • I haven't enabled CORS to API endpoints. I can add it if necessary, but api in this branch on /api on the same domain, so it works anyway.

New frontpage setup:

  • So, Netlify is serverless, and json files won't work there; so I had to create a new DB table latest.frontpage which just stores frontpage_full and frontpage_sliced columns
  • Also, Netlify doesn't have cron; /api/cron/update-frontpage is now responsible for populating the DB, and it can be called from github actions (see: https://vercel.com/docs/concepts/solutions/cron-jobs; this doc is from Vercel but should be identical on Netlify)
  • Github actions are not set up properly yet, but otherwise it works fine
    • TODO: add auth token for /api/cron/update-frontpage so that we don't get DDOSed (will do this before merge if we decide to go with this approach)
  • BTW, I'm not even sure if all this is needed; if I guessed correctly that frontpage_sliced.json is just a performance optimization, then similar effect can be achieved with getStaticProps and incremental static regeneration
    • But there was a log of all frontpage changes in a file so I assumed you considered it useful for something, so I repeated it as is - the new latest.frontpage table is append-only and I don't remove old data

Cleanups:

  • I removed lib/data dir from the frontend (it wasn't used anywhere, please let me know if I shouldn't have done this)

Minor fixes:

  • Initialized documentContents to [] in mongoRead (it failed on an empty DB otherwise)
  • Added a call to pgInitializeScaffolding in pg-wrapper
  • Sometimes in the process of converting files from js to ts I stumbled on bugs such as passing useless className props to custom React components; I just fixed those as I went

Stuff that's not ready yet:

  1. Env configuration; I'm in the process of setting up a separate netlify deployment to check that everything works properly, and will finish getSecret refactoring in further commits.
  2. New README; old README files are stale and I'll have to rewrite it significantly before we merge this.
  3. Cron setup; I have a plan on how netlify + DO database + github actions for cron would be enough for everything, it might be too radical for you, though (no CLI! no local json files!), not sure yet, we can settle on other setups if you don't like this one :) will describe it in more details soon.

@berekuk berekuk mentioned this pull request Mar 21, 2022
@NunoSempere
Copy link
Collaborator

NunoSempere commented Mar 21, 2022

This seems good!

Some comments here:

I dropped duplicate code from api (pg-wrapper, mongo-wrapper), these were somewhat different from their backend versions.

  • Might be important! I haven't read the code carefully but assumed that backend version is correct/good enough, please let me know if I'm wrong here.

Yeah, I think these were a bit different; the frontend version had a bit more error checking to ensure that the webpage doesn't break down completely in case of a database error. Should be relatively easy to redo.

I can imagine CORS not being disabled being a problem if someone wants to query metaforecast endpoints from a different domain.

The frontpage pages were partly for optimization reasons, but mainly for debugging the crontab which wasn't working. Seems fine to delete them.

lib/data was not used, iirc

Re:cron, You are right that I would have used something less radical, but I'm ok with you using Github actions and other black magic beyond my comfort zone :)

Also, I think there are some conflicts on metaculus-fetch function; they were throttling the requests. This shouldn't be too hard to fix.

@NunoSempere
Copy link
Collaborator

PS: If I merge this into master, the heroku repo in charge of the repo will break. This is not that much of a problem, since I can continue updating the pg database using my local branch for a few days.

@berekuk
Copy link
Collaborator Author

berekuk commented Mar 22, 2022

Yeah, I think these were a bit different; the frontend version had a bit more error checking to ensure that the webpage doesn't break down completely in case of a database error. Should be relatively easy to redo.

Ok, I'll re-read that code more carefully then.
(Though I don't like the current approach to error checking, but that's a different topic which we can discuss later)

The frontpage pages were partly for optimization reasons, but mainly for debugging the crontab which wasn't working. > Seems fine to delete them.

Cool, I'll look into whether I can rewrite it without a caching table (which seems like an overkill, tbh).

Re:cron, You are right that I would have used something less radical, but I'm ok with you using Github actions and other black magic beyond my comfort zone :)

The key question here is whether we want CLI and files for one-shot tasks.

I think both approaches (Netlify & manually managed VPS) are viable, but there are tradeoffs, I'll write them down tomorrow.

Also, I think there are some conflicts on metaculus-fetch function; they were throttling the requests. This shouldn't be too hard to fix.

Oops, I tried to rebase on fresh master but messed up, will fix.

PS: If I merge this into master, the heroku repo in charge of the repo will break. This is not that much of a problem, since I can continue updating the pg database using my local branch for a few days.

Yeah, don't merge it yet! I plan to push further changes to this branch for one or two days until it's ready.

@berekuk
Copy link
Collaborator Author

berekuk commented Mar 22, 2022

Yeah, I think these were a bit different; the frontend version had a bit more error checking to ensure that the webpage doesn't break down completely in case of a database error. Should be relatively easy to redo.

Hmm. The only bit I managed to find is extra checks in pgGetByIds, but in case of an error (empty ids list) it returns a nonsensical string which probably triggers an exception down the road anyway, so that's inconsequential.

(Btw, I'm pretty sure pgGetByIds's code is vulnerable to SQL injections...)

@berekuk
Copy link
Collaborator Author

berekuk commented Mar 25, 2022

Ok, here goes, another set of commits.

Env configuration; I'm in the process of setting up a separate netlify deployment to check that everything works properly, and will finish getSecret refactoring in further commits.

I removed getSecret code entirely and documented env configuration in https://github.com/berekuk/metaforecast-backend/blob/monorepo/docs/configuration.md.

If you had the local secrets.json file, you'll have to replace it with .env file.

We'll need to make sure all of env variables are set on Heroku & Netlify (the set of required variables is technically not identical, but I think it'll be easier to just push the same set to both services). I think most of vars are already set on Heroku, and none are set on Netlify. Also, NEXT_PUBLIC_SITE_URL is new and should be set on Netlify too.

New README; old README files are stale and I'll have to rewrite it significantly before we merge this.

Yep, https://github.com/berekuk/metaforecast-backend/blob/monorepo/README.md

Cron setup

As we discussed in #5, we don't actually want to go with Github Actions and Netlify background functions, so we don't really need to change anything.


Minor improvements and changes:

  • npm run dbshell now connects to postgres
  • added netlify-cli to package.json (check out netlify env for easier configuration)
  • cleaned up almost all of data/ and input/ files
  • cleanup up some old commented code

About the new frontpage code.

My first set of commits moved frontpage data from json files to latest.frontpage; today I rewrote all of that to direct SQL just-in-time queries (see https://github.com/berekuk/metaforecast-backend/blob/monorepo/src/backend/frontpage.ts).

To avoid the performance degradation of the front page (ORDER BY RANDOM() is kinda slow), I spent a few hours refactoring the commonDisplay and a few related files.

The changes are significant; @NunoSempere, you might want to do a code review just to get acquainted with the new version and not be surprised in the future.

The main change is that / and /capture now use getStaticProps with revalidate option; this speeds up the initial page load significantly:

azrael@azrael:~$ time curl -s 'https://shimmering-bavarois-62a68f.netlify.app/' >/dev/null

real	0m0.029s
user	0m0.020s
sys	0m0.003s
azrael@azrael:~$ time curl -s 'https://metaforecast.org' >/dev/null

real	0m2.030s
user	0m0.015s
sys	0m0.011s

(https://shimmering-bavarois-62a68f.netlify.app/ is my test instance on netlify; it works on top of incomplete database, but it shouldn't matter)

This means that all search requests except for static frontpage data are performed on client side.

I also did a few cleanups:


Final thoughts on merging this branch.

Here's what has to be done for proper deployment:

  • configure env vars on Heroku and Netlify
  • update heroku scheduler script path (and we'll have to either build typescript code to /out dir with tsc, or install ts-node in production)
  • merge & check

It might be easier/safer if I did the final deployment, I believe I have all necessary credentials for this, let me know if you're ok with this.

@NunoSempere
Copy link
Collaborator

NunoSempere commented Mar 25, 2022

The main change is that / and /capture now use getStaticProps with revalidate option; this speeds up the initial page load significantly

Not sure I like these new changes.

First, curl is just fetching the html, but it's not loading the javascript. So I'm not sure it's measuring what we care about (time until website shows useful things to the user?)

If you test https://metaforecast.org/?query=test vs https://shimmering-bavarois-62a68f.netlify.app/?query=test, the second is much more annoying; there are three react loads between initial page load and results being shown.

Similarly, because you're using staticprops instead of serverside props, the frontpage is also fetched after initial page load, and there is some initial flashing (??).

All in all, the previous approach seems like it was faster, because:

  • When loading the frontpage, I was fetching a json, rather than using the really expensive ORDER BY RANDOM call
  • When loading from a query saved in the url, the server side took as long as the frontend does now, but returned a page which was already ~rendered/didn't have to refresh.

I think that to save the previous approach with the current structure, we could:

  • have a latest.frontpage_active with just 50 elements, instead of a json, or instead of doing the really expensive random query. Its contents could be changed every day/few hours by a cron job.
  • go back to using getServerSideProps to avoid initial flashing, even if that means that the initial page load as measured by curl is slower.

Thoughts here? I personally really dislike multiple reloads until content is shown, but this is not necessarily universal.

@NunoSempere
Copy link
Collaborator

NunoSempere commented Mar 25, 2022

Excepting the above details with regards to the frontpage, I'm really liking how this is going. I'm going through the code, and it looks like a different, less amateurish and better organized beast.

@berekuk
Copy link
Collaborator Author

berekuk commented Mar 25, 2022

there is some initial flashing (??).

This can (and should) be fixed, flashing is due to scrollbar shrinking the page, I planned to look into that anyway, should be fixable with better CSS (the same flashing issue exists on the current website).

When loading the frontpage, I was fetching a json, rather than using the really expensive ORDER BY RANDOM call

No, my version doesn't pay ORDER BY RANDOM costs at all! getStaticProps always prepares props in the background, even when revalidation is enabled.

Overall, for the frontpage without search query, the costs in the new version are:

  • fetching static HTML
  • react hydration of the initial page without data (since we can't prerender frontpage content on the server side)
  • time to render the full version

While in the old version the costs are:

  • fetching non-static HTML (slower than static, since Next.js has to render JSX on every call)
  • (but you don't have to wait for the hydration on the frontend to see the data)

And for the version without search query, new version is faster on time-to-first-byte, but slower on fetching the data since it requests algolia only after hydration.

https://pagespeed.web.dev/report?url=https%3A%2F%2Fmetaforecast.org%2F vs https://pagespeed.web.dev/report?url=https%3A%2F%2Fshimmering-bavarois-62a68f.netlify.app%2F are inconclusive: sometimes old version is faster, sometimes the new one.

(I still like that static version shows the shell with top menu much faster, though.)

Overall, I agree that the gains are not clear, so I'll roll it back, but I'd like to think more about the alternatives later.

I have an idea of splitting search into /search page, then maybe we could have a fast static fully SSR frontpage and slightly slower /search page... but unfortunately frontpage should be filterable on stars and platforms, and if we serve all filters from /search, then we'd still need to pull frontpage.json data into /search, which means we'd still need to cache it in the DB for the optimal performance. This would speed up the default / page, at least, and so might be worthwhile (static fully prerendered pages are much better than anything else, obviously, it's just that the query gets in the way).

There might also be a solution in the direction of "let's statically render everything on demand" (Next.js is capable of doing that with getStaticPaths + fallback=blocking), but it would require different URL schemas, like https://metaforecast.org/search/stars/2/forecasts/100/display/20/platforms/Foretold,PolyMarket; URL schemas could be hidden by rewrites, though... overall this approach seems too complicated to me. But I'll think about it some more and maybe figure out a better way (later; don't want to spend too much time on this and delay merging this branch).

I personally really dislike multiple reloads until content is shown, but this is not necessarily universal.

In my experience, static app shells (initial half-empty pages, sometimes with gray boxes instead of content) can be annoying, but can also be good and invisible. Reload shouldn't be noticeable to the end user, and when it's not noticeable, there's not much difference between the classic old-school "browser rendered the first half of html" and app shells from cache.

PS: I just noticed that I missed Next.js 12.1 a month ago, turns out streaming SSR has arrived, this is going to be great for performance and possibly enough even without going full static.

- rename more files to .ts/.tsx
- update tailwind config to use .tsx files from all dirs
- merge css files into a single main.css to avoid import order
  dependencies
- minor cleanups due to ts complaints
@berekuk
Copy link
Collaborator Author

berekuk commented Mar 26, 2022

I'll roll it back

Done.

"Rebuild database" job is now #20 in CLI (not included in doEverything), we can put it on Heroku as a separate job.

flashing is due to scrollbar shrinking the page

I fixed this with permanent vertical scrollbar. It's not a perfect solution but still better than jumping content, and all approaches I could find have some downsides.


Some minor refactorings:

  • I placed some files in web/search/ (anySearchPage.ts with common getServerSideProps code for index and capture, to fix copy-paste, and I moved commonDisplay.tsx there also since it made sense to locate these together)
  • this slightly broke styles due to too-specific tailwind config
  • so I gave all React-related files .tsx extensions and changed the config to ["./src/web/**/*.tsx", "./src/pages/**/*.tsx"]
  • which required some more cleanups due to typescript having stricter requirements for ts files, but mostly minor.

Overall I think we're ready to deploy this in prod, let me know if it's ok if I do it tomorrow!

https://metaforecast.org/layout is not a real page, so it shouldn't be
in src/pages.
@NunoSempere
Copy link
Collaborator

NunoSempere commented Mar 26, 2022

Hey, I'm still getting the flashing in your test site (though it's possible that you didn't rebuild it). I've made an issue to discuss that in particular: #14

@netlify
Copy link

netlify bot commented Mar 26, 2022

Deploy Preview for metaforecast ready!

Name Link
🔨 Latest commit 118495c
🔍 Latest deploy log https://app.netlify.com/sites/metaforecast/deploys/623f8ec55ae4fc00089a858d
😎 Deploy Preview https://deploy-preview-8--metaforecast.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@berekuk
Copy link
Collaborator Author

berekuk commented Mar 26, 2022

Netlify logs:

Your environment variables exceed the 4KB limit imposed by AWS Lambda. Please consider reducing them.

Ugh. This is annoying and it means that I'll have to figure out a way to inject a .env file to Netlify on the build step, instead of just configuring env variables through Netlify.

@berekuk berekuk merged commit e26f5aa into quantified-uncertainty:master Mar 26, 2022
@berekuk
Copy link
Collaborator Author

berekuk commented Mar 26, 2022

Merged and deployed!

I set up the frontpage cron job to run every hour and ran both cron jobs by hand, everything seems fine.

The issue with env limit is still important, for now I worked around it by trimming cookies and removing some unused ones (e.g. CSETFORETELL_COOKIE) so that the entire config fits in 3kb, but this won't work in the long run.

I'll file a separate issue on this and continue from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants