This project will scrape a
notion.so page and its links, rewrite your
firebase.json file with custom links, then deploy all that as static content to Firebase hosting.
- Static content on Firebase hosting -
- Source content on Notion -
Super nice personal website builder
Super nice means: WYSIWYG editor, beautiful without any work, trivial to restructure
I liked using
notion.so a lot, and wanted to have my person website / blog hosted on it. At first I just used an iframe. It was awesome: a beautiful WYSIWYG website builder that lets you focus on the content. Also you start treating your personal website like a scrapbook instead of a curated representation of you: it made me want to publish unpolished ideas. And I could trivially change the structure of the website (number of pages and how they link to each other).
But there were two things I really didn't like about using an iframe: individual pages in my scrapbook did not have custom URLs, and Google could not index any of the content on my website. Horrid.
Then I found out that putting Notion pages in iframes would soon be deprecated for some security reason: and decided to build this project.
This project is a script you can run to turn your Notion page (and pages it links to, recursively) into static content which gets deployed on Firebase (with custom short links).
Install Docker for your OS: https://docs.docker.com/engine/getstarted/step_one
Set up a Firebase hosting project: https://firebase.google.com/docs/hosting/quickstart
Install selenium from pip:
pip install selenium
Install pickledb from pip:
pip install pickledb
The entry point for running the code is
python run.py d065149ff38a4e7a9b908aeb262b0f4f ../firebase
d065149ff38a4e7a9b908aeb262b0f4f would be the Notion page ID which the spider will start off from.
../firebase is the directory in which you set up your Firebase hosting project.
A headless browser is needed to scrape Notion because it relies significantly on React to render the page.
This module cleanly gets a Selenium webdriver handle.
- Pulls the
- Runs that image
- Connects to the container with a remote Selenium webdriver
- Takes care of clean up with
This module scrapes and cleans up a single Notion page.
- Loads the URL and waits for some seconds
- Throws an exception if the page requires authentication or has no content
- Fixes resource links, deletes scripts, and removes manifest
- Adds mouse handlers to bring back pretty animations on clickable divs
- Keeps track of all other Notion pages linked from this page
- Inserts script element for analytics
- Returns the source for a page, and page IDs linked from that page
This module scrapes a Notion page and all other Notion pages linked from that page, recursively. Breadth first. It also takes care of replacing
https://www.notion.so/<page_id> URLs with custom short links.
- Breadth first search, starting at specified root page
- Skips over any pages which throw an exception
- Pages are all stored in memory while the spider is running
dump_resultsto dump the spidering results to disk
postprocessto go through pages on disk and replace Notion links with custom short links
- Asks user for custom short link to use for each page, then stores those using
generate_rewriteswhich will be used later to modify the rewrite routes in
This module does everything together so you can just run it to update Firebase hosting with the newest version of your Notion.
- Runs a spider
- Dumps the results to a Firebase public folder
firebase.jsonwith updated routes
to Simon for helping me out <3