Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example to parse any block text #416

Merged
merged 7 commits into from
Jul 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions examples/parse-text-from-any-block-type/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
NOTION_KEY=<your-notion-api-key>
NOTION_PAGE_ID=<notion-page-id>
8 changes: 8 additions & 0 deletions examples/parse-text-from-any-block-type/.prettierrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"arrowParens": "avoid",
"tabWidth": 2,
"semi": false,
"trailingComma": "es5",
"endOfLine": "lf",
"singleQuote": false
}
63 changes: 63 additions & 0 deletions examples/parse-text-from-any-block-type/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Sample integration: Parse text from any block type

Parse plain text from any type of block (i.e. page content), including headers, lists, media, etc.

## About the integration

This integration will retrieve Notion [page content](https://developers.notion.com/docs/working-with-page-content) and parse any plain text from the block. (Note: page content is represented by [blocks](https://developers.notion.com/docs/working-with-page-content#modeling-content-as-blocks).) The plain text is printed to the command line in this example, but can be used in your Notion projects as needed.

## About page content

When [retrieving block children](https://developers.notion.com/reference/get-block-children) (i.e. page content) with the public API, the structure of the blocks returned will vary depending on the block type. For example, a [paragraph block](https://developers.notion.com/reference/block#paragraph) and an [image block](https://developers.notion.com/reference/block#image) are modeled differently.

This example demonstrates how to parse any available text for any type of block. In many cases, [rich text](https://developers.notion.com/reference/rich-text) will be available and the `plain_text` value will be used.

Note: Not all blocks contain text to display (e.g. dividers). Additionally, not all block types are currently supported by the public API.

## Running Locally

### 1. Setup your local project

```zsh
# Clone this repository locally
git clone https://github.com/makenotion/notion-sdk-js.git

# Switch into this project
cd notion-sdk-js/examples/parse-text-from-any-block-type

# Install the dependencies
npm install
```

### 2. Set your environment variables in a `.env` file

A `.env.example` file has been included and can be renamed `.env`. Update the environment variables below:

```zsh
NOTION_API_KEY=<your-notion-api-key>
NOTION_PAGE_ID=<notion-page-id>
```

`NOTION_API_KEY`: Create a new integration in the [integrations dashboard](https://www.notion.com/my-integrations) and retrieve the API key from the integration's `Secrets` page.

`NOTION_PAGE_ID`: Use the ID of any Notion page with content. A page with a variety of block types is recommended.

The page ID is the 32 character string at the end of any page URL.
![A Notion page URL with the ID highlighted](./assets/page_id.png)

### 3. Give the integration access to your page

Your Notion integration will need permission to retrieve the block children from the Notion page being used. To provide access, do the following:

1. Go to the page in your workspace.
2. Click the `•••` (more menu) on the top-right corner of the page.
3. Scroll to the bottom of the menu and click `Add connections`.
4. Search for and select your integration in the `Search for connections...` menu.

Once selected, your integration will have permission to read content from the page.

### 4. Run code

```zsh
node index.js
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
151 changes: 151 additions & 0 deletions examples/parse-text-from-any-block-type/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
import { Client, iteratePaginatedAPI } from "@notionhq/client"
import { config } from "dotenv"

config()

const pageId = process.env.NOTION_PAGE_ID
const apiKey = process.env.NOTION_API_KEY

const notion = new Client({ auth: apiKey })

/*
---------------------------------------------------------------------------
*/

// Take rich text array from a block child that supports rich text and return the plain text.
// Note: All rich text objects include a plain_text field.
const getPlainTextFromRichText = richText => {
return richText.map(t => t.plain_text).join("")
// Note: A page mention will return "Undefined" as the page name if the page has not been shared with the integration. See: https://developers.notion.com/reference/block#mention
}

// Use the source URL and optional caption from media blocks (file, video, etc.)
const getMediaSourceText = block => {
let source, caption

if (block[block.type].external) {
source = block[block.type].external.url
} else if (block[block.type].file) {
source = block[block.type].file.url
} else if (block[block.type].url) {
source = block[block.type].url
} else {
source = "[Missing case for media block types]: " + block.type
}
// If there's a caption, return it with the source
if (block[block.type].caption.length) {
caption = getPlainTextFromRichText(block[block.type].caption)
return caption + ": " + source
}
// If no caption, just return the source URL
return source
}

// Get the plain text from any block type supported by the public API.
const getTextFromBlock = block => {
let text

// Get rich text from blocks that support it
if (block[block.type].rich_text) {
// This will be an empty string if it's an empty line.
text = getPlainTextFromRichText(block[block.type].rich_text)
}
// Get text for block types that don't have rich text
else {
switch (block.type) {
case "unsupported":
// The public API does not support all block types yet
text = "[Unsupported block type]"
break
case "bookmark":
text = block.bookmark.url
break
case "child_database":
text = block.child_database.title
// Use "Query a database" endpoint to get db rows: https://developers.notion.com/reference/post-database-query
// Use "Retrieve a database" endpoint to get additional properties: https://developers.notion.com/reference/retrieve-a-database
break
case "child_page":
text = block.child_page.title
break
case "embed":
case "video":
case "file":
case "image":
case "pdf":
text = getMediaSourceText(block)
break
case "equation":
text = block.equation.expression
break
case "link_preview":
text = block.link_preview.url
break
case "synced_block":
// Provides ID for block it's synced with.
text = block.synced_block.synced_from
? "This block is synced with a block with the following ID: " +
block.synced_block.synced_from[block.synced_block.synced_from.type]
: "Source sync block that another blocked is synced with."
break
case "table":
// Only contains table properties.
// Fetch children blocks for more details.
text = "Table width: " + block.table.table_width
break
case "table_of_contents":
// Does not include text from ToC; just the color
text = "ToC color: " + block.table_of_contents.color
break
case "breadcrumb":
case "column_list":
case "divider":
text = "No text available"
break
default:
text = "[Needs case added]"
break
}
}
// Blocks with the has_children property will require fetching the child blocks. (Not included in this example.)
// e.g. nested bulleted lists
if (block.has_children) {
// For now, we'll just flag there are children blocks.
text = text + " (Has children)"
}
// Includes block type for readability. Update formatting as needed.
return block.type + ": " + text
}

async function retrieveBlockChildren(id) {
console.log("Retrieving blocks (async)...")
const blocks = []

// Use iteratePaginatedAPI helper function to get all blocks first-level blocks on the page
for await (const block of iteratePaginatedAPI(notion.blocks.children.list, {
block_id: id, // A page ID can be passed as a block ID: https://developers.notion.com/docs/working-with-page-content#modeling-content-as-blocks
})) {
blocks.push(block)
}

return blocks
}

const printBlockText = blocks => {
console.log("Displaying blocks:")

for (let i = 0; i < blocks.length; i++) {
const text = getTextFromBlock(blocks[i])
// Print plain text for each block.
console.log(text)
}
}

async function main() {
// Make API call to retrieve all block children from the page provided in .env
const blocks = await retrieveBlockChildren(pageId)
// Get and print plain text for each block.
printBlockText(blocks)
}

main()
23 changes: 23 additions & 0 deletions examples/parse-text-from-any-block-type/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"name": "parse-text-from-any-block-type",
"version": "1.0.0",
"description": "Retrieve blocks from a Notion page and parse text from any block type.",
"main": "index.js",
"type": "module",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1",
"start": ""
},
"repository": {
"type": "git",
"url": "git+https://github.com/makenotion/notion-sdk-js.git"
},
"author": "Jess Mitchell",
"license": "MIT",
"bugs": {
"url": "https://github.com/makenotion/notion-sdk-js/issues"
},
"dependencies": {
"@notionhq/client": "file:../.."
}
}
Loading