To convert a repo with dataview inline format to YAML and take advantage of new Obisidan property (see changelog
To clip articles to Obsidian, you might have used Obsidian WebClipper offered by Steph Ango, the CEO of Obsidian, or derivated work.
In this case, you might have ended with files that have the following structure :
author:: XXX
source:: URL link
clipped:: DateOfClipping
published:: DateOfPublication
#clippings
But now you want something more like :
---
category: "[[Clippings]]"
author: XXX scrapped from the META tags of the page at URL
title: Title scrapped from the META tags of the page at URL
source: URL link
clipped: DateOfClipping
description: Description scrapped from the META tags of the page at URL
summary: "" (Some space to put the summary later)
tags:
- AI
- other tags taken from the META keyword tag
publish: false
Note that I have also updated the official webclipper to give a consistent result, here is my JS version : WebClipper
Clone the repository, go into the repo then install the packages :
npm install
First remember to back up your vault before running.
-
Place all the Markdown files you want to process in a
Ressources
Subdirectory of your vault. Then do a symbolic link to this subdiretory from within the project repo :cd ObsidianRepoUpdate ln -s -v PATH_TO_YOUR_RESSOURCEDIR ./Ressources
-
Run the script:
node index.js
-
Processed files will be moved to the
Ressources/Processed
directory. -
New Markdown files with the fetched article content will be generated in the
Ressources/Result
directory. -
Files that could not be processed will be moved to the
Ressources/ToProcessManually
directory.
-
Check the
error.log
file for any errors that may have occurred during the process -
You still need to process around 10/20% of files manually
-
I have noticed sometimes the WebClipper doesn't produce very clean image links. fixMarkdown.js is an attempt to fix this.
The index.js
script performs the following tasks:
- Reads markdown files from the
Ressources
directory. - Extracts URLs from each markdown file that it finds after the "source", "src" or "url"
- Re-implement a version of the WebClipper to recreate a new markdown file.
- Writes the newly generated markdown content into a
Result
sub-directory withinRessources
. - Moves processed files into a
Processed
sub-directory withinRessources
. - If a URL is not found, moves the original file into a
ToProcessManually
sub-directory withinRessources
. - Logs any further errors that occur to
error.log
.
This project is licensed under the MIT License. However other piece of codes are subject to specific licenses :
- Readability.js by Mozilla (Licensed under Apache License Version 2.0)
- Turndown by Dom Christie Licensed under MIT License)