HTML to Markdown tool, designed to migrate content from Google Sites to gitlab pages.
Tested with fiquipedia.es (Work in progress)
In 2012, a one-man army started building fiquipedia.es, hosted on Google Sites. This page promptly became a reference in the Physics and Chemistry Spanish educational landscape.
Unfortunately Google decided to discontinued Google Sites Classics from September 1st, 2021 on. So, our lonely hero decided to migrate the huge amount of content to Gitlab Pages. The pre-existing HTML to Markdown migration tools did not provide a clean Markdown, so I decided to help.
Something is infinitely greater than zero
E.G.S.
(fiquipedia man)
If you have a Physics or Chemistry problem, if no one else can help, and if you can browse to fiquipedia.es and you find the solution....maybe you can buy a coffe to fiquipedia.es.
Convert an HTML file or folder (and its content) in a Markdown file
Execution:
python HTML2mdCLI.py -s <input_file_or_folder> -d <destination_path>
where:
-h, --help: Print this help
-s, --source <source_path>: (Mandatory) source file or folder
-d, --dest <dest_path>: (Mandatory) destination file or folder
-r, --replace : (Optional) Flag: Replace Google Drive links to local links (It WON'T download the content by default. You must use in conjunction with --download to force the download)
-D, --download : (Optional) Flag: Download Google Drive content to local drive.This option will have effect only if is used in conjunction with --replace, otherwise will be ignored
-u, --url: (Optional) Use the page title, header of level 1 or the last section of the URL as URL description (only when URL link a description are the same). NOTE: This option can be slow.
-t, --timeout <seconds>: (Optional) Timeout, in seconds, to use in link validation connections. It admits milliseconds, e.g. "0.750" or seconds "2". By default is unlimited
-m, --multiline : (Optional) Support for multiline content in table cells. (Warning: Google Sites may use internal tables in HTML wich may not seem tables for the user. Use under your own risk!)
These are some recommended readings in order to set up a local environment using PyCharm;
The standard in Python projects is to create a file called requirements.txt and list the packages you want in there.
PyCharm will automatically ask if you want to install those packages as soon as you type them in. Go ahead and let it.
beautifulsoup4
google-api-python-client
google-auth-oauthlib
To get started integrating with the Google Drive UI, you need to enable the Drive API within your app's Cloud Platform project and provide configuration details.
Please see Enable the Google Drive API
NOTE: Google has changed the Google Drive API. The app hasn't been tested wiht the new app
Update to some Google Drive file links, admin decision recommended before July 23, 2021 Developers: Items that have a Drive API permissionwith type=domain or type=anyone, where withLink=true (v2) or allowFileDiscovery=false (v3), will be affected. In addition to the item ID, your application may now also need a resource key to access these items. Use our Developer resource to learn more about how this update will impact your projects.
In order to execute the unit test that downloads content from Google Drive, you must have access to the Google Drive account where the content is stored.
This application needs a local copy of a website (www.fiquipedia.es) to use as input. The source HTML will be converted to Markdown.
$ apt-get install wget
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
$ brew install wget
wget --content-disposition --recursive -p http://www.fiquipedia.es
If the server is kind, it might be sticking a Content-Disposition header on the download advising your client of the correct filename. Telling
wget` to
listen to that header for the final filename is as simple as:
wget --content-disposition
Otherwise, you need to execute this script to remove the URL parameters from
the file names added by wget
# /bin/bash
for i in `find $1 -type f`
do
output=`echo $i | cut -d? -f1`
if [ $i != $output ]
then
mv $i $output
else
echo "Skiping $i"
fi
done