The package honcoder
is a small collection of Python functions and scripts for
analysing the HONcode medical website certification database.
To use honcoder
you need a working Python3 installation. For both Mac and
Windows, the easiest way to do this is with a package manager.
First install X-code and the Mac OSX developer tools. You can download this
through the regular Mac App Store. Then you’ll need the package manager
homebrew
. Follow these instructions to install homebrew
and python3
, then
run the following command in the terminal somewhere where you would like to save the xgoogle program:
git clone https://github.com/johngarg/xgoogle.git
cd xgoogle
pip3 install -r requirements.txt
python3 setup.py build
python3 setup.py install
to install the dependencies needed for the code.
Using a shell like Powershell, you need to install the package manager
Chocolatey
from here. Then install python3
with these instructions.
If it’s not there already, you’ll need to download the data. Place the
honcoder
directory somewhere on your machine. Within the folder, create the
data directory and run the pickler.py
file with
mkdir data
python3 pickler.py
This will download the HONcode data (as a .txt
file with a French name) and
convert it to a python-friendly format for quick lookup. This data file is saved
as honcode.dat
in the data
directory. Run the search with
python3 honcoder.py --search 'lung cancer' --n 50 --lang 'en' > output.csv
where the string of characters in quotes following --search
is the search
term, --n
is the number of google results you would like to query and --lang
is the language of the Google search. output.csv
is the name of the output
file (spreadsheet). You should be able to double click on this file to open it
in Pages on a Mac. You can import csv files into Excel as well. You can find
information about the languages available here.
The google search works by pages so sometimes you’ll get a few more URLs than
you asked for. Also, sometimes you’ll get an HTTP Error 503: Service
Unavailable
error. This comes from Google and is meant to prevent you spamming
them with the constant queries. Most of the time this isn’t an issue; if you get
the error, wait a few minutes and run the code again. It’s possible another
interface to google would do better than this.
The HONcode database keeps only the base URLs. For example,
https://gov.wales/topics/health/nhswales/plans/heart_plan/?lang=en
would be in
the list as just gov.wales/
. Check some of the processed URLs against the
whole thing to see that the code is processing them correctly. Also, be sure to
check some of them with the browser to see if they are working.