โน๏ธ This repository contains the code for the following work: Hรผffer, P., Degbelo, A. and Risse, B. (2025) โGeovicla: automated classification of interactive web-based geovisualizationsโ, in 13th International Conference on Geographic Information Science (GIScience 2025), Christchurch, New Zealand.
This work is about classifying and harvesting interactive (geo)visualisations on the web.
In the 'database' folder are all the files. Including scripts to harvest webpages, process the data and run the machine learning classifiers.
Stores the credentials, necessary for the google search. If you want to use the search yourself, you need to create a cse (custom search engine) here:link: and get yourself an api key here:link:. Create a file called credentials.py in the database folder and insert the ids in the format as shown below.
cse_id = 'insert_your_cse_id_here'
api_key = 'insert_your_api_key_here'Executing this file starts an interactive console, to classify websites of a mongodb database websites.*dbName*. One database entry that do not have a value for the field "real_type", will be opened in the browser. The console asks then for a classification input and saves it in the database, after entered. After that the next webpage from the database is opened.
This File builds search queries used in the automatedSearch.py.
Provides a funtion that conducts an automated search for a given query. Returns search results for the query.
In this file is the logic for the machine learning classifiers. The files mlp.py, svm.py, nb.py and rf.py use this module.
Everything related to mongodb is green
File names are displayed in grey.