Skip to content

BioTaxGeo is a data cleaning tool supervised by the user that offers a light and intuitive web interface to facilitate the identification and correction of geospatial and taxonomic data errors present in species occurrence spreadsheets.

License

Notifications You must be signed in to change notification settings

marcos-de-sousa/BioTaxGeo

Repository files navigation

BioTaxGeo

BioTaxGeo is a web-based application to assess the geospatial and taxonomic quality of primary biodiversity data, and helps to identify and correct errors in the field collection spreadsheets (.xls or .csv). With a light and intuitive web interface, BioTaxGeo aims to provide good user experiences to biologists and researchers who seek the quality of biodiversity data.

Sumary

Functionalities

Taxonomy Check

The user will submit a spreadsheet to check the taxonomy fields of all records of the species collected. BioTaxGeo will check if the taxonomy is valid, if there are problems in the taxonomic data the system will suggest corrections according to the screens below. The taxonomic checking process is done via Species API in Global Biodiversity Information Facility ( GBIF ) database.

In this screen, the user must correctly select the fields referring to the taxonomy present in his spreadsheet, so that the application can validate it.

Then the taxonomic data of the spreadsheet will be checked. If there is any problem in filling it out, the application will point out the flaws and suggest corrections. The user can save the changes in the spreadsheet itself, if desired.

Geospatial check

The user will inform the longitude and latitude fields present in the spreadsheet and must also inform the location of the species collection site, inserting markers on a map that will delimit the area forming a polygon. BioTaxGeo will check if there are inconsistencies in the data of the filled geographic coordinates, or if the data of the coordinates of the collected species are within the mapped area. The coordinate verification process is done via Google's Geocoding API.

Then you will be redirected to plot markers and identify the area you collected the data. The user will be able to inform the geographical coordinates of the plot markers by clicking directly on the map or entering data in the longitude and latitude fields.

After the user saves the coordinates and clicks on check areas, BioTaxGeo will check if there are formatting errors in the fields referring to geographic coordinates, if the coordinates correspond to the informed location (city, state and country), and also if all records of occurrence of species are within the defined area. The records that appear outside the area can be corrected.

Compare spreadsheets

This section will ask for you to fill the fields correctly to identify the columns for two spreadsheet files.

Insert the file that will be used as reference.

Identify the columns in both files.

After the software identify your columns, the data will be compared the entries between them.

Installation Guide

Requirement

  • API KEY GoogleMaps
  • python ^3.6
  • virtualenv

API GoogleMaps

How to get API: https://developers.google.com/maps/documentation/javascript/get-api-key

Active API for Geocoding API and Maps JavaScript API.

At the root of the project open the file googlemaps_api_key.txt, paste your API KEY save and close the file.

Virtualenv

How to install and run virtualenv: https://virtualenv.pypa.io/en/latest/installation.html

How to use : https://virtualenv.pypa.io/en/latest/user_guide.html

Step 1: Download project

Download the project or clone it into a folder on your PC.

Step 2: Init virtualenv

Inside the folder create a virtualenv with a version of Python 3.6^. Open the terminal (if you are using Windows you will need to use the terminal as an administrator), start your virtualenv.

Step 3: Install packages

Use this command to install:

pip install (package name)==(version number)

Install the packages below.

  • flask (version 1.1.1)
  • fuzzywuzzy (version 0.18.0)
  • python-Levenshtein-wheels (version 0.13.1)
  • requests (version 2.23.0)
  • xlrd (version 1.2.0)
  • xlutils (version 2.0.0)
  • xlwt (version 1.3.0)
  • googlemaps (version 4.2.0)
  • pandas (version 1.0.4)

Step 4: Starting the software

Now execute the code:

python run.py

The server will be started at http: // localhost: 8080.

Make sure that you are using virtualenv and that all packages are installed correctly, also make sure you paste the google maps API code into googlemaps_api_key.txt file before running the program.

Authors

License

  • MIT License. See LICENSE for more information

About

BioTaxGeo is a data cleaning tool supervised by the user that offers a light and intuitive web interface to facilitate the identification and correction of geospatial and taxonomic data errors present in species occurrence spreadsheets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published