Skip to content

Mat-O-Lab/CSVToCSVW

Repository files navigation

CSVToCSVW

Publish Docker image TestExamples

Generates JSON-LD for various types of CSVs, it adopts the Vocabulary provided by w3c at CSVW to describe structure and information within. Also uses QUDT units ontology to lookup and describe units. Can segment complex csv files with multiple tables and annotation without further input. Has also an option to output complete serialized content of the csv in csvw standard output format through rdf api endpoint.

restrictions

Situations in which the annotation will fail!

  • If Numbers are used as column names

how to use

create a .env file with

APP_PORT=<80>
ADMIN_MAIL=<email_of_admin>
SSL_VERIFY=<True or False> #default is True

docker

Just pull the docker container from the github container registry

docker pull ghcr.io/mat-o-lab/csvtocsvw:latest

docker-compose

Clone the repo with

git clone https://github.com/Mat-O-Lab/CSVToCSVW

cd into the cloned folder

cd CSVToCSVW

Build and start the container.

docker-compose up

A simple UI can be found at at the index page '/' The API documentation at 'api/docs'

jupyter notebook

  1. Open the notebook in or any other jupyter instance.Open In Colab
  2. Run the first cell of the notebook. It will install the necesary python packages and definitions.
  3. Run the second cell
  4. Upload a csv file or paste in a url pointing at one in the provided widgets.
  5. Click the process button, it will try to determine encoding and column seperator automatically. If that fails, choose appropiate values from the drop downs in the widgets and press the process button again.
  6. If successful the json-ld created will be printed to the cell as output. Click the download button to download the code in the proper filename acoording to https://www.w3.org/ns/csvw.
  7. Place the file in the same folder then the csv it describes.

Acknowledgments

The authors would like to thank the Federal Government and the Heads of Government of the Länder for their funding and support within the framework of the Platform Material Digital consortium. Funded by the German Federal Ministry of Education and Research (BMBF) through the MaterialDigital Call in Project KupferDigital - project id 13XP5119.