Aurum: Discovering Data in Lakes, Clouds and Databases

Webpage version of this documentation: http://mitdbg.github.io/aurum-datadiscovery/

Aurum helps users identify relevant content among multiple data sources that may consist of tabular files, such as CSV, and relational tables. These may be stored in relational database management systems (RDBMS), file systems, and they may live in cloud services, data lakes or other on-premise repositories.

Aurum helps you find data through different interfaces. The most flexible one is an API of primitives that can be composed to build queries that describe the data of interest. For example, you can write a query that says "find tables that contain a column with name 'ID' and have at least one column that looks like an input column". You can also query with very simple primitives, such as "find columns that contain the keyword 'caffeine'". You can also do more complex queries, such as figuring out what tables join with a table of interest. The idea is that the API is flexible enough to allow a wide range of use cases, and that it works over all data you feed to the system, regardless where these live.

Why do I need Aurum? We show you various scenarios in which Aurum has proven useful.
Design Rationale A brief explanation of the system architecture and design rationale.
Quick Start A guide to setup Aurum and start running some discovery queries.
Tutorial A tutorial that walks you through the different aspects of Aurum, from how to write queries using the discovery API, to how to create new connectors to read data from different data sources to how to store data in different stores.
FAQ Collection of frequent questions

Aurum is a work in progress, we expect to release its first open-source version in the 4th quarter of 2018. We are happy to accept contributions of the community. If you are interested in contributing take a look at the CONTRIBUTING and feel free to email raulcf@csail.mit.edu We also have a code of conduct:

Code of Conduct

Check the code of conduct for Aurum here:

https://github.com/mitdbg/aurum-datadiscovery/blob/master/CODE_OF_CONDUCT.md

Please, report violations of the code of conduct by sending an email to raulcf@csail.mit.edu

Name	Name	Last commit message	Last commit date
Latest commit RawatVimal updated Debian 8 (jessie) to Debian 10 (buster) latest stable release (… Nov 11, 2020 3993dbd · Nov 11, 2020 History 870 Commits
DoD	DoD	fix: bug in function apply_filter (#145 )	Sep 20, 2020
api	api	included a new light drs creation which ignores provenance for perfor…	Nov 30, 2018
benchmarking	benchmarking	tweaks post-deadline	Nov 14, 2016
dataanalysis	dataanalysis	wip; all changes in	Feb 14, 2017
ddprofiler	ddprofiler	added mapping for suggestions and reindexed; updated es; created quer…	Oct 3, 2018
docker	docker	updated Debian 8 (jessie) to Debian 10 (buster) latest stable release (…	Nov 11, 2020
docs	docs	Aurum CLI, Neo4J Improvements, Minor api & bug fixes (#129 )	Dec 14, 2018
inputoutput	inputoutput	keeping lsh indexes after building stage and serialized for future reuse	Feb 3, 2017
knowledgerepr	knowledgerepr	Activated lean-search in DoD; changed comparison of Hit to int compar…	May 20, 2019
maintain	maintain	tweaks	Jan 12, 2018
modelstore	modelstore	added mapping for suggestions and reindexed; updated es; created quer…	Oct 3, 2018
nearpy	nearpy	Code format all python files using a script	Aug 25, 2016
ontomatch	ontomatch	tweaks	Jan 12, 2018
server-api	server-api	very rudimentary download option	Oct 7, 2018
.dockerignore	.dockerignore	changes to docker config	Aug 9, 2017
.gitignore	.gitignore	Aurum CLI, Neo4J Improvements, Minor api & bug fixes (#129 )	Dec 14, 2018
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Added code of conduct	Jun 29, 2018
CONTRIBUTING.md	CONTRIBUTING.md	Added code of conduct	Jun 29, 2018
LICENSE	LICENSE	Add MIT License	Sep 23, 2016
README.md	README.md	Structuring docs	Jul 2, 2018
algebra.py	algebra.py	included a new light drs creation which ignores provenance for perfor…	Nov 30, 2018
aurum-cli.md	aurum-cli.md	Aurum CLI, Neo4J Improvements, Minor api & bug fixes (#129 )	Dec 14, 2018
aurum_cli.py	aurum_cli.py	Aurum CLI, Neo4J Improvements, Minor api & bug fixes (#129 )	Dec 14, 2018
config.py	config.py	Dealin with null, nan, nat, values when joining	Jul 17, 2019
ddapi.py	ddapi.py	started sketching new api around algebra	Dec 5, 2017
docker-compose.neo4j.yml	docker-compose.neo4j.yml	Neo4j feature (#124 )	Jun 29, 2018
docker-compose.yml	docker-compose.yml	Refactored dockerfiles to reduce image size (#127 )	Aug 17, 2018
export_network_2_neo4j.py	export_network_2_neo4j.py	Neo4j feature (#124 )	Jun 29, 2018
main.py	main.py	Aurum CLI, Neo4J Improvements, Minor api & bug fixes (#129 )	Dec 14, 2018
networkbuildercoordinator.py	networkbuildercoordinator.py	Neo4j feature (#124 )	Jun 29, 2018
requirements.txt	requirements.txt	updated Debian 8 (jessie) to Debian 10 (buster) latest stable release (…	Nov 11, 2020
run-server-api.sh	run-server-api.sh	server-side frontend and tested two-way comm	Sep 18, 2018
run_dod.py	run_dod.py	included a new light drs creation which ignores provenance for perfor…	Nov 30, 2018
server_config.py	server_config.py	Added main view analysis functions; some cleaning	Nov 20, 2018
sugar.py	sugar.py	started sketching new api around algebra	Dec 5, 2017
test_algebra.py	test_algebra.py	tweaks to ui; need to move configs to separate file	May 20, 2017
test_ddapi.py	test_ddapi.py	working on paths_between fixes	Sep 30, 2016
test_ddapi_path_queries.py	test_ddapi_path_queries.py	overhaul to path between queries. stabilized all paths between. still…	Oct 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aurum: Discovering Data in Lakes, Clouds and Databases

Code of Conduct

About

Releases

Packages

Contributors 16

Languages

License

mitdbg/aurum-datadiscovery

Folders and files

Latest commit

History

Repository files navigation

Aurum: Discovering Data in Lakes, Clouds and Databases

Code of Conduct

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 16

Languages

Packages