No description, website, or topics provided.
Python Java JavaScript C++ Other
Clone or download
raulcf hooking value in intermediate step even if filters is set to none; th…
…is covers one more cornercase. the critical path of verifying joins is very brittle though
Latest commit 4d8c79f Aug 10, 2018
Permalink
Failed to load latest commit information.
DoD
api included option to indicate csv pandas separator through the DoD cons… May 24, 2018
benchmarking tweaks post-deadline Nov 14, 2016
dataanalysis
ddprofiler working version of new simple-connector interface Jul 24, 2018
docker Neo4j feature (#124) Jun 29, 2018
docs added draft docs for adding new source Jul 24, 2018
inputoutput keeping lsh indexes after building stage and serialized for future reuse Feb 3, 2017
knowledgerepr Neo4j feature (#124) Jun 29, 2018
maintain tweaks Jan 12, 2018
modelstore added max_hops to joinable; reimplemented joinable; simplified critic… Aug 8, 2018
nearpy Code format all python files using a script Aug 25, 2016
ontomatch tweaks Jan 12, 2018
.dockerignore changes to docker config Aug 9, 2017
.gitignore Simplified task creation; normalized to individual sources; integrated Nov 29, 2017
CODE_OF_CONDUCT.md
CONTRIBUTING.md Added code of conduct Jun 29, 2018
LICENSE Add MIT License Sep 23, 2016
README.md Structuring docs Jul 2, 2018
algebra.py added exact attribute match May 26, 2018
config.py added config parameter to determine pandas.read separator May 23, 2018
ddapi.py
docker-compose.neo4j.yml
docker-compose.yml Neo4j feature (#124) Jun 29, 2018
export_network_2_neo4j.py Neo4j feature (#124) Jun 29, 2018
main.py Update main.py (#126) Jul 26, 2018
networkbuildercoordinator.py Neo4j feature (#124) Jun 29, 2018
requirements.txt Neo4j feature (#124) Jun 29, 2018
sugar.py started sketching new api around algebra Dec 5, 2017
test_algebra.py
test_ddapi.py working on paths_between fixes Sep 30, 2016
test_ddapi_path_queries.py overhaul to path between queries. stabilized all paths between. still… Oct 8, 2016

README.md

Aurum: Discovering Data in Lakes, Clouds and Databases

Webpage version of this documentation: http://mitdbg.github.io/aurum-datadiscovery/

Aurum helps users identify relevant content among multiple data sources that may consist of tabular files, such as CSV, and relational tables. These may be stored in relational database management systems (RDBMS), file systems, and they may live in cloud services, data lakes or other on-premise repositories.

Aurum helps you find data through different interfaces. The most flexible one is an API of primitives that can be composed to build queries that describe the data of interest. For example, you can write a query that says "find tables that contain a column with name 'ID' and have at least one column that looks like an input column". You can also query with very simple primitives, such as "find columns that contain the keyword 'caffeine'". You can also do more complex queries, such as figuring out what tables join with a table of interest. The idea is that the API is flexible enough to allow a wide range of use cases, and that it works over all data you feed to the system, regardless where these live.

  • Why do I need Aurum? We show you various scenarios in which Aurum has proven useful.

  • Design Rationale A brief explanation of the system architecture and design rationale.

  • Quick Start A guide to setup Aurum and start running some discovery queries.

  • Tutorial A tutorial that walks you through the different aspects of Aurum, from how to write queries using the discovery API, to how to create new connectors to read data from different data sources to how to store data in different stores.

  • FAQ Collection of frequent questions

Aurum is a work in progress, we expect to release its first open-source version in the 4th quarter of 2018. We are happy to accept contributions of the community. If you are interested in contributing take a look at the CONTRIBUTING and feel free to email raulcf@csail.mit.edu We also have a code of conduct:

Code of Conduct

Check the code of conduct for Aurum here:

https://github.com/mitdbg/aurum-datadiscovery/blob/master/CODE_OF_CONDUCT.md

Please, report violations of the code of conduct by sending an email to raulcf@csail.mit.edu