Generalized Data Analytics
The purpose of this suite is to provide advanced database functionality such as searching, processing, and publishing, on on unperfect domains of data in a trivialized manner for fast prototyping and ease of use by non-technical people.
- manual: by hand (on your phone), aka: keeping a todo list - browser upload: select a formatted file to load records from - you could be a grad researcher, not a software developer - data mine: enter a url, enter xpath queries or regex’s to parse websites - scheduled: add a schedule for polling the url - programmatically: through our logging api from your code base
- rely on our user management - load data through our web site to our databases - rely on our site, but use your database - rely on your own clone of our site and your own database
- prototyping: - the data is of questionable quality - dont deal with mysql’s pickness until you are ready - lossless: - accept and record everything from the loader - skip debugging LOAD statement errors due to schema issues - more like a nosql such as mongo db in this regard - revisioned: - see how each set of data (think: a table in mysql) evolves - don’t lose old records if they are accidentally deleted in updates
- standard statistical information displays - is a field generally unique? - the price of a product - out of 2,000 products, there are 1773 unique prices - is the field a flag/enumerated value field? - the category of a product - 10 distinct values across 2000 products - the distribution of records across this flag
- create filters and indexes on a data set - keys: - define the field and value ranges you care about - link data sets together using keys: - similar to myql join - using selected fields - conditional statements - example: price < 50 - everything is mapped to a url so you can share and re-visit - meant to be useable by a non-technical user
- data set revision validation - statistical proofing based on rules to ensure data quality before release - publish reports to clients under your name - easily - privately - speed optimization: - from your Keys and Views - publish an optimized rails application - with a schema and views defined explicitly for your data domain