Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
MacroBase provides a range of functionality, including exploratory data analysis and specialized operators for data types including time-series and streaming data. The easiest way to interact with the system is via the exploratory GUI, which we describe below. Most of the advanced functionality hasn't yet made its way to the GUI, so the best way to access it is by emailing us or poking around the repo.
To build and start the GUI server
Clone the repo:
git clone https://github.com/stanford-futuredata/macrobase.git
cd macrobase; mvn package
- Note: after the first build,
mvn compileshould be sufficient.
- Note: if you update pom.xml, you can also run
mvn dependency:copy-dependenciesinstead of
- Note: your IDE should have an option to import an existing Maven project. Point the import tool at
- Note: after the first build,
Load some data. Set up a postgres server on localhost with a database named
postgres. Then run
- Note: to manually inspect the data:
psql postgresthen run
SELECT * FROM sensor_data_demo LIMIT 10;
Start the MacroBase server:
- Connect to the GUI: Open http://localhost:8080/ in your browser.
- postgres username / password: edit conf/macrobase.yaml and set the parameters macrobase.loader.db.user and macrobase.loader.db.password
- data loading issues: try using a csv. See https://github.com/stanford-futuredata/macrobase/wiki/Running-MacroBase-Queries
MacroBase Exploratory GUI
Configure the data source. MacroBase pulls data from a view defined by the 'Base Query' field. For our demo data, set 'Base Query' to
SELECT * FROM sensor_data_demo;and press
- Perform analysis. Click the 'analyze' button. In a few seconds, you should see your results below, like this:
What we see is that we found 1012 records with high power drain readings. The list below shows attribute-value combinations that were highly correlated with high power drain readings; in this scenario, we only found one group: (device_id = 2040, model = M204, state = AR, firmware_version = 0.3.1).
What do the statistics mean?
- Support is the proportion of records marked as outliers that contained this attribute combination. Theoretical minimum is 0 (no outliers had this pattern), maximum is 1 (all outlier records matched).
- Ratio Out/In is the proportion of outlier records containing this attribute combination compared to the proportion of inlier records containing this attribute combination (i.e., support in outliers divided by support in inliers). A ratio of 1 means that this pattern appeared equally frequently in inlier and outliers. A ratio of infinity means this pattern was not present in the inliers.
- Records is the actual number of outlier records matching this pattern (i.e., support * number of outliers).
Note: you can explore records matching each combinations by clicking the corresponding 'Explore' link.
Can you tell why these these records marked as outliers?
- Note: in the future, we'd like to add histograms to the summary screen so that it's more clear why records are marked as such as well as how they differ from the inlier distribution.
Try it yourself. Repeat the analysis to find records with low temperature readings.
- Note: you can also try finding multiple metrics of interest at once.
MacroBase Command Line
- Running batch analysis:
bin/batch.shrepeats the above, but using a pre-configured set of tables and columns. The configuration is stored in
conf/batch.yaml. You likely need to edit
batch.yamlto work for the
- Note: you can change the configuration file!
batch.shis just a thin wrapper for
macrobase.MacroBase batch <CONFIG_FILE>
- Running streaming analysis:
bin/streaming.shperforms a similar task using a one-pass streaming algorithm. There are considerably more parameters here to explore.
- Note: again, you can change the configuration
macrobase.MacroBase streaming <CONFIG_FILE>