Pulsar is an un-structure focused intelligent data processing system, it extends SQL to handle the entire life cycle of data processing: collection, extraction, analysis, storage and BI, etc.
- X-SQL: eXtended SQL to do all data jobs: collection, extraction, preparation, processing, storage, BI, etc
- Web spider: browser rendering, Ajax, scheduling, page scoring, monitoring, distributed, high performance, indexing by solr/elastic
- BI Integration: turn Web sites into tables and charts using just one simple SQL
- Big data: large scale, various storage: HBase/MongoDB
For more information check out platonic.fun
Extract data from a single page:
SELECT DOM_TEXT(DOM) AS TITLE, DOM_ABS_HREF(DOM) AS LINK FROM LOAD_AND_SELECT('https://en.wikipedia.org/wiki/Topology', '.references a.external');
The SQL above downloads a Web page from wikipedia, find out the references section and extract all external reference links.
Extract data from a batch of pages, and turn them into a table:
SELECT DOM_BASE_URI(DOM) AS BaseUri, DOM_FIRST_TEXT(DOM, '.brand') AS Title, DOM_FIRST_TEXT(DOM, '.titlecon') AS Memo, DOM_FIRST_TEXT(DOM, '.pbox_price') AS Price, DOM_FIRST_TEXT(DOM, '#wrap_con') AS Parameters FROM LOAD_OUT_PAGES_IGNORE_URL_QUERY('https://www.mia.com/formulas.html', '*:expr(width>=250 && width<=260 && height>=360 && height<=370 && sibling>30 ) a', 1, 20);
The SQL above visits an index page in mia.com, download detail pages and then extract data from them.
You can clone a copy of Pulsar code and run the SQLs yourself, or run them from our online demo.
Use the customized Metabase to write X-SQLs and turn Web sites into tables and charts immediately. Everyone in your company can ask questions and learn from WEB DATA now, for the first time.
Build & Run
You can skip this step, in such case, all data will lose after pulsar shutdown. Ubuntu/Debian:
sudo apt-get install mongodb
Build from source
git clone https://github.com/platonai/pulsar.git cd pulsar && mvn -Pthird -Pplugins
Start pulsar server
Web console http://localhost:8082 is already open in your browser now, enjoy playing with X-SQL.
Metabase is the easy, open source way for everyone in your company to ask questions and learn from data. With X-SQL support, everyone can organize knowledge not just from the company's internal data, but also from the WWW.
git clone https://github.com/platonai/metabase.git cd metabase bin/build && bin/start
Pulsar Enterprise Edition supports Auto Web Mining: unsupervised machine learning, no rules or training required, turn Web sites into tables automatically. Here are some examples: Auto Web Mining Examples