Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Miner is a tool that allows processing and analyzing data stored in various formats. It is supposed to be a powerful, extensible and user friendly tool.
The initial idea behind miner was to develop convenient and efficient framework for log data analysis. With time support for many data formats was added and miner got an ability to connect to relational databases (NoSQL databases in nearest future) but the goal remained the same - be a simple yet powerful tool.
Miner can be run in interactive console mode with context based TAB completions or in a script batch mode.
The main source of help information on miner commands is built-in HELP or context aware F1.
Is miner good for me?
If you need to analyze data and want to do it in easy and efficient way - then miner may be a tool for. If you have various data stored in different formats and/or databases then miner is probably the tool for you.
But I want to mention few things:
- Miner is not a database - It may read data, export data, query other databases, but doesn't manage data by itself.
- Miner is pretty good in log file analysis, it even has its own map-reduce implementation. But it doesn't pretend to be a big data engine (yet). So for medium and small data (less than 100GB) miner can do pretty good work for you. For larger data sets consider using other choices.
- Miner is written in python, although you can start using it without minimal python knowledge, though in order to get maximum from miner you'll probably need to learn some basic stuff.
How miner works?
Miner engine performs pipeline data processing (see Miner Query Language). It reads data from one of supported data sources, applies on it chain of data mining commands like SELECT, SORT, AGGREGATE, GROUP BY etc. and then dumps output to the data target. The output data target can be another data source (e.g. database) or can be also a report target (e.g. excel file).
IMHO, pipeline processing is more intuitive than building relational queries and more powerful than accessing NoSQL data stores. Using miner processing model you can easily transfer data between supported data sources. Adding new data sources is easy and straightforward.
Anyhow miner compile queries into the python code which is executed in current environment. This means that you can use in your query external data and modules.