New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Matomo work for Big Data (1 billion hits per month or more) #7526
Comments
Maybe I'm reading the docs wrong, but presto looks like a way to connect different data sources rather than data source itself. Ie, it says it connects to MySQL, Hadoop, Cassandra, etc. |
Yes for Very Big Data, the data could stored in HDFS (which scales) and Presto would read from HDFS (for example). |
I see, this looks like a potentially easy way to support different backends, then. |
I think it less about supporting different backends but:
and
So you can just need SQL skills to use it, it is fast, actually scalable and battle tested |
Hello, has it been tested already? I am using Presto, a little hard to setup, but when everything is setup, easy to use like a traditional SQL. The use case I see for Piwik is: 1- Use HTTP server access log instead of Piwik Javascript tracking Main issue, is Presto has only JDBC driver built-in ;-( |
At Piwik we haven't tried it yet but it would be awesome if someone tried to get it working and shared the gained knowledge. It would be interesting to see if it's possible to use with Piwik, and if this is the case we could do some performance tests. |
I will try ! With apache impala also . Keep you posted.
|
Awesome, looking forward to your results if it works :) |
is there any improvement about the issue? |
Another option would be to support clickhouse. It also has SQL support and is used by yandex as their main storage backend for webanalytics, thus has most features already implemented on needs regarding this task (and it has way less moving parts than a hadoop stack with hive!) |
Other options is a Amazon athena https://aws.amazon.com/athena/ Export data from database to csv, and put it in S3. Use Athena to query (also SQL support) |
Here are more suggestions: #2592 And be good when researching to look into details eg
If anyone has tried MyRocks be interested to learn more. |
There's also https://www.rondb.com/
|
Hey team and @mattab 👋 . We would love to help with this issue if it's a fit and if you can help us maintain. I've opened the original issue here. TLDR: We would love to help with an initial version of CloudQuery source plugin, if you can help us maintain it. It should solve this issue and enable your users to sync Matomo data to any database/datalake of the growing number of CQ destinations. See similar thing we did with Plausible API, Plausible Docs and discussion |
Presto database is a very interesting big data technology that may be a good candidate if/when we need to handle Very Big Data with Piwik.
https://prestodb.io/
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.
Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day.
Related to #2592, #4902, #1999
The text was updated successfully, but these errors were encountered: