New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development roadmap? #111

Open
ranchoiver opened this Issue Oct 19, 2018 · 13 comments

Comments

Projects
None yet
5 participants
@ranchoiver
Copy link

ranchoiver commented Oct 19, 2018

I assume this is a prototype, many things still have to be implemented (and / or researched), so the system will change over time.
Do you have plans to make noria production-ready?
Is there a publicly accessible development roadmap or feature plan people could follow?

@ms705

This comment has been minimized.

Copy link
Collaborator

ms705 commented Oct 19, 2018

You're right that the current version of Noria is a research prototype. However, it's definitely read to try out: we've manage to run some real web applications on Noria with minimal modification.

The best approximation of a development roadmap is probably the GitHub issues. Our research going forward will primarily focus on further improved distributed operation in the short term, although we're also exploring stronger consistency models and some offshoot ideas related to web application security.

For production use, Noria might need:

  1. Improvements to return more helpful errors when Noria doesn't support a query yet (#98, nom-sql, #36).
  2. Better fault-tolerance and high-availability support: client failover (#105) and rebuilding only failed shards (rather than entire operators).
  3. Better resharding/shuffles (#95), so that it can support upqueries across shuffles in the data-flow.

We're actively working on 2. and 3. as part of our scalability work, and hope to fix 1. as well.

We also plan to keep the versions released to crates.io stable, and will use semantic versioning when we make breaking changes.

Noria primarily remains a research project, but we are keen to support people who want to use it for real applications. If you have a use case that you'd like us to consider, do let us know!

@ranchoiver

This comment has been minimized.

Copy link

ranchoiver commented Oct 22, 2018

focus on further improved distributed operation in the short term, although we're also exploring stronger consistency models and some offshoot ideas related to web application security.

Sounds exciting!

Our use-case is aggregating over semi-large amounts of data (10 - 20 million rows in a table) in MySQL and getting last value from each group where timestamp is < (less) than some time (like midnight of current day). Around 1000 - 10000 rows are added per hour.

Incrementally updated materialized view that uses this aggregation query would work for us, I guess.

MySQL materialized views make this really difficult to achieve. Flexviews could be a solution, but we decided against it.

The incoming data may be out of order, and sometimes we need to take historical data into account, so it's difficult to use time windows for grouping. We also do several joins with other tables. These are some of the reasons why we didn't choose Spark, Kafka Streams or other streaming framework. Operationalization complexity / costs is another reason.

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Oct 22, 2018

@ranchoiver I think that sounds like an excellent use-case for Noria! The one thing that we don't quite support yet is "rolling" time windows, which it sounds like you need. Specifically, you need a query with a filter that has a time-variant parameter. This would require the materialized view to change even if there are no writes to it, which is not something we currently support. It is definitely on our radar though, because it's also something that many other applications need!

@ranchoiver

This comment has been minimized.

Copy link

ranchoiver commented Oct 22, 2018

This would require the materialized view to change even if there are no writes to it, which is not something we currently support. It is definitely on our radar though, because it's also something that many other applications need!

Yes, exactly.
Gonna follow the news

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Nov 23, 2018

As an aside, noria-server probably won't be on crates.io until rust-lang/cargo#1565 is solved (which may be a while).

@mjjansen

This comment has been minimized.

Copy link

mjjansen commented Nov 26, 2018

@jonhoo a couple other features I'd be curious about:

  • push notifications (when my view changes)
  • UDF
  • retrieve results in apache arrow (have this be the result of my materialized view request)
@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Nov 26, 2018

@mjjansen

  • Push notifications (basically, pushing parts of the data-flow to the client) is something that's definitely on our radar, and was actually one of the motivations for using data-flow in the first place. Data-flow is so amenable to distribution that in theory this should just be a matter of moving some of the data-flow nodes to a client machine. In practice it gets a little more tricky though. We don't have an implementation of it currently, and it's not at the top of our roadmap, but it is a feature we'd love to see!
  • The code is very much built around the idea that eventually we'll have UDFs. We have to narrow down a bit more what exact contract operators need to abide by first though. If you look at the Ingredient trait, that is basically what's required to implement your own operator, but there's a lot of subtlety in it at the moment that we'd want to resolve before exposing it to end-users.
  • That's a neat idea! I hadn't seen Apache Arrow before, but seems like a good candidate for a data egress format (cc @ms705)!
@mjjansen

This comment has been minimized.

Copy link

mjjansen commented Nov 28, 2018

@jonhoo 1 more question... did you consider https://github.com/andygrove/sqlparser-rs vs https://github.com/ms705/nom-sql. I wonder if the effort can be combined.

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Nov 28, 2018

That crate didn't exist when we first started building Noria :) Combining efforts is probably not a bad idea though! (cc @ms705)

@mjjansen

This comment has been minimized.

Copy link

mjjansen commented Nov 28, 2018

got it. thank you!

@3noch

This comment has been minimized.

Copy link

3noch commented Jan 4, 2019

👍 x 100 for push notifications (subscribing to queries). This would make noria not just a faster database than alternatives, but perfectly ideal for many applications that currently have to get this behavior manually with lots of error-prone work.

@3noch

This comment has been minimized.

Copy link

3noch commented Jan 4, 2019

Also having a Postgres adapter would be pretty amazing.

@jonhoo

This comment has been minimized.

Copy link
Contributor

jonhoo commented Jan 5, 2019

Hehe, yes, a Postgres adapter would be great, it just requires implementing the Postgres binary protocol in Rust similar to msql-srv. That's the bulk of the work. Once that's in place, the Noria SQL shim would just need to be able to run in both modes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment