-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking forward to full description of fst format #3
Comments
Hi @xiaodaigh, thanks! Yes I definitely need to spent time on documenting the format and perhaps more importantly, the It's not complicated, but the API will grow as computational features are added (which will run in parallel with the file IO). Providing for methods that can only be run on the master thread (such as Just a question, why would you prefer a native implementation in Is there an example package in Julia which could be used to model a native binding, a package using a simple |
Anyway, the first thing I would do is to use Cxx.jl to call into fstlib. But I might experiment with a pure Julia implementation at some point given the fst format is stable. Julia has some low-level control as well but not good multi-threading at the moment. I think fstlib is good for scripting languages like Julia, R, and Python so it would be nice to actually write it in a scripting language as well. Given the format is stable, a pure Julia implementation will allow Julia programmers to contribute, not just those with C++ knowledge. But it's overall better to have all resources contribute to one library, in this case a C++ one in fstlib; I wish I know enough C++ to contribute. Learning... Once the multi-threading story is better in Julia and there is better interop between R-Julia and Python-Julia, then you may be tempted to switch to Julia as well as the syntax is nice and simple, and it can be as fast as C/C++ in many cases. |
Hi @xiaodaigh,thanks, I think using I would be very interested in trying to set up a |
In general, is there a chance that fstlib might expose a pure C API, not a C++? That would make integration in other languages a lot easier. E.g. for julia, Cxx.jl is great, but at this point installation is so tricky that it is really not an option for a widely used package. On the other hand, if fstlib just exposed a C API, one could integrate is super easily into julia. |
Hi @davidanthoff, thanks for your question. Basically, the For a full implementation of
These are all abstract classes which would need an implementation based on the The reason for that is that Perhaps when you have a basic setup, I could assist you in implementing the abstract classes for |
Basically for a little bit of context, we have shown via benchmarking that fst has the fastest read/write speed in the Julia/R/Python-verse. Parquet and R's serialization are the only other major one we haven't tested. So I would be extremely to keen to be able to use fst in Julia. |
To be fair, you didn’t measure Feather perf with the R or Python packages, those might be faster than the Julia implementation (or not, who knows). |
Hi @xiaodaigh and @davidanthoff, that's great to hear. It would be nice to compare the various serialization options with a wide range of parameters. For example, for
Testing many systems is very labor intensive, but it would be very interesting to set up a benchmark that uses generated samples with various characteristics:
that way we could really learn about the strong and weak points of different serializers and how they relate to each other. Are your benchmarks published somewhere (or do you have plans for that) ? thanks! |
Obviously that is going to be a lot of work. I think ultimately we can set up a website where people can submit benchmarks from their system via running some Julia and/or R code. For now I am slowly adding benchmarking codes to the DataBench.jl repo. |
Hi @davidanthoff, on your question about a
After milestone 2, we know that we can call the
Would that be doable? If any special code is necessary to accommodate the |
Milestone 1 can be easily achieved see https://github.com/JuliaInterop/CxxWrap.jl I don't know anything about C++ and that's the issue. I want to help here, but I traced the code to What would help is someone familiar with C++ to do this, but if it's me, I need some speficif directions on how to compile fstlib into a #include "jlcxx/jlcxx.hpp"
JLCXX_MODULE define_julia_module(jlcxx::Module& mod)
{
mod.method("greet", &greet);
} |
I know it's going to be a bit of work, but a full-description of the fst format will help build connectors into it. From Julia, Python, and any other programming language. The potential is huge for such an awesome on-disk data manipulation framework!
I will try to help when I know enough C++. I secretly hope that once the format is well known, there can be independent implementation in Julia and Rust (at the risk of running out of sync with C++) but native implementations would be fun. But calling into C++ is also a good option.
The text was updated successfully, but these errors were encountered: