Data science has been the killer application for languages such as Python and R for some time now. Having a comprehensive data analysis library that is idiomatic and uses Raku's unique features would really contribute to its popularity and usability.
Currently there are some partial statistics libraries, as well as built-in facilities like map/reduce, including parallel versions. But we need to take this as far as Pandas, if possible.
- Use of specific data strutures such as data frames for processing.
- Reading from a wide spectrum of formats, from CSV and JSON to highly specific statistics file formats.
- Handling missing data automatically in data sets.
- Powerful matrix and data frame processing and transformation, merging and joining.
- Processing of time series
- Version 0.1 with limited functionality released to the ecosystem.
- Competitive speed compared with Pandas or other implementations.
Required or prefered skills the student should have to be able to tackle this project.
- Some experience with Raku is appreciated, but not really needed. Will to learn will be a requisite.
- Experience in C to be able to use NativeCall for C interfacing.
Medium.
- JJ Merelo (jjmerelo@gmail.com, GitHub), jmerelo on Freenode.