NOTE: Due to issues with parallelism in the Python API, this script eats a lot of RAM and is not production-ready (hopefully it will be soon).
Rank-1 Dictionary Learning in PyFlink, featured in Implementing dictionary learning in Apache Flink, Or: How I learned to relax and love iterations.
If you like Java or you want something that is probably more stable and doesn't use the incomplete Flink Python API, an implementation is available at quinngroup/flink-r1dl.
Use with the
new-iterations-with-multiops branch of GEOFBOT/flink (some binaries available here) which has the bulk iterations implementation and tweaks needed to run this script.
.../pyflink2.sh R1DL_Flink.py without extra arguments for usage information.
Input files are made up of rows of whitespace-separated numbers (whitespace separating the columns).