Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Extending F# Slicing Syntax for NDArrays (like Numpy) #701
Extending F# Slicing Syntax for NDArrays (like Numpy)
This proposal is based on the assumption that slicing of multidimensional arrays (NDArrays) is an important part of being productive with numerics and that the Python Numpy is a good standard from which to draw from. The need for a more extensive slicing syntax becomes more acute when working with NDArrays. This proposal is for additional notation that will be useful in the creation of a future NDArray library.
Example 1: Remove border, rotate image, sub-sample, select specific color channels, and add a 'batch' axis.
rgb_small = bgra[np.newaxis,10:-10:-2,10:-10:-2,[2,1,0]]
rgb_small = bgra[10..-10, 10..-10,*] |> NDArray.rev(axis=0) |> NDArray.rev(axis=1) |> NDArray.subsample([2,2,1]) |> NDArray.gather(axis=2,indexes=[2,1,0]) |> NDArray.newAxis(0)
NOTE: This assumes the NDArray library already supports negative indexing as per issue #358
Problem: Step Sizes in indices (see also #518)
rgb_small = bgra[10..-2..-10, 10..-2..-10,*] |> NDArray.gather(axis=2,indexes=[2,1,0]) |> NDArray.newAxis(0)
Problem: Getting slices along an axis by a list of indexes
rgb_small = bgra[10..-2..-10, 10..-2..-10,[2,1,0]] |> NDArray.newAxis(0)
Problem: Using slicing syntax to insert a new axis
rgb_small = bgra[NewAxis,10..-2..-10, 10..-2,..-10,[2,1,0]]
There is a case to be made for supporting Advanced Indexing which would need to take an NDArray or NDArray in place of Option. Given that F# has better support for higher level functions than Python the need Advanced Indexing may not be as strong. Feedback form an informal poll of numerous F# users was that this is not important to them. I believe it's something to consider alongside the previous suggestion of List.
There is also a case to be made for supporting Numpy syntax ':' and '::' instead of the FSharp '*', '..' and '.. ..'. Again, feedback from an informal poll of F# users found that this was not important to them either. I think the Numpy syntax would be friendlier to non F# users. I believe this is something to consider.
Pros and Cons
The Pros: Slicing is a common operation Data Science and with the growth of the Data Science field it's reasonable to expect it will become increasingly common. Numpy is a de-facto standard and I believe having F# slicing reflect Numpy's capability and syntax will go a long way to make using F# more familiar to Numpy users.
The Cons: It is an increase in complexity to the language. Hopefully this should be restricted to a narrow domain.
Estimated cost (XS, S, M, L, XL, XXL): L
Related suggestions: (put links to related suggestions here)
Affidavit (please submit!)
Please tick this by placing a cross in the box:
Please tick all that apply:
referenced this issue
Oct 3, 2018
Something to mention is that C# will eventually implementing slices and ranges to work with CoreFX types System.Index and System.Range. These will not have the same upper bound rules as F# does, and so there will be work we'll do (regardless of this issue) to ensure we can interoperate.
I think it's worth keeping in mind that we may need to offset things by one (depending on direction) for interop purposes. Since ML.NET is only going to increase in importance for .NET, it will be critical to get this sort of interop correct.
@cartermp Thanks for letting me know. Looking at the CoreFX project issues it seems as if they are already aware of the importance of Slicing and Data Frames. For me Data Frames are not as important, but I understand that they are important to others.
So it's great and I'm happy that MS is looking hard at this. My concern is if F# needs to wait for C# and ML.Net to solidify these features we may be waiting a very long time. But I'll gladly take what I can get.
Yep, this is more informational than anything else. We don't quite know what the shape of things will be yet, but we do know about these constraints and their eventuality, so we'll just have to make sure that what is designed doesn't become too burdensome later.
I would love this feature and I would love a full fledged DataFrame library. The lack of a DataFrame with the associated slicing features is what keeps me using R for some of my analysis work. The
Since this behavior is going to be different than the current slicing index of
I encourage people to look at the design of