New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending F# Slicing Syntax for NDArrays (like Numpy) #701

Open
moloneymb opened this Issue Oct 3, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@moloneymb

moloneymb commented Oct 3, 2018

Extending F# Slicing Syntax for NDArrays (like Numpy)

This proposal is based on the assumption that slicing of multidimensional arrays (NDArrays) is an important part of being productive with numerics and that the Python Numpy is a good standard from which to draw from. The need for a more extensive slicing syntax becomes more acute when working with NDArrays. This proposal is for additional notation that will be useful in the creation of a future NDArray library.

Example 1: Remove border, rotate image, sub-sample, select specific color channels, and add a 'batch' axis.

Numpy

rgb_small = bgra[np.newaxis,10:-10:-2,10:-10:-2,[2,1,0]]

FSharp

rgb_small =
    bgra[10..-10, 10..-10,*]
    |> NDArray.rev(axis=0)
    |> NDArray.rev(axis=1)
    |> NDArray.subsample([2,2,1])
    |> NDArray.gather(axis=2,indexes=[2,1,0])
    |> NDArray.newAxis(0)

NOTE: This assumes the NDArray library already supports negative indexing as per issue #358

Problem: Step Sizes in indices (see also #518)
Proposed Solution: Seconding Prash's proposal of using the seq generator syntax of start .. step .. finish.

rgb_small =
    bgra[10..-2..-10, 10..-2..-10,*]
    |> NDArray.gather(axis=2,indexes=[2,1,0])
    |> NDArray.newAxis(0)

Problem: Getting slices along an axis by a list of indexes
Proposed Solution: Possibly add List overloads alongside Option. In Python the use of a list of indexes is a subset of Advanced Indexing.

    rgb_small =
        bgra[10..-2..-10, 10..-2..-10,[2,1,0]]
        |> NDArray.newAxis(0)

Problem: Using slicing syntax to insert a new axis
Proposed Solution: Perhaps a single case class called NewAxis

    rgb_small = bgra[NewAxis,10..-2..-10, 10..-2,..-10,[2,1,0]]

There is a case to be made for supporting Advanced Indexing which would need to take an NDArray or NDArray in place of Option. Given that F# has better support for higher level functions than Python the need Advanced Indexing may not be as strong. Feedback form an informal poll of numerous F# users was that this is not important to them. I believe it's something to consider alongside the previous suggestion of List.

There is also a case to be made for supporting Numpy syntax ':' and '::' instead of the FSharp '*', '..' and '.. ..'. Again, feedback from an informal poll of F# users found that this was not important to them either. I think the Numpy syntax would be friendlier to non F# users. I believe this is something to consider.

Pros and Cons

The Pros: Slicing is a common operation Data Science and with the growth of the Data Science field it's reasonable to expect it will become increasingly common. Numpy is a de-facto standard and I believe having F# slicing reflect Numpy's capability and syntax will go a long way to make using F# more familiar to Numpy users.

The Cons: It is an increase in complexity to the language. Hopefully this should be restricted to a narrow domain.

Extra information

Estimated cost (XS, S, M, L, XL, XXL): L

Related suggestions: (put links to related suggestions here)
More "python-like" functionality for List/Array slice syntax

Affidavit (please submit!)

Please tick this by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • This is not a breaking change to the F# language design*

  • I or my company would be willing to help implement and/or test this

  • Depending on implementation. I presume any implementation would be additive.

@cartermp

This comment has been minimized.

Show comment
Hide comment
@cartermp

cartermp Oct 4, 2018

Member

Something to mention is that C# will eventually implementing slices and ranges to work with CoreFX types System.Index and System.Range. These will not have the same upper bound rules as F# does, and so there will be work we'll do (regardless of this issue) to ensure we can interoperate.

I think it's worth keeping in mind that we may need to offset things by one (depending on direction) for interop purposes. Since ML.NET is only going to increase in importance for .NET, it will be critical to get this sort of interop correct.

Member

cartermp commented Oct 4, 2018

Something to mention is that C# will eventually implementing slices and ranges to work with CoreFX types System.Index and System.Range. These will not have the same upper bound rules as F# does, and so there will be work we'll do (regardless of this issue) to ensure we can interoperate.

I think it's worth keeping in mind that we may need to offset things by one (depending on direction) for interop purposes. Since ML.NET is only going to increase in importance for .NET, it will be critical to get this sort of interop correct.

@moloneymb

This comment has been minimized.

Show comment
Hide comment
@moloneymb

moloneymb Oct 4, 2018

@cartermp Thanks for letting me know. Looking at the CoreFX project issues it seems as if they are already aware of the importance of Slicing and Data Frames. For me Data Frames are not as important, but I understand that they are important to others.

So it's great and I'm happy that MS is looking hard at this. My concern is if F# needs to wait for C# and ML.Net to solidify these features we may be waiting a very long time. But I'll gladly take what I can get.

moloneymb commented Oct 4, 2018

@cartermp Thanks for letting me know. Looking at the CoreFX project issues it seems as if they are already aware of the importance of Slicing and Data Frames. For me Data Frames are not as important, but I understand that they are important to others.

So it's great and I'm happy that MS is looking hard at this. My concern is if F# needs to wait for C# and ML.Net to solidify these features we may be waiting a very long time. But I'll gladly take what I can get.

@cartermp

This comment has been minimized.

Show comment
Hide comment
@cartermp

cartermp Oct 4, 2018

Member

Yep, this is more informational than anything else. We don't quite know what the shape of things will be yet, but we do know about these constraints and their eventuality, so we'll just have to make sure that what is designed doesn't become too burdensome later.

Member

cartermp commented Oct 4, 2018

Yep, this is more informational than anything else. We don't quite know what the shape of things will be yet, but we do know about these constraints and their eventuality, so we'll just have to make sure that what is designed doesn't become too burdensome later.

@matthewcrews

This comment has been minimized.

Show comment
Hide comment
@matthewcrews

matthewcrews Oct 5, 2018

I would love this feature and I would love a full fledged DataFrame library. The lack of a DataFrame with the associated slicing features is what keeps me using R for some of my analysis work. The data.table library for R by Matt Dowle has no real competitor. Pandas and dplyr may have wider usage but if you are dealing with huge data sets and performance is critical, you are not going to beat data.table.

Since this behavior is going to be different than the current slicing index of .. and .. .., why not use : and :: per the proposal? This will alert people to the fact that this is not behaving the same and will free is from backwards comparability concerns. The terseness this provides for expressing data analysis is incredibly valuable and saves tremendous amounts of time.

I encourage people to look at the design of data.table and Pandas. I believe adopting industry standard indexing syntax could ease peoples transition from other languages. These are just my thoughts though as a long time R user and now an F# developer. I am sure there are many details I am glossing over.

matthewcrews commented Oct 5, 2018

I would love this feature and I would love a full fledged DataFrame library. The lack of a DataFrame with the associated slicing features is what keeps me using R for some of my analysis work. The data.table library for R by Matt Dowle has no real competitor. Pandas and dplyr may have wider usage but if you are dealing with huge data sets and performance is critical, you are not going to beat data.table.

Since this behavior is going to be different than the current slicing index of .. and .. .., why not use : and :: per the proposal? This will alert people to the fact that this is not behaving the same and will free is from backwards comparability concerns. The terseness this provides for expressing data analysis is incredibly valuable and saves tremendous amounts of time.

I encourage people to look at the design of data.table and Pandas. I believe adopting industry standard indexing syntax could ease peoples transition from other languages. These are just my thoughts though as a long time R user and now an F# developer. I am sure there are many details I am glossing over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment