Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data frame/data table structure and associated module-capabilities (similar to Deedle) to the core data structures. #704

Closed
5 tasks done
siavash-babaei opened this issue Oct 29, 2018 · 4 comments

Comments

@siavash-babaei
Copy link

Add data frame/data table and associated module-capabilities (similar to Deedle) to the core data structures.

I propose we add optimised-for-big-data data frame/data table structures and associated module-capabilities (similar to Deedle) to the core data structures. They are the prevalent data structure to hold either input or output in the field of data science, and very natural and convenient to work with.

The existing way of approaching this problem in F# is through Deedle. As an example, I experience a very strange problem with the Deedle data frame and conversion to float, in an expression involving Array.Parallel.map. The odd thing was that code ran perfectly fine as an interactive script but would cryptically not compile and run as a source file.

Pros and Cons

The advantages of making this adjustment to F# is to make it a more capable and well-rounded tool for data science tasks. I believe F# has excellent potential and unique natural suitability for such a hot and ever-growing field. This would certainly be a good push in the right direction.

I see no disadvantages in making this adjustment to F#.

Please tick this by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

  • This is not a breaking change to the F# language design
  • I or my company would be willing to help implement and/or test this
@cartermp
Copy link
Member

cartermp commented Nov 8, 2018

@sashababaei Could you provide some hypothetical examples (with code samples) of how this might look? Mostly interested in the programming model that you're interested in.

@siavash-babaei
Copy link
Author

siavash-babaei commented Dec 6, 2018

@cartermp : So sorry about such a long delay (illness, unfortunately).

I suppose the following page provides comprehensive info on the data frame concept in F# via the Deedle package:
https://fslab.org/Deedle/frame.html
Their time series structure is also a very good idea to implement:
https://fslab.org/Deedle/series.html
Frankly, I think the entire Deedle ecosystem should be optimised for big data and incorporated into the core language. The package is not that big. They had an idea for BigDeedle but I think they gave up on it a long while ago:
https://github.com/fslaborg/Deedle.BigDemo

@dsyme
Copy link
Collaborator

dsyme commented Feb 26, 2019

@sashababaei To save confusion I edited the links in your comment to point to the new home of Deedle, https://fslab.org/Deedle/

@siavash-babaei
Copy link
Author

It seems Microsoft itself is going to introduce the dataframe concept ... so I guess, that is that!!!

https://devblogs.microsoft.com/dotnet/an-introduction-to-dataframe/
https://github.com/dotnet/corefxlab/tree/master/src/Microsoft.Data.Analysis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants