Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify and enhance descriptive statistics (and more) #29663

Open
kcrisman opened this issue May 8, 2020 · 6 comments
Open

Clarify and enhance descriptive statistics (and more) #29663

kcrisman opened this issue May 8, 2020 · 6 comments

Comments

@kcrisman
Copy link
Member

kcrisman commented May 8, 2020

We have some basic statistics functionality in sage stats for some descriptive statistics. Unfortunately, it is really basic.

This ticket is for clarifying the relationship of that material to the Sage probability distributions, histogram, Scipy, GSL, and other libraries - perhaps including pandas, though this is not (yet) standard in Sage.

  • Ideally there would be interfaces to the best native Python functionality rather than something specific to Sage (though that may not be possible).
  • There may be a tutorial page in the (reference manual) documentation for demonstrating best practices.
  • There could be a more education-oriented tutorial elsewhere, along the lines of the PREP Quickstart but more comprehensive.
  • As noted at Deprecate sage.stats.basic_stats #29662, Python 3 has a stats module, though presumably that module can't handle (say) the mean of several Integers or even stranger objects, as-is.

If all of those generate interest, this ticket would be converted to a metaticket to keep track of them.

Depends on #29662

CC: @NathanDunfield

Component: statistics

Issue created by migration from https://trac.sagemath.org/ticket/29663

@dimpase
Copy link
Member

dimpase commented May 8, 2020

Dependencies: #29662

@kcrisman

This comment has been minimized.

@NathanDunfield
Copy link
Contributor

comment:3

I use pandas pretty heavily from within Sage (Python 2.7 version). The only problem I encounter has to do with pandas not recognizing Sage's Integer as an integer. Assuming one has the standard preparser on, you have to do things like:

dataframe.loc[int(100)]
dataframe.apply(some_function, axis=int(1))

to keep it happy.

@kcrisman
Copy link
Member Author

kcrisman commented May 8, 2020

comment:4

I use pandas pretty heavily from within Sage (Python 2.7 version).

Hmm, yeah that is exactly the kind of problem I expected (brian had some similar issues iirc). I assume you pip install it, not included in our Python from the get-go, right?

@NathanDunfield
Copy link
Contributor

comment:5

Replying to @kcrisman:

I assume you pip install it, not included in our Python from the get-go, right?

Yes, I just use pip install which has always worked smoothly (though it takes a bit of time to compile). The main dependency is just a reasonably recent version of numpy which of course Sage has.

@sheerluck
Copy link
Contributor

comment:6

Replying to @NathanDunfield:

pandas not recognizing Sage's Integer as an integer.

I added

from sage.rings.integer import Integer
if type(key) is Integer:
    ...

to pandas/core/indexes/{base,range}.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants