-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re-worked cutoff_error parameter for Automatic linestrength cutoff #646
base: develop
Are you sure you want to change the base?
Conversation
Had accidentally changed the docstring in my previous commit; changed it back to raw string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to me but I would like @erwanp feedback. We are also waiting for Travis to be back for automatic tests
Had accidentally changed the docstring in my previous commit; changed it back to raw string
Rebased on latest develop version (with fixed tests), and triggering the test suite to test this PR |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## develop #646 +/- ##
===========================================
- Coverage 73.16% 72.95% -0.22%
===========================================
Files 137 148 +11
Lines 18864 21230 +2366
===========================================
+ Hits 13802 15488 +1686
- Misses 5062 5742 +680 |
I've tried fixing the code so it currently supports Pandas DataFrames. What would be the best approach for supporting Vaex, as it doesn't seem to have a |
If we convert to Numpy we will loose all the performance benefit of Vaex. We use Vaex when arrays cannot be loaded in RAM. There is no straightforward solution. You could try to benchmark a few things on the side; for instance creating a function in radis.misc.arrays to find a cumsum treshold on a >1e10 Vaex array. Make sure that the array doesn't fit in memory, to make sure you are dealing with the worst case scenario. Then; it is probably possible to use vaex to sort and do a bining; and use Pandas cumsum() the binned array (which will clearly reduce the memory usage). |
I chose 100000 as the limit on length of a df to be processed in memory, which I tested on a dataset with > 3e9 rows, but might be worth testing further. |
Seems good. Could you provide an example that raises a warning? |
For an example that both adjusts the cutoff and raises the
or set |
Description
Addresses issue #268 building on PR #451.
I adjusted some of the docstrings and printed messages to be a bit more informative by clarifying what the function does with the input arguments, i.e. that the user-input
cutoff
is being adjusted to conform to the user-inputcutoff_error
.I chose to stick to pandas methods for simplicity rather than using numpy. But is compatibility with vaex dataframes also sought in this issue?