-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
Description
Describe the bug
I am using sklearn.linear_model.QuantileRegressor for a dataset with ~2.9 million datapoints.
When I use it as follows, I get a memory error.
MemoryError: Unable to allocate 61.6 TiB for an array with shape (2909376, 2909376) and data type float64
When I do the same with statsmodel library, I don't encounter any issues.
Would this be a known limitation of the sklarn implementation? What would be the biggest size of dataframe it can take? Or is it a bug?
I was choosing to do it in sklearn rather than statsmodel because the sklearn API provides ability to add regularization which the other library doesn't.
Steps/Code to Reproduce
from sklearn.linear_model import QuantileRegressor
reg = QuantileRegressor(quantile=0.8).fit(X, y)
Expected Results
The data is fit without errors.
Actual Results
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
/tmp/ipykernel_53907/1165686720.py in <module>
1 from sklearn.linear_model import QuantileRegressor
2
----> 3 reg = QuantileRegressor(quantile=0.8).fit(X, y)
4 print_analysis(reg, X, y, flyte_type)
/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/_quantile.py in fit(self, X, y, sample_weight)
239 -np.ones((n_mask, 1)),
240 -X[mask],
--> 241 np.eye(n_mask),
242 -np.eye(n_mask),
243 ],
/anaconda3/lib/python3.7/site-packages/numpy/lib/twodim_base.py in eye(N, M, k, dtype, order)
197 if M is None:
198 M = N
--> 199 m = zeros((N, M), dtype=dtype, order=order)
200 if k >= M:
201 return m
MemoryError: Unable to allocate 61.6 TiB for an array with shape (2909376, 2909376) and data type float64
Versions
System:
python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0]
executable: /anaconda3/bin/python
machine: Linux-4.14.68-4.14.3-amd64-3adf3675665129fa-x86_64-with-debian-stretch-sid
Python dependencies:
pip: 21.2.2
setuptools: 58.0.4
sklearn: 1.0.1
numpy: 1.19.2
scipy: 1.5.2
Cython: 0.29.25
pandas: 1.2.4
matplotlib: 3.5.0
joblib: 1.1.0
threadpoolctl: 2.2.0
Built with OpenMP: True
and-kulmarkmbaum