Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_parquet failed with Internal Error #6855

Closed
anmyachev opened this issue Jan 12, 2024 · 0 comments · Fixed by #6874
Closed

read_parquet failed with Internal Error #6855

anmyachev opened this issue Jan 12, 2024 · 0 comments · Fixed by #6874
Labels
bug 🦗 Something isn't working P1 Important tasks that we should complete soon pandas.io

Comments

@anmyachev
Copy link
Collaborator

Modin version: 45ad9de

Reproducer:

from time import time
import modin.pandas as pd
from modin.utils import execute
import numpy as np


df = pd.DataFrame(np.random.rand(10**5, 10**2))


df.to_parquet("test_parquet_usual.parquet")
df = pd.read_parquet("test_parquet_usual.parquet")

Traceback:

Traceback (most recent call last):
  File "C:\projects\modin\test_parquet_glob.py", line 11, in <module>
    df = pd.read_parquet("test_parquet_usual.parquet")
  File "C:\projects\modin\modin\utils.py", line 484, in wrapped
    return func(*params.args, **params.kwargs)
  File "C:\projects\modin\modin\logging\logger_decorator.py", line 129, in run_and_log
    return obj(*args, **kwargs)
  File "C:\projects\modin\modin\pandas\io.py", line 325, in read_parquet
    query_compiler=FactoryDispatcher.read_parquet(
  File "C:\projects\modin\modin\core\execution\dispatching\factories\dispatcher.py", line 187, in read_parquet
    return cls.get_factory()._read_parquet(**kwargs)
  File "C:\projects\modin\modin\core\execution\dispatching\factories\factories.py", line 212, in _read_parquet
    return cls.io_cls.read_parquet(**kwargs)
  File "C:\projects\modin\modin\logging\logger_decorator.py", line 129, in run_and_log
    return obj(*args, **kwargs)
  File "C:\projects\modin\modin\core\io\file_dispatcher.py", line 164, in read
    if not AsyncReadMode.get() and hasattr(query_compiler, "dtypes"):
  File "C:\projects\modin\modin\core\storage_formats\pandas\query_compiler.py", line 321, in dtypes
    return self._modin_frame.dtypes
  File "C:\projects\modin\modin\core\dataframe\pandas\dataframe\dataframe.py", line 383, in dtypes
    dtypes = self._dtypes.get()
  File "C:\projects\modin\modin\core\dataframe\pandas\metadata\dtypes.py", line 830, in get
    self._value = self._value.to_series()
  File "C:\projects\modin\modin\core\dataframe\pandas\metadata\dtypes.py", line 436, in to_series
    self.materialize()
  File "C:\projects\modin\modin\core\dataframe\pandas\metadata\dtypes.py", line 402, in materialize
    self._materialize_cols_with_unknown_dtypes()
  File "C:\projects\modin\modin\core\dataframe\pandas\metadata\dtypes.py", line 388, in _materialize_cols_with_unknown_dtypes
    self._known_dtypes.update(self._parent_df._compute_dtypes(subset))
  File "C:\projects\modin\modin\logging\logger_decorator.py", line 129, in run_and_log
    return obj(*args, **kwargs)
  File "C:\projects\modin\modin\core\dataframe\pandas\dataframe\dataframe.py", line 436, in _compute_dtypes
    obj.tree_reduce(0, lambda df: df.dtypes, dtype_builder)
  File "C:\projects\modin\modin\logging\logger_decorator.py", line 129, in run_and_log
    return obj(*args, **kwargs)
  File "C:\projects\modin\modin\core\dataframe\pandas\dataframe\utils.py", line 501, in run_f_on_minimally_updated_metadata
    result = f(self, *args, **kwargs)
  File "C:\projects\modin\modin\core\dataframe\pandas\dataframe\dataframe.py", line 4062, in to_pandas
    ErrorMessage.catch_bugs_and_request_email(
  File "C:\projects\modin\modin\error_message.py", line 81, in catch_bugs_and_request_email
    raise Exception(
Exception: Internal Error. Please visit https://github.com/modin-project/modin/issues to file an issue with the traceback and the command that caused this error. If you can't file a GitHub issue, please email bug_reports@modin.org.
Internal and external indices on axis 1 do not match.
@anmyachev anmyachev added bug 🦗 Something isn't working pandas.io P1 Important tasks that we should complete soon labels Jan 12, 2024
anmyachev added a commit to anmyachev/modin that referenced this issue Jan 22, 2024
…umns for pyarrow engine

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
anmyachev added a commit to anmyachev/modin that referenced this issue Jan 22, 2024
…umns for pyarrow engine

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
anmyachev added a commit to anmyachev/modin that referenced this issue Jan 22, 2024
…umns for pyarrow engine

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
YarShev pushed a commit that referenced this issue Jan 23, 2024
…row engine (#6874)

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P1 Important tasks that we should complete soon pandas.io
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant