Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support Python 3.11 #46680

Closed
EwoutH opened this issue Apr 7, 2022 · 11 comments · Fixed by MacPython/pandas-wheels#192
Closed

ENH: Support Python 3.11 #46680

EwoutH opened this issue Apr 7, 2022 · 11 comments · Fixed by MacPython/pandas-wheels#192
Assignees
Labels
Blocker Blocking issue or pull request for an upcoming release Compat pandas objects compatability with Numpy or Python functions Master Tracker High level tracker for similar issues Python 3.11
Milestone

Comments

@EwoutH
Copy link
Contributor

EwoutH commented Apr 7, 2022

Is your feature request related to a problem?

Currently Python 3.11 isn't fully supported.

Describe the solution you'd like

Full support for Python 3.11, including testing in CI and wheels published to PyPI.

Additional context

Python 3.11 is expected to be released as stable in October 2022, with many new features including:

  • PEP 657 -- Include Fine-Grained Error Locations in Tracebacks
  • PEP 654 -- Exception Groups and except*
  • PEP 673 -- Self Type
  • PEP 646-- Variadic Generics
  • PEP 680-- tomllib: Support for Parsing TOML in the Standard Library
  • PEP 675-- Arbitrary Literal String Type
  • PEP 655-- Marking individual TypedDict items as required or potentially-missing
  • bpo-46752-- Introduce task groups to asyncio
  • The Faster Cpython Project is already yielding some exciting results: this version of CPython 3.11 is ~ 19% faster on the geometric mean of the PyPerformance benchmarks, compared to 3.10.0.

Pandas is one of the most used packages and a package on which many other packages depend, so early support will help speed up Python 3.11 adoption. The last alpha release in the 3.11 series, Python 3.11.0a7, has been release earlier this week, and early May the first beta release will be published, after which no new features will be added.

I think it would be an amazing feat to have full Python 3.11 support in Pandas by the time Python 3.11.0 beta 1 gets released, (expected Friday May 6th, 2022). This includes testing in CI and publishing wheels to PyPI.

@EwoutH EwoutH added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 7, 2022
@lithomas1 lithomas1 added Compat pandas objects compatability with Numpy or Python functions Python 3.11 Master Tracker High level tracker for similar issues and removed Compat pandas objects compatability with Numpy or Python functions Needs Triage Issue that has not been reviewed by a pandas team member Enhancement labels Apr 7, 2022
@lithomas1
Copy link
Member

Hi @EwoutH,
Thanks for bringing this issue up. While we do try to have full compat and wheels up by the official release date(we had linux wheels up on release day for Python 3.10), it is unlikely that this will happen by beta 1. This is because there is still some work to do for our dependencies(namely numpy, and Cython) to support and test on Python 3.11.

The good news, though is that since the release of Python 3.10, the underlying infrastructure for testing/building wheels on new Pythons has been much improved, so adding testing/wheels should be easier this time around.

Do note, though, that wheels will only be uploaded to PyPI only after at least Python 3.11 candidate 1. This is because the ABI is only frozen after the RC, and we don't want already uploaded wheels to be broken because Python decided to change something during the beta period.

In the meantime, I've created a new Python 3.11 label and marked this issue as a Master Tracker issue to centralize comments/discussion here for now. Feel free to ping me if you have any more questions.

@EwoutH
Copy link
Contributor Author

EwoutH commented Apr 8, 2022

Thanks for the extensive response Thomas! Great to hear that the wheel infrastructure is in great shape.

To keep track of the dependencies:

@EwoutH
Copy link
Contributor Author

EwoutH commented May 23, 2022

I took a look at the Python 3.11 CI run in #47032, and the following errors, failures and warnings were listed:

Python 3.11 CI Errors
==================================== ERRORS ====================================
_______________ ERROR collecting pandas/tests/scalar/test_nat.py _______________
pandas/tests/scalar/test_nat.py:320: in <module>
    _get_overlap_public_nat_methods(Timestamp, True)
pandas/tests/scalar/test_nat.py:255: in _get_overlap_public_nat_methods
    overlap.sort()
E   TypeError: '<' not supported between instances of 'type' and 'type'
_______________ ERROR collecting pandas/tests/scalar/test_nat.py _______________
pandas/tests/scalar/test_nat.py:320: in <module>
    _get_overlap_public_nat_methods(Timestamp, True)
pandas/tests/scalar/test_nat.py:255: in _get_overlap_public_nat_methods
    overlap.sort()
E   TypeError: '<' not supported between instances of 'type' and 'type'
Python 3.11 CI Failures
=================================== FAILURES ===================================
_____________________ TestCategoricalAPI.test_set_ordered ______________________
[gw1] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python

self = <pandas.tests.arrays.categorical.test_api.TestCategoricalAPI object at 0x7f728648a950>

    def test_set_ordered(self):
    
        cat = Categorical(["a", "b", "c", "a"], ordered=True)
        cat2 = cat.as_unordered()
        assert not cat2.ordered
        cat2 = cat.as_ordered()
        assert cat2.ordered
        cat2.as_unordered(inplace=True)
        assert not cat2.ordered
        cat2.as_ordered(inplace=True)
        assert cat2.ordered
    
        assert cat2.set_ordered(True).ordered
        assert not cat2.set_ordered(False).ordered
        cat2.set_ordered(True, inplace=True)
        assert cat2.ordered
        cat2.set_ordered(False, inplace=True)
        assert not cat2.ordered
    
        # removed in 0.19.0
        msg = "can't set attribute"
        with pytest.raises(AttributeError, match=msg):
>           cat.ordered = True
E           AttributeError: property 'ordered' of 'Categorical' object has no setter

pandas/tests/arrays/categorical/test_api.py:58: AttributeError

During handling of the above exception, another exception occurred:

self = <pandas.tests.arrays.categorical.test_api.TestCategoricalAPI object at 0x7f728648a950>

    def test_set_ordered(self):
    
        cat = Categorical(["a", "b", "c", "a"], ordered=True)
        cat2 = cat.as_unordered()
        assert not cat2.ordered
        cat2 = cat.as_ordered()
        assert cat2.ordered
        cat2.as_unordered(inplace=True)
        assert not cat2.ordered
        cat2.as_ordered(inplace=True)
        assert cat2.ordered
    
        assert cat2.set_ordered(True).ordered
        assert not cat2.set_ordered(False).ordered
        cat2.set_ordered(True, inplace=True)
        assert cat2.ordered
        cat2.set_ordered(False, inplace=True)
        assert not cat2.ordered
    
        # removed in 0.19.0
        msg = "can't set attribute"
>       with pytest.raises(AttributeError, match=msg):
E       AssertionError: Regex pattern "can't set attribute" does not match "property 'ordered' of 'Categorical' object has no setter".

pandas/tests/arrays/categorical/test_api.py:57: AssertionError
________________ TestPrivateCategoricalAPI.test_codes_immutable ________________
[gw1] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python

self = <pandas.tests.arrays.categorical.test_api.TestPrivateCategoricalAPI object at 0x7f728659ed10>

    def test_codes_immutable(self):
    
        # Codes should be read only
        c = Categorical(["a", "b", "c", "a", np.nan])
        exp = np.array([0, 1, 2, 0, -1], dtype="int8")
        tm.assert_numpy_array_equal(c.codes, exp)
    
        # Assignments to codes should raise
        with pytest.raises(AttributeError, match="can't set attribute"):
>           c.codes = np.array([0, 1, 2, 0, 1], dtype="int8")
E           AttributeError: property 'codes' of 'Categorical' object has no setter

pandas/tests/arrays/categorical/test_api.py:511: AttributeError

During handling of the above exception, another exception occurred:

self = <pandas.tests.arrays.categorical.test_api.TestPrivateCategoricalAPI object at 0x7f728659ed10>

    def test_codes_immutable(self):
    
        # Codes should be read only
        c = Categorical(["a", "b", "c", "a", np.nan])
        exp = np.array([0, 1, 2, 0, -1], dtype="int8")
        tm.assert_numpy_array_equal(c.codes, exp)
    
        # Assignments to codes should raise
>       with pytest.raises(AttributeError, match="can't set attribute"):
E       AssertionError: Regex pattern "can't set attribute" does not match "property 'codes' of 'Categorical' object has no setter".

pandas/tests/arrays/categorical/test_api.py:510: AssertionError
________________________ test_set_levels_codes_directly ________________________
[gw0] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python

idx = MultiIndex([('foo', 'one'),
            ('foo', 'two'),
            ('bar', 'one'),
            ('baz', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

    def test_set_levels_codes_directly(idx):
        # setting levels/codes directly raises AttributeError
    
        levels = idx.levels
        new_levels = [[lev + "a" for lev in level] for level in levels]
    
        codes = idx.codes
        major_codes, minor_codes = codes
        major_codes = [(x + 1) % 3 for x in major_codes]
        minor_codes = [(x + 1) % 1 for x in minor_codes]
        new_codes = [major_codes, minor_codes]
    
        msg = "[Cc]an't set attribute"
        with pytest.raises(AttributeError, match=msg):
            idx.levels = new_levels
        with pytest.raises(AttributeError, match=msg):
>           idx.codes = new_codes
E           AttributeError: property 'codes' of 'MultiIndex' object has no setter

pandas/tests/indexes/multi/test_get_set.py:146: AttributeError

During handling of the above exception, another exception occurred:

idx = MultiIndex([('foo', 'one'),
            ('foo', 'two'),
            ('bar', 'one'),
            ('baz', 'two'),
            ('qux', 'one'),
            ('qux', 'two')],
           names=['first', 'second'])

    def test_set_levels_codes_directly(idx):
        # setting levels/codes directly raises AttributeError
    
        levels = idx.levels
        new_levels = [[lev + "a" for lev in level] for level in levels]
    
        codes = idx.codes
        major_codes, minor_codes = codes
        major_codes = [(x + 1) % 3 for x in major_codes]
        minor_codes = [(x + 1) % 1 for x in minor_codes]
        new_codes = [major_codes, minor_codes]
    
        msg = "[Cc]an't set attribute"
        with pytest.raises(AttributeError, match=msg):
            idx.levels = new_levels
>       with pytest.raises(AttributeError, match=msg):
E       AssertionError: Regex pattern "[Cc]an't set attribute" does not match "property 'codes' of 'MultiIndex' object has no setter".

pandas/tests/indexes/multi/test_get_set.py:145: AssertionError
_____________________ TestFreq.test_freq_setter_deprecated _____________________
[gw1] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python

self = <pandas.tests.indexes.period.test_freq_attr.TestFreq object at 0x7f7278849750>

    def test_freq_setter_deprecated(self):
        # GH#20678
        idx = period_range("2018Q1", periods=4, freq="Q")
    
        # no warning for getter
        with tm.assert_produces_warning(None):
            idx.freq
    
        # warning for setter
        with pytest.raises(AttributeError, match="can't set attribute"):
>           idx.freq = offsets.Day()

pandas/tests/indexes/period/test_freq_attr.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = PeriodIndex(['2018Q1', '2018Q2', '2018Q3', '2018Q4'], dtype='period[Q-DEC]')
value = <Day>

    def fset(self, value):
>       setattr(self._data, name, value)
E       AttributeError: property 'freq' of 'PeriodArray' object has no setter

pandas/core/indexes/extension.py:78: AttributeError

During handling of the above exception, another exception occurred:

self = <pandas.tests.indexes.period.test_freq_attr.TestFreq object at 0x7f7278849750>

    def test_freq_setter_deprecated(self):
        # GH#20678
        idx = period_range("2018Q1", periods=4, freq="Q")
    
        # no warning for getter
        with tm.assert_produces_warning(None):
            idx.freq
    
        # warning for setter
>       with pytest.raises(AttributeError, match="can't set attribute"):
E       AssertionError: Regex pattern "can't set attribute" does not match "property 'freq' of 'PeriodArray' object has no setter".

pandas/tests/indexes/period/test_freq_attr.py:20: AssertionError
_______________________ test_null_quote_char[python--0] ________________________
[gw1] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python

all_parsers = <pandas.tests.io.parser.conftest.PythonParser object at 0x7f7263c52010>
quoting = 0, quote_char = ''

    @pytest.mark.parametrize("quoting", [csv.QUOTE_MINIMAL, csv.QUOTE_NONE])
    @pytest.mark.parametrize("quote_char", ["", None])
    def test_null_quote_char(all_parsers, quoting, quote_char):
        kwargs = {"quotechar": quote_char, "quoting": quoting}
        data = "a,b,c\n1,2,3"
        parser = all_parsers
    
        if quoting != csv.QUOTE_NONE:
            # Sanity checking.
            msg = "quotechar must be set if quoting enabled"
    
            with pytest.raises(TypeError, match=msg):
>               parser.read_csv(StringIO(data), **kwargs)

pandas/tests/io/parser/test_quoting.py:86: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.tests.io.parser.conftest.PythonParser object at 0x7f7263c52010>
args = (<_io.StringIO object at 0x7f7263ad97e0>,)
kwargs = {'engine': 'python', 'low_memory': True, 'quotechar': '', 'quoting': 0}

    def read_csv(self, *args, **kwargs):
        kwargs = self.update_kwargs(kwargs)
>       return read_csv(*args, **kwargs)

pandas/tests/io/parser/conftest.py:29: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<_io.StringIO object at 0x7f7263ad97e0>,)
kwargs = {'engine': 'python', 'low_memory': True, 'quotechar': '', 'quoting': 0}
arguments = " except for the argument 'filepath_or_buffer'"

    @wraps(func)
    def wrapper(*args, **kwargs):
        arguments = _format_argument_list(allow_args)
        if len(args) > num_allow_args:
            warnings.warn(
                msg.format(arguments=arguments),
                FutureWarning,
                stacklevel=stacklevel,
            )
>       return func(*args, **kwargs)

pandas/util/_decorators.py:317: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

filepath_or_buffer = <_io.StringIO object at 0x7f7263ad97e0>, sep = <no_default>
delimiter = None, header = 'infer', names = <no_default>, index_col = None
usecols = None, squeeze = None, prefix = <no_default>, mangle_dupe_cols = True
dtype = None, engine = 'python', converters = None, true_values = None
false_values = None, skipinitialspace = False, skiprows = None, skipfooter = 0
nrows = None, na_values = None, keep_default_na = True, na_filter = True
verbose = False, skip_blank_lines = True, parse_dates = None
infer_datetime_format = False, keep_date_col = False, date_parser = None
dayfirst = False, cache_dates = True, iterator = False, chunksize = None
compression = 'infer', thousands = None, decimal = '.', lineterminator = None
quotechar = '', quoting = 0, doublequote = True, escapechar = None
comment = None, encoding = None, encoding_errors = 'strict', dialect = None
error_bad_lines = None, warn_bad_lines = None, on_bad_lines = None
delim_whitespace = False, low_memory = True, memory_map = False
float_precision = None, storage_options = None

    @deprecate_nonkeyword_arguments(
        version=None, allowed_args=["filepath_or_buffer"], stacklevel=3
    )
    @Appender(
        _doc_read_csv_and_table.format(
            func_name="read_csv",
            summary="Read a comma-separated values (csv) file into DataFrame.",
            _default_sep="','",
            storage_options=_shared_docs["storage_options"],
            decompression_options=_shared_docs["decompression_options"],
        )
    )
    def read_csv(
        filepath_or_buffer: FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str],
        sep: str | None | lib.NoDefault = lib.no_default,
        delimiter: str | None | lib.NoDefault = None,
        # Column and Index Locations and Names
        header: int | Sequence[int] | None | Literal["infer"] = "infer",
        names=lib.no_default,
        index_col=None,
        usecols=None,
        squeeze: bool | None = None,
        prefix: str | lib.NoDefault = lib.no_default,
        mangle_dupe_cols: bool = True,
        # General Parsing Configuration
        dtype: DtypeArg | None = None,
        engine: CSVEngine | None = None,
        converters=None,
        true_values=None,
        false_values=None,
        skipinitialspace: bool = False,
        skiprows=None,
        skipfooter: int = 0,
        nrows: int | None = None,
        # NA and Missing Data Handling
        na_values=None,
        keep_default_na: bool = True,
        na_filter: bool = True,
        verbose: bool = False,
        skip_blank_lines: bool = True,
        # Datetime Handling
        parse_dates=None,
        infer_datetime_format: bool = False,
        keep_date_col: bool = False,
        date_parser=None,
        dayfirst: bool = False,
        cache_dates: bool = True,
        # Iteration
        iterator: bool = False,
        chunksize: int | None = None,
        # Quoting, Compression, and File Format
        compression: CompressionOptions = "infer",
        thousands: str | None = None,
        decimal: str = ".",
        lineterminator: str | None = None,
        quotechar: str = '"',
        quoting: int = csv.QUOTE_MINIMAL,
        doublequote: bool = True,
        escapechar: str | None = None,
        comment: str | None = None,
        encoding: str | None = None,
        encoding_errors: str | None = "strict",
        dialect=None,
        # Error Handling
        error_bad_lines: bool | None = None,
        warn_bad_lines: bool | None = None,
        # TODO(2.0): set on_bad_lines to "error".
        # See _refine_defaults_read comment for why we do this.
        on_bad_lines=None,
        # Internal
        delim_whitespace: bool = False,
        low_memory=_c_parser_defaults["low_memory"],
        memory_map: bool = False,
        float_precision: Literal["high", "legacy"] | None = None,
        storage_options: StorageOptions = None,
    ) -> DataFrame | TextFileReader:
        # locals() should never be modified
        kwds = locals().copy()
        del kwds["filepath_or_buffer"]
        del kwds["sep"]
    
        kwds_defaults = _refine_defaults_read(
            dialect,
            delimiter,
            delim_whitespace,
            engine,
            sep,
            error_bad_lines,
            warn_bad_lines,
            on_bad_lines,
            names,
            prefix,
            defaults={"delimiter": ","},
        )
        kwds.update(kwds_defaults)
    
>       return _read(filepath_or_buffer, kwds)

pandas/io/parsers/readers.py:927: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

filepath_or_buffer = <_io.StringIO object at 0x7f7263ad97e0>
kwds = {'cache_dates': True, 'chunksize': None, 'comment': None, 'compression': 'infer', ...}

    def _read(
        filepath_or_buffer: FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str], kwds
    ) -> DataFrame | TextFileReader:
        """Generic reader of line files."""
        # if we pass a date_parser and parse_dates=False, we should not parse the
        # dates GH#44366
        if kwds.get("parse_dates", None) is None:
            if kwds.get("date_parser", None) is None:
                kwds["parse_dates"] = False
            else:
                kwds["parse_dates"] = True
    
        # Extract some of the arguments (pass chunksize on).
        iterator = kwds.get("iterator", False)
        chunksize = kwds.get("chunksize", None)
        if kwds.get("engine") == "pyarrow":
            if iterator:
                raise ValueError(
                    "The 'iterator' option is not supported with the 'pyarrow' engine"
                )
    
            if chunksize is not None:
                raise ValueError(
                    "The 'chunksize' option is not supported with the 'pyarrow' engine"
                )
        else:
            chunksize = validate_integer("chunksize", chunksize, 1)
    
        nrows = kwds.get("nrows", None)
    
        # Check for duplicates in names.
        _validate_names(kwds.get("names", None))
    
        # Create the parser.
>       parser = TextFileReader(filepath_or_buffer, **kwds)

pandas/io/parsers/readers.py:582: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.io.parsers.readers.TextFileReader object at 0x7f7275a7b390>
f = <_io.StringIO object at 0x7f7263ad97e0>, engine = 'python'
kwds = {'cache_dates': True, 'chunksize': None, 'comment': None, 'compression': 'infer', ...}
engine_specified = True, dialect = None
options = {'cache_dates': True, 'comment': None, 'compression': 'infer', 'converters': None, ...}

    def __init__(
        self,
        f: FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str] | list,
        engine: CSVEngine | None = None,
        **kwds,
    ) -> None:
        if engine is not None:
            engine_specified = True
        else:
            engine = "python"
            engine_specified = False
        self.engine = engine
        self._engine_specified = kwds.get("engine_specified", engine_specified)
    
        _validate_skipfooter(kwds)
    
        dialect = _extract_dialect(kwds)
        if dialect is not None:
            if engine == "pyarrow":
                raise ValueError(
                    "The 'dialect' option is not supported with the 'pyarrow' engine"
                )
            kwds = _merge_with_dialect_properties(dialect, kwds)
    
        if kwds.get("header", "infer") == "infer":
            kwds["header"] = 0 if kwds.get("names") is None else None
    
        self.orig_options = kwds
    
        # miscellanea
        self._currow = 0
    
        options = self._get_options_with_defaults(engine)
        options["storage_options"] = kwds.get("storage_options", None)
    
        self.chunksize = options.pop("chunksize", None)
        self.nrows = options.pop("nrows", None)
    
        self._check_file_or_buffer(f, engine)
        self.options, self.engine = self._clean_options(options, engine)
    
        self.squeeze = self.options.pop("squeeze", False)
    
        if "has_index_names" in kwds:
            self.options["has_index_names"] = kwds["has_index_names"]
    
        self.handles: IOHandles | None = None
>       self._engine = self._make_engine(f, self.engine)

pandas/io/parsers/readers.py:1421: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.io.parsers.readers.TextFileReader object at 0x7f7275a7b390>
f = <_io.StringIO object at 0x7f7263ad97e0>, engine = 'python'

    def _make_engine(
        self,
        f: FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str] | list | IO,
        engine: CSVEngine = "c",
    ) -> ParserBase:
        mapping: dict[str, type[ParserBase]] = {
            "c": CParserWrapper,
            "python": PythonParser,
            "pyarrow": ArrowParserWrapper,
            "python-fwf": FixedWidthFieldParser,
        }
        if engine not in mapping:
            raise ValueError(
                f"Unknown engine: {engine} (valid options are {mapping.keys()})"
            )
        if not isinstance(f, list):
            # open file here
            is_text = True
            mode = "r"
            if engine == "pyarrow":
                is_text = False
                mode = "rb"
            # error: No overload variant of "get_handle" matches argument types
            # "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
            # , "str", "bool", "Any", "Any", "Any", "Any", "Any"
            self.handles = get_handle(  # type: ignore[call-overload]
                f,
                mode,
                encoding=self.options.get("encoding", None),
                compression=self.options.get("compression", None),
                memory_map=self.options.get("memory_map", False),
                is_text=is_text,
                errors=self.options.get("encoding_errors", "strict"),
                storage_options=self.options.get("storage_options", None),
            )
            assert self.handles is not None
            f = self.handles.handle
    
        elif engine != "python":
            msg = f"Invalid file path or buffer object type: {type(f)}"
            raise ValueError(msg)
    
        try:
>           return mapping[engine](f, **self.options)

pandas/io/parsers/readers.py:1725: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.io.parsers.python_parser.PythonParser object at 0x7f7263c0bd10>
f = <_io.StringIO object at 0x7f7263ad97e0>
kwds = {'comment': None, 'compression': 'infer', 'converters': {}, 'decimal': '.', ...}

    def __init__(self, f: ReadCsvBuffer[str] | list, **kwds) -> None:
        """
        Workhorse function for processing nested list into DataFrame
        """
        super().__init__(kwds)
    
        self.data: Iterator[str] | None = None
        self.buf: list = []
        self.pos = 0
        self.line_pos = 0
    
        self.skiprows = kwds["skiprows"]
    
        if callable(self.skiprows):
            self.skipfunc = self.skiprows
        else:
            self.skipfunc = lambda x: x in self.skiprows
    
        self.skipfooter = _validate_skipfooter_arg(kwds["skipfooter"])
        self.delimiter = kwds["delimiter"]
    
        self.quotechar = kwds["quotechar"]
        if isinstance(self.quotechar, str):
            self.quotechar = str(self.quotechar)
    
        self.escapechar = kwds["escapechar"]
        self.doublequote = kwds["doublequote"]
        self.skipinitialspace = kwds["skipinitialspace"]
        self.lineterminator = kwds["lineterminator"]
        self.quoting = kwds["quoting"]
        self.skip_blank_lines = kwds["skip_blank_lines"]
    
        self.names_passed = kwds["names"] or None
    
        self.has_index_names = False
        if "has_index_names" in kwds:
            self.has_index_names = kwds["has_index_names"]
    
        self.verbose = kwds["verbose"]
    
        self.thousands = kwds["thousands"]
        self.decimal = kwds["decimal"]
    
        self.comment = kwds["comment"]
    
        # Set self.data to something that can read lines.
        if isinstance(f, list):
            # read_excel: f is a list
            self.data = cast(Iterator[str], f)
        else:
            assert hasattr(f, "readline")
>           self._make_reader(f)

pandas/io/parsers/python_parser.py:110: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.io.parsers.python_parser.PythonParser object at 0x7f7263c0bd10>
f = <_io.StringIO object at 0x7f7263ad97e0>

    def _make_reader(self, f: IO[str] | ReadCsvBuffer[str]) -> None:
        sep = self.delimiter
    
        if sep is None or len(sep) == 1:
            if self.lineterminator:
                raise ValueError(
                    "Custom line terminators not supported in python parser (yet)"
                )
    
            class MyDialect(csv.Dialect):
                delimiter = self.delimiter
                quotechar = self.quotechar
                escapechar = self.escapechar
                doublequote = self.doublequote
                skipinitialspace = self.skipinitialspace
                quoting = self.quoting
                lineterminator = "\n"
    
            dia = MyDialect
    
            if sep is not None:
                dia.delimiter = sep
            else:
                # attempt to sniff the delimiter from the first valid line,
                # i.e. no comment line and not in skiprows
                line = f.readline()
                lines = self._check_comments([[line]])[0]
                while self.skipfunc(self.pos) or not lines:
                    self.pos += 1
                    line = f.readline()
                    lines = self._check_comments([[line]])[0]
                lines_str = cast(List[str], lines)
    
                # since `line` was a string, lines will be a list containing
                # only a single string
                line = lines_str[0]
    
                self.pos += 1
                self.line_pos += 1
                sniffed = csv.Sniffer().sniff(line)
                dia.delimiter = sniffed.delimiter
    
                # Note: encoding is irrelevant here
                line_rdr = csv.reader(StringIO(line), dialect=dia)
                self.buf.extend(list(line_rdr))
    
            # Note: encoding is irrelevant here
>           reader = csv.reader(f, dialect=dia, strict=True)
E           TypeError: "quotechar" must be a 1-character string

pandas/io/parsers/python_parser.py:222: TypeError

During handling of the above exception, another exception occurred:

all_parsers = <pandas.tests.io.parser.conftest.PythonParser object at 0x7f7263c52010>
quoting = 0, quote_char = ''

    @pytest.mark.parametrize("quoting", [csv.QUOTE_MINIMAL, csv.QUOTE_NONE])
    @pytest.mark.parametrize("quote_char", ["", None])
    def test_null_quote_char(all_parsers, quoting, quote_char):
        kwargs = {"quotechar": quote_char, "quoting": quoting}
        data = "a,b,c\n1,2,3"
        parser = all_parsers
    
        if quoting != csv.QUOTE_NONE:
            # Sanity checking.
            msg = "quotechar must be set if quoting enabled"
    
>           with pytest.raises(TypeError, match=msg):
E           AssertionError: Regex pattern 'quotechar must be set if quoting enabled' does not match '"quotechar" must be a 1-character string'.

pandas/tests/io/parser/test_quoting.py:85: AssertionError
_______________________ test_null_quote_char[python--3] ________________________
[gw1] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python

all_parsers = <pandas.tests.io.parser.conftest.PythonParser object at 0x7f7263190e90>
quoting = 3, quote_char = ''

    @pytest.mark.parametrize("quoting", [csv.QUOTE_MINIMAL, csv.QUOTE_NONE])
    @pytest.mark.parametrize("quote_char", ["", None])
    def test_null_quote_char(all_parsers, quoting, quote_char):
        kwargs = {"quotechar": quote_char, "quoting": quoting}
        data = "a,b,c\n1,2,3"
        parser = all_parsers
    
        if quoting != csv.QUOTE_NONE:
            # Sanity checking.
            msg = "quotechar must be set if quoting enabled"
    
            with pytest.raises(TypeError, match=msg):
                parser.read_csv(StringIO(data), **kwargs)
        else:
            expected = DataFrame([[1, 2, 3]], columns=["a", "b", "c"])
>           result = parser.read_csv(StringIO(data), **kwargs)

pandas/tests/io/parser/test_quoting.py:89: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/tests/io/parser/conftest.py:29: in read_csv
    return read_csv(*args, **kwargs)
pandas/util/_decorators.py:317: in wrapper
    return func(*args, **kwargs)
pandas/io/parsers/readers.py:927: in read_csv
    return _read(filepath_or_buffer, kwds)
pandas/io/parsers/readers.py:582: in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
pandas/io/parsers/readers.py:1421: in __init__
    self._engine = self._make_engine(f, self.engine)
pandas/io/parsers/readers.py:1725: in _make_engine
    return mapping[engine](f, **self.options)
pandas/io/parsers/python_parser.py:110: in __init__
    self._make_reader(f)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pandas.io.parsers.python_parser.PythonParser object at 0x7f7263260590>
f = <_io.StringIO object at 0x7f7263ad83a0>

    def _make_reader(self, f: IO[str] | ReadCsvBuffer[str]) -> None:
        sep = self.delimiter
    
        if sep is None or len(sep) == 1:
            if self.lineterminator:
                raise ValueError(
                    "Custom line terminators not supported in python parser (yet)"
                )
    
            class MyDialect(csv.Dialect):
                delimiter = self.delimiter
                quotechar = self.quotechar
                escapechar = self.escapechar
                doublequote = self.doublequote
                skipinitialspace = self.skipinitialspace
                quoting = self.quoting
                lineterminator = "\n"
    
            dia = MyDialect
    
            if sep is not None:
                dia.delimiter = sep
            else:
                # attempt to sniff the delimiter from the first valid line,
                # i.e. no comment line and not in skiprows
                line = f.readline()
                lines = self._check_comments([[line]])[0]
                while self.skipfunc(self.pos) or not lines:
                    self.pos += 1
                    line = f.readline()
                    lines = self._check_comments([[line]])[0]
                lines_str = cast(List[str], lines)
    
                # since `line` was a string, lines will be a list containing
                # only a single string
                line = lines_str[0]
    
                self.pos += 1
                self.line_pos += 1
                sniffed = csv.Sniffer().sniff(line)
                dia.delimiter = sniffed.delimiter
    
                # Note: encoding is irrelevant here
                line_rdr = csv.reader(StringIO(line), dialect=dia)
                self.buf.extend(list(line_rdr))
    
            # Note: encoding is irrelevant here
>           reader = csv.reader(f, dialect=dia, strict=True)
E           TypeError: "quotechar" must be a 1-character string

pandas/io/parsers/python_parser.py:222: TypeError
_________________________ test_null_byte_char[python] __________________________
[gw1] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python

all_parsers = <pandas.tests.io.parser.conftest.PythonParser object at 0x7f7263750190>

    def test_null_byte_char(all_parsers):
        # see gh-2741
        data = "\x00,foo"
        names = ["a", "b"]
        parser = all_parsers
    
        if parser.engine == "c":
            expected = DataFrame([[np.nan, "foo"]], columns=names)
            out = parser.read_csv(StringIO(data), names=names)
            tm.assert_frame_equal(out, expected)
        else:
            msg = "NULL byte detected"
>           with pytest.raises(ParserError, match=msg):
E           Failed: DID NOT RAISE <class 'pandas.errors.ParserError'>

pandas/tests/io/parser/common/test_read_errors.py:239: Failed
______________________________ test_show_versions ______________________________
[gw0] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python
[XPASS(strict)] _distutils not in python3.10/distutils/core.py
_______________________ test_show_versions_console_json ________________________
[gw0] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python
[XPASS(strict)] _distutils not in python3.10/distutils/core.py
__________________________ test_show_versions_console __________________________
[gw0] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python
[XPASS(strict)] _distutils not in python3.10/distutils/core.py
____________________________ test_json_output_match ____________________________
[gw0] linux -- Python 3.11.0 /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/bin/python
[XPASS(strict)] _distutils not in python3.10/distutils/core.py
Python 3.11 CI Warnings
=============================== warnings summary ===============================
pandas/tests/exchange/test_impl.py::test_categorical_dtype[data0]
pandas/tests/exchange/test_impl.py::test_categorical_dtype[data1]
  /home/runner/work/pandas/pandas/pandas/tests/exchange/test_impl.py:68: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
    tm.assert_frame_equal(df, from_dataframe(df.__dataframe__()))

pandas/tests/exchange/test_impl.py::test_dataframe[data0]
pandas/tests/exchange/test_impl.py::test_dataframe[data1]
pandas/tests/exchange/test_impl.py::test_dataframe[data2]
pandas/tests/exchange/test_impl.py::test_dataframe[data3]
pandas/tests/exchange/test_impl.py::test_dataframe[data4]
  /home/runner/work/pandas/pandas/pandas/tests/exchange/test_impl.py:88: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
    from_dataframe(df2.select_columns(indices)),

pandas/tests/exchange/test_impl.py::test_dataframe[data0]
pandas/tests/exchange/test_impl.py::test_dataframe[data1]
pandas/tests/exchange/test_impl.py::test_dataframe[data2]
pandas/tests/exchange/test_impl.py::test_dataframe[data3]
pandas/tests/exchange/test_impl.py::test_dataframe[data4]
  /home/runner/work/pandas/pandas/pandas/tests/exchange/test_impl.py:89: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
    from_dataframe(df2.select_columns_by_name(names)),

pandas/tests/extension/test_floating.py::Test2DCompat::test_reductions_2d_axis_none[Float32Dtype-prod]
pandas/tests/extension/test_floating.py::Test2DCompat::test_reductions_2d_axis1[Float32Dtype-prod]
  /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/lib/python3.11/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

pandas/tests/util/test_show_versions.py::test_show_versions
  /opt/hostedtoolcache/Python/3.11.0-beta.1/x64/lib/python3.11/site-packages/pkg_resources/_vendor/pyparsing.py:87: DeprecationWarning: module 'sre_constants' is deprecated
    import sre_constants

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
------ generated xml file: /home/runner/work/pandas/pandas/test-data.xml -------

@EwoutH
Copy link
Contributor Author

EwoutH commented May 23, 2022

The TypeError: '<' not supported between instances of 'type' and 'type' above might be similar to sqlalchemy/sqlalchemy#7591.

@lithomas1
Copy link
Member

Hi all, sorry for the long silence here. Long story short, I got bogged down trying to fix the tests, and then ran out of free time to work on this.

At this point, we can probably just xfail the remaining failing tests and get preliminary CI for 3.11 merged in. Wheels will probably follow shortly when the Python 3.11 rc makes its way to GHA/Azure pipelines.

@lithomas1 lithomas1 added this to the 1.5 milestone Aug 17, 2022
@lithomas1
Copy link
Member

Update: Python 3.11 testing is in.
Manylinux wheels are going to be released for the 1.5 release. I don't know if they are going to be retroactively added for the RC though.
I'm working on enabling Windows/MacOS wheels too, but it's slightly trickier.

That's all for this week. I'll make an update next week if I have any.

@EwoutH
Copy link
Contributor Author

EwoutH commented Aug 25, 2022

Thanks for the update, the hard work is appreciated!

@lithomas1 lithomas1 added the Blocker Blocking issue or pull request for an upcoming release label Aug 27, 2022
@lithomas1
Copy link
Member

Update: Wheels for all platforms(manylinux, Windows, macOS) have landed, and will be included in the next nightlies (and for 1.5).

@dss010101
Copy link

curious - if pyarrow does not as yet support 3.11, does that mean we are unable to use engine=pyarrow with python 3.11?

@jaymegordo
Copy link

@msingh00 we can still use pyarrow, but when you try to install it it will be built from source, as there is no prebuilt binary available on pypi.org yet. I've got it to work with pyarrow=9.0.0, but it does cause some headaches making sure some compiler stuff is installed

@EwoutH
Copy link
Contributor Author

EwoutH commented Jun 14, 2023

Created an tracking issue for 3.12: #53665

If you want to contribute towards 3.12 support, please do so and communicate it there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Compat pandas objects compatability with Numpy or Python functions Master Tracker High level tracker for similar issues Python 3.11
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants