Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPS: Sync environment.yml with CI dep files #47287

Merged
merged 20 commits into from Jun 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/code-checks.yml
Expand Up @@ -157,3 +157,32 @@ jobs:

- name: Build image
run: docker build --pull --no-cache --tag pandas-dev-env .

requirements-dev-text-installable:
name: Test install requirements-dev.txt
runs-on: ubuntu-latest

concurrency:
# https://github.community/t/concurrecy-not-work-for-push/183068/7
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-requirements-dev-text-installable
cancel-in-progress: true

steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0

- name: Setup Python
id: setup_python
uses: actions/setup-python@v3
with:
python-version: '3.8'
cache: 'pip'
cache-dependency-path: 'requirements-dev.txt'

- name: Install requirements-dev.txt
run: pip install -r requirements-dev.txt

- name: Check Pip Cache Hit
run: echo ${{ steps.setup_python.outputs.cache-hit }}
@@ -1,4 +1,4 @@
name: Posix
name: Ubuntu

on:
push:
Expand Down Expand Up @@ -145,7 +145,7 @@ jobs:

- name: Extra installs
# xsel for clipboard tests
run: sudo apt-get update && sudo apt-get install -y libc6-dev-i386 xsel ${{ env.EXTRA_APT }}
run: sudo apt-get update && sudo apt-get install -y xsel ${{ env.EXTRA_APT }}

- name: Set up Conda
uses: ./.github/actions/setup-conda
Expand Down
3 changes: 1 addition & 2 deletions ci/deps/actions-310.yaml
Expand Up @@ -31,8 +31,7 @@ dependencies:
- jinja2
- lxml
- matplotlib
# TODO: uncomment after numba supports py310
#- numba
- numba
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could cause installing an older numpy version which could(?) explain most of the errors (but not the pyqt stuff).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is one dedicated CI run for numpy-dev, it would make sense to use the latest numpy compatible with numba (even for typing). Reverting #45244 would probably fix most of the numpy errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should only affect our specific PY 3.10 build which just runs the unit tests though. The typing checks should have an environment that is set up by environment.yml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant that since we anyways run the unit tests with NumPy-dev in a separate workflow, prioritizing the latest numba version over the latest (released) NumPy version (in environment.yml) could be fine. Either way, it would be good to limit the numba version or the numpy version in environment.yml.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is one dedicated CI run for numpy-dev, it would make sense to use the latest numpy compatible with numba (even for typing). Reverting #45244 would probably fix most of the numpy errors.

The problem I have with this is that new contribritors when setting up an environment will get the latest numpy and have mypy errors by default.

We should make the contributor experience pain free so (imo) we should use environment.yaml for the typing validation to match the local dev env .

Otherwise, this just makes it difficult for people to contribute to the typing issues.

Now, numba is included in environment.yaml so I'm not sure why when I set up a clean dev locally I get numpy 1.23.1 and on ci we get 1.22.4 (maybe there is some caching on ci?)

My comments here are from looking into this a couple of weeks ago. So this comment here now maybe out of date. Will look again soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, numba is included in environment.yaml so I'm not sure why when I set up a clean dev locally I get numpy 1.23.1 and on ci we get 1.22.4 (maybe there is some caching on ci?)

I must admit that I don't use the official way to setup a pandas-dev env, but it would be great to ensure that the officially documented pandas-env does not cause mypy errors.

Maybe numba has different numpy-constraints on conda-forge (or conda installs incompatible versions)? When I ask poetry to install numba = ">=0.53.1" (as in environment.yml) and numpy = ">=1.23.0", it is unable to find a solution (at least not on Linux with python 3.10).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit that I don't use the official way to setup a pandas-dev env, but it would be great to ensure that the officially documented pandas-env does not cause mypy errors.

yes I need to double check that's still true.

- numexpr
- openpyxl
- odfpy
Expand Down
152 changes: 77 additions & 75 deletions environment.yml
@@ -1,21 +1,85 @@
# Local development dependencies including docs building, website upload, ASV benchmark
name: pandas-dev
channels:
- conda-forge
dependencies:
# required
- numpy>=1.19.5
- python=3.8
- python-dateutil>=2.8.1

# test dependencies
- cython=0.29.30
- pytest>=6.0
- pytest-cov
- pytest-xdist>=1.31
- psutil
- pytest-asyncio>=0.17
- boto3

# required dependencies
- python-dateutil
- numpy
- pytz

# optional dependencies
- beautifulsoup4
- blosc
- brotlipy
- bottleneck
- fastparquet
- fsspec
- html5lib
- hypothesis
- gcsfs
- jinja2
- lxml
- matplotlib
- numba>=0.53.1
- numexpr>=2.8.0 # pin for "Run checks on imported code" job
- openpyxl
- odfpy
- pandas-gbq
- psycopg2
- pyarrow
- pymysql
- pyreadstat
- pytables
- python-snappy
- pyxlsb
- s3fs
- scipy
- sqlalchemy
- tabulate
- xarray
- xlrd
- xlsxwriter
- xlwt
- zstandard

# downstream packages
- aiobotocore<2.0.0 # GH#44311 pinned to fix docbuild
- botocore
- cftime
- dask
- ipython
- geopandas-base
- seaborn
- scikit-learn
- statsmodels
- coverage
- pandas-datareader
- pyyaml
- py
- pytorch

# local testing dependencies
- moto
- flask

# benchmarks
- asv

# building
# The compiler packages are meta-packages and install the correct compiler (activation) packages on the respective platforms.
- c-compiler
- cxx-compiler
- cython>=0.29.30

# code checks
- black=22.3.0
Expand All @@ -32,10 +96,11 @@ dependencies:
# documentation
- gitpython # obtain contributors from git for whatsnew
- gitdb
- natsort # DataFrame.sort_values doctest
- numpydoc
- pandas-dev-flaker=0.5.0
- pydata-sphinx-theme=0.8.0
- pytest-cython
- pytest-cython # doctest
- sphinx
- sphinx-panels
- types-python-dateutil
Expand All @@ -47,77 +112,14 @@ dependencies:
- nbconvert>=6.4.5
- nbsphinx
- pandoc

# Dask and its dependencies (that dont install with dask)
- dask-core
- toolz>=0.7.3
- partd>=0.3.10
- cloudpickle>=0.2.1

# web (jinja2 is also needed, but it's also an optional pandas dependency)
- markdown
- feedparser
- pyyaml
- requests

# testing
- boto3
- botocore>=1.11
- hypothesis>=5.5.3
- moto # mock S3
- flask
- pytest>=6.0
- pytest-cov
- pytest-xdist>=1.31
- pytest-asyncio>=0.17
- pytest-instafail

# downstream tests
- seaborn
- statsmodels

# unused (required indirectly may be?)
- ipywidgets
- nbformat
- notebook>=6.0.3

# optional
- blosc
- bottleneck>=1.3.1
- ipykernel
- ipython>=7.11.1
- jinja2 # pandas.Styler
- matplotlib>=3.3.2 # pandas.plotting, Series.plot, DataFrame.plot
- numexpr>=2.7.1
- scipy>=1.4.1
- numba>=0.50.1

# optional for io
# ---------------
# pd.read_html
- beautifulsoup4>=4.8.2
- html5lib
- lxml

# pd.read_excel, DataFrame.to_excel, pd.ExcelWriter, pd.ExcelFile
- openpyxl
- xlrd
- xlsxwriter
- xlwt
- odfpy

- fastparquet>=0.4.0 # pandas.read_parquet, DataFrame.to_parquet
- pyarrow>2.0.1 # pandas.read_parquet, DataFrame.to_parquet, pandas.read_feather, DataFrame.to_feather
- python-snappy # required by pyarrow

- pytables>=3.6.1 # pandas.read_hdf, DataFrame.to_hdf
- s3fs>=0.4.0 # file IO when using 's3://...' path
- aiobotocore<2.0.0 # GH#44311 pinned to fix docbuild
- fsspec>=0.7.4 # for generic remote file operations
- gcsfs>=0.6.0 # file IO when using 'gcs://...' path
- sqlalchemy # pandas.read_sql, DataFrame.to_sql
- xarray # DataFrame.to_xarray
- cftime # Needed for downstream xarray.CFTimeIndex test
- pyreadstat # pandas.read_spss
- tabulate>=0.8.3 # DataFrame.to_markdown
- natsort # DataFrame.sort_values
# web
- jinja2 # in optional dependencies, but documented here as needed
- markdown
- feedparser
- pyyaml
- requests
7 changes: 1 addition & 6 deletions pandas/core/algorithms.py
Expand Up @@ -1064,12 +1064,7 @@ def checked_add_with_arr(
elif arr_mask is not None:
not_nan = np.logical_not(arr_mask)
elif b_mask is not None:
# Argument 1 to "__call__" of "_UFunc_Nin1_Nout1" has incompatible type
# "Optional[ndarray[Any, dtype[bool_]]]"; expected
# "Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[An
# y]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool,
# int, float, complex, str, bytes]]]" [arg-type]
not_nan = np.logical_not(b2_mask) # type: ignore[arg-type]
not_nan = np.logical_not(b2_mask)
else:
not_nan = np.empty(arr.shape, dtype=bool)
not_nan.fill(True)
Expand Down
10 changes: 2 additions & 8 deletions pandas/core/array_algos/quantile.py
Expand Up @@ -143,10 +143,7 @@ def _nanpercentile_1d(
return np.percentile(
values,
qs,
# error: No overload variant of "percentile" matches argument types
# "ndarray[Any, Any]", "ndarray[Any, dtype[floating[_64Bit]]]",
# "int", "Dict[str, str]"
**{np_percentile_argname: interpolation}, # type: ignore[call-overload]
**{np_percentile_argname: interpolation},
)


Expand Down Expand Up @@ -215,8 +212,5 @@ def _nanpercentile(
values,
qs,
axis=1,
# error: No overload variant of "percentile" matches argument types
# "ndarray[Any, Any]", "ndarray[Any, dtype[floating[_64Bit]]]",
# "int", "Dict[str, str]"
**{np_percentile_argname: interpolation}, # type: ignore[call-overload]
**{np_percentile_argname: interpolation},
)
6 changes: 5 additions & 1 deletion pandas/core/arraylike.py
Expand Up @@ -265,7 +265,11 @@ def array_ufunc(self, ufunc: np.ufunc, method: str, *inputs: Any, **kwargs: Any)
return result

# Determine if we should defer.
no_defer = (np.ndarray.__array_ufunc__, cls.__array_ufunc__)
# error: "Type[ndarray[Any, Any]]" has no attribute "__array_ufunc__"
no_defer = (
np.ndarray.__array_ufunc__, # type: ignore[attr-defined]
cls.__array_ufunc__,
)

for item in inputs:
higher_priority = (
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/arrow/array.py
Expand Up @@ -496,7 +496,7 @@ def _indexing_key_to_indices(
if isinstance(key, slice):
indices = np.arange(n)[key]
elif is_integer(key):
indices = np.arange(n)[[key]] # type: ignore[index]
indices = np.arange(n)[[key]]
elif is_bool_dtype(key):
key = np.asarray(key)
if len(key) != n:
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/arrays/datetimes.py
Expand Up @@ -487,7 +487,10 @@ def _generate_range(
np.linspace(0, end.value - start.value, periods, dtype="int64")
+ start.value
)
if i8values.dtype != "i8":
# error: Non-overlapping equality check
# (left operand type: "dtype[signedinteger[Any]]",
# right operand type: "Literal['i8']")
if i8values.dtype != "i8": # type: ignore[comparison-overlap]
# 2022-01-09 I (brock) am not sure if it is possible for this
# to overflow and cast to e.g. f8, but if it does we need to cast
i8values = i8values.astype("i8")
Expand Down
24 changes: 16 additions & 8 deletions pandas/core/arrays/interval.py
Expand Up @@ -687,7 +687,21 @@ def __getitem__(
if is_scalar(left) and isna(left):
return self._fill_value
return Interval(left, right, inclusive=self.inclusive)
if np.ndim(left) > 1:
# error: Argument 1 to "ndim" has incompatible type
# "Union[ndarray[Any, Any], ExtensionArray]"; expected
# "Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]],
# Union[Union[_SupportsArray[dtype[Any]],
# Sequence[_SupportsArray[dtype[Any]]],
# Sequence[Sequence[_SupportsArray[dtype[Any]]]],
# Sequence[Sequence[Sequence[_SupportsArray[dtype[Any]]]]],
# Sequence[Sequence[Sequence[Sequence[_SupportsArray[dtype[Any]]]]]]],
# Union[bool, int, float, complex, str, bytes,
# Sequence[Union[bool, int, float, complex, str, bytes]],
# Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]],
# Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]],
# Sequence[Sequence[Sequence[Sequence[Union[bool, int, float,
# complex, str, bytes]]]]]]]]"
if np.ndim(left) > 1: # type: ignore[arg-type]
# GH#30588 multi-dimensional indexer disallowed
raise ValueError("multi-dimensional indexing not allowed")
return self._shallow_copy(left, right)
Expand Down Expand Up @@ -1665,13 +1679,7 @@ def isin(self, values) -> np.ndarray:
# complex128 ndarray is much more performant.
left = self._combined.view("complex128")
right = values._combined.view("complex128")
# Argument 1 to "in1d" has incompatible type "Union[ExtensionArray,
# ndarray[Any, Any], ndarray[Any, dtype[Any]]]"; expected
# "Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[
# dtype[Any]]], bool, int, float, complex, str, bytes,
# _NestedSequence[Union[bool, int, float, complex, str, bytes]]]"
# [arg-type]
return np.in1d(left, right) # type: ignore[arg-type]
return np.in1d(left, right)

elif needs_i8_conversion(self.left.dtype) ^ needs_i8_conversion(
values.left.dtype
Expand Down
20 changes: 9 additions & 11 deletions pandas/core/arrays/masked.py
Expand Up @@ -110,7 +110,13 @@ def __init__(
self, values: np.ndarray, mask: npt.NDArray[np.bool_], copy: bool = False
) -> None:
# values is supposed to already be validated in the subclass
if not (isinstance(mask, np.ndarray) and mask.dtype == np.bool_):
if not (
isinstance(mask, np.ndarray)
and
# error: Non-overlapping equality check
# (left operand type: "dtype[bool_]", right operand type: "Type[bool_]")
mask.dtype == np.bool_ # type: ignore[comparison-overlap]
):
raise TypeError(
"mask should be boolean numpy array. Use "
"the 'pd.array' function instead"
Expand Down Expand Up @@ -1151,11 +1157,7 @@ def any(self, *, skipna: bool = True, **kwargs):
nv.validate_any((), kwargs)

values = self._data.copy()
# Argument 3 to "putmask" has incompatible type "object"; expected
# "Union[_SupportsArray[dtype[Any]], _NestedSequence[
# _SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _Nested
# Sequence[Union[bool, int, float, complex, str, bytes]]]" [arg-type]
np.putmask(values, self._mask, self._falsey_value) # type: ignore[arg-type]
np.putmask(values, self._mask, self._falsey_value)
result = values.any()
if skipna:
return result
Expand Down Expand Up @@ -1231,11 +1233,7 @@ def all(self, *, skipna: bool = True, **kwargs):
nv.validate_all((), kwargs)

values = self._data.copy()
# Argument 3 to "putmask" has incompatible type "object"; expected
# "Union[_SupportsArray[dtype[Any]], _NestedSequence[
# _SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _Neste
# dSequence[Union[bool, int, float, complex, str, bytes]]]" [arg-type]
np.putmask(values, self._mask, self._truthy_value) # type: ignore[arg-type]
np.putmask(values, self._mask, self._truthy_value)
result = values.all()

if skipna:
Expand Down