99## Abstract
1010
1111The suggestion is that setitem-like operations would
12- not change a `` Series `` dtype (nor that of a `` DataFrame `` 's column).
12+ not change a `` Series `` ' dtype (nor that of a `` DataFrame `` 's column).
1313
1414Current behaviour:
1515``` python
@@ -51,65 +51,71 @@ In[11]: ser[2] = "2000-01-04x" # typo - but pandas does not error, it upcasts t
5151```
5252
5353The scope of this PDEP is limited to setitem-like operations on Series (and DataFrame columns).
54- For example, starting with
54+ For example, starting with:
5555``` python
5656df = DataFrame({" a" : [1 , 2 , np.nan], " b" : [4 , 5 , 6 ]})
5757ser = df[" a" ].copy()
5858```
5959then the following would all raise:
6060
61- - setitem-like operations:
62- - `` ser.fillna('foo', inplace=True) `` ;
63- - `` ser.where(ser.isna(), 'foo', inplace=True) ``
64- - `` ser.fillna('foo', inplace=False) `` ;
65- - `` ser.where(ser.isna(), 'foo', inplace=False) ``
66- - setitem indexing operations (where `` indexer `` could be a slice, a mask,
61+ * setitem-like operations:
62+
63+ - `` ser.fillna('foo', inplace=True) ``
64+ - `` ser.where(ser.isna(), 'foo', inplace=True) ``
65+ - `` ser.fillna('foo', inplace=False) ``
66+ - `` ser.where(ser.isna(), 'foo', inplace=False) ``
67+
68+ * setitem indexing operations (where `` indexer `` could be a slice, a mask,
6769 a single value, a list or array of values, or any other allowed indexer):
68- - `` ser.iloc[indexer] = 'foo' ``
69- - `` ser.loc[indexer] = 'foo' ``
70- - `` df.iloc[indexer, 0] = 'foo' ``
71- - `` df.loc[indexer, 'a'] = 'foo' ``
72- - `` ser[indexer] = 'foo' ``
70+
71+ - `` ser.iloc[indexer] = 'foo' ``
72+ - `` ser.loc[indexer] = 'foo' ``
73+ - `` df.iloc[indexer, 0] = 'foo' ``
74+ - `` df.loc[indexer, 'a'] = 'foo' ``
75+ - `` ser[indexer] = 'foo' ``
7376
7477It may be desirable to expand the top list to `` Series.replace `` and `` Series.update `` ,
7578but to keep the scope of the PDEP down, they are excluded for now.
7679
7780Examples of operations which would not raise are:
78- - `` ser.diff() `` ;
79- - `` pd.concat([ser, ser.astype(object)]) `` ;
80- - `` ser.mean() `` ;
81- - `` ser[0] = 3 `` ; # same dtype
82- - `` ser[0] = 3. `` ; # 3.0 is a 'round' float and so compatible with 'int64' dtype
83- - `` df['a'] = pd.date_range(datetime(2020, 1, 1), periods=3) `` ;
84- - `` df.index.intersection(ser.index) `` .
81+
82+ - `` ser.diff() ``
83+ - `` pd.concat([ser, ser.astype(object)]) ``
84+ - `` ser.mean() ``
85+ - `` ser[0] = 3 `` # same dtype
86+ - `` ser[0] = 3. `` # 3.0 is a 'round' float and so compatible with 'int64' dtype
87+ - `` df['a'] = pd.date_range(datetime(2020, 1, 1), periods=3) ``
88+ - `` df.index.intersection(ser.index) ``
8589
8690## Detailed description
8791
8892Concretely, the suggestion is:
89- - if a `` Series `` is of a given dtype, then a `` setitem `` -like operation should not change its dtype;
90- - if a `` setitem `` -like operation would previously have changed a `` Series `` ' dtype, it would now raise.
93+
94+ - If a `` Series `` is of a given dtype, then a `` setitem `` -like operation should not change its dtype.
95+ - If a `` setitem `` -like operation would previously have changed a `` Series `` ' dtype, it would now raise.
9196
9297For a start, this would involve:
9398
94- 1 . changing `` Block.setitem `` such that it does not have an `` except `` block in
99+ 1 . changing `` Block.setitem `` such that it does not have an `` except `` block in:
100+
101+ <!-- language: python -->
95102
96- ``` python
97- value = extract_array(value, extract_numpy = True )
98- try :
99- casted = np_can_hold_element(values.dtype, value)
100- except LossySetitemError:
101- # current dtype cannot store value, coerce to common dtype
102- nb = self .coerce_to_target_dtype(value)
103- return nb.setitem(indexer, value)
104- else :
105- ```
103+ value = extract_array(value, extract_numpy=True)
104+ try:
105+ casted = np_can_hold_element(values.dtype, value)
106+ except LossSetitiemError:
107+ # current dtype cannot store value, coerce to common dtype
108+ nb = self.coerce_to_target_dtype(value)
109+ return nb.setitem(index, value)
110+ else:
106111
1071122 . making a similar change in:
108- - `` Block.where `` ;
109- - `` Block.putmask `` ;
110- - `` EABackedBlock.setitem `` ;
111- - `` EABackedBlock.where `` ;
112- - `` EABackedBlock.putmask `` ;
113+
114+ - `` Block.where ``
115+ - `` Block.putmask ``
116+ - `` EABackedBlock.setitem ``
117+ - `` EABackedBlock.where ``
118+ - `` EABackedBlock.putmask ``
113119
114120The above would already require several hundreds of tests to be adjusted. Note that once
115121implementation starts, the list of locations to change may turn out to be slightly
@@ -147,11 +153,13 @@ numeric (without much regard for ``int`` vs ``float``) - ``'int64'`` is just wha
147153when constructing it.
148154
149155Possible options could be:
150- 1 . only accept round floats (e.g. `` 1.0 `` ) and raise on anything else (e.g. `` 1.01 `` );
151- 2 . convert the float value to `` int `` before setting it (i.e. silently round all float values);
152- 3 . limit "banning upcasting" to when the upcasted dtype is `` object `` (i.e. preserve current behavior of upcasting the int64 Series to float64) .
156+
157+ 1 . Only accept round floats (e.g. `` 1.0 `` ) and raise on anything else (e.g. `` 1.01 `` ).
158+ 2 . Convert the float value to `` int `` before setting it (i.e. silently round all float values).
159+ 3 . Limit "banning upcasting" to when the upcasted dtype is `` object `` (i.e. preserve current behavior of upcasting the int64 Series to float64).
153160
154161Let us compare with what other libraries do:
162+
155163- `` numpy `` : option 2
156164- `` cudf `` : option 2
157165- `` polars `` : option 2
@@ -165,12 +173,13 @@ if the objective of this PDEP is to prevent bugs, then this is also not desirabl
165173someone might set `` 1.5 `` and later be surprised to learn that they actually set `` 1 `` .
166174
167175There are several downsides to option `` 3 `` :
168- - it would be inconsistent with the nullable dtypes' behaviour;
169- - it would also add complexity to the codebase and to tests;
170- - it would be hard to teach, as instead of being able to teach a simple rule,
171- there would be a rule with exceptions;
172- - there would be a risk of loss of precision and or overflow;
173- - it opens the door to other exceptions, such as not upcasting `` 'int8' `` to `` 'int16' `` .
176+
177+ - It would be inconsistent with the nullable dtypes' behaviour.
178+ - It would also add complexity to the codebase and to tests.
179+ - It would be hard to teach, as instead of being able to teach a simple rule,
180+ There would be a rule with exceptions.
181+ - There would be a risk of loss of precision and or overflow.
182+ - It opens the door to other exceptions, such as not upcasting `` 'int8' `` to `` 'int16' `` .
174183
175184Option `` 1 `` is the maximally safe one in terms of protecting users from bugs, being
176185consistent with the current behaviour of nullable dtypes, and in being simple to teach.
@@ -208,22 +217,25 @@ at all. To keep this proposal focused, it is intentionally excluded from the sco
208217** A** : The current behavior would be to upcast to `` int32 `` . So under this PDEP,
209218 it would instead raise.
210219
211- ** Q: What happens in setting `` 16.000000000000001 `` in an `int8`` Series?**
220+ ** Q: What happens in setting `` 16.000000000000001 `` in an `` int8 `` Series?**
212221
213222** A** : As far as Python is concerned, `` 16.000000000000001 `` and `` 16.0 `` are the
214223 same number. So, it would be inserted as `` 16 `` and the dtype would not change
215224 (just like what happens now, there would be no change here).
216225
217- ** Q: What if I want `` 1.0000000001 `` to be inserted as `` 1.0 `` in an ` 'int8' ` Series?**
226+ ** Q: What if I want `` 1.0000000001 `` to be inserted as `` 1.0 `` in an `` int8 `` Series?**
227+
228+ ** A** : You may want to define your own helper function, such as:
229+
230+ ``` python
231+ def maybe_convert_to_int (x : int | float , tolerance : float ):
232+ if np.abs(x - round (x)) < tolerance:
233+ return round (x)
234+ return x
235+ ```
236+
237+ which you could adapt according to your needs.
218238
219- ** A** : You may want to define your own helper function, such as
220- ``` python
221- >> > def maybe_convert_to_int (x : int | float , tolerance : float ):
222- if np.abs(x - round (x)) < tolerance:
223- return round (x)
224- return x
225- ```
226- which you could adapt according to your needs.
227239
228240## Timeline
229241
0 commit comments