REGR?: read_sql no longer supports duplicate column names

Probably caused by https://github.com/pandas-dev/pandas/pull/50048, and probably related to https://github.com/pandas-dev/pandas/issues/52437 (another change in behaviour that might have been caused by the same PR). In essence the change is from using `DataFrame.from_records` to processing the records into a list of column arrays and then using `DataFrame(dict(zip(columns, arrays)))`. That has slightly different behaviour.

Dummy reproducer for the `read_sql` change:

```python
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': [0.1, 0.2, 0.3]})

from sqlalchemy import create_engine
eng = create_engine("sqlite://")
df.to_sql("test_table", eng, index=False)

pd.read_sql("SELECT a, b, a +1 as a FROM test_table;", eng)
```

With pandas 1.5 this returns

```
   a    b  a
0  1  0.1  2
1  2  0.2  3
2  3  0.3  4
```

with pandas 2.0 this returns

```
   a    b
0  2  0.1
1  3  0.2
2  4  0.3
```

I don't know how _much_ we want to support duplicate column names in read_sql, but it _is_ a change in behaviour, and the new behaviour of just silently ignoring it / dropping some data also isn't ideal IMO.

cc @phofl 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

REGR?: read_sql no longer supports duplicate column names #53117

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

REGR?: read_sql no longer supports duplicate column names #53117

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions