Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't handle duplicate column names in sql read #3487

Closed
darindillon opened this issue Apr 29, 2013 · 9 comments
Closed

Can't handle duplicate column names in sql read #3487

darindillon opened this issue Apr 29, 2013 · 9 comments
Labels
Milestone

Comments

@darindillon
Copy link

Possibly related to #3468

Using pandas 0.10.1.
It's legal for sql (at least Sql Server) to have queries that return two columns with the same name. And it's legal for pandas to have dataframes that have two columns with the same name. But pandas gives a very misleading error message when you try it with read_frame:

Query that has 2 columns with the same name:

sql = "select 1 as whatever, 2 as name, 3 as name where 1=0"
pos = pandas.io.sql.read_frame(sql, dbconn)

That gives very misleading error message

@jreback
Copy link
Contributor

jreback commented May 2, 2013

@tavistmorph we just pushed updates for dup_columns. Can you try this wil latest master (I am not sure this is the issue, but let's see)

@ghost
Copy link

ghost commented May 10, 2013

Can't repro with 0.10.1 or 0.11.0 or master

In [1]: import sqlite3
   ...: 
   ...: import pandas.io.sql as psql
   ...: con = sqlite3.connect("a2.db") 
   ...: c = con.cursor()
   ...: try:
   ...:     c.execute('''CREATE TABLE tab (a,b,c);''')
   ...: except Exception as e:
   ...:     pass
   ...: try:
   ...:     c.execute("INSERT INTO tab VALUES (1,2,3);")
   ...:     c.execute("INSERT INTO tab VALUES (4,5,6);")
   ...:     c.commit()
   ...: except Exception as e:
   ...:     pass
   ...: #    print e
   ...: sql = "select a,b as a from tab where a=1"
   ...: #c.execute(sql).fetchall()
   ...: pandas.io.sql.read_frame(sql, con)
Out[1]: 
   a  a
0  1  2

In [3]: print pd.__version__
0.10.1

@darindillon
Copy link
Author

Your code above works fine for me too. But I was running a different case than you. Try it this way and it gives an error for me on 0.11. (BTW -- not sure if it matters, but my database is sql server)

import pandas
import pyodbc
import pandas.io.sql
conn = pyodbc.connect("your dsn here")
sql = "select 1 as whatever, 2 as name, 3 as name where 1 = 0"
pos = pandas.io.sql.read_frame(sql, conn)

Result:
Traceback (most recent call last):
File "D:\workspaces\PythonOneOffs\client.py", line 7, in
pos = pandas.io.sql.read_frame(sql, conn)
File "C:\Python27\lib\site-packages\pandas\io\sql.py", line 162, in read_frame
coerce_float=coerce_float)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1115, in from_records
columns)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 5338, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 1821, in create_block_manager_from_arrays
mgr = BlockManager(blocks, axes)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 896, in init
self._verify_integrity()
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 990, in _verify_integrity
raise AssertionError('Number of manager items must equal union of '
AssertionError: Number of manager items must equal union of block items

@ghost
Copy link

ghost commented May 10, 2013

I can reproduce that error, but only when the select query returns 0 rows.

import sqlite3

import pandas.io.sql as psql
con = sqlite3.connect("a9.db") 
c = con.cursor()
try:
    c.execute('''CREATE TABLE tab (a,b,c)''')
except:
    pass
try:
    c.execute("INSERT INTO tab VALUES (1,2,3)")
    c.execute("INSERT INTO tab VALUES (4,5,6)")
    c.commit()
except:
    pass

sql = 'select a as a2  , b , c as b from tab where a="meow"'
pandas.io.sql.read_frame(sql, con)

/home/user1/src/pandas/pandas/core/internals.py in _verify_integrity(self)
   1126 
   1127         if len(self.items) != tot_items:
-> 1128             raise AssertionError('Number of manager items must equal union of '
   1129                                  'block items')
   1130 

AssertionError: Number of manager items must equal union of block items

@ghost
Copy link

ghost commented May 10, 2013

Works with pyodbc as well, unless the select query returns no rows.

In [45]: import sqlite3
    ...: import pyodbc
    ...: 
    ...: import pandas.io.sql as psql
    ...: con = sqlite3.connect("a.db") 
    ...: c = con.cursor()
    ...: try:
    ...:     c.execute('''CREATE TABLE tab (a,b,c)''')
    ...: except:
    ...:     pass
    ...: try:
    ...:     c.execute("INSERT INTO tab VALUES (1,2,3)")
    ...:     c.execute("INSERT INTO tab VALUES (4,5,6)")
    ...:     c.commit()
    ...: except:
    ...:     pass
    ...: con.close()
    ...: 
    ...: con = pyodbc.connect('DRIVER={SQLite3};DATABASE=a.db;')
    ...: sql = 'select a as a2  , b , c as b from tab'
    ...: pandas.io.sql.read_frame(sql, con)
Out[45]: 
   a2  b  b
0   1  2  3
1   4  5  6

@jreback
Copy link
Contributor

jreback commented May 10, 2013

@y-p what is actually passed in the 0-case (to arrays_to_mgr)?

@ghost
Copy link

ghost commented May 10, 2013

Interesting.
This works:

DataFrame.from_records([], columns=['a','b','c'])

Empty DataFrame
Columns: [a, b, c]
Index: []

But this fails with the obscure message:

DataFrame.from_records([], columns=['a','b','b'])

AssertionError: Number of manager items must equal union of block items

The main ctor doesn't let you construct an empty df with non-zero dims at all:

DataFrame([], columns=['a','b','c'])

ValueError: Shape of passed values is (0, 0), indices imply (3, 0)

but allows it if there are no dims (I remember opening that issue, actually):

DataFrame([])

Empty DataFrame
Columns: []
Index: []

In any case, the original issue probably interpreted the behaviour on an empty queryset
with a bug in the duplicate column handling.

@ghost
Copy link

ghost commented May 10, 2013

I think we can close this in deference to #3562.

@ghost
Copy link

ghost commented May 10, 2013

ok, #3564 closed #3562 and this as well.

@ghost ghost closed this as completed May 10, 2013
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants