Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Columns are typed "REAL" if they are integers with some NaN/blanks #5

Closed
simonw opened this issue Nov 17, 2017 · 4 comments
Closed

Comments

@simonw
Copy link
Owner

simonw commented Nov 17, 2017

This is bad. If a column has all integers and some blanks it should result in an INTEGER.

Example: this CSV https://github.com/openelections/openelections-data-ca/blob/master/2016/20161108__ca__general__yolo__precinct.csv produces this SQL table:

CREATE TABLE "2016/20161108__ca__general__yolo__precinct" (
"county" TEXT,
  "precinct" INTEGER,
  "office" INTEGER,
  "district" REAL,
  "party" REAL,
  "candidate" INTEGER,
  "votes" INTEGER
,
FOREIGN KEY (county) REFERENCES county(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id))
@simonw
Copy link
Owner Author

simonw commented Nov 17, 2017

@rgieseke
Copy link

I ran into the same problem, in my https://github.com/rgieseke/pandas-datapackage-reader reader I convert columns with missing integers to dtype object, but that doesn't help with writing to sql.

The Tableschema library (https://github.com/frictionlessdata/tableschema-py) might be helpful here for inferring data types:

test.csv

a,b
1,0.1
2,0.2
,0.3
4,
from tableschema import infer

infer("test.csv")
{
  'fields': [
    {'format': 'default', 'name': 'a', 'type': 'integer'},
    {'format': 'default', 'name': 'b', 'type': 'number'}
  ],
   'missingValues': ['']
}

@simonw
Copy link
Owner Author

simonw commented Nov 19, 2017

Fixed in a8ab524 and 0997b7b

@simonw simonw closed this as completed Nov 19, 2017
@simonw
Copy link
Owner Author

simonw commented Nov 19, 2017

@rgieseke your pandas-datapackage-reader tool looks fascinating, I'll definitely have a play with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants