Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_stata ignores index parameter #16342

Closed
TomAugspurger opened this issue May 12, 2017 · 15 comments · Fixed by #17328
Closed

read_stata ignores index parameter #16342

TomAugspurger opened this issue May 12, 2017 · 15 comments · Fixed by #17328
Labels
IO Stata read_stata, to_stata
Milestone

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 12, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
import pandas.util.testing as tm

In [19]: df = tm.makeDataFrame()

In [20]: df.to_stata('foo.stata')

In [21]: pd.read_stata('foo.stata', index='index')
Out[21]:
         index         A         B         C         D
0   2wIkgwAbVm  0.310317 -0.753652 -0.509405 -0.309492
1   59guDFHo0l  0.404983 -0.098995 -0.330786 -0.548584
2   5sX3JbPQjk  0.520377  1.379943  2.375164  1.475831
3   UsF3izOnQy -1.413272  0.668192  0.859581 -0.646165
...

It should be like read_stata('foo.stata').set_index('index')

Side-note, I think elsewhere we use index_col instead of index.

Output of pd.show_versions()

master

@TomAugspurger TomAugspurger added the IO Stata read_stata, to_stata label May 12, 2017
@TomAugspurger TomAugspurger added this to the 0.20.2 milestone May 12, 2017
@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.20.2 May 12, 2017
@VincentLa
Copy link
Contributor

What is tm.makeDataFrame()?

@TomAugspurger
Copy link
Contributor Author

@VincentLa Sorry, it's import pandas.util.testing as tm I'll fix it.

@stanleyguan
Copy link

Attempting to take a stab at this issue.

@TomAugspurger
Copy link
Contributor Author

FYI I think @mogillies is working on this? Is that right?

@VincentLa
Copy link
Contributor

Awesome. Anyone at Pycon Sprints right now who is working on this issue?

@stanleyguan
Copy link

Looking at Gitter discussions I think @mogillies is working on it. I will look for something else :)

@VincentLa
Copy link
Contributor

@mogillies let me know if you're at Pycon right now and would be up for pairing!

@jorisvandenbossche
Copy link
Member

@TomAugspurger if the index param is actually not working right now, shouldn't we change it to index_col at the same time?

@TomAugspurger
Copy link
Contributor Author

@jorisvandenbossche oh yeah, I guess we won't be breaking backwards compat :D

@mogillies
Copy link

Should we change to Index_col?

@mogillies
Copy link

Sure. I'm in the same room as you :)

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented May 22, 2017

@mogillies yeah, if you would index_col (lowercase i)

@mogillies
Copy link

Sure.

@mogillies
Copy link

Just to clarify index param index not working
do you mean that first index not working or the second index not working

  1. index = index_cols
  2. index_cols = index_cols

if index is not None:
data.set_index(index_cols)

or (I don't think this is the case):
do you mean to use the function index_cols() in class Table?

if index is not None:
data.index_cols()

@jorisvandenbossche
Copy link
Member

Currently the keyword argument in read_stata to specify which colunm to set as the index is called index. And there are two issues:

  • the fact that this parameter is not working
  • the fact that it is a bit inconsistent with other read functions where we typically use index_col

You are trying to solve the first one. But what we suggest is to also fix the second one and rename the index keyword argument to index_col (but for now keeping index as an alias). The fact that the old keyword was not yet working just makes it a bit easier to do this change, as nobody could already rely on its behaviour.

bashtage added a commit to bashtage/pandas that referenced this issue Aug 24, 2017
Ensures index is set when requested when reading state dta file

closes pandas-dev#16342
bashtage added a commit to bashtage/pandas that referenced this issue Aug 28, 2017
Ensures index is set when requested during reading of a Stata dta file
Rename index to index_col for API consistency

closes pandas-dev#16342
@jreback jreback modified the milestones: 0.21.0, Next Major Release Aug 29, 2017
bashtage added a commit to bashtage/pandas that referenced this issue Aug 29, 2017
Ensures index is set when requested during reading of a Stata dta file
Deprecates and renames index to index_col for API consistence

closes pandas-dev#16342
TomAugspurger pushed a commit that referenced this issue Sep 16, 2017
Ensures index is set when requested during reading of a Stata dta file
Deprecates and renames index to index_col for API consistence

closes #16342
alanbato pushed a commit to alanbato/pandas that referenced this issue Nov 10, 2017
Ensures index is set when requested during reading of a Stata dta file
Deprecates and renames index to index_col for API consistence

closes pandas-dev#16342
No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017
Ensures index is set when requested during reading of a Stata dta file
Deprecates and renames index to index_col for API consistence

closes pandas-dev#16342
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants