Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe extension: order of columns not preserved #69

Closed
ha62791 opened this issue Feb 2, 2017 · 1 comment
Closed

Dataframe extension: order of columns not preserved #69

ha62791 opened this issue Feb 2, 2017 · 1 comment

Comments

@ha62791
Copy link

ha62791 commented Feb 2, 2017

My dataframe has below data when printed:

df
Out[149]: 
       user gender        age  \
0     Peter      F  23.000000   
1   M.A.R.Y      F  27.333333   
2  The King      M  28.000000   
3   Tim Tom      M  28.000000   
4      Mary      F  29.000000   

                                          self_intro  
0  Hello, my name is Peter. I am graduated from t...  
1  I am Mary. I am from Maryland. I was born from...  
2                                I am King. The end.  
3  Hi, I am Tim Tom. I love eating snakes. I am v...  
4                                                ...  

I use below code to test:

write_dataframe(hdfs_client, df_avro_filepath, df, overwrite=True)
_df = read_dataframe(hdfs_client, df_avro_filepath)
print(_df)
pd.util.testing.assert_frame_equal(df, _df)

and the output has the column sorted alphabetically:

         age gender                                         self_intro  \
0  29.000000      F                                                ...   
1  27.333334      F  I am Mary. I am from Maryland. I was born from...   
2  23.000000      F  Hello, my name is Peter. I am graduated from t...   
3  28.000000      M  Hi, I am Tim Tom. I love eating snakes. I am v...   
4  28.000000      M                                I am King. The end.   

       user  
0      Mary  
1   M.A.R.Y  
2     Peter  
3   Tim Tom  
4  The King 
...
AssertionError: DataFrame.columns are different

DataFrame.columns values are different (75.0 %)
[left]:  Index(['user', 'gender', 'age', 'self_intro'], dtype='object')
[right]: Index(['age', 'gender', 'self_intro', 'user'], dtype='object')
@mtth
Copy link
Owner

mtth commented Feb 4, 2017

Just released a new version, 2.0.16, that should address this issue: any dataframes you upload then download should retain their original column order.

@mtth mtth closed this as completed Feb 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants