Skip to content

Reindex broken #23104

@boldt

Description

@boldt

MWE

from __future__ import print_function

import pandas as pd
import numpy as np

print("Panda version:", pd.__version__)
print("+++++++++++++++++++++++++++++++++++")
print(pd.show_versions())
print("+++++++++++++++++++++++++++++++++++")

####################################################
# Config
####################################################

pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

####################################################
# Read data
####################################################

file = "/tmp/california_housing_train.csv"
if(np.DataSource().exists(file)):
	dataset = file
else:
	dataset = "https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv"

sep=","
california_housing_dataframe = pd.read_csv(dataset, sep)

####################################################
# Reorder
####################################################

newOrder = np.random.permutation(california_housing_dataframe.index)
california_housing_dataframe_reordered = california_housing_dataframe.reindex(newOrder)

####################################################
# Merge and show diff of the heads
####################################################

# Let's take the heads of both datasetstand compare them
# They should be different in (mostly) all elements 

head1 = california_housing_dataframe.head(10)
head2 = california_housing_dataframe_reordered.head(10)

# @see https://stackoverflow.com/a/36893675/605890
merged = head1.merge(head2, indicator=True, how='outer')
print(merged)

Run on colab

I created a colab for the MWE, which is based on pandas 0.22.0:

https://colab.research.google.com/drive/19uDE_H4AtpLaEL6INrRrDMXkdANsNr69#scrollTo=CzxuGppV26Rt

If you run it, you see at the output (if non is doubled randomly):

  • 10x left_only
  • 10x right_only

Run with docker containers

Now, run the same MWE (located under /tmp/tf/Bug.py) in a two different docker containers, which uses pandas 0.23.4,:

Both return:

  • 10x both

This means, both heads are the same, which means that reindex does not have any effect.

Python docker container (python 3.6.6)

docker run --rm -it -v /tmp/tf/:/tmp/ python:3.6.6 /bin/bash -c "pip install pandas && python /tmp/Bug.py"

tensorflow docker container (tensorsflow 1.11.0)

docker run --rm -it -v /tmp/tf/:/tmp/ tensorflow/tensorflow:1.11.0-py3 python /tmp/Bug.py 

TLDR

The following code does not have any effect in pandas 0.23.4:

california_housing_dataframe_reordered = california_housing_dataframe.reindex(newOrder)

Metadata

Metadata

Assignees

No one assigned

    Labels

    DependenciesRequired and optional dependencies

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions