- Title: Understand Index in pandas
- Slug: python-pandas-index
- Date: 2020-08-10 11:01:40
- Category: Computer Science
- Tags: programming, Python, pandas, Index, DataFrame, Series
- Author: Ben Du
- Modified: 2020-08-10 11:01:40


## Comments 

1. There are multiple ways to update the index of a DataFrame or Series. 
    First,
    you can assign a new `Series` or `Index` object to the index of a DataFrame or Series.
    Or you can use methods such as `DataFrame.set_index` or `DataFrame.reset_index`.
    `DataFrame.reset_index` resets the index of a DataFrame/Series 
    to an integer index starting from 0.
    The old index is kept by default but can be dropped using the option `drop=True`.
    `DataFrame.set_index` sets the index of a DataFrame to the specified column
    and removes the column from the DataFrame.
    This can also be achieved by directly assign the column to the index of the DataFrame 
    and then manually remove the column from the DataFrame.
    Note that by default `DataFrame.set_index`, `DataFrame.reset_index` and `Series.reset_index` returns new copies.
    The option `inplace=True` can be specified to make the update in-place. 

## [set_index](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html)


In [1]:
import pandas as pd

df = pd.DataFrame(
    {
        "x": [1, 2, 3, 4, 5],
        'y': [5, 4, 3, 2, 1]
    }, index=['r1', 'r2', 'r3', 'r4', 'r5']
)

df

Unnamed: 0,x,y
r1,1,5
r2,2,4
r3,3,3
r4,4,2
r5,5,1


In [2]:
df.set_index("x")

Unnamed: 0_level_0,y
x,Unnamed: 1_level_1
1,5
2,4
3,3
4,2
5,1


## reindex

DataFrame.reindex does NOT change the original index. 
It just rearrange rows according to the specified index.
If you want change the index but keep the orignal order of row,
just assign new values to the index of the DataFrame
or call the method `reset_index(drop=True)`.

In [1]:
import pandas as pd

df = pd.DataFrame(
    {
        'x': [1, 2, 3, 4, 5],
        'y': [5, 4, 3, 2, 1]
    }, index=['r1', 'r2', 'r3', 'r4', 'r5']
)

df.head()

Unnamed: 0,x,y
r1,1,5
r2,2,4
r3,3,3
r4,4,2
r5,5,1


In [11]:
df.reindex(index=range(0, df.shape[0]))

Unnamed: 0,x,y
0,,
1,,
2,,
3,,
4,,


In [12]:
df.reindex(index=['r1', 'r3', 'r5', 'r2', 'r4'])

Unnamed: 0,x,y
r1,1,5
r3,3,3
r5,5,1
r2,2,4
r4,4,2


In [4]:
x = df.copy()
print(x)
x.index = range(1, 6)
x

   x  y
y      
5  1  5
4  2  4
3  3  3
2  4  2
1  5  1


Unnamed: 0,x,y
1,1,5
2,2,4
3,3,3
4,4,2
5,5,1


In [22]:
x = df.copy()
x.reset_index()

Unnamed: 0,index,x,y
0,r1,1,5
1,r2,2,4
2,r3,3,3
3,r4,4,2
4,r5,5,1


In [3]:
x = df.copy()
x.reset_index(drop=True, inplace=True)
x

Unnamed: 0,x,y
0,1,5
1,2,4
2,3,3
3,4,2
4,5,1


## [reset_index](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html)

By default `reset_index` returns a copy rather than modify the original data frame. 
You can specify `inplace=True` to overwrite the behavior.

### Series

1. If you drop the original index, 
    you still have a Series. 
    However, 
    if you reset index of a sereis without dropping the original index, 
    you get a data frame.


In [5]:
s = pd.Series([1, 2, 3, 4], index=['r1', 'r2', 'r3', 'r4'])
s

r1    1
r2    2
r3    3
r4    4
dtype: int64

In [8]:
df = s.reset_index()
df

Unnamed: 0,index,0
0,r1,1
1,r2,2
2,r3,3
3,r4,4


In [10]:
df = s.reset_index(drop=True)
df

0    1
1    2
2    3
3    4
dtype: int64

### DataFrame

In [15]:
import pandas as pd

df = pd.DataFrame(
    {
        'x': [1, 2, 3, 4, 5],
        'y': [5, 4, 3, 2, 1]
    }, index=['r1', 'r2', 'r3', 'r4', 'r5']
)

df.head()

Unnamed: 0,x,y
r1,1,5
r2,2,4
r3,3,3
r4,4,2
r5,5,1


In [29]:
# keep the original index as a new column and create a new index
df.reset_index()

Unnamed: 0,index,x,y
0,r1,1,5
1,r2,2,4
2,r3,3,3
3,r4,4,2
4,r5,5,1


In [30]:
# drop the original index and create a new index
df.reset_index(drop=True)

Unnamed: 0,x,y
0,1,5
1,2,4
2,3,3
3,4,2
4,5,1


#### Multi-index

In [31]:
import pandas as pd

df = pd.DataFrame(
    {
        'x': [1, 2, 3, 4, 5],
        'y': [5, 4, 3, 2, 1]
    },
    index=pd.MultiIndex.from_tuples(
        [('r1', 0), ('r2', 1), ('r3', 2), ('r4', 3), ('r5', 4)]
    )
)

df.head()

Unnamed: 0,Unnamed: 1,x,y
r1,0,1,5
r2,1,2,4
r3,2,3,3
r4,3,4,2
r5,4,5,1


In [32]:
df.reset_index()

Unnamed: 0,level_0,level_1,x,y
0,r1,0,1,5
1,r2,1,2,4
2,r3,2,3,3
3,r4,3,4,2
4,r5,4,5,1


In [33]:
df.reset_index(drop=True)

Unnamed: 0,x,y
0,1,5
1,2,4
2,3,3
3,4,2
4,5,1


In [38]:
# drops the 2nd index and keep the first index
df.reset_index(level=1, drop=True)

Unnamed: 0,x,y
r1,1,5
r2,2,4
r3,3,3
r4,4,2
r5,5,1


## Assign Index

In [1]:
import pandas as pd

df = pd.DataFrame(
    {
        'x': [1, 2, 3, 4, 5],
        'y': [5, 4, 3, 2, 1]
    }, index=['r1', 'r2', 'r3', 'r4', 'r5']
)

df.head()

Unnamed: 0,x,y
r1,1,5
r2,2,4
r3,3,3
r4,4,2
r5,5,1


In [2]:
df.index = df.y
df

Unnamed: 0_level_0,x,y
y,Unnamed: 1_level_1,Unnamed: 2_level_1
5,1,5
4,2,4
3,3,3
2,4,2
1,5,1


## Index to Series
An index can be converted to a Series object,
which makes it benefits from the rich methods of Series.

In [49]:
df.columns.to_series().select(lambda x: x == 'x')

x    x
dtype: object

## Multi-Index

In [None]:
pd.MultiIndex.from_product([[jj.index.name], jj.index.values])

In [None]:
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

In [None]:
pd.MultiIndex.from_tuples([(jj.index.name, v) for v in jj.index.values])

## References

https://www.youtube.com/watch?v=tcRGa2soc-c

https://stackoverflow.com/questions/38542419/could-pandas-use-column-as-index

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html

http://www.legendu.net/misc/blog/python-pandas-set_index/

http://www.legendu.net/misc/blog/python-pandas-reset_index/

http://www.legendu.net/misc/blog/python-pandas-reindex/

http://www.legendu.net/misc/blog/python-pandas-multiindex/
