# Add row numbers to a DataFrame

Use [`np.arange()`](https://numpy.org/doc/stable/reference/generated/numpy.arange.html).

Source: [Generate row number in pandas python](https://www.datasciencemadesimple.com/generate-row-number-in-pandas-python-2/) (DataScience Made Simple)

In [1]:
# Setup

import numpy as np
import pandas as pd

In [2]:
# Create some toy data

df = pd.DataFrame(
    [
        [np.nan, 2, np.nan, 0],
        [3, 4, np.nan, 1],
        [np.nan, np.nan, np.nan, 5],
        [np.nan, 3, np.nan, 4]
    ],
    columns=list("ABCD")
)

df

Unnamed: 0,A,B,C,D
0,,2.0,,0
1,3.0,4.0,,1
2,,,,5
3,,3.0,,4


You could just create a row number from the index

In [3]:
# Add a row number column based on the index

df.assign(row_number=df.index)

Unnamed: 0,A,B,C,D,row_number
0,,2.0,,0,0
1,3.0,4.0,,1,1
2,,,,5,2
3,,3.0,,4,3


However, if you've subset your data and haven't reset the index, or if you have a non-default index, this method will give you unexpected results.

In [4]:
# Subset the original data

df_subset = df[~pd.isna(df["B"])]

df_subset

Unnamed: 0,A,B,C,D
0,,2.0,,0
1,3.0,4.0,,1
3,,3.0,,4


In [5]:
# Show how using the index to create a row number might give unexpected
# results on subset data.

df_subset.assign(row_number=df_subset.index)

Unnamed: 0,A,B,C,D,row_number
0,,2.0,,0,0
1,3.0,4.0,,1,1
3,,3.0,,4,3


Using `np.arange()` creates row numbers based on the row order, even when data has been subset.

In [6]:
# Use np.arange() to add a row number column

df.assign(row_number=np.arange(df.shape[0]))

Unnamed: 0,A,B,C,D,row_number
0,,2.0,,0,0
1,3.0,4.0,,1,1
2,,,,5,2
3,,3.0,,4,3


In [7]:
# Use np.arange() to add a row number column on data that has been subset
# and therefore doesn't have an index that reflects the order of the rows.

df_subset.assign(row_number=np.arange(df_subset.shape[0]))

Unnamed: 0,A,B,C,D,row_number
0,,2.0,,0,0
1,3.0,4.0,,1,1
3,,3.0,,4,2


In the examples above, I've used `pd.DataFrame.assign()` to create the columns without altering the source data. If using this method is unfamiliar to you, just know that you can also just assign the column value.

In [8]:
# Show adding the row number column using assignment instead of
# `pd.DataFrame.assign()`

df["row_number"] = np.arange(df.shape[0])

df

Unnamed: 0,A,B,C,D,row_number
0,,2.0,,0,0
1,3.0,4.0,,1,1
2,,,,5,2
3,,3.0,,4,3
