# `read_csv()` - `skiprows` parameter

In [1]:
import pandas as pd

Normal read

In [6]:
df_student_score = pd.read_csv("./datasets/students_score_no_index.csv")

In [7]:
df_student_score

Unnamed: 0,Name,Score,Grades
0,Paul,98,AA
1,Aaron,89,AB
2,Krista,99,AA
3,Veronica,87,AB
4,Paxton,90,AC
5,Madison,83,BA
6,Aurora,82,BB


## 1. Skipping rows from first (0th) position

Specifiy how many row(s) that we want to skip from the dataset

In [8]:
df_student_score = pd.read_csv("./datasets/students_score_no_index.csv", skiprows=2)
# means we skip first 2 rows

In [9]:
df_student_score

Unnamed: 0,Aaron,89,AB
0,Krista,99,AA
1,Veronica,87,AB
2,Paxton,90,AC
3,Madison,83,BA
4,Aurora,82,BB


See the difference?

<b> * Always keep in mind that when we only use one integer as the value of skiprow parameter, it means we also skip the header of the dataset</b>

## 2. Skipping rows from a specific/custom position

<b> * Note that when we use list as the skiprow parameter, we won't skip the header of the dataset</b>

In [10]:
df_student_score = pd.read_csv("./datasets/students_score_no_index.csv", skiprows=[2, 4, 6])
# means we skip second, fourth, and fifth rows

In [11]:
df_student_score

Unnamed: 0,Name,Score,Grades
0,Paul,98,AA
1,Krista,99,AA
2,Paxton,90,AC
3,Aurora,82,BB


## 3. Skipping rows using list comprehension

In [12]:
df_student_score = pd.read_csv("./datasets/students_score_no_index.csv", skiprows=[i for i in range(1, 3)])
# means we skip first row and second row
# because range(1, 3) means take 1 but do not take 3
# in other words range(1, 3) equals 1 and 2
# because (1, 3) means first inclusive (1) and last exclusive (3)

In [13]:
df_student_score

Unnamed: 0,Name,Score,Grades
0,Krista,99,AA
1,Veronica,87,AB
2,Paxton,90,AC
3,Madison,83,BA
4,Aurora,82,BB


In [14]:
# skip first, third, and fifth row looks like:
df_student_score = pd.read_csv("./datasets/students_score_no_index.csv", skiprows=[i for i in [2, 4, 6]])
# means we skip second, fourth, and fifth rows

In [15]:
df_student_score

Unnamed: 0,Name,Score,Grades
0,Paul,98,AA
1,Krista,99,AA
2,Paxton,90,AC
3,Aurora,82,BB


## 4. Skipping rows using conditions

Say we want to use a function that return a boolean value if a number is divisible by 3

In [17]:
def div_three(x):
    if x % 3 == 0:
        return True
    else:
        False

Now we pass the function using lambda as a value for the skiprow parameter, so that we may skip whichever row position that is divisible by three. In other words, we skip every third row

In [18]:
df_student_score = pd.read_csv("./datasets/students_score_no_index.csv", skiprows=lambda x: div_three(x))

In [19]:
df_student_score

Unnamed: 0,Paul,98,AA
0,Aaron,89,AB
1,Veronica,87,AB
2,Paxton,90,AC
3,Aurora,82,BB


## 5. Using `skipfooter` parameter

Skip number of rows from the end row of the dataset

In [20]:
df_student_score = pd.read_csv("./datasets/students_score_no_index.csv", skipfooter=3)

  df_student_score = pd.read_csv("./datasets/students_score_no_index.csv", skipfooter=3)


In [21]:
df_student_score

Unnamed: 0,Name,Score,Grades
0,Paul,98,AA
1,Aaron,89,AB
2,Krista,99,AA
3,Veronica,87,AB


Skipped last three rows

In [22]:
# Use engine parameter to ignore the warning
df_student_score = pd.read_csv("./datasets/students_score_no_index.csv", skipfooter=3, engine='python')

In [23]:
df_student_score

Unnamed: 0,Name,Score,Grades
0,Paul,98,AA
1,Aaron,89,AB
2,Krista,99,AA
3,Veronica,87,AB
