## __Pandas DataFrame__

The pandas DataFrame is a two-dimensional data structure that is in a tabular format.

## Step 1: Import Pandas and Create an Empty DataFrame

- Import the pandas library to create a DataFrame:


In [None]:
import pandas as pd

To create a DataFrame, we need to call DataFrame() in the pandas library.

Let's create a DataFrame **df**.

In [None]:
df = pd.DataFrame()

Now, let's find the type of the DataFrame **df**.

In [None]:
type(df)

pandas.core.frame.DataFrame

**Observation**

The type of the DataFrame is **pandas.core.frame.DataFrame**.

## Step 2: Read a CSV File

We can load the data into a DataFrame from various data files. Here, we will load the data from a CSV file.

- Read a CSV file using the pd.read_csv() function:


In [None]:
df = pd.read_csv('PandasExample.csv')

Now, let's print the DataFrame **df** to see the data.

We can also print the first or last five rows of a DataFrame using the head and tail functions.

In [None]:
df

Unnamed: 0,Name,Age,Gender
0,Nithin,24,Male
1,Manoj,30,Male
2,Shivashankar,44,Male
3,Swathi,18,Female
4,Pareekshith,28,Male


**Observation**

- The result is in a tabular format, which has rows and columns.

- The row indices are generated.

- There are 3 columns: **Name**, **Age** and **Gender**.



## Step 3: Display the First and Last Rows

Display the first and last rows of the DataFrame using the head() and tail() methods:

- head() returns the first 5 rows of a DataFrame.

- tail() returns the last 5 rows of a DataFrame.

In [None]:

df.head()

Unnamed: 0,Name,Age,Gender
0,Nithin,24,Male
1,Manoj,30,Male
2,Shivashankar,44,Male
3,Swathi,18,Female
4,Pareekshith,28,Male


In [None]:
df.tail()

Unnamed: 0,Name,Age,Gender
0,Nithin,24,Male
1,Manoj,30,Male
2,Shivashankar,44,Male
3,Swathi,18,Female
4,Pareekshith,28,Male


We can also specify the number of rows we want to display by passing it to the head or tail functions.

- Print the first 2 rows of the DataFrame **df**:


In [None]:
df.head(2)

Unnamed: 0,Name,Age,Gender
0,Nithin,24,Male
1,Manoj,30,Male


Print the last two rows by passing 2 as an argument:

In [None]:
df.tail(2)

Unnamed: 0,Name,Age,Gender
3,Swathi,18,Female
4,Pareekshith,28,Male


## Step 4: Index-Based Accessing

We can access elements using the .iloc method for which we need to pass integer-based indices. That is, from 0 to n-1, where n is the total number of rows or columns.

- Access rows in a DataFrame using the .iloc method:


In [None]:
df.iloc[0]

Name      Nithin
Age           24
Gender      Male
Name: 0, dtype: object

**Observation**

The result is the data in the first row.

The result also shows the data type. The data type is **object** since each column contains different types of data.

Now, let's print the data in the 4<sup>th</sup> row.

In [None]:
df.iloc[3]

Name      Swathi
Age           18
Gender    Female
Name: 3, dtype: object

**Observation**

This is the data from the 4<sup>th</sup> row.

## Step 5: Access DataFrame Values as a NumPy Array

We can only retrieve values from a DataFrame.

- Access the DataFrame values as a NumPy array using the .values attribute:


In [None]:
df.values

array([['Nithin', 24, 'Male'],
       ['Manoj', 30, 'Male'],
       ['Shivashankar', 44, 'Male'],
       ['Swathi', 18, 'Female'],
       ['Pareekshith', 28, 'Male']], dtype=object)

## Step 6: Read a CSV File in Chunks

We can also read the file in chunks.

Reading the file will return an iterable. We need to iterate through it.

To read the CSV file in chunks, we use the **chunk size** parameter in the pd.read_csv() function.


- Read a CSV file in chunks:

In [None]:
df = pd.read_csv('PandasExample.csv',chunksize=2)

Now, let's print the chunk.

In [None]:
for chunk in df:
  print(chunk)

     Name  Age Gender
0  Nithin   24   Male
1   Manoj   30   Male
           Name  Age  Gender
2  Shivashankar   44    Male
3        Swathi   18  Female
          Name  Age Gender
4  Pareekshith   28   Male


**Observation**

We can see that three separate DataFrames are created with chunk size 2.

## Step 7: Filter the DataFrame Based on a Condition

We can also filter the data using conditions in Pandas. For example, if we were to print ages that are only above 25, we could do that.

- Read pandasExample.csv into DataFrame **df**
- Access the **Age** column from **df**, and check df['Age'] > 25
- Extract only that data from **df**
- Assign that to **df**


Let's read the CSV file.

In [None]:
df = pd.read_csv('PandasExample.csv')

Now, let's see how to extract the data where **Age** is greater than 25.

In [None]:
df = df[df['Age']>25]

Print **df**:

In [None]:
df

Unnamed: 0,Name,Age,Gender
1,Manoj,30,Male
2,Shivashankar,44,Male
4,Pareekshith,28,Male


**Observation**

The DataFrame now shows records where **Age** is greater than 25.