Pandas-DataFrame and Series

Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning.
It provides two primary data structures: Series and DataFrame.
A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional, size-mutable and potentially heterogeneous tabular data structure labeled axes (rows and columns).

In [None]:
import pandas as pd

### Series

A Series in Pandas is a one-dimensional array capable of holding data of any type (integers, strings, floats, Python objects, etc.) with labels called index.

In [3]:
import pandas as pd

data = [1,2,3,4,5]
series = pd.Series(data)
print("Series \n", series)

Series 
 0    1
1    2
2    3
3    4
4    5
dtype: int64


In [4]:
# Created a Series from dictionary

data = {'a': 1, 'b': 2, 'c': 3}
series_dict = pd.Series(data)
print(series_dict)

a    1
b    2
c    3
dtype: int64


In [5]:
data = [10, 20, 30]
index = ['a', 'b', 'c']
pd.Series(data, index = index)

a    10
b    20
c    30
dtype: int64

### DataFrame

A DataFrame is a 2D data structure in Pandas where data is arranged in rows and columns (like an Excel spreadsheet or SQL table).

In [6]:
#c Create a DataFrame from a dictionary of list

data = {
  'Name' : ['Suk', 'Waani', 'Shristi'],
  'Age' : [19, 19, 24],
  'City' : ['Odisha', 'New York', 'Bangalore']
  }
df = pd.DataFrame(data)
print(df)
print(type(df))

      Name  Age       City
0      Suk   19     Odisha
1    Waani   19   New York
2  Shristi   24  Bangalore
<class 'pandas.core.frame.DataFrame'>


In [7]:
import numpy as np
np.array(df)

array([['Suk', 19, 'Odisha'],
       ['Waani', 19, 'New York'],
       ['Shristi', 24, 'Bangalore']], dtype=object)

In [8]:
# Create a DataFrame from a list of Dictionaries

data = [
  {'Name' :'Suk', 'Age' : 19,'City' :'Odisha'},
  {'Name' :'Waani', 'Age' : 19,'City' :'Pune'},
  {'Name' :'Sourya', 'Age' : 18,'City' :'Odisha'},
  {'Name' :'Shristi', 'Age' : 22,'City' :'Bangalore'},
]

df = pd.DataFrame(data)
print(df)
print(type(df))

      Name  Age       City
0      Suk   19     Odisha
1    Waani   19       Pune
2   Sourya   18     Odisha
3  Shristi   22  Bangalore
<class 'pandas.core.frame.DataFrame'>


In [15]:
df = pd.read_csv('sales_data.csv')
df.head(5)

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Price_per_Unit,Total_Sales
0,ORD1001,2025-01-01,Riya,Headphones,2,56190,112380
1,ORD1002,2025-01-02,Vikram,Tablet,5,79651,398255
2,ORD1003,2025-01-03,Sneha,Tablet,3,21249,63747
3,ORD1004,2025-01-04,Karan,Laptop,5,10603,53015
4,ORD1005,2025-01-05,Riya,Headphones,5,45618,228090


In [16]:
df.tail(5)

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Price_per_Unit,Total_Sales
15,ORD1016,2025-01-16,Karan,Headphones,5,70546,352730
16,ORD1017,2025-01-17,Karan,Headphones,1,61199,61199
17,ORD1018,2025-01-18,Vikram,Laptop,5,72582,362910
18,ORD1019,2025-01-19,Riya,Headphones,1,79076,79076
19,ORD1020,2025-01-20,Amit,Headphones,3,28931,86793


In [17]:
# Accessing Data From DataFrame

df

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Price_per_Unit,Total_Sales
0,ORD1001,2025-01-01,Riya,Headphones,2,56190,112380
1,ORD1002,2025-01-02,Vikram,Tablet,5,79651,398255
2,ORD1003,2025-01-03,Sneha,Tablet,3,21249,63747
3,ORD1004,2025-01-04,Karan,Laptop,5,10603,53015
4,ORD1005,2025-01-05,Riya,Headphones,5,45618,228090
5,ORD1006,2025-01-06,Karan,Laptop,3,31519,94557
6,ORD1007,2025-01-07,Amit,Headphones,3,69390,208170
7,ORD1008,2025-01-08,Riya,Headphones,4,48472,193888
8,ORD1009,2025-01-09,Tina,Smartwatch,4,60723,242892
9,ORD1010,2025-01-10,Sneha,Smartwatch,1,48305,48305


In [22]:
df.loc[0] # loc (label-based indexing) is used to access rows and columns by their labels (names)

Order_ID             ORD1001
Date              2025-01-01
Customer_Name           Riya
Product           Headphones
Quantity                   2
Price_per_Unit         56190
Total_Sales           112380
Name: 0, dtype: object

In [23]:
df.iloc[0] # iloc (integer-location based indexing) is used to access rows and columns by their integer positions (numbers

Order_ID             ORD1001
Date              2025-01-01
Customer_Name           Riya
Product           Headphones
Quantity                   2
Price_per_Unit         56190
Total_Sales           112380
Name: 0, dtype: object

In [24]:
df.loc[0][0]

  df.loc[0][0]


'ORD1001'

In [25]:
df.iloc[0][2]

  df.iloc[0][2]


'Riya'

In [28]:
df['Customer_Name']

0       Riya
1     Vikram
2      Sneha
3      Karan
4       Riya
5      Karan
6       Amit
7       Riya
8       Tina
9      Sneha
10     Sneha
11      Tina
12      Amit
13    Vikram
14      Amit
15     Karan
16     Karan
17    Vikram
18      Riya
19      Amit
Name: Customer_Name, dtype: object

In [30]:
# Accessing a specified element

df.at[1, 'Product']

'Tablet'

In [31]:
df.at[1, 'Customer_Name']

'Vikram'

In [32]:
# Accessing a specified element using iat

df.iat[2,2]

'Sneha'

In [36]:
df

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Price_per_Unit,Total_Sales
0,ORD1001,2025-01-01,Riya,Headphones,2,56190,112380
1,ORD1002,2025-01-02,Vikram,Tablet,5,79651,398255
2,ORD1003,2025-01-03,Sneha,Tablet,3,21249,63747
3,ORD1004,2025-01-04,Karan,Laptop,5,10603,53015
4,ORD1005,2025-01-05,Riya,Headphones,5,45618,228090
5,ORD1006,2025-01-06,Karan,Laptop,3,31519,94557
6,ORD1007,2025-01-07,Amit,Headphones,3,69390,208170
7,ORD1008,2025-01-08,Riya,Headphones,4,48472,193888
8,ORD1009,2025-01-09,Tina,Smartwatch,4,60723,242892
9,ORD1010,2025-01-10,Sneha,Smartwatch,1,48305,48305


In [37]:
# Data Manipulation with DataFrame

df['Price_per_Unit'] = [50000, 60000, 70000, 80000, 40000, 50000, 60000, 70000, 80000, 40000, 50000, 60000, 70000, 80000, 40000, 50000, 60000, 70000, 80000, 40000]
df

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Price_per_Unit,Total_Sales
0,ORD1001,2025-01-01,Riya,Headphones,2,50000,112380
1,ORD1002,2025-01-02,Vikram,Tablet,5,60000,398255
2,ORD1003,2025-01-03,Sneha,Tablet,3,70000,63747
3,ORD1004,2025-01-04,Karan,Laptop,5,80000,53015
4,ORD1005,2025-01-05,Riya,Headphones,5,40000,228090
5,ORD1006,2025-01-06,Karan,Laptop,3,50000,94557
6,ORD1007,2025-01-07,Amit,Headphones,3,60000,208170
7,ORD1008,2025-01-08,Riya,Headphones,4,70000,193888
8,ORD1009,2025-01-09,Tina,Smartwatch,4,80000,242892
9,ORD1010,2025-01-10,Sneha,Smartwatch,1,40000,48305


In [41]:
# Remove a column

df.drop('Price_per_Unit', axis = 1)

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Total_Sales
0,ORD1001,2025-01-01,Riya,Headphones,2,112380
1,ORD1002,2025-01-02,Vikram,Tablet,5,398255
2,ORD1003,2025-01-03,Sneha,Tablet,3,63747
3,ORD1004,2025-01-04,Karan,Laptop,5,53015
4,ORD1005,2025-01-05,Riya,Headphones,5,228090
5,ORD1006,2025-01-06,Karan,Laptop,3,94557
6,ORD1007,2025-01-07,Amit,Headphones,3,208170
7,ORD1008,2025-01-08,Riya,Headphones,4,193888
8,ORD1009,2025-01-09,Tina,Smartwatch,4,242892
9,ORD1010,2025-01-10,Sneha,Smartwatch,1,48305


In [42]:
df

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Price_per_Unit,Total_Sales
0,ORD1001,2025-01-01,Riya,Headphones,2,50000,112380
1,ORD1002,2025-01-02,Vikram,Tablet,5,60000,398255
2,ORD1003,2025-01-03,Sneha,Tablet,3,70000,63747
3,ORD1004,2025-01-04,Karan,Laptop,5,80000,53015
4,ORD1005,2025-01-05,Riya,Headphones,5,40000,228090
5,ORD1006,2025-01-06,Karan,Laptop,3,50000,94557
6,ORD1007,2025-01-07,Amit,Headphones,3,60000,208170
7,ORD1008,2025-01-08,Riya,Headphones,4,70000,193888
8,ORD1009,2025-01-09,Tina,Smartwatch,4,80000,242892
9,ORD1010,2025-01-10,Sneha,Smartwatch,1,40000,48305


In [43]:
df.drop('Price_per_Unit', axis = 1, inplace = True)

In [44]:
df

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Total_Sales
0,ORD1001,2025-01-01,Riya,Headphones,2,112380
1,ORD1002,2025-01-02,Vikram,Tablet,5,398255
2,ORD1003,2025-01-03,Sneha,Tablet,3,63747
3,ORD1004,2025-01-04,Karan,Laptop,5,53015
4,ORD1005,2025-01-05,Riya,Headphones,5,228090
5,ORD1006,2025-01-06,Karan,Laptop,3,94557
6,ORD1007,2025-01-07,Amit,Headphones,3,208170
7,ORD1008,2025-01-08,Riya,Headphones,4,193888
8,ORD1009,2025-01-09,Tina,Smartwatch,4,242892
9,ORD1010,2025-01-10,Sneha,Smartwatch,1,48305


In [47]:
# Add Quantity to the column
df['Quantity'] = df['Quantity'] + 1
df

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Total_Sales
0,ORD1001,2025-01-01,Riya,Headphones,5,112380
1,ORD1002,2025-01-02,Vikram,Tablet,8,398255
2,ORD1003,2025-01-03,Sneha,Tablet,6,63747
3,ORD1004,2025-01-04,Karan,Laptop,8,53015
4,ORD1005,2025-01-05,Riya,Headphones,8,228090
5,ORD1006,2025-01-06,Karan,Laptop,6,94557
6,ORD1007,2025-01-07,Amit,Headphones,6,208170
7,ORD1008,2025-01-08,Riya,Headphones,7,193888
8,ORD1009,2025-01-09,Tina,Smartwatch,7,242892
9,ORD1010,2025-01-10,Sneha,Smartwatch,4,48305


In [48]:
df.drop(0, inplace = True)

In [49]:
df

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Total_Sales
1,ORD1002,2025-01-02,Vikram,Tablet,8,398255
2,ORD1003,2025-01-03,Sneha,Tablet,6,63747
3,ORD1004,2025-01-04,Karan,Laptop,8,53015
4,ORD1005,2025-01-05,Riya,Headphones,8,228090
5,ORD1006,2025-01-06,Karan,Laptop,6,94557
6,ORD1007,2025-01-07,Amit,Headphones,6,208170
7,ORD1008,2025-01-08,Riya,Headphones,7,193888
8,ORD1009,2025-01-09,Tina,Smartwatch,7,242892
9,ORD1010,2025-01-10,Sneha,Smartwatch,4,48305
10,ORD1011,2025-01-11,Sneha,Tablet,4,51873


In [50]:
df = pd.read_csv('sales_data.csv')
df.head(5)

Unnamed: 0,Order_ID,Date,Customer_Name,Product,Quantity,Price_per_Unit,Total_Sales
0,ORD1001,2025-01-01,Riya,Headphones,2,56190,112380
1,ORD1002,2025-01-02,Vikram,Tablet,5,79651,398255
2,ORD1003,2025-01-03,Sneha,Tablet,3,21249,63747
3,ORD1004,2025-01-04,Karan,Laptop,5,10603,53015


In [52]:
# Display the data types of each column
print("Data types: \n", df.dtypes)

# Describe the DataFrame
print("Statistical summary: \n", df.describe())

Data types: 
 Order_ID          object
Date              object
Customer_Name     object
Product           object
Quantity           int64
Price_per_Unit     int64
Total_Sales        int64
dtype: object
Statistical summary: 
        Quantity  Price_per_Unit    Total_Sales
count      4.00        4.000000       4.000000
mean       3.75    41923.250000  156849.250000
std        1.50    31808.311014  162996.759433
min        2.00    10603.000000   53015.000000
25%        2.75    18587.500000   61064.000000
50%        4.00    38719.500000   88063.500000
75%        5.00    62055.250000  183848.750000
max        5.00    79651.000000  398255.000000


In [53]:
df.describe()

Unnamed: 0,Quantity,Price_per_Unit,Total_Sales
count,4.0,4.0,4.0
mean,3.75,41923.25,156849.25
std,1.5,31808.311014,162996.759433
min,2.0,10603.0,53015.0
25%,2.75,18587.5,61064.0
50%,4.0,38719.5,88063.5
75%,5.0,62055.25,183848.75
max,5.0,79651.0,398255.0
