#### Pandas Usage 

##### In pandas we have mainly 2 data structures 
1. Series - a one dimensional array that holds any data type
2. DataFrame - a two dimensional array that holds 2d array or a table with row and column

`pd.Series` create a serries object

In [1]:
import pandas as pd
import numpy as np

In [2]:
pd.Series([1,3,4,5,6,67,5,np.nan,10])

0     1.0
1     3.0
2     4.0
3     5.0
4     6.0
5    67.0
6     5.0
7     NaN
8    10.0
dtype: float64

`pd.DataFrame` creates a dataframe objects

In [3]:
pd.DataFrame([[1,2,3,4,5,6,67,5,np.nan,10],[1,2,3,np.nan,5,None,67,5,np.nan,10]])

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1,2,3,4.0,5,6.0,67,5,,10
1,1,2,3,,5,,67,5,,10


In [4]:
df_ex = pd.DataFrame({
    'A':[1.0,3,4,5],
    'B':[5,6,8,9],
    'C': [4,5,7,2],
})
df_ex

Unnamed: 0,A,B,C
0,1.0,5,4
1,3.0,6,5
2,4.0,8,7
3,5.0,9,2


`pd.date_range(date,periods)` here date in the format **yyyymmdd** and periods give how many values needed

In [5]:
pd.date_range('20260101',periods=24)

DatetimeIndex(['2026-01-01', '2026-01-02', '2026-01-03', '2026-01-04',
               '2026-01-05', '2026-01-06', '2026-01-07', '2026-01-08',
               '2026-01-09', '2026-01-10', '2026-01-11', '2026-01-12',
               '2026-01-13', '2026-01-14', '2026-01-15', '2026-01-16',
               '2026-01-17', '2026-01-18', '2026-01-19', '2026-01-20',
               '2026-01-21', '2026-01-22', '2026-01-23', '2026-01-24'],
              dtype='datetime64[ns]', freq='D')

The columns in the dataframe can have different data types like one can have a float and integer we can check that using `df.dtypes`

In [6]:
df_ex.dtypes

A    float64
B      int64
C      int64
dtype: object

We can also access the columns using membership operator with column name if the column name does not contain any spaces

We can read any csv file using the `pd.read_csv(filepath)` which gives a dataframe object created with the csv data directly

In this we use the superstore data

In [7]:
df_store = pd.read_csv('../datasets/superstore_excel.csv')
df_store

Unnamed: 0,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Region,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,CA-198797,2024-10-06,2025-05-01,Second Class,CUST-4170,Ruben Ramirez,Home Office,South,Technology,Copiers,Virtual didactic synergy,2467.49,9,0.29,617.24
1,CA-233440,2023-06-12,2025-05-05,First Class,CUST-1196,Jesse Ortega,Home Office,Central,Furniture,Furnishings,Operative reciprocal projection,2751.66,1,0.09,922.14
2,CA-691533,2024-04-18,2025-04-24,Standard Class,CUST-2985,Kathryn Castro,Corporate,East,Furniture,Chairs,Multi-lateral incremental knowledge user,1199.66,9,0.03,220.44
3,CA-277218,2023-11-02,2025-05-07,First Class,CUST-2412,Sharon Taylor,Consumer,South,Office Supplies,Labels,Integrated bifurcated groupware,2335.57,6,0.30,620.16
4,CA-311967,2023-07-15,2025-04-14,First Class,CUST-3728,David Murillo,Corporate,South,Office Supplies,Binders,Re-contextualized human-resource benchmark,2540.88,5,0.03,258.06
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,CA-780853,2025-03-04,2025-04-13,Second Class,CUST-8307,Adam Scott,Home Office,West,Office Supplies,Art,Polarized 3rdgeneration frame,3322.63,4,0.37,-14.44
4996,CA-489090,2023-06-20,2025-04-28,Same Day,CUST-1311,Kathryn Miller,Home Office,South,Furniture,Chairs,Enhanced contextually-based focus group,4518.10,1,0.37,561.98
4997,CA-203624,2024-03-11,2025-04-23,Standard Class,CUST-6014,Tiffany Moreno,Corporate,East,Technology,Accessories,Fundamental human-resource capacity,4631.15,4,0.47,356.98
4998,CA-562734,2024-11-04,2025-04-20,Same Day,CUST-1274,Yolanda Conner,Corporate,West,Furniture,Chairs,Optimized holistic approach,1691.66,3,0.35,462.50


To we data we have `df.head()` and `df.tail()` which takes numerical value to show upto that number of data

In [8]:
df_store.head(8)

Unnamed: 0,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Region,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,CA-198797,2024-10-06,2025-05-01,Second Class,CUST-4170,Ruben Ramirez,Home Office,South,Technology,Copiers,Virtual didactic synergy,2467.49,9,0.29,617.24
1,CA-233440,2023-06-12,2025-05-05,First Class,CUST-1196,Jesse Ortega,Home Office,Central,Furniture,Furnishings,Operative reciprocal projection,2751.66,1,0.09,922.14
2,CA-691533,2024-04-18,2025-04-24,Standard Class,CUST-2985,Kathryn Castro,Corporate,East,Furniture,Chairs,Multi-lateral incremental knowledge user,1199.66,9,0.03,220.44
3,CA-277218,2023-11-02,2025-05-07,First Class,CUST-2412,Sharon Taylor,Consumer,South,Office Supplies,Labels,Integrated bifurcated groupware,2335.57,6,0.3,620.16
4,CA-311967,2023-07-15,2025-04-14,First Class,CUST-3728,David Murillo,Corporate,South,Office Supplies,Binders,Re-contextualized human-resource benchmark,2540.88,5,0.03,258.06
5,CA-990541,2023-09-20,2025-04-17,Same Day,CUST-6746,Gabriel Donaldson,Home Office,East,Office Supplies,Storage,Programmable grid-enabled benchmark,1454.47,4,0.42,272.28
6,CA-500069,2024-11-03,2025-05-08,Standard Class,CUST-6658,Colleen Rose,Corporate,West,Office Supplies,Storage,Switchable secondary database,4381.97,2,0.37,749.22
7,CA-701472,2023-08-12,2025-04-23,First Class,CUST-4421,Grant Garcia,Home Office,East,Office Supplies,Storage,Public-key human-resource installation,1152.75,3,0.46,426.2


In [9]:
df_store.tail(7)

Unnamed: 0,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Region,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
4993,CA-743555,2023-11-11,2025-04-26,Second Class,CUST-3878,Mary Ramirez,Consumer,West,Technology,Copiers,Digitized attitude-oriented throughput,153.24,10,0.39,632.05
4994,CA-450931,2024-12-23,2025-04-25,Second Class,CUST-5025,Lisa Moore,Corporate,South,Technology,Accessories,Cross-platform local leverage,2515.67,1,0.19,63.2
4995,CA-780853,2025-03-04,2025-04-13,Second Class,CUST-8307,Adam Scott,Home Office,West,Office Supplies,Art,Polarized 3rdgeneration frame,3322.63,4,0.37,-14.44
4996,CA-489090,2023-06-20,2025-04-28,Same Day,CUST-1311,Kathryn Miller,Home Office,South,Furniture,Chairs,Enhanced contextually-based focus group,4518.1,1,0.37,561.98
4997,CA-203624,2024-03-11,2025-04-23,Standard Class,CUST-6014,Tiffany Moreno,Corporate,East,Technology,Accessories,Fundamental human-resource capacity,4631.15,4,0.47,356.98
4998,CA-562734,2024-11-04,2025-04-20,Same Day,CUST-1274,Yolanda Conner,Corporate,West,Furniture,Chairs,Optimized holistic approach,1691.66,3,0.35,462.5
4999,CA-852804,2024-03-22,2025-04-24,First Class,CUST-6238,Michael Giles,Home Office,East,Furniture,Chairs,Team-oriented 4thgeneration task-force,4450.05,8,0.0,-4.91


To display the indexes and columns we need to use `df.index` and `df.columns`

In [10]:
df_store.index

RangeIndex(start=0, stop=5000, step=1)

In [11]:
df_store.columns

Index(['Order ID', 'Order Date', 'Ship Date', 'Ship Mode', 'Customer ID',
       'Customer Name', 'Segment', 'Region', 'Category', 'Sub-Category',
       'Product Name', 'Sales', 'Quantity', 'Discount', 'Profit'],
      dtype='object')

We can also convert the dataframe into numpy array using `df.to_numpy()`

In [12]:
df_store.to_numpy()

array([['CA-198797', '2024-10-06', '2025-05-01', ..., 9, 0.29, 617.24],
       ['CA-233440', '2023-06-12', '2025-05-05', ..., 1, 0.09, 922.14],
       ['CA-691533', '2024-04-18', '2025-04-24', ..., 9, 0.03, 220.44],
       ...,
       ['CA-203624', '2024-03-11', '2025-04-23', ..., 4, 0.47, 356.98],
       ['CA-562734', '2024-11-04', '2025-04-20', ..., 3, 0.35, 462.5],
       ['CA-852804', '2024-03-22', '2025-04-24', ..., 8, 0.0, -4.91]],
      dtype=object)

In [24]:
df_store.dtypes

Order ID          object
Order Date        object
Ship Date         object
Ship Mode         object
Customer ID       object
Customer Name     object
Segment           object
Region            object
Category          object
Sub-Category      object
Product Name      object
Sales            float64
Quantity           int64
Discount         float64
Profit           float64
dtype: object

`df.describe()` gives the all mathematical,statistical calculation of the columns that are numerical 

In [13]:
df_store.describe()

Unnamed: 0,Sales,Quantity,Discount,Profit
count,5000.0,5000.0,5000.0,5000.0
mean,2501.363102,5.5014,0.250084,447.942226
std,1424.757879,2.897737,0.144133,318.285948
min,7.37,1.0,0.0,-99.96
25%,1275.3475,3.0,0.13,174.525
50%,2482.83,6.0,0.25,444.955
75%,3733.71,8.0,0.38,725.1575
max,4997.57,10.0,0.5,999.77


`df.info()` gives the information of the dataframe like how many rows and dtype and null values etc

In [14]:
df_store.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 15 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Order ID       5000 non-null   object 
 1   Order Date     5000 non-null   object 
 2   Ship Date      5000 non-null   object 
 3   Ship Mode      5000 non-null   object 
 4   Customer ID    5000 non-null   object 
 5   Customer Name  5000 non-null   object 
 6   Segment        5000 non-null   object 
 7   Region         5000 non-null   object 
 8   Category       5000 non-null   object 
 9   Sub-Category   5000 non-null   object 
 10  Product Name   5000 non-null   object 
 11  Sales          5000 non-null   float64
 12  Quantity       5000 non-null   int64  
 13  Discount       5000 non-null   float64
 14  Profit         5000 non-null   float64
dtypes: float64(3), int64(1), object(11)
memory usage: 586.1+ KB


`df.shape` give the shape of the dataframe

In [15]:
df_store.shape

(5000, 15)

we can use two different sort methods for a data frame
`df.sort_index(axis,ascending,inplace)` it sort a data frame indexes

In [16]:
df_store.sort_index(axis=1,ascending=True)

Unnamed: 0,Category,Customer ID,Customer Name,Discount,Order Date,Order ID,Product Name,Profit,Quantity,Region,Sales,Segment,Ship Date,Ship Mode,Sub-Category
0,Technology,CUST-4170,Ruben Ramirez,0.29,2024-10-06,CA-198797,Virtual didactic synergy,617.24,9,South,2467.49,Home Office,2025-05-01,Second Class,Copiers
1,Furniture,CUST-1196,Jesse Ortega,0.09,2023-06-12,CA-233440,Operative reciprocal projection,922.14,1,Central,2751.66,Home Office,2025-05-05,First Class,Furnishings
2,Furniture,CUST-2985,Kathryn Castro,0.03,2024-04-18,CA-691533,Multi-lateral incremental knowledge user,220.44,9,East,1199.66,Corporate,2025-04-24,Standard Class,Chairs
3,Office Supplies,CUST-2412,Sharon Taylor,0.30,2023-11-02,CA-277218,Integrated bifurcated groupware,620.16,6,South,2335.57,Consumer,2025-05-07,First Class,Labels
4,Office Supplies,CUST-3728,David Murillo,0.03,2023-07-15,CA-311967,Re-contextualized human-resource benchmark,258.06,5,South,2540.88,Corporate,2025-04-14,First Class,Binders
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,Office Supplies,CUST-8307,Adam Scott,0.37,2025-03-04,CA-780853,Polarized 3rdgeneration frame,-14.44,4,West,3322.63,Home Office,2025-04-13,Second Class,Art
4996,Furniture,CUST-1311,Kathryn Miller,0.37,2023-06-20,CA-489090,Enhanced contextually-based focus group,561.98,1,South,4518.10,Home Office,2025-04-28,Same Day,Chairs
4997,Technology,CUST-6014,Tiffany Moreno,0.47,2024-03-11,CA-203624,Fundamental human-resource capacity,356.98,4,East,4631.15,Corporate,2025-04-23,Standard Class,Accessories
4998,Furniture,CUST-1274,Yolanda Conner,0.35,2024-11-04,CA-562734,Optimized holistic approach,462.50,3,West,1691.66,Corporate,2025-04-20,Same Day,Chairs


`df.sort_values(by)` where by takes a single value or list of values that is they are columns that are used to sort

In [17]:
df_store.sort_values('Category')

Unnamed: 0,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Region,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
4999,CA-852804,2024-03-22,2025-04-24,First Class,CUST-6238,Michael Giles,Home Office,East,Furniture,Chairs,Team-oriented 4thgeneration task-force,4450.05,8,0.00,-4.91
1228,CA-600198,2023-05-05,2025-04-09,First Class,CUST-2982,Jared Long,Consumer,East,Furniture,Bookcases,Innovative human-resource collaboration,1924.52,10,0.18,674.07
2905,CA-226163,2025-02-03,2025-04-17,Standard Class,CUST-1558,Kevin Landry,Home Office,South,Furniture,Tables,Optional encompassing task-force,740.75,6,0.11,694.33
1226,CA-177980,2024-06-07,2025-04-27,First Class,CUST-3411,Larry Joseph,Home Office,West,Furniture,Furnishings,Universal composite knowledge user,2934.57,3,0.24,614.01
1224,CA-197875,2024-08-30,2025-04-23,Standard Class,CUST-8835,Laura French,Corporate,West,Furniture,Chairs,Team-oriented dynamic encoding,1207.10,4,0.23,586.56
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3484,CA-666809,2024-11-23,2025-04-21,Second Class,CUST-3882,Erin Reyes,Consumer,East,Technology,Accessories,Digitized bifurcated projection,4843.24,1,0.04,336.18
1486,CA-698553,2023-11-13,2025-04-27,Standard Class,CUST-2329,Daniel Brown,Consumer,South,Technology,Accessories,Focused client-server encryption,2024.43,6,0.36,267.96
3480,CA-195246,2023-10-09,2025-04-19,Second Class,CUST-2644,Sharon Lewis,Consumer,Central,Technology,Copiers,Versatile local ability,2012.00,7,0.31,893.16
1473,CA-599683,2023-08-24,2025-04-22,First Class,CUST-1869,Jacob Strickland,Home Office,West,Technology,Accessories,Ameliorated secondary website,1166.02,9,0.22,958.02
