# What is Pandas?

**Pandas** is a powerful open-source Python library used for data manipulation and analysis. It provides flexible data structures like `DataFrame` and `Series` that make it easy to clean, explore, and analyze data.

## Use Case Example

Pandas is commonly used for:

- Reading data from CSV, Excel, SQL, and other formats
- Cleaning and transforming data (handling missing values, filtering, grouping)
- Statistical analysis and aggregation
- Data visualization (in combination with libraries like Matplotlib)



In [7]:
## Series 
## A pandas Series is a one-dimensional array-like object that can hold any data type.

import pandas as pd
data = [1,2,3,4,5]
series1 = pd.Series(data)
print("series:", series1)

series: 0    1
1    2
2    3
3    4
4    5
dtype: int64


In [12]:
## creating series using dict

data_dict = {
    'name ': "kirtan",
    "gpa" : "5",
    'package': "10LPA"

}


series_dict = pd.Series(data_dict)
print("series:\n", series_dict)  ## key becomes index
print(type(series_dict))

series:
 name       kirtan
gpa             5
package     10LPA
dtype: object
<class 'pandas.core.series.Series'>


In [13]:
# customize index

data = [10,20,30,40,50]
index = ['a','b','c','d','e']

series_custom_index = pd.Series(data,index = index)
print("Series with custom index:\n", series_custom_index)

Series with custom index:
 a    10
b    20
c    30
d    40
e    50
dtype: int64


In [56]:
## Dataframe
## create a dataframe from a dictionary of list
data_dict = {
    'name': ["kirtan", "kunal", "vishal"],
    'gpa': ["5", "4", "3"],
    'age':[21,22,24],
    'package': ["10LPA", "8LPA", "6LPA"],
    'city':['karachi','lahore','islamabad']

}

df = pd.DataFrame(data_dict)
print(df)
print(type(df))



     name gpa  age package       city
0  kirtan   5   21   10LPA    karachi
1   kunal   4   22    8LPA     lahore
2  vishal   3   24    6LPA  islamabad
<class 'pandas.core.frame.DataFrame'>


In [21]:
import numpy as np
np.array(df)

array([['kirtan', '5', 21, '10LPA', 'karachi'],
       ['kunal', '4', 22, '8LPA', 'lahore'],
       ['vishal', '3', 24, '6LPA', 'islamabad']], dtype=object)

In [22]:
## create a dataframe from list of dictionaries
data = [
    {'name': "kirtan", 'gpa': "5", 'age': 21, 'package': "10LPA", 'city': 'karachi'},
    {'name': "kunal", 'gpa': "4", 'age': 22, 'package': "8LPA", 'city': 'lahore'},
    {'name': "vishal", 'gpa': "3", 'age': 24, 'package': "6LPA", 'city': 'islamabad'}
]

df = pd.DataFrame(data)
print(df)
print(type(df))


     name gpa  age package       city
0  kirtan   5   21   10LPA    karachi
1   kunal   4   22    8LPA     lahore
2  vishal   3   24    6LPA  islamabad
<class 'pandas.core.frame.DataFrame'>


In [23]:
## reading from csv
sales_data = pd.read_csv('sales_data.csv')
sales_data.head()## first 5 records

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [24]:
sales_data.tail() # last 5 record

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
235,10236,2024-08-23,Home Appliances,Nespresso Vertuo Next Coffee and Espresso Maker,1,159.99,159.99,Europe,PayPal
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.0,270.0,Asia,Debit Card
237,10238,2024-08-25,Books,The Handmaid's Tale by Margaret Atwood,3,10.99,32.97,North America,Credit Card
238,10239,2024-08-26,Beauty Products,Sunday Riley Luna Sleeping Night Oil,1,55.0,55.0,Europe,PayPal
239,10240,2024-08-27,Sports,Yeti Rambler 20 oz Tumbler,2,29.99,59.98,Asia,Credit Card


In [27]:
### Accessing Data From Dataframe

df

Unnamed: 0,name,gpa,age,package,city
0,kirtan,5,21,10LPA,karachi
1,kunal,4,22,8LPA,lahore
2,vishal,3,24,6LPA,islamabad


In [None]:
print(df['name'])  # accessing the column
print()
print(type(df['name']))  ## a single column is a series actually

0    kirtan
1     kunal
2    vishal
Name: name, dtype: object

<class 'pandas.core.series.Series'>


In [32]:
df.loc[0] # accessing the first row (row index -> loc)

name        kirtan
gpa              5
age             21
package      10LPA
city       karachi
Name: 0, dtype: object

In [34]:
df.loc[0][0] ## first element of first row

  df.loc[0][0] ## first element of first row


'kirtan'

In [35]:
df.iloc[0][1] ## second element of first row

  df.iloc[0][1] ## second element of first row


'5'

In [39]:
## Accessing a specific element

df.at[0,'name']

'kirtan'

In [40]:
df.at[1,'package']

'8LPA'

In [42]:
## accessing a specific element using iat
df.iat[1,3]

'8LPA'

In [43]:
## data manipulation with DataFrame

df

Unnamed: 0,name,gpa,age,package,city
0,kirtan,5,21,10LPA,karachi
1,kunal,4,22,8LPA,lahore
2,vishal,3,24,6LPA,islamabad


In [None]:
# add new column salary

df['Salary']=[50000,60000,80000]
df


Unnamed: 0,name,gpa,age,package,city,Salary
0,kirtan,5,21,10LPA,karachi,50000
1,kunal,4,22,8LPA,lahore,60000
2,vishal,3,24,6LPA,islamabad,80000


In [None]:
## remove the package column
df.drop('package')   ## because by default axis = 0 which is row

KeyError: "['package'] not found in axis"

In [51]:
df.drop('package', axis=1) ## removing the 'package' column with axis=1

Unnamed: 0,name,gpa,age,city,Salary
0,kirtan,5,21,karachi,50000
1,kunal,4,22,lahore,60000
2,vishal,3,24,islamabad,80000


In [None]:
df  ## still the package is here because  the previous operation is not inplace means not permanent

Unnamed: 0,name,gpa,age,package,city,Salary
0,kirtan,5,21,10LPA,karachi,50000
1,kunal,4,22,8LPA,lahore,60000
2,vishal,3,24,6LPA,islamabad,80000


In [58]:
df.drop('package',axis=1,inplace=True)  ## now its permanent

## trick
 # if output is showing after executing then its temperory otherwise permanent

In [54]:
print(df)

     name gpa  age       city  Salary
0  kirtan   5   21    karachi   50000
1   kunal   4   22     lahore   60000
2  vishal   3   24  islamabad   80000


In [63]:
## add age to the column after 1 year

df['age'] = df['age']+1  # permanent operation
df

Unnamed: 0,name,gpa,age,city,Salary
0,kirtan,5,24,karachi,50000
1,kunal,4,25,lahore,60000
2,vishal,3,27,islamabad,80000


In [65]:
df.drop(0) ## delete first row

Unnamed: 0,name,gpa,age,city,Salary
1,kunal,4,25,lahore,60000
2,vishal,3,27,islamabad,80000


In [67]:
df = pd.read_csv('sales_data.csv')
df.head()

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


In [71]:
## Display the data types of each Column
print("Datatypes: \n", df.dtypes)

# Describe the DataFrame
print("\nDescription: \n", df.describe())

## Group by a specific column and perform aggregation
# Use existing column names from the sales DataFrame.
# 'Product Category' exists instead of 'Category' and 'Total Revenue' exists instead of 'Value'.
grouped = df.groupby('Product Category')['Total Revenue'].mean()
print("\nMean Total Revenue by 'Product Category': \n", grouped)


Datatypes: 
 Transaction ID        int64
Date                 object
Product Category     object
Product Name         object
Units Sold            int64
Unit Price          float64
Total Revenue       float64
Region               object
Payment Method       object
dtype: object

Description: 
        Transaction ID  Units Sold   Unit Price  Total Revenue
count       240.00000  240.000000   240.000000     240.000000
mean      10120.50000    2.158333   236.395583     335.699375
std          69.42622    1.322454   429.446695     485.804469
min       10001.00000    1.000000     6.500000       6.500000
25%       10060.75000    1.000000    29.500000      62.965000
50%       10120.50000    2.000000    89.990000     179.970000
75%       10180.25000    3.000000   249.990000     399.225000
max       10240.00000   10.000000  3899.990000    3899.990000

Mean Total Revenue by 'Product Category': 
 Product Category
Beauty Products     65.54750
Books               46.54825
Clothing           203.2232