# Pandas 

Pandas is a powerful open-source data analysis and manipulation library built on top of NumPy. It provides high-level data structures like Series (1D) and DataFrame (2D table), which make it easy to load, clean, explore, analyze, and visualize structured data.


Pandas is powerful data manipulation library in Python, widely used for data analysis and data cleaning. It provides two primary data structures : Series and DataFrame. A Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional , size-mutable, and potentially heterogeneous tabular data structure with labeled axes(rows and columns).

A Series in pandas is a one-dimensional labeled array capable of holding any data type — integers, floats, strings, objects, etc.

It’s like a column in a spreadsheet or a single column from a DataFrame — but it can also work as a standalone object.

In [5]:
import pandas as pd 

data = [1,2,3,4,5]
series = pd.Series(data)
print("Series : \n",series)
print(type(series))

Series : 
 0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


# Create a Series from dictionary 


In [None]:
data = {'a':1,'b':2,'c':3}
# If we convert dictionary to Series , then the keys will become indices and values as values 
series_dict = pd.Series(data)
print(series_dict)

a    1
b    2
c    3
dtype: int64


In [13]:
my_data= [10,20,30]
index = ['a','b','c']

#combining both into one series 
pd.Series(my_data,index=index)

a    10
b    20
c    30
dtype: int64

# DataFrame

A DataFrame is a 2-dimensional, tabular, and labeled data structure in pandas — similar to a spreadsheet, SQL table, or an Excel sheet.

Each column in a DataFrame is a Series, and the whole DataFrame is a collection of Series objects with shared row indexes.

In [None]:
# Create a dataframe from a dictionary of list 
# Creating a datafram from a dictionary containing list of values 

datax = {
    'name': ['Krish','John','Jack'],
    'age':[25,30,45],
    'city':['Bangalore','New York','Florida']
}

df = pd.DataFrame(datax)
print(df)
print(type(df))

    name  age       city
0  Krish   25  Bangalore
1   John   30   New York
2   Jack   45    Florida
<class 'pandas.core.frame.DataFrame'>


In [11]:
# Create  a dataframe from a list of dictionaries

datay = [
    {'Name':'Samad','Age':20,'City':'Chicago'},
    {'Name':'Ilaf','Age':20,'City':'Florida'},
    {'Name':'Usaid','Age':20,'City':'New york'},
    {'Name':'Amin','Age':20,'City':'California'},
]

df = pd.DataFrame(datay)
print(df)

    Name  Age        City
0  Samad   20     Chicago
1   Ilaf   20     Florida
2  Usaid   20    New york
3   Amin   20  California


# Loading a csv dataset

In [17]:
df = pd.read_csv("data.csv")
df.head()

Unnamed: 0,Date,Category,Value,Product,Sales,Region
0,2023-01-01,A,28.0,Product1,754.0,East
1,2023-01-02,B,39.0,Product3,110.0,North
2,2023-01-03,C,32.0,Product2,398.0,East
3,2023-01-04,B,8.0,Product1,522.0,East
4,2023-01-05,B,26.0,Product3,869.0,North


In [18]:
df.tail()

Unnamed: 0,Date,Category,Value,Product,Sales,Region
45,2023-02-15,B,99.0,Product2,599.0,West
46,2023-02-16,B,6.0,Product1,938.0,South
47,2023-02-17,B,69.0,Product3,143.0,West
48,2023-02-18,C,65.0,Product3,182.0,North
49,2023-02-19,C,11.0,Product3,708.0,North


# Accessing data from dataframe 

In [20]:
stu_data = [
    {'Name':'Samad','Age':20,'City':'Chicago'},
    {'Name':'Ilaf','Age':20,'City':'Florida'},
    {'Name':'Usaid','Age':20,'City':'New york'},
    {'Name':'Amin','Age':20,'City':'California'},
]

df2 = pd.DataFrame(stu_data)
df2

Unnamed: 0,Name,Age,City
0,Samad,20,Chicago
1,Ilaf,20,Florida
2,Usaid,20,New york
3,Amin,20,California


In [None]:
# Accessing a column
df2['Name']

0    Samad
1     Ilaf
2    Usaid
3     Amin
Name: Name, dtype: object

In [None]:
# Print a user record with label 0 
'''.loc[0] means:
 "Give me the row with label 0."'''

df2.loc[0] # Prints the row with label 0 

Name      Samad
Age          20
City    Chicago
Name: 0, dtype: object

In [None]:
# Go to row with label 2 and give me the value of 'City' column
''' .loc[2, 'City'] means:
    From the row with label 2, get the value in the column "City".'''
#df.loc[row , col] 
df2.loc[2, 'City'] # output : New York

'New york'

In [29]:
# Print the rows from 1 to 3 (include 3rd row also)
df2.loc[1:3]

Unnamed: 0,Name,Age,City
1,Ilaf,20,Florida
2,Usaid,20,New york
3,Amin,20,California


In [None]:
'''
[[0, 2]] → get rows with labels 0 and 2
['Name', 'City'] → get only columns 'Name' and 'City'

'''

df2.loc[[0, 2], ['Name', 'City']]

# iloc
.iloc uses row positions (0, 1, 2, …), so:

You can always use integers with .iloc — even if index labels are something else ('a', 'b', etc.). 

so if index of dataframe is custom ,like you have set 'a' , 'b' , 'c' then also you can use iloc, because irrespective of any label iloc treats the labels as indices 0,1,2

iloc will see the labels a,b,c,d as 0,1,2,3 , so iloc ,

(Double click to see in correct form)
   Name   Age       City
a  Samad   20    Chicago
b   Ilaf   20    Florida
c  Usaid   20   New york
d   Amin   20  California

It's based on the order of rows.

# loc

You can only use it if you know the actual labels (like 'a', 'b').

If you know the labels then use loc[]
If you are not sure about the labels then use iloc[] by looking at the order of rows 

Because 
If you use labels correctly then loc[] will work
If dataframe have custom labels and :  
    If you use incorrect labels for accessing then loc will fail
    But looking at the order of rows you can use iloc[] to access



# Major difference between loc[] and iloc[]

🔹 loc[1:3] → 
This will include 3 also , and will give rows 1 ,2, 3

🔹 iloc[1:3] → 
This will exclude 3 , and will give rows 1 and 2