# What is Pandas?

Pandas is an open-source Python library for data manipulation and analysis. It gives you 
powerful, fast, flexible, and expressive data structures designed to work with structured data.

It is built on top of NumPy, and its name comes from ‚ÄúPanel Data‚Äù (econometrics term for multidimensional data).

# Why Pandas? What problem does it solve?

### Before Pandas:

People used raw Python lists, dictionaries, or NumPy arrays.

These structures lacked labels, indexing, grouping, join/merge, handling missing data, etc.

Real-world datasets (CSV, JSON, Excel, SQL) are structured in rows and columns ‚Äî not arrays.

So Pandas was designed to work like:

üìä Excel + üßÆ NumPy + üóÇÔ∏è SQL + üß∞ ETL tools ‚Äî inside Python

# ‚öôÔ∏è Where is Pandas used?

| Field              | How Pandas Helps                                        |
| ------------------ | ------------------------------------------------------- |
| Data Science       | Load, clean, explore, transform, and model data         |
| Machine Learning   | Prepare features, preprocess, manipulate large datasets |
| Finance            | Time series, stock prices, trade records                |
| Web/Data Analytics | ETL pipelines, server logs, product usage stats         |
| Healthcare         | Patient records, diagnostics, hospital statistics       |

# üß± What does Pandas offer at its core?

| Core Object          | Description                                   |
| -------------------- | --------------------------------------------- |
| `Series`             | 1D labeled array ‚Äî like a column              |
| `DataFrame`          | 2D table ‚Äî like a sheet of Excel or SQL table |
| `Panel` (deprecated) | 3D data container (replaced by MultiIndex)    |

# üéÅ And tons of utilities:

- Reading/Writing (CSV, JSON, Excel, SQL, Parquet)

- Powerful slicing/indexing/merging

- GroupBy, pivoting, reshaping

- Time series and date handling

- Missing data handling

- Statistical functions and window operations

- Built-in plotting (with matplotlib backend)

Think of Pandas like a spreadsheet that you can program, automate, and scale ‚Äî with the power of Python and NumPy underneath.

In [34]:
import pandas as pd
import numpy as np

In [35]:
dataset_url="https://raw.githubusercontent.com/datasciencedojo/datasets/refs/heads/master/titanic.csv"

In [36]:
df=pd.read_csv(dataset_url)

In [37]:
type(df)

pandas.core.frame.DataFrame

In [38]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [39]:
df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [40]:
df.head(10)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C


In [41]:
df['Name']

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
886                                Montvila, Rev. Juozas
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object

In [42]:
df[['Name','Age']]

Unnamed: 0,Name,Age
0,"Braund, Mr. Owen Harris",22.0
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0
2,"Heikkinen, Miss. Laina",26.0
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
4,"Allen, Mr. William Henry",35.0
...,...,...
886,"Montvila, Rev. Juozas",27.0
887,"Graham, Miss. Margaret Edith",19.0
888,"Johnston, Miss. Catherine Helen ""Carrie""",
889,"Behr, Mr. Karl Howell",26.0


In [43]:
df.shape

(891, 12)

In [44]:
dir(df)

['Age',
 'Cabin',
 'Embarked',
 'Fare',
 'Name',
 'Parch',
 'PassengerId',
 'Pclass',
 'Sex',
 'SibSp',
 'Survived',
 'T',
 'Ticket',
 '_AXIS_LEN',
 '_AXIS_ORDERS',
 '_AXIS_TO_AXIS_NUMBER',
 '_HANDLED_TYPES',
 '__abs__',
 '__add__',
 '__and__',
 '__annotations__',
 '__array__',
 '__array_priority__',
 '__array_ufunc__',
 '__arrow_c_stream__',
 '__bool__',
 '__class__',
 '__contains__',
 '__copy__',
 '__dataframe__',
 '__dataframe_consortium_standard__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__firstlineno__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__imod__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__matmul__',
 '__mo

In [45]:
data=[1,2,3,4,5]
s1=pd.Series(data)
s1

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [46]:
s2=pd.Series(data=data,index=['a','b','c','d','e'],name='Numbers')
s2

a    1
b    2
c    3
d    4
e    5
Name: Numbers, dtype: int64

In [47]:
data={"a":1,"b":2,"c":3}
s3=pd.Series(data=data)
s3

a    1
b    2
c    3
dtype: int64

In [50]:
data=np.array([10,20,30])
s4=pd.Series(data=data,index=["A","B","C"])
s4

A    10
B    20
C    30
dtype: int64

In [52]:
s5=pd.Series(data=[10,20,30,40,50],index=["A","B","C","E","F"],name="Score")
s5

A    10
B    20
C    30
E    40
F    50
Name: Score, dtype: int64

In [60]:
print(s5)
print(s5.index)
print(s5.values)
print(s5.dtype)
print(s5.name)
print(s5.shape)
type(s5.shape)

A    10
B    20
C    30
E    40
F    50
Name: Score, dtype: int64
Index(['A', 'B', 'C', 'E', 'F'], dtype='object')
[10 20 30 40 50]
int64
Score
(5,)


tuple

#### Series Indexing & Access

In [66]:
print(s5.values[0])
print(s5.values[3])

10
40


In [69]:
print(s5.iloc[0])
print(s5.iloc[3])

10
40


In [70]:
print(s5.iloc[0:3])

A    10
B    20
C    30
Name: Score, dtype: int64


In [72]:
print(s5[0])

10


  print(s5[0])


In [75]:
print(s5['A'])

10


In [76]:
print(s5['B'])

20


In [79]:
print(s5[['A','B']])

A    10
B    20
Name: Score, dtype: int64


In [81]:
print(s5.loc['A'])

10


In [82]:
print(s5.loc[["A","B"]])

A    10
B    20
Name: Score, dtype: int64


In [87]:
s3=pd.Series([10,20,30,40,50])
s3.iloc[[1,2]]

1    20
2    30
dtype: int64

In [88]:
s3.iloc[1:3]

1    20
2    30
dtype: int64

In [89]:
s3.loc[[1,2]]

1    20
2    30
dtype: int64

# 2. What is a DataFrame?

A DataFrame is a 2D labeled data structure with columns of potentially different types, just like a spreadsheet or SQL table.

Think of it like:

- A dict of Series

- Or a table with rows and columns

It is the core structure of pandas used for:

- Reading, cleaning, transforming, analyzing, and exporting data


### Why DataFrame?

Real-world data is usually tabular (e.g., CSV, Excel, SQL)

Rows = records

Columns = features

It supports label-based indexing, automatic alignment, missing data handling, grouping, aggregation, and more


| Concept     | Real-Life Analogy             |
| ----------- | ----------------------------- |
| `DataFrame` | Excel sheet or database table |
| `Series`    | One column of that sheet      |
| Index       | Row number or ID              |
| Columns     | Named headers                 |


In [90]:
data={
    "name":["Zaheer","Sanju","Shashank","Nandini","Nisarga"],
    "usn":[30,46,44,26,28],
    "city":["Gulbarga","Hubli","Mysore","Bengaluru","Hyderabad"]
}

In [91]:
df=pd.DataFrame(data=data)

In [94]:
print(df)

       name  usn       city
0    Zaheer   30   Gulbarga
1     Sanju   46      Hubli
2  Shashank   44     Mysore
3   Nandini   26  Bengaluru
4   Nisarga   28  Hyderabad


In [95]:
df

Unnamed: 0,name,usn,city
0,Zaheer,30,Gulbarga
1,Sanju,46,Hubli
2,Shashank,44,Mysore
3,Nandini,26,Bengaluru
4,Nisarga,28,Hyderabad


In [99]:
data=[
    {"name":"zaheer","usn":30,"city":"Gulbarga"},
    {"name":"sanju","usn":46,"city":"Hubli"},
    {"name":"Shashank","usn":44,"city":"Aurad"},
    {"name":"Nandini","usn":26,"city":"Mysore"},
    {"name":"Nisarga","usn":28,"city":"Hyderabad"}
]
df=pd.DataFrame(data)
df

Unnamed: 0,name,usn,city
0,zaheer,30,Gulbarga
1,sanju,46,Hubli
2,Shashank,44,Aurad
3,Nandini,26,Mysore
4,Nisarga,28,Hyderabad


In [101]:
array=np.array([[1,2],[3,4],[5,6]])
df=pd.DataFrame(array,columns=["A","B"])
df

Unnamed: 0,A,B
0,1,2
1,3,4
2,5,6


In [103]:
data={
    "name":["Zaheer","Sanju","Shashank","Nandini","Nisarga"],
    "usn":[30,46,44,26,28],
    "city":["Gulbarga","Hubli","Mysore","Bengaluru","Hyderabad"]
}
df=pd.DataFrame(data)
df

Unnamed: 0,name,usn,city
0,Zaheer,30,Gulbarga
1,Sanju,46,Hubli
2,Shashank,44,Mysore
3,Nandini,26,Bengaluru
4,Nisarga,28,Hyderabad


In [111]:
print(df['name'])
print("=============")
print(df[['name','usn']])
print("=============")
print(df.iloc[0])
print("=============")
print(df.loc[1])

0      Zaheer
1       Sanju
2    Shashank
3     Nandini
4     Nisarga
Name: name, dtype: object
       name  usn
0    Zaheer   30
1     Sanju   46
2  Shashank   44
3   Nandini   26
4   Nisarga   28
name      Zaheer
usn           30
city    Gulbarga
Name: 0, dtype: object
name    Sanju
usn        46
city    Hubli
Name: 1, dtype: object
