<a href="https://colab.research.google.com/github/garimaakashyap/pandas/blob/main/g_pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PANDAS

In [None]:
# garima learning pandas

Pandas is a Python library for data manipulation and analysis.
It’s built on top of NumPy and allows you to:

Work with tables (rows & columns)

Clean, analyze, and transform datasets

Import/export data (CSV, Excel, SQL, etc.)

| Step | Topic                       | Description                             |
| ---- | --------------------------- | --------------------------------------- |
| 1    | Introduction & Installation | What is Pandas and how to install it    |
| 2    | Series                      | 1D data structure (like one column)     |
| 3    | DataFrame                   | 2D table (like Excel sheet)             |
| 4    | Importing & Exporting Data  | CSV, Excel, JSON                        |
| 5    | Selecting & Filtering Data  | Accessing rows, columns, and conditions |
| 6    | Data Cleaning               | Handle nulls, duplicates, rename, etc.  |
| 7    | Aggregations & GroupBy      | Summarize and group data                |
| 8    | Merging & Joining           | Combine multiple tables                 |
| 9    | Visualization               | Quick data plots with Pandas            |


In [None]:
# first we have to install pandas library
# pip install pandas

In [None]:
# importing it
import pandas as pd
import numpy as np

Pandas Series

A Series is like a single column of data

In [None]:
# From a list
s = pd.Series([10, 20, 30, 40])
print(s)
print()
print(type(s))

0    10
1    20
2    30
3    40
dtype: int64

<class 'pandas.core.series.Series'>


we can also create Custom Index

In [None]:
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s['b'])   # Access using label
print()
print(s)


20

a    10
b    20
c    30
dtype: int64


Series from a dictionary

In [None]:
data = {'a': 10, 'b': 20, 'c': 30}
s = pd.Series(data)
print(s)
print(type(s))


a    10
b    20
c    30
dtype: int64
<class 'pandas.core.series.Series'>


Pandas DataFrame

A DataFrame is like an Excel sheet — rows and columns.

In [None]:
data = {
    'Name': ['Guria', 'Garima', 'guru'],
    'Age': [22, 21, 24],
    'City': ['Delhi', 'Lucknow', 'Pune']
}

df = pd.DataFrame(data)
print(df)


     Name  Age     City
0   Guria   22    Delhi
1  Garima   21  Lucknow
2    guru   24     Pune


In [None]:
print(df['Name'])
print(df[['Name', 'City']])


0     Guria
1    Garima
2      guru
Name: Name, dtype: object
     Name     City
0   Guria    Delhi
1  Garima  Lucknow
2    guru     Pune


In [None]:
print(df.loc[1])    # By label/index
print()
print(df.iloc[0])   # By position


Name     Garima
Age          21
City    Lucknow
Name: 1, dtype: object

Name    Guria
Age        22
City    Delhi
Name: 0, dtype: object


In [None]:
df=pd.DataFrame(np.arange(0,20).reshape(5,4),index=['Row1','Row2','Row3','Row4','Row5'],columns=["Column1","Column2","Column3","Coumn4"])
print(df)

      Column1  Column2  Column3  Coumn4
Row1        0        1        2       3
Row2        4        5        6       7
Row3        8        9       10      11
Row4       12       13       14      15
Row5       16       17       18      19


In [None]:
df.head() # gives the first five rows

Unnamed: 0,Column1,Column2,Column3,Coumn4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


In [None]:
df.tail() # gives the bottom five rows

Unnamed: 0,Column1,Column2,Column3,Coumn4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


Accessing the elements

In [None]:
df.loc['Row1']

Unnamed: 0,Row1
Column1,0
Column2,1
Column3,2
Coumn4,3


In [None]:
# checkint the type
print(type(df.loc['Row1']))

<class 'pandas.core.series.Series'>


In [None]:
print(df.loc['Row2','Column2'])

5


difference between loc and iloc

| Feature           | `.loc[]`                                | `.iloc[]`                               |
| ----------------- | --------------------------------------- | --------------------------------------- |
| Access type       | **Label-based** (uses row/column names) | **Position-based** (uses index numbers) |
| Syntax            | `df.loc[row_label, column_label]`       | `df.iloc[row_index, column_index]`      |
| Inclusive slicing | Yes                                     | Exclusive (like normal Python slicing)  |
| Use when          | You know the **labels (names)**         | You know the **numeric positions**      |
