# What is pandas library?
- Pandas is a powerful library used for data manipulation and analysis.
- It provides two primary data structures:
  1. Series - A one-dimensional labeled array
  2. Data Frame - A two-dimensional labeled table similar to an excel spreadsheet or sql table.
- With pandas, you can:
  1. Load data from CSV, EXCEL, SQL, JASON.
  2. Handle missing data.
  3. Filter and transform datasets.
  4. Perform group-by operations and aggregation
  5. Merge and Join datasets.
  6. Visualize basic statistics.

In [1]:
import pandas as pd

1. Create and Explore a DataFrame
 - Create a DataFrame with the following columns: Name, Age, City, Score.

In [2]:
data = {'Name': ['Sakshi Yadav', 'Santosh Shah', 'Diya Bansal', 'Riya Bansal', 'Abhishek Sharma', 'Ram Rai', 'Shyam Gupta', 'Satyam Mishera', 'Rahul Rai', 'Roshani Kumar'],
     'Age': [20, 21, 19, 19, 22, 18, 22, 21, 19, 18],
     'City': ['Indirapuram', 'Nodia', 'Vaishali', 'Delhi', 'New Delhi', 'Preet vihar', 'GTB Nagar', 'Laxmi Nagar', 'Tagore Garden', 'Ghazibad'],
     'Score': [99, 98, 70, 50, 66, 30, 10, 55, 40, 87]}

In [3]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City,Score
0,Sakshi Yadav,20,Indirapuram,99
1,Santosh Shah,21,Nodia,98
2,Diya Bansal,19,Vaishali,70
3,Riya Bansal,19,Delhi,50
4,Abhishek Sharma,22,New Delhi,66
5,Ram Rai,18,Preet vihar,30
6,Shyam Gupta,22,GTB Nagar,10
7,Satyam Mishera,21,Laxmi Nagar,55
8,Rahul Rai,19,Tagore Garden,40
9,Roshani Kumar,18,Ghazibad,87


- Print the first 5 rows using .head().

In [4]:
df.head(5)

Unnamed: 0,Name,Age,City,Score
0,Sakshi Yadav,20,Indirapuram,99
1,Santosh Shah,21,Nodia,98
2,Diya Bansal,19,Vaishali,70
3,Riya Bansal,19,Delhi,50
4,Abhishek Sharma,22,New Delhi,66


- Use .info() and .describe() to understand the dataset.

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    10 non-null     object
 1   Age     10 non-null     int64 
 2   City    10 non-null     object
 3   Score   10 non-null     int64 
dtypes: int64(2), object(2)
memory usage: 452.0+ bytes


In [6]:
df.describe()

Unnamed: 0,Age,Score
count,10.0,10.0
mean,19.9,60.5
std,1.523884,29.349427
min,18.0,10.0
25%,19.0,42.5
50%,19.5,60.5
75%,21.0,82.75
max,22.0,99.0


2. Select Columns and Rows
- Select the Name and Score columns.
- Select rows where Score > 80.

In [7]:
df[['Name', 'Score']]
df[df['Score'] > 80]

Unnamed: 0,Name,Age,City,Score
0,Sakshi Yadav,20,Indirapuram,99
1,Santosh Shah,21,Nodia,98
9,Roshani Kumar,18,Ghazibad,87


3. Filter Rows by Conditions
- Find people from 'New York' whose score is above 70.
- Count how many people are over 30 years old.

In [8]:
df[(df['City'] == 'Ghazibad') & (df['Score'] > 70)]

Unnamed: 0,Name,Age,City,Score
9,Roshani Kumar,18,Ghazibad,87


In [9]:
df[(df['Age'] > 18)]

Unnamed: 0,Name,Age,City,Score
0,Sakshi Yadav,20,Indirapuram,99
1,Santosh Shah,21,Nodia,98
2,Diya Bansal,19,Vaishali,70
3,Riya Bansal,19,Delhi,50
4,Abhishek Sharma,22,New Delhi,66
6,Shyam Gupta,22,GTB Nagar,10
7,Satyam Mishera,21,Laxmi Nagar,55
8,Rahul Rai,19,Tagore Garden,40


4. Add and Modify Columns
- Add a new column Passed which is True if Score >= 50, else False.
- Increase everyone’s score by 5 points.

In [10]:
df['Passed'] = df['Score'] >= 50
df['Score_Updated'] = df['Score'] + 5

In [11]:
df

Unnamed: 0,Name,Age,City,Score,Passed,Score_Updated
0,Sakshi Yadav,20,Indirapuram,99,True,104
1,Santosh Shah,21,Nodia,98,True,103
2,Diya Bansal,19,Vaishali,70,True,75
3,Riya Bansal,19,Delhi,50,True,55
4,Abhishek Sharma,22,New Delhi,66,True,71
5,Ram Rai,18,Preet vihar,30,False,35
6,Shyam Gupta,22,GTB Nagar,10,False,15
7,Satyam Mishera,21,Laxmi Nagar,55,True,60
8,Rahul Rai,19,Tagore Garden,40,False,45
9,Roshani Kumar,18,Ghazibad,87,True,92


5. Read Data from CSV
- Read a CSV file using pd.read_csv().
- Display the shape, column names, and data types.

In [12]:
df = pd.read_csv("c:/users/sakshi yadav/Downloads/tested.csv")

In [13]:
df.shape

(418, 12)

In [14]:
df.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

In [15]:
df.dtypes

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object