# Pandas
Pandas is a popular open-source Python library for data analysis and manipulation. It is widely used in Machine Learning and Data Science for handling datasets efficiently.

### Why use Pandas?
* Helps in reading, cleaning and analyzing data
* Makes working with structured data (CSV, Excel, SQL, etc.) easy

### Installation:
Pandas is an external library, so you need to install it using:
`pip install pandas`

## DataFrame
A Pandas DataFrame is a 2D data structure like a table with rows and columns. It is used to store and manipulate structured data efficiently.

### Creating a DataFrame in Pandas:

In [36]:
import pandas as pd

student_data = {
    "StudentID": [101,102,103,104],
    "Name": ["Alice","Bob","Charlie","David"],
    "Age": [20,21,19,22],
    "Grade": ["A","B","B","C"]
}

student_df = pd.DataFrame(student_data)
print(student_df)

   StudentID     Name  Age Grade
0        101    Alice   20     A
1        102      Bob   21     B
2        103  Charlie   19     B
3        104    David   22     C


## Basic DataFrame Operations
You can easily access, modify, add and delete columns in a Pandas DataFrame.

In [41]:
# Accessing columns
print(student_df[['Name']])            # single column
print(student_df[['Age','Grade']])     # multiple columns

      Name
0    Alice
1      Bob
2  Charlie
3    David
   Age Grade
0   20     A
1   21     B
2   19     B
3   22     C


In [43]:
# Adding a new column
student_df['NewColumn'] = [1,2,3,4]
print(student_df)

   StudentID     Name  Age Grade  NewColumn
0        101    Alice   20     A          1
1        102      Bob   21     B          2
2        103  Charlie   19     B          3
3        104    David   22     C          4


In [45]:
# Deleting a column
student_df.drop(columns=['NewColumn'], inplace=True)
print(student_df)

   StudentID     Name  Age Grade
0        101    Alice   20     A
1        102      Bob   21     B
2        103  Charlie   19     B
3        104    David   22     C


In [47]:
# Renaming a column
student_df.rename(columns={'Grade': 'FinalGrade'}, inplace=True)
print(student_df)

   StudentID     Name  Age FinalGrade
0        101    Alice   20          A
1        102      Bob   21          B
2        103  Charlie   19          B
3        104    David   22          C


## Reading a CSV File in Pandas
CSV files are widely used in Machine Learning and Data Science for handling large datasets.

In [55]:
import pandas as pd

# Reading a CSV file
df = pd.read_csv('data.csv')

# Displaying the DataFrame
print(df)

     Duration  Pulse  Maxpulse  Calories              date
0          60  110.0       130     409.1  12/31/2023 10:46
1          60  117.0       145     479.0     1/1/2024 0:00
2          60    NaN       135     340.0    1/2/2024 10:46
3          45  109.0       175     282.4    1/3/2024 10:46
4          45  117.0       148     406.0    1/4/2024 10:46
..        ...    ...       ...       ...               ...
164        60  105.0       140     290.8   6/12/2024 10:45
165        60  110.0       145     300.0   6/13/2024 10:45
166        60  115.0       145     310.2   6/14/2024 10:45
167        75  120.0       150     320.4   6/15/2024 10:45
168        75  125.0       150     330.4   6/16/2024 10:45

[169 rows x 5 columns]


>This allows you to load and analyze large datasets efficiently using Pandas!

## Statistical Analysis in Pandas
Pandas provides powerful functions to analyze data and extract insights.

In [60]:
import pandas as pd

In [64]:
# Load dataset
df = pd.read_csv('data.csv')

In [66]:
# Statistical Summary
print(df.describe()) # mean, max, min, etc.

         Duration       Pulse    Maxpulse     Calories
count  169.000000  168.000000  169.000000   164.000000
mean    63.846154  107.488095  134.047337   375.790244
std     42.299949   14.549518   16.450434   266.379919
min     15.000000   80.000000  100.000000    50.300000
25%     45.000000  100.000000  124.000000   250.925000
50%     60.000000  105.000000  131.000000   318.600000
75%     60.000000  111.000000  141.000000   387.600000
max    300.000000  159.000000  184.000000  1860.400000


In [68]:
# Unique values in a column
print(df['Pulse'].unique())

[110. 117.  nan 109. 102. 104.  98. 103. 100. 106.  90.  97. 108. 130.
 105.  92. 101.  93. 107. 114. 111.  99. 123. 118. 136. 121. 115. 153.
 159. 149. 151. 129.  83.  80. 150.  95. 152. 137. 124. 116. 112. 119.
 113. 141. 122.  85. 120. 125.]


In [70]:
# Value counts (frequency)
print(df['Pulse'].value_counts())

Pulse
100.0    19
90.0     12
109.0     9
107.0     8
103.0     8
108.0     7
97.0      7
110.0     7
106.0     6
111.0     6
98.0      6
105.0     6
102.0     6
104.0     4
114.0     4
95.0      3
115.0     3
117.0     3
118.0     3
136.0     3
93.0      3
92.0      3
99.0      2
151.0     2
112.0     2
123.0     2
80.0      2
150.0     2
101.0     2
149.0     1
116.0     1
120.0     1
85.0      1
122.0     1
141.0     1
113.0     1
119.0     1
124.0     1
159.0     1
137.0     1
152.0     1
130.0     1
121.0     1
153.0     1
83.0      1
129.0     1
125.0     1
Name: count, dtype: int64


In [80]:
df

Unnamed: 0,Duration,Pulse,Maxpulse,Calories
0,60,110.0,130,409.1
1,60,117.0,145,479.0
2,60,,135,340.0
3,45,109.0,175,282.4
4,45,117.0,148,406.0
...,...,...,...,...
164,60,105.0,140,290.8
165,60,110.0,145,300.0
166,60,115.0,145,310.2
167,75,120.0,150,320.4


In [84]:
# Correlation between numerical features
print(df.corr())

          Duration     Pulse  Maxpulse  Calories
Duration  1.000000 -0.155623  0.009403  0.922717
Pulse    -0.155623  1.000000  0.786872  0.024865
Maxpulse  0.009403  0.786872  1.000000  0.203813
Calories  0.922717  0.024865  0.203813  1.000000


In [92]:
# Other statistical functions
print("Mean:\n", df.mean())   # mean
print("\nMedian:\n", df.median()) # median
print("\nMode:\n", df.mode())   # mode

Mean:
 Duration     63.846154
Pulse       107.488095
Maxpulse    134.047337
Calories    375.790244
dtype: float64

Median:
 Duration     60.0
Pulse       105.0
Maxpulse    131.0
Calories    318.6
dtype: float64

Mode:
    Duration  Pulse  Maxpulse  Calories
0        60  100.0       120     300.0
