# Data Manipulation and Analysis with Pandas:

Data manipulation and analysis are key tasks in any data science or data analysis project. Pandas provides a wide range of functions for data manipulation and analysis, making it easier to clean, transform, and extract insights from data. In this lesson, we will cover various data manipulation and analysis techniques using Pandas.

In [None]:
import pandas as pd
df = pd.read_csv('sales1_data.csv')

In [None]:
df.head()

Unnamed: 0,Date,Product,Sales,Region
0,2023-01-01,Product3,738.0,West
1,2023-01-02,Product2,868.0,North
2,2023-01-03,Product2,554.0,West
3,2023-01-04,Product1,618.0,South
4,2023-01-05,Product3,501.0,East


In [None]:
# ## Handling Missing Values
df.isnull().any()

Unnamed: 0,0
Date,False
Product,False
Sales,True
Region,False


In [None]:
df.isnull().sum()

Unnamed: 0,0
Date,0
Product,0
Sales,6
Region,0


### filling missing values with the mean of the column

In [None]:
df['Sales'] = df['Sales'].fillna(df['Sales'].mean())

In [None]:
df.isnull().sum()

Unnamed: 0,0
Date,0
Product,0
Sales,0
Region,0


### Renaming Columns:

In [None]:
df = df.rename(columns={'Date':'Sales Date'})
df

Unnamed: 0,Sales Date,Product,Sales,Region
0,2023-01-01,Product3,738.0,West
1,2023-01-02,Product2,868.0,North
2,2023-01-03,Product2,554.0,West
3,2023-01-04,Product1,618.0,South
4,2023-01-05,Product3,501.0,East
5,2023-01-06,Product1,554.0,West
6,2023-01-07,Product3,339.0,South
7,2023-01-08,Product3,280.0,South
8,2023-01-09,Product2,806.0,North
9,2023-01-10,Product2,816.0,South


### Change datatypes:

In [None]:
df['Value_new'] = df['Sales'].astype(int)
df

Unnamed: 0,Sales Date,Product,Sales,Region,Value_new
0,2023-01-01,Product3,738.0,West,738
1,2023-01-02,Product2,868.0,North,868
2,2023-01-03,Product2,554.0,West,554
3,2023-01-04,Product1,618.0,South,618
4,2023-01-05,Product3,501.0,East,501
5,2023-01-06,Product1,554.0,West,554
6,2023-01-07,Product3,339.0,South,339
7,2023-01-08,Product3,280.0,South,280
8,2023-01-09,Product2,806.0,North,806
9,2023-01-10,Product2,816.0,South,816


In [None]:
# Want to multiply each value from 'Sales' by 2:

In [None]:
df['Sales_new'] = df['Sales'].apply(lambda x:x**2)
df

Unnamed: 0,Sales Date,Product,Sales,Region,Value_new,Sales_new
0,2023-01-01,Product3,738.0,West,738,544644.0
1,2023-01-02,Product2,868.0,North,868,753424.0
2,2023-01-03,Product2,554.0,West,554,306916.0
3,2023-01-04,Product1,618.0,South,618,381924.0
4,2023-01-05,Product3,501.0,East,501,251001.0
5,2023-01-06,Product1,554.0,West,554,306916.0
6,2023-01-07,Product3,339.0,South,339,114921.0
7,2023-01-08,Product3,280.0,South,280,78400.0
8,2023-01-09,Product2,806.0,North,806,649636.0
9,2023-01-10,Product2,816.0,South,816,665856.0


## Data Aggregating And Grouping

In [None]:
# Group avg. sales value by product:
grouped_mean = df.groupby('Product')['Sales'].mean()

In [None]:
grouped_mean

Unnamed: 0_level_0,Sales
Product,Unnamed: 1_level_1
Product1,481.166667
Product2,661.625
Product3,515.454545


In [None]:
# Grouped Sum:
grpuped_sum = df.groupby(['Product', 'Region'])['Sales'].sum()
grpuped_sum

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
Product,Region,Unnamed: 2_level_1
Product1,South,2333.0
Product1,West,554.0
Product2,North,2632.0
Product2,South,816.0
Product2,West,1845.0
Product3,East,2816.0
Product3,South,619.0
Product3,West,2235.0


#### Aggregate multiple functions:

In [None]:
groudped_agg=df.groupby('Product')['Sales'].agg(['mean','sum','count'])
groudped_agg

Unnamed: 0_level_0,mean,sum,count
Product,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Product1,481.166667,2887.0,6
Product2,661.625,5293.0,8
Product3,515.454545,5670.0,11
