### Scenario: 
### You have a dataset on customer purchases with the following issues: 
### Missing values in the "price" column. 

### Inconsistent data types (prices might be strings with a currency symbol). 
### Inconsistent product names (typos or capitalization). 

### Goal: Clean and transform the data using pandas functions.


In [65]:
# install and import library 
# pip install pandas
import pandas as pd

In [66]:
# Load the dataset from a CSV file
df = pd.read_csv('car_sales.csv')

In [67]:
# dimension of dataset
df.shape

(20, 7)

In [68]:
# top 5 rows
df.head()

Unnamed: 0,Date,Salesperson,Customer Name,Car Make,Car Model,Car Year,Sale Price
0,01-08-2022,Monica Moore MD,Mary Butler,Nissan,Altima,2018,15983
1,15-03-2023,Roberto Rose,Richard Pierce,Nissan,F-150,2016,$238
2,29-04-2023,Ashley Ramos,Sandra Moore,Ford,Civic,2016,
3,04-09-2022,Patrick Harris,Johnny Scott,Ford,Altima,2013,41937
4,16-06-2022,Eric Lopez,Vanessa Jones,Honda,Silverado,2022,20256


In [69]:
# know data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Date           20 non-null     object
 1   Salesperson    20 non-null     object
 2   Customer Name  20 non-null     object
 3   Car Make       20 non-null     object
 4   Car Model      20 non-null     object
 5   Car Year       20 non-null     int64 
 6   Sale Price     15 non-null     object
dtypes: int64(1), object(6)
memory usage: 1.2+ KB


In [71]:
# count null values
df.isnull().sum()

Date             0
Salesperson      0
Customer Name    0
Car Make         0
Car Model        0
Car Year         0
Sale Price       5
dtype: int64

In [72]:
# replace $ sign
df['Sale Price'] = df['Sale Price'].replace({'\$': '', ',': ''}, regex=True)

In [73]:
df

Unnamed: 0,Date,Salesperson,Customer Name,Car Make,Car Model,Car Year,Sale Price
0,01-08-2022,Monica Moore MD,Mary Butler,Nissan,Altima,2018,15983.0
1,15-03-2023,Roberto Rose,Richard Pierce,Nissan,F-150,2016,238.0
2,29-04-2023,Ashley Ramos,Sandra Moore,Ford,Civic,2016,
3,04-09-2022,Patrick Harris,Johnny Scott,Ford,Altima,2013,41937.0
4,16-06-2022,Eric Lopez,Vanessa Jones,Honda,Silverado,2022,20256.0
5,18-12-2022,Terry Perkins MD,John Olsen,Ford,Altima,2015,147.69
6,12-06-2022,Ashley Brown,Tyler Lawson,Honda,F-150,2013,41397.0
7,20-06-2022,Norma Watkins,Michael Bond,Ford,Altima,2015,
8,02-09-2022,Scott Parker,Stephanie Smith,Ford,Corolla,2021,
9,06-04-2023,Andrew Smith,Ashley Moreno DDS,Ford,Civic,2018,16309.0


In [81]:
# fillna 
df['Sale Price'].fillna(0)

0      15983
1        238
2          0
3      41937
4      20256
5     147.69
6      41397
7          0
8          0
9      16309
10     41259
11     48224
12     36409
13     29628
14         0
15     36607
16     39916
17     18482
18     17158
19         0
Name: Sale Price, dtype: object