Python Data Analysis: Sales and Revenue Insights
This project demonstrates a fundamental data analysis workflow using Python and several key libraries. The goal is to perform data cleaning, transformation, and exploratory analysis on a sales dataset to answer specific business questions.



In [None]:
import pandas as pd
df = pd.read_csv('kiwilytics_orders.csv')
print('Summary of the dataset: \n')
print(df.head())
print('Information of the dataset: \n')
print(df.info())
print('Description of the dataset: \n')
print(df.describe())
print('Missing values in the dataset: \n')
print(df.isna().sum())
print('Unique values in the dataset: \n')
print(df.nunique())
print('Value counts in the dataset: \n')
print(df.value_counts())



Summary of the dataset: 

   order_id customer_name     product  quantity  unit_price  order_date
0         1         Maria  Kiwi Chips         2         3.0  2024-01-29
1         2       Richard  Kiwi Chips         1         3.0  2024-01-08
2         3      Nicholas  Kiwi Candy         1         2.5  2024-01-25
3         4       Raymond  Kiwi Candy         4         NaN  2024-01-04
4         5         David  Kiwi Juice         1         4.5  2024-02-25
Information of the dataset: 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   order_id       100 non-null    int64  
 1   customer_name  100 non-null    object 
 2   product        100 non-null    object 
 3   quantity       100 non-null    int64  
 4   unit_price     92 non-null     float64
 5   order_date     100 non-null    object 
dtypes: float64(1), int64(2), object(3)
memory usage: 4.8

In [9]:
# Fille null values in unit_price with the mean of the product
df['unit_price'] = df['unit_price'].fillna(df.groupby('product')['unit_price'].transform('mean'))
print('Missing values in the dataset: \n')
print(df.isna().sum())

Missing values in the dataset: 

order_id         0
customer_name    0
product          0
quantity         0
unit_price       0
order_date       0
dtype: int64


In [13]:
# Setting order date to datetime
df['order_date'] = pd.to_datetime(df['order_date'])
print('Order date is now a datetime: \n')
print(df['order_date'].head())


Order date is now a datetime: 

0   2024-01-29
1   2024-01-08
2   2024-01-25
3   2024-01-04
4   2024-02-25
Name: order_date, dtype: datetime64[ns]


In [None]:
# Calculate the total price of the order
df['total_price'] = df['quantity'] * df['unit_price']
print('Total price of the order: \n')
print(df['total_price'].sum())



Total price of the order: 

1167.5
Revenue: 

1167.5


In [None]:
# Get highest product quantity sold 
products_sellings = df.groupby('product')['quantity'].sum()
print('Products sellings: \n')
print(products_sellings)

# Get highest product price per product
highest_product_price_per_product = df.groupby('product')['unit_price'].max()
print('Highest product price per product: \n')
print(highest_product_price_per_product)


Products sellings: 

product
Kiwi Candy       70
Kiwi Chips       73
Kiwi Jam         41
Kiwi Juice       61
Kiwi Smoothie    46
Name: quantity, dtype: int64
Highest product price per product: 

product
Kiwi Candy       2.5
Kiwi Chips       3.0
Kiwi Jam         6.0
Kiwi Juice       4.5
Kiwi Smoothie    5.5
Name: unit_price, dtype: float64


In [None]:
# Get highest product quantity sold 
products_sellings = df.groupby('product')['quantity'].sum()
print('Products sellings: \n')
print(products_sellings)

# Get highest product price per product
highest_product_price_per_product = df.groupby('product')['unit_price'].max()
print('Highest product price per product: \n')
print(highest_product_price_per_product)


Products sellings: 

product
Kiwi Candy       70
Kiwi Chips       73
Kiwi Jam         41
Kiwi Juice       61
Kiwi Smoothie    46
Name: quantity, dtype: int64
Highest product price per product: 

product
Kiwi Candy       2.5
Kiwi Chips       3.0
Kiwi Jam         6.0
Kiwi Juice       4.5
Kiwi Smoothie    5.5
Name: unit_price, dtype: float64


In [None]:
# Get highest product quantity sold 
products_sellings = df.groupby('product')['quantity'].sum()
print('Products sellings: \n')
print(products_sellings)

# Get highest product price per product
highest_product_price_per_product = df.groupby('product')['unit_price'].max()
print('Highest product price per product: \n')
print(highest_product_price_per_product)


Products sellings: 

product
Kiwi Candy       70
Kiwi Chips       73
Kiwi Jam         41
Kiwi Juice       61
Kiwi Smoothie    46
Name: quantity, dtype: int64
Highest product price per product: 

product
Kiwi Candy       2.5
Kiwi Chips       3.0
Kiwi Jam         6.0
Kiwi Juice       4.5
Kiwi Smoothie    5.5
Name: unit_price, dtype: float64


In [None]:
# Get highest product quantity sold 
products_sellings = df.groupby('product')['quantity'].sum()
print('Products sellings: \n')
print(products_sellings)

# Get highest product price per product
highest_product_price_per_product = df.groupby('product')['unit_price'].max()
print('Highest product price per product: \n')
print(highest_product_price_per_product)


Products sellings: 

product
Kiwi Candy       70
Kiwi Chips       73
Kiwi Jam         41
Kiwi Juice       61
Kiwi Smoothie    46
Name: quantity, dtype: int64
Highest product price per product: 

product
Kiwi Candy       2.5
Kiwi Chips       3.0
Kiwi Jam         6.0
Kiwi Juice       4.5
Kiwi Smoothie    5.5
Name: unit_price, dtype: float64


In [None]:
# Get highest product quantity sold 
products_sellings = df.groupby('product')['quantity'].sum()
print('Products sellings: \n')
print(products_sellings)

# Get highest product price per product
highest_product_price_per_product = df.groupby('product')['unit_price'].max()
print('Highest product price per product: \n')
print(highest_product_price_per_product)


Products sellings: 

product
Kiwi Candy       70
Kiwi Chips       73
Kiwi Jam         41
Kiwi Juice       61
Kiwi Smoothie    46
Name: quantity, dtype: int64
Highest product price per product: 

product
Kiwi Candy       2.5
Kiwi Chips       3.0
Kiwi Jam         6.0
Kiwi Juice       4.5
Kiwi Smoothie    5.5
Name: unit_price, dtype: float64


In [None]:
# Get highest product quantity sold 
products_sellings = df.groupby('product')['quantity'].sum()
print('Products sellings: \n')
print(products_sellings)

# Get highest product price per product
highest_product_price_per_product = df.groupby('product')['unit_price'].max()
print('Highest product price per product: \n')
print(highest_product_price_per_product)


Products sellings: 

product
Kiwi Candy       70
Kiwi Chips       73
Kiwi Jam         41
Kiwi Juice       61
Kiwi Smoothie    46
Name: quantity, dtype: int64
Highest product price per product: 

product
Kiwi Candy       2.5
Kiwi Chips       3.0
Kiwi Jam         6.0
Kiwi Juice       4.5
Kiwi Smoothie    5.5
Name: unit_price, dtype: float64
