## PROJECT DESCRIPTION

Sales analytics is the practice of generating insights from sales data, trends, and metrics to set targets and forecast future sales performance.

In this analytics project,  I explore the data to evaluate the performance of the sales team against specified goals. 
The project provides insights about the top performing and underperforming products/services, the problems in selling and market opportunities, sales forecasting, and sales activities that generate revenue.

### PROJECT QUESTIONS AND ANALYSIS

In the analysis of the project data, these questions will be answered with accompanying visualizations.

1: What was the best Year for sales? How much was earned that Year?

2: What City had the highest number of sales?

3: What was the best month for sales? How much was earned that month?

4: What products are most often sold together?

5: What product sold the most? Why do you think it sold the most?

6: What time should we display adverstisement to maximize likelihood of customer's buying product?

7: What is the probability that people will order USB-C Charging Cable?

8: What is the probability that people will orderiPhone?

9: What is the probability that people will order Google Phone?

10:What is the probability that people will order Wired Headphones?

### IMPORT LIBRARIES

In [117]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib
from matplotlib import pyplot as plt

import warnings
warnings.filterwarnings("ignore")

%matplotlib inline

### IMPORT DATASETS

In [50]:
jan_data = pd.read_csv("Sales_January_2019.csv")
feb_data = pd.read_csv("Sales_February_2019.csv")
mar_data = pd.read_csv("Sales_March_2019.csv")
apr_data = pd.read_csv("Sales_April_2019.csv")
may_data = pd.read_csv("Sales_May_2019.csv")
jun_data = pd.read_csv("Sales_June_2019.csv")
jul_data = pd.read_csv("Sales_July_2019.csv")
aug_data = pd.read_csv("Sales_August_2019.csv")
sep_data = pd.read_csv("Sales_September_2019.csv")
oct_data = pd.read_csv("Sales_October_2019.csv")
nov_data = pd.read_csv("Sales_November_2019.csv")
dec_data = pd.read_csv("Sales_December_2019.csv")

## Analyze details of each of the datasets

### January product sales

In [21]:
# Checking the head of data

jan_data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,141234,iPhone,1,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215"
1,141235,Lightning Charging Cable,1,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035"
2,141236,Wired Headphones,2,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016"
3,141237,27in FHD Monitor,1,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001"
4,141238,Wired Headphones,1,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301"


In [70]:
# Checking the shape of the data

jan_data.shape

(9671, 6)

There are 9671 rows and 6 columns in the dataset

In [22]:
# Checking basic info of data

jan_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9723 entries, 0 to 9722
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Order ID          9697 non-null   object
 1   Product           9697 non-null   object
 2   Quantity Ordered  9697 non-null   object
 3   Price Each        9697 non-null   object
 4   Order Date        9697 non-null   object
 5   Purchase Address  9697 non-null   object
dtypes: object(6)
memory usage: 455.9+ KB


In [31]:
# Checking for null values

jan_data.isna().sum()

Order ID            26
Product             26
Quantity Ordered    26
Price Each          26
Order Date          26
Purchase Address    26
dtype: int64

There are 26 instances of NaN values in the dataset. These entries will be dropped.

In [114]:
# Drop NaN values

jan_data.dropna(how="all", axis=0, inplace=True)

#### Checking for duplicated entries

In [49]:
jan_data.duplicated().any()

True

In [50]:
# Checking the number of duplicates

jan_data.duplicated().sum()

25

In [51]:
# Investigating the occurrence of duplicates

jan_data[jan_data.duplicated()]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
875,142071,AA Batteries (4-pack),1,3.84,01/17/19 23:02,"131 2nd St, Boston, MA 02215"
1102,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1194,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1897,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
2463,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3115,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3247,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3612,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3623,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
4126,145143,Lightning Charging Cable,1,14.95,01/06/19 03:01,"182 Jefferson St, San Francisco, CA 94016"


There are instances where "Order ID" column has non-numeric values. Drop these rows from the dataset.

In [52]:
# Drop rows that have Order ID as values in the "Order ID" column

jan_data = jan_data[jan_data["Order ID"] != "Order ID"]

In [53]:
jan_data[jan_data.duplicated()]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
875,142071,AA Batteries (4-pack),1,3.84,01/17/19 23:02,"131 2nd St, Boston, MA 02215"
4126,145143,Lightning Charging Cable,1,14.95,01/06/19 03:01,"182 Jefferson St, San Francisco, CA 94016"
5811,146765,Google Phone,1,600.0,01/21/19 11:23,"918 Highland St, New York City, NY 10001"
6807,147707,Wired Headphones,1,11.99,01/04/19 16:50,"883 4th St, Dallas, TX 75001"
8134,148984,USB-C Charging Cable,1,11.95,01/08/19 17:36,"562 14th St, Boston, MA 02215"
8309,149149,Lightning Charging Cable,1,14.95,01/12/19 12:30,"180 1st St, Boston, MA 02215"
8470,149308,Apple Airpods Headphones,1,150.0,01/02/19 23:07,"351 Madison St, New York City, NY 10001"
8690,149515,USB-C Charging Cable,1,11.95,01/14/19 21:19,"913 10th St, Los Angeles, CA 90001"
8923,149738,USB-C Charging Cable,1,11.95,01/11/19 11:22,"612 West St, New York City, NY 10001"
9427,150216,Wired Headphones,1,11.99,01/21/19 09:20,"691 Pine St, San Francisco, CA 94016"


#### Investigate the issue of duplicates

In [57]:
# Order ID --- 142071

jan_data[jan_data["Order ID"] == "142071"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
874,142071,AA Batteries (4-pack),1,3.84,01/17/19 23:02,"131 2nd St, Boston, MA 02215"
875,142071,AA Batteries (4-pack),1,3.84,01/17/19 23:02,"131 2nd St, Boston, MA 02215"


In [58]:
# Order ID --- 145143

jan_data[jan_data["Order ID"] == "145143"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
4125,145143,Lightning Charging Cable,1,14.95,01/06/19 03:01,"182 Jefferson St, San Francisco, CA 94016"
4126,145143,Lightning Charging Cable,1,14.95,01/06/19 03:01,"182 Jefferson St, San Francisco, CA 94016"


In [59]:
# Order ID --- 146765

jan_data[jan_data["Order ID"] == "146765"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
5810,146765,Google Phone,1,600,01/21/19 11:23,"918 Highland St, New York City, NY 10001"
5811,146765,Google Phone,1,600,01/21/19 11:23,"918 Highland St, New York City, NY 10001"


In [60]:
# Order ID --- 147707

jan_data[jan_data["Order ID"] == "147707"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
6806,147707,Wired Headphones,1,11.99,01/04/19 16:50,"883 4th St, Dallas, TX 75001"
6807,147707,Wired Headphones,1,11.99,01/04/19 16:50,"883 4th St, Dallas, TX 75001"


In [61]:
# Order ID --- 148984

jan_data[jan_data["Order ID"] == "148984"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
8133,148984,USB-C Charging Cable,1,11.95,01/08/19 17:36,"562 14th St, Boston, MA 02215"
8134,148984,USB-C Charging Cable,1,11.95,01/08/19 17:36,"562 14th St, Boston, MA 02215"


In [62]:
# Order ID --- 149149

jan_data[jan_data["Order ID"] == "149149"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
8308,149149,Lightning Charging Cable,1,14.95,01/12/19 12:30,"180 1st St, Boston, MA 02215"
8309,149149,Lightning Charging Cable,1,14.95,01/12/19 12:30,"180 1st St, Boston, MA 02215"


In [63]:
# Order ID --- 149308

jan_data[jan_data["Order ID"] == "149308"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
8469,149308,Apple Airpods Headphones,1,150,01/02/19 23:07,"351 Madison St, New York City, NY 10001"
8470,149308,Apple Airpods Headphones,1,150,01/02/19 23:07,"351 Madison St, New York City, NY 10001"


In [64]:
# Order ID --- 149515

jan_data[jan_data["Order ID"] == "149515"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
8689,149515,USB-C Charging Cable,1,11.95,01/14/19 21:19,"913 10th St, Los Angeles, CA 90001"
8690,149515,USB-C Charging Cable,1,11.95,01/14/19 21:19,"913 10th St, Los Angeles, CA 90001"


In [65]:
# Order ID --- 149738

jan_data[jan_data["Order ID"] == "149738"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
8922,149738,USB-C Charging Cable,1,11.95,01/11/19 11:22,"612 West St, New York City, NY 10001"
8923,149738,USB-C Charging Cable,1,11.95,01/11/19 11:22,"612 West St, New York City, NY 10001"


In [66]:
# Order ID --- 150216

jan_data[jan_data["Order ID"] == "150216"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
9426,150216,Wired Headphones,1,11.99,01/21/19 09:20,"691 Pine St, San Francisco, CA 94016"
9427,150216,Wired Headphones,1,11.99,01/21/19 09:20,"691 Pine St, San Francisco, CA 94016"


From the duplicates investigations, it can be realised that each duplicated entries had same values. One instance of each duplicated entry will be dropped.

In [115]:
# Drop duplicated entries

jan_data.drop_duplicates(subset=None, keep="first", inplace=True)

### February product sales

In [71]:
# Checking the head of the data

feb_data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,150502,iPhone,1,700.0,02/18/19 01:35,"866 Spruce St, Portland, ME 04101"
1,150503,AA Batteries (4-pack),1,3.84,02/13/19 07:24,"18 13th St, San Francisco, CA 94016"
2,150504,27in 4K Gaming Monitor,1,389.99,02/18/19 09:46,"52 6th St, New York City, NY 10001"
3,150505,Lightning Charging Cable,1,14.95,02/02/19 16:47,"129 Cherry St, Atlanta, GA 30301"
4,150506,AA Batteries (4-pack),2,3.84,02/28/19 20:32,"548 Lincoln St, Seattle, WA 98101"


In [72]:
# Checking the shape of the data

feb_data.shape

(12036, 6)

There are 12036 rows and 6 columns in the data

In [73]:
# Checking the basic info of the data

feb_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12036 entries, 0 to 12035
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Order ID          12004 non-null  object
 1   Product           12004 non-null  object
 2   Quantity Ordered  12004 non-null  object
 3   Price Each        12004 non-null  object
 4   Order Date        12004 non-null  object
 5   Purchase Address  12004 non-null  object
dtypes: object(6)
memory usage: 564.3+ KB


In [74]:
# Checking for the null values in the data

feb_data.isna().sum()

Order ID            32
Product             32
Quantity Ordered    32
Price Each          32
Order Date          32
Purchase Address    32
dtype: int64

There are 32 instances of NaN values in the data

In [75]:
# Drop NaN values from the data

feb_data.dropna(how="all", axis=0, inplace=True)

In [77]:
# Checking for duplicates in the data

feb_data.duplicated().any()

True

In [78]:
# Checking the number of duplicates

feb_data.duplicated().sum()

35

There are 35 instances of duplicated entries in the data

In [79]:
# Investigating the occurrence of duplicates

feb_data[feb_data.duplicated()]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
432,150917,Lightning Charging Cable,1,14.95,02/06/19 16:07,"111 10th St, Austin, TX 73301"
442,150925,iPhone,1,700,02/07/19 17:43,"784 Elm St, Boston, MA 02215"
461,150943,USB-C Charging Cable,1,11.95,02/06/19 19:13,"759 1st St, Austin, TX 73301"
548,151024,Wired Headphones,1,11.99,02/19/19 08:39,"35 Pine St, Portland, OR 97035"
1164,151616,USB-C Charging Cable,1,11.95,02/25/19 19:29,"666 Meadow St, Boston, MA 02215"
1224,151673,Wired Headphones,1,11.99,02/10/19 21:52,"504 Center St, Dallas, TX 75001"
1417,151856,USB-C Charging Cable,1,11.95,02/06/19 12:11,"475 Jackson St, San Francisco, CA 94016"
1904,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1918,152330,Bose SoundSport Headphones,1,99.99,02/25/19 18:53,"827 Dogwood St, Los Angeles, CA 90001"
2050,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address


There are instances where "Order ID" column has non-numeric values. Drop these rows from the dataset.

In [81]:
# Drop rows that have Order ID as values in the "Order ID" column

feb_data = feb_data[feb_data["Order ID"] != "Order ID"]

In [82]:
feb_data[feb_data.duplicated()]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
432,150917,Lightning Charging Cable,1,14.95,02/06/19 16:07,"111 10th St, Austin, TX 73301"
442,150925,iPhone,1,700.0,02/07/19 17:43,"784 Elm St, Boston, MA 02215"
461,150943,USB-C Charging Cable,1,11.95,02/06/19 19:13,"759 1st St, Austin, TX 73301"
548,151024,Wired Headphones,1,11.99,02/19/19 08:39,"35 Pine St, Portland, OR 97035"
1164,151616,USB-C Charging Cable,1,11.95,02/25/19 19:29,"666 Meadow St, Boston, MA 02215"
1224,151673,Wired Headphones,1,11.99,02/10/19 21:52,"504 Center St, Dallas, TX 75001"
1417,151856,USB-C Charging Cable,1,11.95,02/06/19 12:11,"475 Jackson St, San Francisco, CA 94016"
1918,152330,Bose SoundSport Headphones,1,99.99,02/25/19 18:53,"827 Dogwood St, Los Angeles, CA 90001"
2937,153304,Wired Headphones,1,11.99,02/12/19 20:08,"74 Meadow St, Austin, TX 73301"
2949,153315,Wired Headphones,1,11.99,02/13/19 14:47,"953 Jefferson St, Atlanta, GA 30301"


#### Investigating the issue of duplicated entries

In [4]:
# Order ID --- 150917

feb_data[feb_data["Order ID"] == "150917"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
431,150917,Lightning Charging Cable,1,14.95,02/06/19 16:07,"111 10th St, Austin, TX 73301"
432,150917,Lightning Charging Cable,1,14.95,02/06/19 16:07,"111 10th St, Austin, TX 73301"


In [5]:
# Order ID --- 150925

feb_data[feb_data["Order ID"] == "150925"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
440,150925,iPhone,1,700.0,02/07/19 17:43,"784 Elm St, Boston, MA 02215"
441,150925,Lightning Charging Cable,1,14.95,02/07/19 17:43,"784 Elm St, Boston, MA 02215"
442,150925,iPhone,1,700.0,02/07/19 17:43,"784 Elm St, Boston, MA 02215"


In [6]:
# Order ID --- 150943

feb_data[feb_data["Order ID"] == "150943"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
460,150943,USB-C Charging Cable,1,11.95,02/06/19 19:13,"759 1st St, Austin, TX 73301"
461,150943,USB-C Charging Cable,1,11.95,02/06/19 19:13,"759 1st St, Austin, TX 73301"


In [7]:
# Order ID --- 151024

feb_data[feb_data["Order ID"] == "151024"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
547,151024,Wired Headphones,1,11.99,02/19/19 08:39,"35 Pine St, Portland, OR 97035"
548,151024,Wired Headphones,1,11.99,02/19/19 08:39,"35 Pine St, Portland, OR 97035"


In [8]:
# Order ID --- 151616

feb_data[feb_data["Order ID"] == "151616"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1163,151616,USB-C Charging Cable,1,11.95,02/25/19 19:29,"666 Meadow St, Boston, MA 02215"
1164,151616,USB-C Charging Cable,1,11.95,02/25/19 19:29,"666 Meadow St, Boston, MA 02215"


In [9]:
# Order ID --- 151673

feb_data[feb_data["Order ID"] == "151673"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1223,151673,Wired Headphones,1,11.99,02/10/19 21:52,"504 Center St, Dallas, TX 75001"
1224,151673,Wired Headphones,1,11.99,02/10/19 21:52,"504 Center St, Dallas, TX 75001"


In [10]:
# Order ID --- 151856

feb_data[feb_data["Order ID"] == "151856"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1416,151856,USB-C Charging Cable,1,11.95,02/06/19 12:11,"475 Jackson St, San Francisco, CA 94016"
1417,151856,USB-C Charging Cable,1,11.95,02/06/19 12:11,"475 Jackson St, San Francisco, CA 94016"


In [11]:
# Order ID --- 152330

feb_data[feb_data["Order ID"] == "152330"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1917,152330,Bose SoundSport Headphones,1,99.99,02/25/19 18:53,"827 Dogwood St, Los Angeles, CA 90001"
1918,152330,Bose SoundSport Headphones,1,99.99,02/25/19 18:53,"827 Dogwood St, Los Angeles, CA 90001"


In [12]:
# Order ID --- 153304

feb_data[feb_data["Order ID"] == "153304"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
2936,153304,Wired Headphones,1,11.99,02/12/19 20:08,"74 Meadow St, Austin, TX 73301"
2937,153304,Wired Headphones,1,11.99,02/12/19 20:08,"74 Meadow St, Austin, TX 73301"


In [13]:
# Order ID --- 153315

feb_data[feb_data["Order ID"] == "153315"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
2948,153315,Wired Headphones,1,11.99,02/13/19 14:47,"953 Jefferson St, Atlanta, GA 30301"
2949,153315,Wired Headphones,1,11.99,02/13/19 14:47,"953 Jefferson St, Atlanta, GA 30301"


In [14]:
# Order ID --- 154747

feb_data[feb_data["Order ID"] == "154747"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
4456,154747,27in 4K Gaming Monitor,1,389.99,02/01/19 22:46,"367 Cedar St, Austin, TX 73301"
4457,154747,27in 4K Gaming Monitor,1,389.99,02/01/19 22:46,"367 Cedar St, Austin, TX 73301"


In [15]:
# Order ID --- 155697

feb_data[feb_data["Order ID"] == "155697"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
5449,155697,AA Batteries (4-pack),1,3.84,02/13/19 15:17,"961 Spruce St, Boston, MA 02215"
5450,155697,AA Batteries (4-pack),1,3.84,02/13/19 15:17,"961 Spruce St, Boston, MA 02215"


In [16]:
# Order ID --- 156109

feb_data[feb_data["Order ID"] == "156109"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
5879,156109,Bose SoundSport Headphones,1,99.99,02/18/19 09:18,"450 Jackson St, Boston, MA 02215"
5880,156109,Bose SoundSport Headphones,1,99.99,02/18/19 09:18,"450 Jackson St, Boston, MA 02215"


In [17]:
# Order ID --- 156247

feb_data[feb_data["Order ID"] == "156247"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
6024,156247,AAA Batteries (4-pack),1,2.99,02/09/19 07:29,"511 Dogwood St, Los Angeles, CA 90001"
6025,156247,AAA Batteries (4-pack),1,2.99,02/09/19 07:29,"511 Dogwood St, Los Angeles, CA 90001"


In [18]:
# Order ID --- 158236

feb_data[feb_data["Order ID"] == "158236"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
8110,158236,AA Batteries (4-pack),1,3.84,02/19/19 09:49,"319 West St, San Francisco, CA 94016"
8111,158236,AA Batteries (4-pack),1,3.84,02/19/19 09:49,"319 West St, San Francisco, CA 94016"


In [19]:
# Order ID --- 158841

feb_data[feb_data["Order ID"] == "158841"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
8733,158841,34in Ultrawide Monitor,1,379.99,02/01/19 23:16,"786 Willow St, Boston, MA 02215"
8734,158841,34in Ultrawide Monitor,1,379.99,02/01/19 23:16,"786 Willow St, Boston, MA 02215"


In [20]:
# Order ID --- 161567

feb_data[feb_data["Order ID"] == "161567"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
11573,161567,Apple Airpods Headphones,1,150,02/10/19 11:42,"413 Walnut St, San Francisco, CA 94016"
11574,161567,Apple Airpods Headphones,1,150,02/10/19 11:42,"413 Walnut St, San Francisco, CA 94016"


From the duplicates investigations, it can be realised that each duplicated entries had same values. One instance of each duplicated entry will be dropped. 

In [112]:
# Drop duplicated entries

feb_data.drop_duplicates(subset=None, keep="first", inplace=True)

### March Product Sales

In [23]:
# Checking the head of the March Product Sales data

mar_data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,162009,iPhone,1,700.0,03/28/19 20:59,"942 Church St, Austin, TX 73301"
1,162009,Lightning Charging Cable,1,14.95,03/28/19 20:59,"942 Church St, Austin, TX 73301"
2,162009,Wired Headphones,2,11.99,03/28/19 20:59,"942 Church St, Austin, TX 73301"
3,162010,Bose SoundSport Headphones,1,99.99,03/17/19 05:39,"261 10th St, San Francisco, CA 94016"
4,162011,34in Ultrawide Monitor,1,379.99,03/10/19 00:01,"764 13th St, San Francisco, CA 94016"


In [25]:
# Checking the shape of the March sales data

mar_data.shape

(15226, 6)

From the shape, there are 15226 rows and 6 columns

In [26]:
# Checking the basic info of the March sales data

mar_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15226 entries, 0 to 15225
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Order ID          15189 non-null  object
 1   Product           15189 non-null  object
 2   Quantity Ordered  15189 non-null  object
 3   Price Each        15189 non-null  object
 4   Order Date        15189 non-null  object
 5   Purchase Address  15189 non-null  object
dtypes: object(6)
memory usage: 713.8+ KB


From the basic info of the dataset, there are instances of null values. Also, the date is in object format

In [27]:
# Checking the sum of null values in the data

mar_data.isnull().sum()

Order ID            37
Product             37
Quantity Ordered    37
Price Each          37
Order Date          37
Purchase Address    37
dtype: int64

There are 37 instances of null values in our dataset. We will drop these values as they seem insignificant as compared to the sum of entries in the entire dataset.

In [51]:
# Drop null values from the data

mar_data.dropna(how="all", axis=0, inplace=True)

In [30]:
# Check for duplicates in the dataset

mar_data.duplicated().any()

True

In [31]:
# Check the sum of duplicated entries

mar_data.duplicated().sum()

59

There are 59 instances of duplicated entries in the data

In [52]:
# Check the instances of the duplicates

mar_data[mar_data.duplicated()]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
341,162332,Flatscreen TV,1,300,03/20/19 14:23,"925 10th St, Atlanta, GA 30301"
864,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
930,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1066,163018,AAA Batteries (4-pack),1,2.99,03/17/19 14:10,"694 Cedar St, Seattle, WA 98101"
1979,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
2032,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
2107,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
2142,164046,Bose SoundSport Headphones,1,99.99,03/17/19 20:44,"837 Dogwood St, San Francisco, CA 94016"
2485,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
2728,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address


In [53]:
# Remove "Order ID" as values in the Order ID column

mar_data = mar_data[mar_data["Order ID"] != "Order ID"]

In [54]:
# Re-checking the instances of duplicates

mar_data[mar_data.duplicated()]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
341,162332,Flatscreen TV,1,300.0,03/20/19 14:23,"925 10th St, Atlanta, GA 30301"
1066,163018,AAA Batteries (4-pack),1,2.99,03/17/19 14:10,"694 Cedar St, Seattle, WA 98101"
2142,164046,Bose SoundSport Headphones,1,99.99,03/17/19 20:44,"837 Dogwood St, San Francisco, CA 94016"
2966,164825,Lightning Charging Cable,1,14.95,03/23/19 18:51,"34 Pine St, San Francisco, CA 94016"
3338,165180,Lightning Charging Cable,1,14.95,03/24/19 12:57,"597 5th St, Seattle, WA 98101"
3651,165481,Apple Airpods Headphones,1,150.0,03/19/19 18:55,"422 4th St, Los Angeles, CA 90001"
3848,165668,34in Ultrawide Monitor,1,379.99,03/27/19 11:28,"386 Jackson St, San Francisco, CA 94016"
4130,165934,USB-C Charging Cable,1,11.95,03/24/19 08:25,"521 Forest St, Seattle, WA 98101"
5215,166981,AAA Batteries (4-pack),1,2.99,03/31/19 01:40,"557 Wilson St, Dallas, TX 75001"
5685,167429,Lightning Charging Cable,1,14.95,03/27/19 05:05,"430 Lake St, San Francisco, CA 94016"


#### Investigate the issue of duplicated values

In [56]:
# Order ID --- 162332

mar_data[mar_data["Order ID"] == "162332"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
340,162332,Flatscreen TV,1,300,03/20/19 14:23,"925 10th St, Atlanta, GA 30301"
341,162332,Flatscreen TV,1,300,03/20/19 14:23,"925 10th St, Atlanta, GA 30301"


In [57]:
# Order ID --- 163018

mar_data[mar_data["Order ID"] == "163018"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1065,163018,AAA Batteries (4-pack),1,2.99,03/17/19 14:10,"694 Cedar St, Seattle, WA 98101"
1066,163018,AAA Batteries (4-pack),1,2.99,03/17/19 14:10,"694 Cedar St, Seattle, WA 98101"


In [58]:
# Order ID --- 164046

mar_data[mar_data["Order ID"] == "164046"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
2141,164046,Bose SoundSport Headphones,1,99.99,03/17/19 20:44,"837 Dogwood St, San Francisco, CA 94016"
2142,164046,Bose SoundSport Headphones,1,99.99,03/17/19 20:44,"837 Dogwood St, San Francisco, CA 94016"


In [61]:
# Order ID --- 164825

mar_data[mar_data["Order ID"] == "164825"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
2965,164825,Lightning Charging Cable,1,14.95,03/23/19 18:51,"34 Pine St, San Francisco, CA 94016"
2966,164825,Lightning Charging Cable,1,14.95,03/23/19 18:51,"34 Pine St, San Francisco, CA 94016"


In [62]:
# Order ID --- 165180

mar_data[mar_data["Order ID"] == "165180"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3337,165180,Lightning Charging Cable,1,14.95,03/24/19 12:57,"597 5th St, Seattle, WA 98101"
3338,165180,Lightning Charging Cable,1,14.95,03/24/19 12:57,"597 5th St, Seattle, WA 98101"


In [63]:
# Order ID --- 165481

mar_data[mar_data["Order ID"] == "165481"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3650,165481,Apple Airpods Headphones,1,150,03/19/19 18:55,"422 4th St, Los Angeles, CA 90001"
3651,165481,Apple Airpods Headphones,1,150,03/19/19 18:55,"422 4th St, Los Angeles, CA 90001"


In [64]:
# Order ID --- 165668

mar_data[mar_data["Order ID"] == "165668"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3847,165668,34in Ultrawide Monitor,1,379.99,03/27/19 11:28,"386 Jackson St, San Francisco, CA 94016"
3848,165668,34in Ultrawide Monitor,1,379.99,03/27/19 11:28,"386 Jackson St, San Francisco, CA 94016"


In [65]:
# Order ID --- 165934

mar_data[mar_data["Order ID"] == "165934"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
4129,165934,USB-C Charging Cable,1,11.95,03/24/19 08:25,"521 Forest St, Seattle, WA 98101"
4130,165934,USB-C Charging Cable,1,11.95,03/24/19 08:25,"521 Forest St, Seattle, WA 98101"


In [66]:
# Order ID --- 166981

mar_data[mar_data["Order ID"] == "166981"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
5214,166981,AAA Batteries (4-pack),1,2.99,03/31/19 01:40,"557 Wilson St, Dallas, TX 75001"
5215,166981,AAA Batteries (4-pack),1,2.99,03/31/19 01:40,"557 Wilson St, Dallas, TX 75001"


In [67]:
# Order ID --- 167429

mar_data[mar_data["Order ID"] == "167429"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
5684,167429,Lightning Charging Cable,1,14.95,03/27/19 05:05,"430 Lake St, San Francisco, CA 94016"
5685,167429,Lightning Charging Cable,1,14.95,03/27/19 05:05,"430 Lake St, San Francisco, CA 94016"


In [68]:
# Order ID --- 167654

mar_data[mar_data["Order ID"] == "167654"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
5922,167654,27in FHD Monitor,1,149.99,03/29/19 15:10,"654 5th St, Portland, OR 97035"
5923,167654,27in FHD Monitor,1,149.99,03/29/19 15:10,"654 5th St, Portland, OR 97035"


In [69]:
# Order ID --- 168724

mar_data[mar_data["Order ID"] == "168724"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
7032,168724,Apple Airpods Headphones,1,150,03/13/19 11:25,"552 Park St, Los Angeles, CA 90001"
7033,168724,Apple Airpods Headphones,1,150,03/13/19 11:25,"552 Park St, Los Angeles, CA 90001"


In [70]:
# Order ID --- 168777

mar_data[mar_data["Order ID"] == "168777"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
7086,168777,iPhone,1,700.0,03/07/19 14:55,"247 Pine St, San Francisco, CA 94016"
7087,168777,Lightning Charging Cable,1,14.95,03/07/19 14:55,"247 Pine St, San Francisco, CA 94016"
7088,168777,Lightning Charging Cable,1,14.95,03/07/19 14:55,"247 Pine St, San Francisco, CA 94016"


In [71]:
# Order ID --- 168888

mar_data[mar_data["Order ID"] == "168888"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
7202,168888,AA Batteries (4-pack),1,3.84,03/18/19 14:26,"815 Hill St, Los Angeles, CA 90001"
7203,168888,AA Batteries (4-pack),1,3.84,03/18/19 14:26,"815 Hill St, Los Angeles, CA 90001"


In [72]:
# Order ID --- 169600

mar_data[mar_data["Order ID"] == "169600"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
7949,169600,Wired Headphones,1,11.99,03/10/19 11:12,"839 Cedar St, New York City, NY 10001"
7950,169600,Wired Headphones,1,11.99,03/10/19 11:12,"839 Cedar St, New York City, NY 10001"


In [73]:
# Order ID --- 170109

mar_data[mar_data["Order ID"] == "170109"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
8486,170109,Apple Airpods Headphones,1,150,03/16/19 13:35,"462 Meadow St, Seattle, WA 98101"
8487,170109,Apple Airpods Headphones,1,150,03/16/19 13:35,"462 Meadow St, Seattle, WA 98101"


In [74]:
# Order ID --- 171322

mar_data[mar_data["Order ID"] == "171322"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
9748,171322,20in Monitor,1,109.99,03/15/19 13:45,"357 Meadow St, Portland, ME 04101"
9749,171322,20in Monitor,1,109.99,03/15/19 13:45,"357 Meadow St, Portland, ME 04101"


In [75]:
# Order ID --- 172155

mar_data[mar_data["Order ID"] == "172155"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
10612,172155,USB-C Charging Cable,1,11.95,03/11/19 23:08,"712 1st St, New York City, NY 10001"
10613,172155,USB-C Charging Cable,1,11.95,03/11/19 23:08,"712 1st St, New York City, NY 10001"


In [77]:
# Order ID --- 173388

mar_data[mar_data["Order ID"] == "173388"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
11916,173388,Bose SoundSport Headphones,1,99.99,03/21/19 15:31,"96 Cherry St, San Francisco, CA 94016"
11917,173388,Bose SoundSport Headphones,1,99.99,03/21/19 15:31,"96 Cherry St, San Francisco, CA 94016"


In [78]:
# Order ID --- 174691

mar_data[mar_data["Order ID"] == "174691"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
13282,174691,Apple Airpods Headphones,1,150,03/17/19 13:22,"672 Highland St, Seattle, WA 98101"
13283,174691,Apple Airpods Headphones,1,150,03/17/19 13:22,"672 Highland St, Seattle, WA 98101"


In [79]:
# Order ID --- 174972

mar_data[mar_data["Order ID"] == "174972"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
13575,174972,USB-C Charging Cable,1,11.95,03/26/19 23:02,"389 10th St, New York City, NY 10001"
13576,174972,USB-C Charging Cable,1,11.95,03/26/19 23:02,"389 10th St, New York City, NY 10001"


In [80]:
# Order ID --- 176537

mar_data[mar_data["Order ID"] == "176537"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
15203,176537,Apple Airpods Headphones,1,150,03/12/19 07:33,"80 Church St, Austin, TX 73301"
15204,176537,Apple Airpods Headphones,1,150,03/12/19 07:33,"80 Church St, Austin, TX 73301"


From the duplicates investigations, it can be realised that each duplicated entries had same values. One instance of each duplicated entry will be dropped

In [82]:
# Drop duplicated entries

mar_data.drop_duplicates(subset=None, keep="first", inplace=True)

### April Product Sales

In [83]:
# Checking the head of the April data

apr_data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,176558.0,USB-C Charging Cable,2.0,11.95,04/19/19 08:46,"917 1st St, Dallas, TX 75001"
1,,,,,,
2,176559.0,Bose SoundSport Headphones,1.0,99.99,04/07/19 22:30,"682 Chestnut St, Boston, MA 02215"
3,176560.0,Google Phone,1.0,600.0,04/12/19 14:38,"669 Spruce St, Los Angeles, CA 90001"
4,176560.0,Wired Headphones,1.0,11.99,04/12/19 14:38,"669 Spruce St, Los Angeles, CA 90001"


In [84]:
# Checking the shape of the data

apr_data.shape

(18383, 6)

The data has 18383 rows and 6 columns

In [85]:
# Checking the basic info of the data

apr_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18383 entries, 0 to 18382
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Order ID          18324 non-null  object
 1   Product           18324 non-null  object
 2   Quantity Ordered  18324 non-null  object
 3   Price Each        18324 non-null  object
 4   Order Date        18324 non-null  object
 5   Purchase Address  18324 non-null  object
dtypes: object(6)
memory usage: 861.8+ KB


There are null values in the data. The date is also in bject format.

In [86]:
# Checking the sum of the null values in the data

apr_data.isnull().sum()

Order ID            59
Product             59
Quantity Ordered    59
Price Each          59
Order Date          59
Purchase Address    59
dtype: int64

There are 59 instances of null values in the data.

In [88]:
# Drop null values from the data

apr_data.dropna(how="all", axis=0, inplace=True)

In [90]:
# Check for duplicates in the data

apr_data.duplicated().any()

True

In [91]:
# Check the sun of duplicated values in the data

apr_data.duplicated().sum()

56

There are 56 instances of duplicates in the data

In [95]:
# Check for the occurrence of the duplicates

apr_data[apr_data.duplicated()]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
31,176585,Bose SoundSport Headphones,1,99.99,04/07/19 11:31,"823 Highland St, Boston, MA 02215"
1149,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1155,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1302,177795,Apple Airpods Headphones,1,150,04/27/19 19:45,"740 14th St, Seattle, WA 98101"
1684,178158,USB-C Charging Cable,1,11.95,04/28/19 21:13,"197 Center St, San Francisco, CA 94016"
2878,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
2893,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3036,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3209,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3618,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address


In [99]:
# Remove "Order ID" from Order ID column

apr_data = apr_data[apr_data["Order ID"] != "Order ID"]

In [100]:
# Investigate  the issue of duplicated values



Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
31,176585,Bose SoundSport Headphones,1,99.99,04/07/19 11:31,"823 Highland St, Boston, MA 02215"
1302,177795,Apple Airpods Headphones,1,150.0,04/27/19 19:45,"740 14th St, Seattle, WA 98101"
1684,178158,USB-C Charging Cable,1,11.95,04/28/19 21:13,"197 Center St, San Francisco, CA 94016"
3805,180207,Apple Airpods Headphones,1,150.0,04/13/19 01:46,"196 7th St, Los Angeles, CA 90001"
4196,180576,Lightning Charging Cable,1,14.95,04/18/19 17:23,"431 Park St, Dallas, TX 75001"
4376,180746,AAA Batteries (4-pack),1,2.99,04/30/19 12:05,"398 West St, New York City, NY 10001"
4450,180817,Lightning Charging Cable,1,14.95,04/13/19 18:33,"563 13th St, Los Angeles, CA 90001"
4734,181084,Flatscreen TV,1,300.0,04/07/19 21:36,"1 Walnut St, Boston, MA 02215"
4906,181246,Wired Headphones,1,11.99,04/09/19 13:58,"342 Hickory St, San Francisco, CA 94016"
5773,182077,AAA Batteries (4-pack),1,2.99,04/13/19 22:08,"730 4th St, New York City, NY 10001"


In [102]:
# Order ID --- 176585

apr_data[apr_data["Order ID"] == "176585"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
30,176585,Bose SoundSport Headphones,1,99.99,04/07/19 11:31,"823 Highland St, Boston, MA 02215"
31,176585,Bose SoundSport Headphones,1,99.99,04/07/19 11:31,"823 Highland St, Boston, MA 02215"


In [103]:
# Order ID --- 177795

apr_data[apr_data["Order ID"] == "177795"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1301,177795,Apple Airpods Headphones,1,150,04/27/19 19:45,"740 14th St, Seattle, WA 98101"
1302,177795,Apple Airpods Headphones,1,150,04/27/19 19:45,"740 14th St, Seattle, WA 98101"


In [104]:
# Order ID --- 178158

apr_data[apr_data["Order ID"] == "178158"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1681,178158,Google Phone,1,600.0,04/28/19 21:13,"197 Center St, San Francisco, CA 94016"
1682,178158,USB-C Charging Cable,1,11.95,04/28/19 21:13,"197 Center St, San Francisco, CA 94016"
1683,178158,Wired Headphones,1,11.99,04/28/19 21:13,"197 Center St, San Francisco, CA 94016"
1684,178158,USB-C Charging Cable,1,11.95,04/28/19 21:13,"197 Center St, San Francisco, CA 94016"


In [105]:
# Order ID --- 180207

apr_data[apr_data["Order ID"] == "180207"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
3804,180207,Apple Airpods Headphones,1,150,04/13/19 01:46,"196 7th St, Los Angeles, CA 90001"
3805,180207,Apple Airpods Headphones,1,150,04/13/19 01:46,"196 7th St, Los Angeles, CA 90001"


In [106]:
# Order ID --- 182077

apr_data[apr_data["Order ID"] == "182077"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
5772,182077,AAA Batteries (4-pack),1,2.99,04/13/19 22:08,"730 4th St, New York City, NY 10001"
5773,182077,AAA Batteries (4-pack),1,2.99,04/13/19 22:08,"730 4th St, New York City, NY 10001"


In [107]:
# Order ID --- 184717

apr_data[apr_data["Order ID"] == "184717"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
8539,184717,USB-C Charging Cable,1,11.95,04/04/19 10:17,"439 Forest St, Atlanta, GA 30301"
8540,184717,USB-C Charging Cable,1,11.95,04/04/19 10:17,"439 Forest St, Atlanta, GA 30301"


In [108]:
# Order ID --- 190553

apr_data[apr_data["Order ID"] == "190553"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
14676,190553,Lightning Charging Cable,1,14.95,04/10/19 17:38,"548 Madison St, New York City, NY 10001"
14677,190553,Lightning Charging Cable,1,14.95,04/10/19 17:38,"548 Madison St, New York City, NY 10001"


In [109]:
# Order ID --- 192939

apr_data[apr_data["Order ID"] == "192939"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
17163,192939,34in Ultrawide Monitor,1,379.99,04/29/19 21:07,"519 Adams St, Seattle, WA 98101"
17164,192939,34in Ultrawide Monitor,1,379.99,04/29/19 21:07,"519 Adams St, Seattle, WA 98101"


In [110]:
# Order ID --- 193916

apr_data[apr_data["Order ID"] == "193916"]

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
18193,193916,20in Monitor,1,109.99,04/18/19 12:59,"653 Cherry St, Dallas, TX 75001"
18194,193916,20in Monitor,1,109.99,04/18/19 12:59,"653 Cherry St, Dallas, TX 75001"


From the duplicates investigations, it can be realised that each duplicated entries had same values. One instance of each duplicated entry will be dropped

In [118]:
# Drop duplicated entries from data

apr_data.drop_duplicates(subset=None, keep="first", inplace=True)

### May Product Sales

In [119]:
# Checking sample of the May Product sales

may_data.sample(5, random_state=0)

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
10846,204397,USB-C Charging Cable,1,11.95,05/13/19 18:41,"997 8th St, Dallas, TX 75001"
11863,205375,Lightning Charging Cable,1,14.95,05/15/19 13:10,"72 10th St, Dallas, TX 75001"
8503,202176,Lightning Charging Cable,1,14.95,05/17/19 22:04,"244 Park St, New York City, NY 10001"
9021,202664,AA Batteries (4-pack),1,3.84,05/24/19 13:29,"27 11th St, Atlanta, GA 30301"
11810,205323,Apple Airpods Headphones,1,150.0,05/07/19 14:17,"250 Maple St, Boston, MA 02215"


In [120]:
# Checking the shape of the data

may_data.shape

(16635, 6)

The May Product data has 16635 rows and 6 columns

In [121]:
# Checking the basic info of the data

may_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16635 entries, 0 to 16634
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Order ID          16587 non-null  object
 1   Product           16587 non-null  object
 2   Quantity Ordered  16587 non-null  object
 3   Price Each        16587 non-null  object
 4   Order Date        16587 non-null  object
 5   Purchase Address  16587 non-null  object
dtypes: object(6)
memory usage: 779.9+ KB
