<a href="https://colab.research.google.com/github/rahul-tc/Data-Analysis-Project/blob/main/Rahul_Yellow_Mart_Sales_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Yellow Mart Sales Data

This dataset contains information about sales orders, including details such as the order number, region, state, city, customer names, order dates, product categories, subcategories, costs, sales amounts, and profit margins.To understand the dataset, you have been provided with the following data dictionary:

| Column Name   | Description                                                          | ExampleColumn |
|---------------|----------------------------------------------------------------------|---------------|
| OrderNumber   | Unique identifier for each sales order.                               | OD00125 |
| Region        | Geographic region where the order was placed.                         | West |
| State         | State within the region where the order was placed.                   |Gujarat |
| City          | City where the order was placed.                                      | Ahmedabad |
| FirstName     | First name of the salesperson.                          | chiranjit |
| LastName      | Last name of the salesperson.                           | Ghosh |
| Order Date    | Date when the order was placed.                                       | 11/8/2020 |
| Category      | General category of the product (e.g., Electronics, Clothing).        | Electronics |
| SubCategory   | Specific subcategory of the product (e.g., Gaming Consoles, Dresses). | Gaming Consoles |
| Cost          | Cost incurred for the product.                                         | 12000 |
| Sales         | Total sales amount generated from the order.                           | 15000 |
| Profit        | Profit generated from the order (Sales - Cost).                        | 3000 |



To create a database and import the sqlite 3 run the below code

In [1]:
import pandas as pd
import sqlite3
import requests
from google.colab import files

# URL of the CSV file on GitHub
url = 'https://raw.githubusercontent.com/Invact-Abhay/DOE/main/Yellow%20Mart%20Sales%20Analysis.csv'

# Download the CSV file
response = requests.get(url)
with open('ylwmart.csv', 'wb') as file:
    file.write(response.content)

# Load the CSV file into a pandas DataFrame
data = pd.read_csv('ylwmart.csv')

# Create a SQLite database (or connect to an existing one)
conn = sqlite3.connect('ylwmart.db')

# Load the DataFrame into the SQLite database
data.to_sql('ylwmart', conn, if_exists='replace', index=False)


196

**Task 1:**

Retrieve the sales data from ylwmart using SQL query.

In [2]:
pd.read_sql_query('SELECT * FROM ylwmart', conn)

Unnamed: 0,OrderID,Region,State,City,FirstName,LastName,OrderDate,Category,SubCategory,Cost,Sales,Profit
0,OD00125,West,Gujarat,Ahmedabad,chiranjit_,Ghosh,11/8/2020,Electronics,Gaming Consoles,12000,15000,3000
1,OD00126,Central,Madhya Pradesh,Jabalpur,nandini,sharma,11/8/2020,Clothing,Dresses,3000,5000,2000
2,OD0012,South,Kerala,Kochi,Kartik,Soren,6/12/2020,Furniture,Bookcases,5600,8000,2400
3,OD00128,North,Rajasthan,Udaipur,Divya_,Mahto,10/11/2018,Office Supplies,Stationery,1800,2000,200
4,OD00129,East,Bihar,Patna,Indrajit,Sharma,10/11/2018,Appliances,Kitchen Appliances,2400,3000,600
...,...,...,...,...,...,...,...,...,...,...,...,...
191,OD00316,South,Telangana,Ramagundam,Xenia,Chakrabarti,10/12/2018,Appliances,Kitchen Appliances,15300,18000,2700
192,OD00317,South,Telangana,Ramagundam,Oishi,DeY_,10/12/2018,Electronics,Cameras,10500,14000,3500
193,OD0031,South,Karnataka,Belagavi,Firoz,Hussain,10/12/2018,Fitness,Exercise Equipment,7700,11000,3300
194,OD00319,South,Tamil Nadu,Tirunelveli,Trisha,Murmu,10/31/2018,Home Decor,Wall Decor,15750,21000,5250


**Task 2:**

Retrieve the full name of the salesperson in proper case and their respective company email ID.

*   Your output should include the following headers: 'fullname' and 'emailID' respectively.
*   Replace '_' with '' (no space) from first name and last name if any.

*   To create email ID, lower case the first name and lastname and concat it with '.' and '@yellowmart.com'.

*   Email ID format: lowercase(firstname.lastname)@yellowmart.com








      

In [10]:
pd.read_sql_query("""SELECT
                     UPPER(SUBSTR(REPLACE(FirstName, '_', ''), 1, 1)) || SUBSTR(REPLACE(FirstName, '_', ''), 2) || ' ' ||
                     UPPER(SUBSTR(REPLACE(LastName, '_', ''), 1, 1)) || SUBSTR(REPLACE(LastName, '_', ''), 2) AS fullname,
                     LOWER(REPLACE(FirstName, '_', '')) || '.' || LOWER(REPLACE(LastName, '_', '')) || '@yellowmart.com' AS emailID
                      FROM ylwmart;
                      """, conn)

Unnamed: 0,fullname,emailID
0,Chiranjit Ghosh,chiranjit.ghosh@yellowmart.com
1,Nandini Sharma,nandini.sharma@yellowmart.com
2,Kartik Soren,kartik.soren@yellowmart.com
3,Divya Mahto,divya.mahto@yellowmart.com
4,Indrajit Sharma,indrajit.sharma@yellowmart.com
...,...,...
191,Xenia Chakrabarti,xenia.chakrabarti@yellowmart.com
192,Oishi DeY,oishi.dey@yellowmart.com
193,Firoz Hussain,firoz.hussain@yellowmart.com
194,Trisha Murmu,trisha.murmu@yellowmart.com


**Task 3**

Retrieve the full name of salesperson in proper case along with the region, order year, sales and profit from ylwmart for the order year 2020.

Note:-

*   Replace '_', with '' no space from first name and last name if any and concat the first name and last name with space in between. Full Name in  your output should be in proper format. (eg: Vishal Sharma)
*   Remove extra spaces if any in the Region column
*   Extract order year from order date using SUBSTR funtion and name this column as order year.




In [21]:
pd.read_sql_query("""
SELECT
    -- Format full name: remove underscores and apply proper casing
    UPPER(SUBSTR(REPLACE(FirstName, '_', ''), 1, 1)) ||
    LOWER(SUBSTR(REPLACE(FirstName, '_', ''), 2)) || ' ' ||
    UPPER(SUBSTR(REPLACE(LastName, '_', ''), 1, 1)) ||
    LOWER(SUBSTR(REPLACE(LastName, '_', ''), 2)) AS fullname,

    -- Trim extra spaces from Region
    TRIM(Region) AS Region,

    -- Extract order year using SUBSTR
    SUBSTR(`OrderDate`, -4) AS `Order Year`,  -- Changed `OrderDate` to `Order_Date`

    -- Select sales and profit
    Sales,
    Profit

FROM ylwmart
WHERE SUBSTR(`OrderDate`, -4) = '2020';  -- Changed `Order Date` to `Order_Date`
""", conn)

Unnamed: 0,fullname,Region,Order Year,Sales,Profit
0,Chiranjit Ghosh,West,2020,15000,3000
1,Nandini Sharma,Central,2020,5000,2000
2,Kartik Soren,South,2020,8000,2400
3,Sampurna Das,North,2020,36000,9000
4,Ranjit Barman,West,2020,25000,6250
...,...,...,...,...,...
63,Xitij Borah,East,2020,39000,5850
64,Nandita Roy,East,2020,10000,3000
65,Anirudh Tripathi,East,2020,14000,4900
66,Yashika Das,North,2020,6000,900


**Task 4**

Retrieve the order ID and length of the Order ID from ylwmart using SQL Query. Name the length of order id column as len.


In [23]:
pd.read_sql_query("""
SELECT OrderID, LENGTH(OrderID) AS len
FROM ylwmart;
""", conn)

Unnamed: 0,OrderID,len
0,OD00125,7
1,OD00126,7
2,OD0012,6
3,OD00128,7
4,OD00129,7
...,...,...
191,OD00316,7
192,OD00317,7
193,OD0031,6
194,OD00319,7


**Task 5:**

Prepare a list of OrderID whose length is 6. The list should include OrderID, length of OrderId and Subcategory. Name the length of OrderID column as len.


In [25]:
pd.read_sql_query( """ Select OrderID, LENGTH(OrderID) AS len
FROM ylwmart
WHERE LENGTH(OrderID) = 6;
""", conn)

Unnamed: 0,OrderID,len
0,OD0012,6
1,OD0038,6
2,OD0014,6
3,OD0151,6
4,OD0017,6
5,OD0019,6
6,OD0022,6
7,OD0025,6
8,OD0028,6
9,OD0030,6


**Task 6:**

Retrieve the Subcategory, Category, Sales, Profit for the Order ID whose length is 6.

Note: Remove the extra space in category if any using trim function and name the column as Category.

In [26]:
pd.read_sql_query("""
SELECT
    TRIM(SubCategory) AS SubCategory,
    TRIM(Category) AS Category,
    Sales,
    Profit
FROM ylwmart
WHERE LENGTH(OrderID) = 6;
""", conn)

Unnamed: 0,SubCategory,Category,Sales,Profit
0,Bookcases,Furniture,8000,2400
1,Coffee Machines,Appliances,36000,9000
2,Kitchen Appliances,Appliances,7000,2450
3,Wearable Technology,Electronics,26000,3900
4,Fasteners,Office Supplies,4000,600
5,Camping Gear,Outdoor,7000,2450
6,Desks,Furniture,30000,4500
7,Fasteners,Office Supplies,28000,8400
8,Jackets,Clothing,26000,6500
9,Clocks,Home Decor,35000,5250


**Task 7:**

Retrieve the list of subcategories and their respective categories, with the  subcategories that begin with the letter 'O'.

Note: Remove any extra spaces in category column using trim function and name the column as Category.


In [27]:
pd.read_sql_query("""
SELECT
    TRIM(SubCategory) AS SubCategory,
    TRIM(Category) AS Category
FROM ylwmart
WHERE SubCategory LIKE 'O%';
""", conn)

Unnamed: 0,SubCategory,Category
0,Ottomans,Furniture
1,Optics,Outdoor
2,Optics,Outdoor
3,Outerwear,Clothing
4,Optics,Outdoor
5,Outerwear,Clothing
6,Optics,Outdoor
7,Outerwear,Clothing
8,Outerwear,Clothing
9,Optics,Outdoor


**Task 8**

Retrieve the category alog with their total cost, total sales and total profit for each category  and name the column as Category, TotalCost, TotalSales and TotalProfit respectively.

Note:
*   Remove any extra spaces if any in categories using trim function and name the column as Category.



In [28]:
pd.read_sql_query("""
SELECT
    TRIM(Category) AS Category,
    SUM(Cost) AS TotalCost,
    SUM(Sales) AS TotalSales,
    SUM(Profit) AS TotalProfit
FROM ylwmart
GROUP BY Category;
""", conn)

Unnamed: 0,Category,TotalCost,TotalSales,TotalProfit
0,Appliances,7700,11000,3300
1,Books,9000,12000,3000
2,Clothing,8500,10000,1500
3,Home Decor,16800,24000,7200
4,Office Supplies,17500,25000,7500
5,Accessories,32300,38000,5700
6,Electronics,12000,15000,3000
7,Office Supplies,1800,2000,200
8,Furniture,23400,36000,12600
9,Home Decor,32750,44000,11250


**Task 9**

Retrieve the Order year, category, total sales and total profit for each category and for each year (include year after 2018). Name the columns as OrderYear, Category, TotalSales and TotalProfit respectively.
Note:-

*   Retrieve order year from order date using SUBSTR fucntion
*   Remove the extra spaces from categrory if any using trim function.




In [29]:
pd.read_sql_query("""
SELECT SUBSTR(OrderDate, -4) AS OrderYear,
       TRIM(Category) AS Category,
       SUM(Sales) AS TotalSales,
       SUM(Profit) AS TotalProfit
FROM ylwmart
WHERE SUBSTR(OrderDate, -4) > '2018'
GROUP BY OrderYear, Category;
""", conn)

Unnamed: 0,OrderYear,Category,TotalSales,TotalProfit
0,2019,Furniture,36000,12600
1,2019,Home Decor,5000,1500
2,2019,Accessories,113000,20100
3,2019,Appliances,92000,23050
4,2019,Books,53000,17350
5,2019,Clothing,58000,17200
6,2019,Electronics,195000,44350
7,2019,Fitness,58000,17000
8,2019,Furniture,43000,12550
9,2019,Home Decor,91000,21000


**Task 10**

Retrieve max sales amount for each year.Name the column as OrderYear and MaxSale respectively. Order the list by Order Year in ascending order.

Note:

*   Retrieve the order year from order date by using SUBSTR function.








In [30]:
pd.read_sql_query("""
SELECT SUBSTR(OrderDate, -4) AS OrderYear,
       MAX(Sales) AS MaxSale
FROM ylwmart
GROUP BY OrderYear
ORDER BY OrderYear ASC;
""", conn)

Unnamed: 0,OrderYear,MaxSale
0,2017,40000
1,2018,40000
2,2019,39000
3,2020,39000


**Task 11:**

Retrieve the category and number of orders received for the year 2020 for each category, Name the column as UniqueCategory and OrderCount respectively.

Note:

*   Remove the extra spaces if any in the categroy using trim function.
*   All Category should be Unique.

*   Retrieve the order year from order date using substr function.







In [31]:
pd.read_sql_query("""
SELECT
    TRIM(Category) AS UniqueCategory,
    COUNT(OrderID) AS OrderCount
FROM ylwmart
WHERE SUBSTR(OrderDate, -4) = '2020'
GROUP BY UniqueCategory;
""", conn)

Unnamed: 0,UniqueCategory,OrderCount
0,Accessories,6
1,Appliances,12
2,Clothing,7
3,Electronics,11
4,Fitness,4
5,Furniture,6
6,Home Decor,6
7,Office Supplies,11
8,Outdoor,5


**Task 12:**

Retieve total sales for each year for the region East and West.Your output should include the headers Region ,OrderYear and TotalSales. Sort the list by Region and Order Year in ascending order.

Note:

*   Remove extra spaces in region  if any using trim function.
*   Retrieve the order year from order date using substr function.




In [32]:
pd.read_sql_query("""
SELECT
    TRIM(Region) AS Region,
    SUBSTR(OrderDate, -4) AS OrderYear,
    SUM(Sales) AS TotalSales
FROM ylwmart
WHERE TRIM(Region) IN ('East', 'West')
GROUP BY Region, OrderYear
ORDER BY Region, OrderYear;
""", conn)

Unnamed: 0,Region,OrderYear,TotalSales
0,East,2017,26000
1,East,2017,73000
2,East,2018,356000
3,East,2018,3000
4,East,2019,185000
5,East,2020,37000
6,East,2020,196000
7,West,2017,137000
8,West,2018,153000
9,West,2019,13000


**Task 13:**

Count the Number of Order Id with the order ID length 6.

In [33]:
pd.read_sql_query("""
SELECT COUNT(OrderID) AS OrderCount
FROM ylwmart
WHERE LENGTH(OrderID) = 6;
""", conn)

Unnamed: 0,OrderCount
0,11


**Task 14:**

Count the number of orders recieved in each year. Your Output should inlcude OrderYear and OrderCount respectively.

In [34]:
pd.read_sql_query("""
SELECT SUBSTR(OrderDate, -4) AS OrderYear,
       COUNT(OrderID) AS OrderCount
FROM ylwmart
GROUP BY OrderYear;
""", conn)

Unnamed: 0,OrderYear,OrderCount
0,2017,30
1,2018,53
2,2019,45
3,2020,68
