<a href="https://colab.research.google.com/github/stephyi/SQL-DOJO/blob/master/SQL_Data_Science_Dojo_Deliverable.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SQL Basics for Data Science: Data Science Dojo

## Practice Project Deliverable 


Shoprity is a retail company that has operations in several African countries. The company focuses on providing retail services to small businesses as well as individual consumers. Recently the company's management is looking at the prospect of expanding the B2B sales and would like to task you as a Data Scientist to gather insights from data that the company has collected then inform them on how the company can increase B2B sales. 


To get you started you look at the following questions then later provide your insights in an effort to answer the research question.
* Which branch performed had the highest gross income?
* Which branch was the top-rated?
* Which branch was the lowest-rated?
* Should the company spend more costs on advertising to normal clients or clients who are members?
* What type of products should the company focus on increasing sales?
* What type should they should the company focus on reducing marketing costs?
* Should the company invest in their own payments systems if they are outsourcing all payment methods?
* Who should the company target most in advertisements?


## Dataset Information 


You are required to select all the records from the retail dataset using the following credentials, log into the MySQL database accessed via the following URL and credentials.


URL = http://159.89.167.145/phpmyadmin/

username = learner 

password = E*3b8km$dpmRLLuf1Rs$

### Context

The growth of retail supermarkets in most populated cities are increasing and market competitions are also high. The dataset is one of the historical sales of supermarket company which has recorded in 3 different branches for 3 months data. Predictive data analytics methods are easy to apply with this dataset.


Attribute information
* Invoice id: Computer generated sales slip invoice identification number
* Date: Date of purchase (Record available from January 2019 to March 2019)
* Time: Purchase time (10 am to 9 pm)
Branch: Branch of supercenter (3 branches are available identified by a, b and c).
* City: Location of supercenters
* Customer type: Type of customers, recorded by Members for customers using member card and Normal for without member card.
* Gender: Gender type of customer
Product line: General item categorization groups - 
* Electronic accessories, Fashion accessories, Food and beverages, Health and beauty, Home and Lifestyle, Sports and travel
* Unit price: Price of each product in $
* Quantity: Number of products purchased by the customer
* Tax: 5% tax fee for customer buying
* Total: Total price including tax
* Payment: Payment used by the customer for the purchase (3 methods are available – Cash, Debit Card and Mobile money)
* COGS: Cost of goods sold (USD)
* Gross margin percentage: Gross margin percentage
* Gross income: Gross income (USD)
* Rating: Customer stratification rating on their overall shopping experience (On a scale of 1 to 10)


Project Source: [[Link]('https://www.kaggle.com/mahmoudeletrby/supermarket?')]

## Step 1. Importing Required Libraries

In [None]:
# Importing pandas
import pandas as pd

# Loading SQL extension
%load_ext sql

# Connecting sqlite database
%sql sqlite://

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


'Connected: @None'

## Step 2. Importing our Dataset

In [None]:
# Example 
# ---
# We load our first dataset from a csv file as shown and afterwards, 
# we then store the dataset in our in memory sqlite database.
# We first read our dataset from its source and store it in a dataframe called cities.
# From there we then resume to performing our analysis with sql.
# ---
#
retail = pd.read_csv('TABLE_1.csv') 

# Then store it in an SQL table of our in memory sqlite database 
# --- 
# 

# We then delete the table if it exists in our database
# ---
#

%sql DROP TABLE if EXISTS retail;

# And finally store our table in table name cities within our dataset.
# The persist command will create a table in the database to which we are connected, 
# the table name will be the same as dataframe variable.
# ---
#
%sql PERSIST retail;

 * sqlite://
Done.
 * sqlite://


'Persisted retail'

In [None]:
# We can then continue to check the records in our dataset by 
# using the following command. This will check for the first five records.
# ---
#
%%sql
SELECT * FROM retail LIMIT 5;

 * sqlite://
Done.


index,date,time,invoice_id,branch,city,cust_type,gender,type,unit_price,quantity,payment,cost,gross income,rating
0,2019-03-13,19:44,101-17-6199,a,nairobi,normal,male,food and beverages,45.79,7,debit card,320.53,16.0265,7.0
1,2019-01-17,12:36,101-81-4070,c,lagos,member,female,health and beauty,62.82,2,mobile money,125.64,6.282,4.9
2,2019-03-20,17:52,102-06-2002,c,lagos,member,male,sports and travel,25.25,5,cash,126.25,6.3125,6.1
3,2019-03-05,18:02,102-77-2261,c,lagos,member,male,health and beauty,65.31,7,debit card,457.17,22.8585,4.2
4,2019-02-27,12:22,105-10-6182,a,nairobi,member,male,fashion accessories,21.48,2,mobile money,42.96,2.148,6.6


## Step 3. Questions

**1. Which branch had the highest gross income?**

In [None]:
# We use the sum() function and "as" to rename specified colum
# We also wrap the column "gross income" with quotes since it has spaces in between
%%sql 
select branch, sum("gross income") as "gross income"
from retail
group by branch
order by "gross income" desc
limit 5;

 * sqlite://
Done.


branch,gross income
c,5265.1765000000005
a,5057.160500000004
b,5057.032000000009


`Branch A` had the highest gross income of `$ 5,265.17` followed by `Branch B` and `C` with gross income of `$ 5057.16` and `$ 5057.03` respectively.

**2. Which branch was the highest-rated?**

In [None]:
%%sql
SELECT branch,AVG(rating) as average FROM retail
GROUP BY branch
ORDER BY rating DESC;

 * sqlite://
Done.


branch,average
c,7.072865853658538
b,6.8180722891566266
a,7.027058823529416


Branch C was the highest rated with a rating of 7.07, while Branch A and B had a rating of 7.03 and 6.82 respectively.

**3. Which branch was the lowest-rated?**

In [None]:
%%sql
SELECT branch,AVG(rating) as average FROM retail
GROUP BY branch
ORDER BY rating ASC;

 * sqlite://
Done.


branch,average
a,7.027058823529416
b,6.8180722891566266
c,7.072865853658538


Branch C was the highest rated with a rating of 7.07, while Branch A and B had a rating of 7.03 and 6.82 respectively.

**4. Should the company spend more costs on advertising to normal clients or clients who are members?**

We can decide to compare the gross income of both types of clients/customers.

In [None]:
%%sql
SELECT SUM("gross income"),cust_type
FROM retail
GROUP BY cust_type;

 * sqlite://
Done.


"SUM(""gross income"")",cust_type
7820.164000000002,member
7559.205000000009,normal


From our outcome, we can note that the gross income is higher than the normal member gross income, thus resolve that we could target more of clients who are members if the advertisment costs and other factors are held constant.

**5. What type of products should the company focus on increasing marketing sales?**

We can resolve to answer this question by looking looking at the gross income of the different types of products.

In [None]:
# We wrap the keyword type with quotes "" since type is a reserved word
%%sql

SELECT SUM("gross income") as income,"type"
FROM retail
GROUP BY "type"
ORDER BY income DESC;

 * sqlite://
Done.


income,type
2673.5640000000008,food and beverages
2624.8964999999994,sports and travel
2587.5015000000003,electronic accessories
2585.994999999999,fashion accessories
2564.853000000001,home and lifestyle
2342.559,health and beauty


An increase in marketing sales should happen for food and beverages since they contribute the most to the gross income.

**6. What type of products should they should the company focus on reducing marketing costs?**

We decide to investigate the types of products which have the lowest gross income.

In [None]:
# We wrap the keyword type with quotes "" since type is a reserved word
%%sql
SELECT SUM("gross income") as income,"type"
FROM retail
GROUP BY "type"
ORDER BY income ASC;

 * sqlite://
Done.


income,type
2342.559,health and beauty
2564.853000000001,home and lifestyle
2585.994999999999,fashion accessories
2587.5015000000003,electronic accessories
2624.8964999999994,sports and travel
2673.5640000000008,food and beverages


Because health and beauty contribute least, we can look at the possiblity of stopping to reducing marketing costs for those products. However, still that wouldn't be a recommendation because those products still significantly contribute to the gross income of the retail.

## Step 4. Recommendations

The company's management is looking at the prospect of expanding the B2B sales and would like to be informed on the best strategy to increase B2B sales. 

Based on the analysis done, we were able to arrive at the following recommendations:

* **Recommendation 1:** The company should consider optimizing the sales process for `Branch A` as it had the highest gross income of `$ 5,265.17` followed by `Branch B` and `C` with gross income of `$ 5057.16` and `$ 5057.03` respectively. 

* **Recommendation 2:**
, **Recommendation 3**, etc. 
