# **<font color = '#3498eb'>SECTOR & COMPANY</font>**

Extreme Sports House is a mountain and adventure distribution company. 

The outdoor apparel and gear sector, which encompasses mountain and adventure clothing and objects, has seen significant growth over the years. According to reports by the Outdoor Industry Association, consumer spending on outdoor recreation totaled $887 billion in 2017. This growth has been fueled by an increasing consumer focus on health and well-being, as well as a surge in adventure tourism. Various trends, such as the rising popularity of sustainable and eco-friendly products, are shaping the industry.

# **<font color = '#3498eb'>OBJETIVES</font>**

The company has enlisted me to:

· Extract maximum value from their existing data.

· Identify the most profitable channels.

· Determine who their most valuable clients are.

· Develop strategies to increase profit margins.

· Review the product portfolio to cut costs.

· Optimize customer management processes.

· Build a SQL-based recommendation system to increase cross-selling opportunities.

# **<font color = '#3498eb'>LIBRARIES</font>**

In [1]:
import pandas as pd
import pymysql
import tabulate 

# **<font color = '#3498eb'>DATA LOADING AND UNDERSTANDING</font>**

## GENERAL VIEW

In [2]:
eshdb_conn = pymysql.connect(host='localhost', user='root', password='b8SO0ibY5claaghlgiehgld', database='extremesportshouse')

eshdb_cursor = eshdb_conn.cursor()

eshdb_cursor.execute("SELECT DATABASE();")
print("Currently using the database:", eshdb_cursor.fetchone()[0])

Currently using the database: extremesportshouse


I review the tables of the database and then review each table.

In [13]:
eshdb_cursor.execute("show tables from extremesportshouse;")
print(tabulate.tabulate([list(eachtuple) for eachtuple in eshdb_cursor.fetchall()]))

--------
channels
products
sales
stores
--------


In [29]:
eshdb_cursor.execute("select * from channels;")
print(tabulate.tabulate([list(eachtuple) for eachtuple in eshdb_cursor.fetchall()]))

--  -----------
 1  Fax
 2  Telephone
 3  Mail
 4  E-mail
 5  Web
 6  Sales visit
 7  Special
 8  Other
 9  Other
10  Other
11  Other
12  Other
--  -----------


In [19]:
eshdb_cursor.execute("select * from products;")
print(tabulate.tabulate([list(row) for row in eshdb_cursor.fetchall()[:10]], headers=[desc[0] for desc in eshdb_cursor.description]))

  product_id  line               type          product                    brand      color          cost    price
------------  -----------------  ------------  -------------------------  ---------  -----------  ------  -------
        1110  Camping Equipment  Cooking Gear  TrailChef Water Bag        TrailChef  Clear          2.77     6.59
        2110  Camping Equipment  Cooking Gear  TrailChef Canteen          TrailChef  Brown          6.92    12.92
        3110  Camping Equipment  Cooking Gear  TrailChef Kitchen Kit      TrailChef  Unspecified   15.78    23.8
        4110  Camping Equipment  Cooking Gear  TrailChef Cup              TrailChef  Silver         0.85     3.66
        5110  Camping Equipment  Cooking Gear  TrailChef Cook Set         TrailChef  Silver        34.41    54.93
        6110  Camping Equipment  Cooking Gear  TrailChef Deluxe Cook Set  TrailChef  Silver        78.72   129.72
        7110  Camping Equipment  Cooking Gear  TrailChef Single Flame     TrailChef  Silv

In [3]:
eshdb_cursor.execute("select * from sales;")
print(tabulate.tabulate([list(row) for row in eshdb_cursor.fetchall()[:10]], headers=[desc[0] for desc in eshdb_cursor.description]))

  store_id    product_id    channel_id  date          quantity    official_price    offer_price
----------  ------------  ------------  ----------  ----------  ----------------  -------------
      1201        109110             4  12/01/2015         648             76.86          71.48
      1201        112110             4  12/01/2015         799             10.64          10.21
      1201        115110             4  12/01/2015         755             10.71          10.28
      1205         70240             3  12/01/2015          70            122.7          114.11
      1205         71110             3  12/01/2015          28             95.62          92.75
      1215         73110             2  12/01/2015        3992             12.78          11.89
      1215         83110             2  12/01/2015         156             96.44          89.69
      1215         86110             2  12/01/2015        2615              6              5.58
      1215         93110             2  

In [31]:
eshdb_cursor.execute("select * from stores;")
print(tabulate.tabulate([list(row) for row in eshdb_cursor.fetchall()[:10]], headers=[desc[0] for desc in eshdb_cursor.description]))

  store_id  store_name            type                    country
----------  --------------------  ----------------------  -----------
      1101  ActiForme             Equipment Rental Store  France
      1115  SportsClub            Golf Shop               France
      1123  Anapurna              Direct Marketing        France
      1132  Cordages Discount     Warehouse Store         France
      1133  Altitudes extrÃªmes   Outdoors Shop           France
      1134  Optique et Lentilles  Eyewear Store           France
      1135  Camping Sauvage       Outdoors Shop           France
      1137  Grand choix           Department Store        Switzerland
      1144  Die Fitness-Experten  Direct Marketing        Germany
      1147  Der Fitness-Doktor    Sports Store            Germany


## SALES

The company ask me these tasks

### <u>Review the data types</u>

In [4]:
# Let's see data types

eshdb_cursor.execute("describe sales;")
print(tabulate.tabulate([list(row) for row in eshdb_cursor.fetchall()], headers=[desc[0] for desc in eshdb_cursor.description]))

Field           Type    Null    Key    Default    Extra
--------------  ------  ------  -----  ---------  -------
store_id        int     YES     MUL
product_id      int     YES     MUL
channel_id      int     YES     MUL
date            text    YES
quantity        int     YES
official_price  double  YES
offer_price     double  YES


In [28]:
# Change the date type and also change to an european format

eshdb_cursor.execute("UPDATE sales SET date=DATE_FORMAT(date, '%d/%m/%Y');")
print(tabulate.tabulate([list(row) for row in eshdb_cursor.fetchall()], headers=[desc[0] for desc in eshdb_cursor.description]))


OperationalError: (1292, "Incorrect datetime value: '12/01/2015'")

### <u>Review the granularity of sales table</u>

In [16]:
# The company ask me to check if the table is in this level -> store-product-channel-date

eshdb_cursor.execute("""
    select count(*) as repeated_reg
    from sales
    group by store_id, product_id, channel_id, date
    having repeated_reg > 1
    order by repeated_reg desc
    limit 20;
    """)

for each_tuple in eshdb_cursor:
    print(each_tuple[0])

5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4


In [17]:
eshdb_cursor.execute("""
    select *, count(*) as repeated_reg
    from sales
    group by store_id, product_id, channel_id, date
    having repeated_reg > 1
    order by store_id, product_id, channel_id, date
    limit 20;
    """)

print(tabulate.tabulate([list(row) for row in eshdb_cursor.fetchall()], headers=[desc[0] for desc in eshdb_cursor.description]))

  store_id    product_id    channel_id  date          quantity    official_price    offer_price    repeated_reg
----------  ------------  ------------  ----------  ----------  ----------------  -------------  --------------
      1115        127110             5  22/12/2016         203             20.15          20.15               2
      1115        127130             5  13/07/2018         133             21.25          21.25               2
      1115        127130             5  22/12/2016         205             20.15          20.15               2
      1115        129130             5  22/12/2016          19            220            220                  2
      1115        130110             5  22/12/2016          24            167.2          167.2                2
      1115        130130             2  16/01/2015           4            172            172                  2
      1115        132120             2  18/10/2015           3             80             80            

In [21]:
# An example of duplicated record

eshdb_cursor.execute("""
    select *
    from sales
    where store_id=1115 and product_id=127110 and channel_id=5 and date='22/12/2016';
    """)

print(tabulate.tabulate([list(row) for row in eshdb_cursor.fetchall()], headers=[desc[0] for desc in eshdb_cursor.description]))

  store_id    product_id    channel_id  date          quantity    official_price    offer_price
----------  ------------  ------------  ----------  ----------  ----------------  -------------
      1115        127110             5  22/12/2016         203             20.15          20.15
      1115        127110             5  22/12/2016         271             20.15          20.15


We can see here that quantity value is different.

To have the table in the level required I'll agregate it by store, product, channel and date and I'll apply an aggregation function to the rest of variables, more concretly, the sum in qunatity and the average in official price and offer price.

### <u>Insert a new variable</u>

With the total revenue of every sale


# **<font color = '#3498eb'>DATA QUALITY</font>**