# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [1]:
# your code here

# Define the initial API  URL
api_url = 'https://www.zalando.fr/api/catalog/articles?categories=promo-femme&limit=84&offset=84&sort=popularity'

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [2]:
# your code here

# Import libraries
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

In [3]:
# Define a function to get the information of a selected page
def page_selection(k):
    # to select the first page
    offset = (k-1)*84
    api_url = 'https://www.zalando.fr/api/catalog/articles'
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}
    query = {'categories':'promo-femme','limit': 84,'offset':offset,'sort':'popularity'}
    response = requests.get(api_url, headers = headers, params=query)
    return response.json()

In [4]:
# Content of page 1
response = page_selection(1)
response

{'total_count': 181376,
 'pagination': {'page_count': 892, 'current_page': 1, 'per_page': 84},
 'sort': 'popularity',
 'articles': [{'sku': 'NU511A019-Q11',
   'name': 'Sandales - black',
   'price': {'original': '84,99\xa0€',
    'promotional': '59,95\xa0€',
    'has_different_prices': False,
    'has_different_original_prices': False,
    'has_different_promotional_prices': False,
    'has_discount_on_selected_sizes_only': False},
   'sizes': ['37', '38', '39', '40'],
   'url_key': 'inuovo-sandales-black-nu511a019-q11',
   'media': [{'path': 'NU/51/1A/01/9Q/11/NU511A019-Q11@21.jpg',
     'role': 'DEFAULT',
     'packet_shot': False}],
   'brand_name': 'Inuovo',
   'is_premium': False,
   'family_articles': [],
   'flags': [{'key': 'discountRate',
     'value': '-29%',
     'tracking_value': 'discount rate'},
    {'key': 'sponsored', 'value': 'Sponsorisé', 'tracking_value': 'sponsored'},
    {'key': 'csr',
     'value': 'Éco-responsabilité',
     'tracking_value': 'sustainable'}],
   

In [5]:
# Flatten data
goods = json_normalize(response['articles'])
goods

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,tracking_information.metrigo_impression_urls,tracking_information.impression_beacon,tracking_information.source
0,NU511A019-Q11,Sandales - black,"[37, 38, 39, 40]",inuovo-sandales-black-nu511a019-q11,[{'path': 'NU/51/1A/01/9Q/11/NU511A019-Q11@21....,Inuovo,False,[],"[{'key': 'discountRate', 'value': '-29%', 'tra...",shoe,[],"84,99 €","59,95 €",False,False,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp
1,NU511A049-O12,Sandales - mntrl coconut ncc,"[37, 38, 39, 40, 41, 42]",inuovo-sandales-mntrl-coconut-ncc-nu511a049-o12,[{'path': 'NU/51/1A/04/9O/12/NU511A049-O12@15....,Inuovo,False,[],"[{'key': 'discountRate', 'value': '-14%', 'tra...",shoe,[],"69,99 €","59,95 €",False,False,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp
2,NU511A034-Q11,Sandales - black blk,"[36, 37, 38, 41]",inuovo-sandales-black-blk-nu511a034-q11,[{'path': 'NU/51/1A/03/4Q/11/NU511A034-Q11@14....,Inuovo,False,[],"[{'key': 'discountRate', 'value': '-13%', 'tra...",shoe,[],"79,99 €","69,95 €",False,False,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp
3,LA251H021-E11,Cabas - black warm sand,[One Size],lacoste-reversible-cabas-la251h021-e11,[{'path': 'LA/25/1H/02/1E/11/LA251H021-E11@15....,Lacoste,False,[],"[{'key': 'discountRate', 'value': '-25%', 'tra...",accessoires,[],"119,95 €","89,95 €",False,False,False,False,,,
4,GU151M00P-F11,LADIES SPORT - Montre - rosegold-coloured,[One Size],guess-montre-rosegold-coloured-gu151m00p-f11,[{'path': 'GU/15/1M/00/PF/11/GU151M00P-F11@6.1...,Guess,False,[],"[{'key': 'discountRate', 'value': '-70%', 'tra...",accessoires,[],"289,95 €","86,95 €",False,False,False,False,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,OS411A03C-A11,ONLSKYE CROC TOE CAP - Baskets basses - white,"[36, 37, 38, 39, 40]",only-shoes-onlskye-croc-toe-cap-baskets-basses...,[{'path': 'OS/41/1A/03/CA/11/OS411A03C-A11@3.j...,ONLY SHOES,False,[],"[{'key': 'discountRate', 'value': '-70%', 'tra...",shoe,[],"37,99 €","11,49 €",False,False,False,False,,,
80,M9121A1XZ-M11,RELAX - Pantalon classique - grün,"[36, 38, 44, 46]",mango-relax-pantalon-classique-m9121a1xz-m11,[{'path': 'M9/12/1A/1X/ZM/11/M9121A1XZ-M11@11....,Mango,False,[],"[{'key': 'discountRate', 'value': '-60%', 'tra...",clothing,[],"39,99 €","15,99 €",False,False,False,False,,,
81,AN611B0B2-Q11,LEATHER PUMPS - Escarpins - black,"[36, 37, 38, 39, 40, 41, 42]",anna-field-escarpins-black-an611b0b2-q11,[{'path': 'AN/61/1B/0B/2Q/11/AN611B0B2-Q11@10....,Anna Field,False,[],"[{'key': 'discountRate', 'value': '-70%', 'tra...",shoe,[],"69,99 €","20,99 €",False,False,False,False,,,
82,GU151H1J5-Q11,DIGITAL - Sac à main - black,[One Size],guess-sac-a-main-black-gu151h1j5-q11,[{'path': 'GU/15/1H/1J/5Q/11/GU151H1J5-Q11@4.j...,Guess,False,[],"[{'key': 'discountRate', 'value': '-50%', 'tra...",accessoires,[],"145,00 €","72,50 €",False,False,False,False,,,


In [6]:
# Get the total number of pages
total_pages = response['pagination']['page_count']
(f'Total pages: {total_pages}')

'Total pages: 892'

In [8]:
# For loop to iterate from page 2 to page count
df = pd.DataFrame()
for page in range(total_pages):
    offset=84*page
    api_url = f'https://www.zalando.fr/api/catalog/articles'
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}
    query = {'categories':'promo-femme','limit': 84,'offset':{offset},'sort':'popularity'}
    response = requests.get(api_url, headers = headers, params=query)
    results = response.json()
    goods = json_normalize(results['articles'])
    goods.set_index('sku', inplace = True)
    df = df.append(goods, sort=False)

df.head()

Unnamed: 0_level_0,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,...,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,tracking_information.metrigo_impression_urls,tracking_information.impression_beacon,tracking_information.source,outfits,amount,price.base_price,condition,condition_key
sku,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
NU511A019-Q11,Sandales - black,"[37, 38, 39, 40]",inuovo-sandales-black-nu511a019-q11,[{'path': 'NU/51/1A/01/9Q/11/NU511A019-Q11@21....,Inuovo,False,[],"[{'key': 'discountRate', 'value': '-29%', 'tra...",shoe,[],...,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp,,,,,
NU511A049-O12,Sandales - mntrl coconut ncc,"[37, 38, 39, 40, 41, 42]",inuovo-sandales-mntrl-coconut-ncc-nu511a049-o12,[{'path': 'NU/51/1A/04/9O/12/NU511A049-O12@15....,Inuovo,False,[],"[{'key': 'discountRate', 'value': '-14%', 'tra...",shoe,[],...,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp,,,,,
NU511A03P-Q11,Espadrilles - black blk,"[36, 37, 38, 39, 40]",inuovo-sandales-a-plateforme-black-blk-nu511a0...,[{'path': 'NU/51/1A/03/PQ/11/NU511A03P-Q11@19....,Inuovo,False,[],"[{'key': 'discountRate', 'value': '-38%', 'tra...",shoe,[],...,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp,,,,,
LA251H021-E11,Cabas - black warm sand,[One Size],lacoste-reversible-cabas-la251h021-e11,[{'path': 'LA/25/1H/02/1E/11/LA251H021-E11@15....,Lacoste,False,[],"[{'key': 'discountRate', 'value': '-25%', 'tra...",accessoires,[],...,False,False,,,,,,,,
GU151M00P-F11,LADIES SPORT - Montre - rosegold-coloured,[One Size],guess-montre-rosegold-coloured-gu151m00p-f11,[{'path': 'GU/15/1M/00/PF/11/GU151M00P-F11@6.1...,Guess,False,[],"[{'key': 'discountRate', 'value': '-70%', 'tra...",accessoires,[],...,False,False,,,,,,,,


In [9]:
df.columns

Index(['name', 'sizes', 'url_key', 'media', 'brand_name', 'is_premium',
       'family_articles', 'flags', 'product_group', 'delivery_promises',
       'price.original', 'price.promotional', 'price.has_different_prices',
       'price.has_different_original_prices',
       'price.has_different_promotional_prices',
       'price.has_discount_on_selected_sizes_only',
       'tracking_information.metrigo_impression_urls',
       'tracking_information.impression_beacon', 'tracking_information.source',
       'outfits', 'amount', 'price.base_price', 'condition', 'condition_key'],
      dtype='object')

## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [10]:
# your code here

# Trending brand
df.brand_name.value_counts().index[0]

'usha'

In [11]:
df.dtypes

name                                            object
sizes                                           object
url_key                                         object
media                                           object
brand_name                                      object
is_premium                                        bool
family_articles                                 object
flags                                           object
product_group                                   object
delivery_promises                               object
price.original                                  object
price.promotional                               object
price.has_different_prices                        bool
price.has_different_original_prices               bool
price.has_different_promotional_prices            bool
price.has_discount_on_selected_sizes_only         bool
tracking_information.metrigo_impression_urls    object
tracking_information.impression_beacon          object
tracking_i

In [12]:
# change format of original price
df['price.original'] = df['price.original'].str.replace('\d\xa0','').str.replace('\d\xa0[0*?]','').str.replace(',','.')
df['price.original'] = df['price.original'].str.replace('€','')
df['price.original']

sku
NU511A019-Q11     84.9
NU511A049-O12     69.9
NU511A03P-Q11     79.9
LA251H021-E11    119.9
GU151M00P-F11    289.9
                 ...  
IC621U00N-I11    189.9
5DE41D02K-A11     44.9
M9121C40L-Q11     39.9
1US21C06A-K11     99.9
4DR21U03E-A11    109.9
Name: price.original, Length: 74928, dtype: object

In [16]:
# change format of original price
df['price.promotional'] = df['price.promotional'].str.replace('\d\xa0','').str.replace('\d\xa0[0*?]','').str.replace(',','.')
df['price.promotional'] = df['price.promotional'].str.replace('€','')
df['price.promotional']

sku
NU511A019-Q11    59.9
NU511A049-O12    59.9
NU511A03P-Q11    49.9
LA251H021-E11    89.9
GU151M00P-F11    86.9
                 ... 
IC621U00N-I11    84.1
5DE41D02K-A11    20.4
M9121C40L-Q11    19.9
1US21C06A-K11    49.9
4DR21U03E-A11    49.9
Name: price.promotional, Length: 74928, dtype: object

In [13]:
df['price.original'] = df['price.original'].astype(float)

In [17]:
df['price.promotional'] = df['price.promotional'].astype(float)

In [18]:
# Create a new column with the discount
df['discount'] = df['price.original'].astype(float) - df['price.promotional'].astype(float)
df['discount']

sku
NU511A019-Q11     25.0
NU511A049-O12     10.0
NU511A03P-Q11     30.0
LA251H021-E11     30.0
GU151M00P-F11    203.0
                 ...  
IC621U00N-I11    105.8
5DE41D02K-A11     24.5
M9121C40L-Q11     20.0
1US21C06A-K11     50.0
4DR21U03E-A11     60.0
Name: discount, Length: 74928, dtype: float64

In [19]:
# The first 10 products with the highest discount
total_discount = df.groupby('brand_name').sum().nlargest(10,'discount')
total_discount

Unnamed: 0_level_0,is_premium,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,discount
brand_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
myMo,0.0,156425.3,77487.0,963.0,0.0,963.0,0.0,78938.3
usha,0.0,170391.3,93468.7,1009.0,0.0,1009.0,0.0,76922.6
DreiMaster,0.0,141977.0,74444.7,952.0,0.0,952.0,0.0,67532.3
Guess,0.0,91916.9,51861.6,73.0,1.0,72.0,0.0,40055.3
Schmuddelwedda,0.0,79067.6,46541.0,388.0,0.0,388.0,0.0,32526.6
Silvio Tossi,0.0,45672.0,13355.1,0.0,0.0,0.0,0.0,32316.9
HUGO,232.0,42201.3,20870.8,59.0,0.0,59.0,0.0,21330.5
IVY & OAK,0.0,49154.2,28175.8,76.0,2.0,74.0,0.0,20978.4
faina,0.0,81334.2,61212.1,135.0,0.0,135.0,0.0,20122.1
MM6 Maison Margiela,103.0,37715.0,18468.3,0.0,0.0,0.0,0.0,19246.7


In [20]:
# The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices) ?
# I don't really understand the question

sum_discounts = df['price.promotional'].sum() / df['price.original'].sum()
sum_discounts

0.6471238292709032