# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [None]:
# your code here
url = 'https://www.zalando.fr/api/catalog/articles'

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [2]:
# your code here

#Import libraries

import json
import requests
import pandas as pd

In [3]:
# Define a function to get the information for each page

def get_page_info(num):
    offset = (num-1)*84
    page_url = 'https://www.zalando.fr/api/catalog/articles'
    params = {'categories':'promo-femme', 'limit':'84', 'offset':offset, 'sort':'popularity'}
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'}
    response = requests.get(page_url, headers=headers, params=params)
    return response.json()

In [4]:
# Get the information for the first page
response = get_page_info(1)

# Save it in a DataFrame
zal = pd.json_normalize(response['articles'])

In [5]:
# Get the total number of pages
total_pages = response['pagination']['page_count']

# Loop over all the pages to find all the information
# Page 1 is already in the goods dataframe so it's skipped
for page in range(1,total_pages+1):
    response = get_page_info(page)
    zal_aux = pd.json_normalize(response['articles'])
    zal = pd.concat([zal, zal_aux])

In [6]:
zal.head()

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,...,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,tracking_information.metrigo_impression_urls,tracking_information.impression_beacon,tracking_information.source,amount,price.base_price,outfits,condition,condition_key
0,CAO21B07H-A11,JUPE TAILLE HAUTE CEINTURÉE - Jupe trapèze - c...,"[34, 36, 38, 40, 44, 46]",camaieu-jupe-taille-haute-ceinturee-jupe-trape...,[{'path': 'CA/O2/1B/07/HA/11/CAO21B07H-A11@8.1...,Camaïeu,False,[],"[{'key': 'discountRate', 'value': '-60%', 'tra...",clothing,...,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp,,,,,
1,CAO21U02C-Q11,Manteau classique - black,"[36, 40, 42, 44, 46]",camaieu-manteau-classique-black-cao21u02c-q11,[{'path': 'CA/O2/1U/02/CQ/11/CAO21U02C-Q11@12....,Camaïeu,False,[],"[{'key': 'discountRate', 'value': '-60%', 'tra...",clothing,...,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp,,,,,
2,CAO21I08X-G11,GILET MAILLE FILÉE - Gilet - rouille,"[S, M, L, XL]",camaieu-gilet-maille-filee-gilet-rouille-cao21...,[{'path': 'CA/O2/1I/08/XG/11/CAO21I08X-G11@15....,Camaïeu,False,[],"[{'key': 'discountRate', 'value': '-40%', 'tra...",clothing,...,False,False,[https://ccp-et.adtechlab.zalan.do/event/sbv?z...,https://ccp-et.adtechlab.zalan.do/event/sbv?z=...,ccp,,,,,
3,LA251H01P-Q11,NF1888PO_141 - Cabas - noir,[One Size],lacoste-cabas-black-la251h01p-q11,[{'path': 'LA/25/1H/01/PQ/11/LA251H01P-Q11@17....,Lacoste,False,[],"[{'key': 'discountRate', 'value': '-10%', 'tra...",accessoires,...,False,False,,,,,,,,
4,AD111A0YA-A11,TRAINER - Baskets basses - footwear white/glo...,"[36, 42, 36 2/3, 37 1/3, 42 2/3, 43 1/3]",adidas-originals-trainer-baskets-basses-ad111a...,[{'path': 'AD/11/1A/0Y/AA/11/AD111A0YA-A11@12....,adidas Originals,False,[],"[{'key': 'discountRate', 'value': '-70%', 'tra...",shoe,...,False,False,,,,,,,,


## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [7]:
# your code here

# Trending brand = the brand with more products
zal.groupby('brand_name').count().nlargest(1,'name')

Unnamed: 0_level_0,sku,name,sizes,url_key,media,is_premium,family_articles,flags,product_group,delivery_promises,...,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,tracking_information.metrigo_impression_urls,tracking_information.impression_beacon,tracking_information.source,amount,price.base_price,outfits,condition,condition_key
brand_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
usha,1306,1306,1306,1306,1306,1306,1306,1306,1306,1306,...,1306,1306,0,0,0,0,0,1,0,0


In [8]:
# Convert the original price to float type
zal['price.original'] = zal['price.original'].replace('\u20AC','',regex=True)
zal['price.original'] = zal['price.original'].str.replace(',', '.').replace('\xa0','').astype(float)

0     25.99
1     89.99
2     25.99
3     94.95
4     99.95
      ...  
79    49.95
80    69.95
81    23.99
82    89.95
83    59.95
Name: price.original, Length: 75012, dtype: float64

In [40]:
# Convert the promotional price to float type
zal['price.promotional'] = zal['price.promotional'].replace('\u20AC','',regex=True)
zal['price.promotional'] = zal['price.promotional'].str.replace(',', '.').replace('\xa0','')

zal['price.promotional'].loc[zal['price.promotional'].str.len() >7] = zal['price.promotional'].loc[zal['price.promotional'].str.len() >7].str.replace('\xa0','')
zal['price.promotional'] = zal['price.promotional'].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


In [41]:
# Create a column with the difference of price
zal['PriceReduction'] = zal['price.original'] - zal['price.promotional'] 

# Find the product(s)with the highest discount:
zal.nlargest(5, 'PriceReduction')

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,...,price.has_discount_on_selected_sizes_only,tracking_information.metrigo_impression_urls,tracking_information.impression_beacon,tracking_information.source,amount,price.base_price,outfits,condition,condition_key,PriceReduction
7,HL421C03E-F11,FRINGE GOWN - Robe de cocktail - gold/combo,"[34, 36, 38]",herve-leger-fringe-gown-robe-de-cocktail-goldc...,[{'path': 'HL/42/1C/03/EF/11/HL421C03E-F11@4.j...,Hervé Léger,True,"[{'sku': 'HL421C03E-F11', 'url_key': 'herve-le...","[{'key': 'discountRate', 'value': '-70%', 'tra...",clothing,...,False,,,,,,,,,1659.0
2,HL421C03O-F11,STRAPS GOWN - Robe de cocktail - rosegold,"[34, 36, 38]",herve-leger-straps-gown-robe-de-cocktail-roseg...,[{'path': 'HL/42/1C/03/OF/11/HL421C03O-F11@9.j...,Hervé Léger,True,"[{'sku': 'HL421C03O-F11', 'url_key': 'herve-le...","[{'key': 'discountRate', 'value': '-60%', 'tra...",clothing,...,False,,,,,,,,,1269.0
36,HL421C03U-F11,Robe de soirée - rose gold,"[34, 36, 38]",herve-leger-robe-de-cocktail-rose-gold-hl421c0...,[{'path': 'HL/42/1C/03/UF/11/HL421C03U-F11@14....,Hervé Léger,True,"[{'sku': 'HL421C03U-F11', 'url_key': 'herve-le...","[{'key': 'discountRate', 'value': '-70%', 'tra...",clothing,...,False,,,,,,,,,1116.0
82,HL421C043-F11,FRINGE GOWN - Robe de soirée - rose gold,"[34, 36, 38]",herve-leger-fringe-gown-robe-de-soiree-rose-go...,[{'path': 'HL/42/1C/04/3F/11/HL421C043-F11@13....,Hervé Léger,True,"[{'sku': 'HL421C043-F11', 'url_key': 'herve-le...","[{'key': 'discountRate', 'value': '-70%', 'tra...",clothing,...,False,,,,,,,,,1064.0
1,HL421C043-F11,FRINGE GOWN - Robe de soirée - rose gold,"[34, 36, 38]",herve-leger-fringe-gown-robe-de-soiree-rose-go...,[{'path': 'HL/42/1C/04/3F/11/HL421C043-F11@13....,Hervé Léger,True,"[{'sku': 'HL421C043-F11', 'url_key': 'herve-le...","[{'key': 'discountRate', 'value': '-70%', 'tra...",clothing,...,False,,,,,,,,,1064.0


In [44]:
# sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices)
zal['price.promotional'].sum() / zal['price.original'].sum()


0.6550829511559152