# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [1]:
# your code here

url = 'https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset=84&sort=popularity'

In [2]:
# your code here
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize

In [3]:
headers = \
{"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"}
response = requests.get(url, headers=headers)

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [4]:
results = response.json()

In [5]:
flattened_data = json_normalize(results)
first_page_data = json_normalize(flattened_data.articles[0])
first_page_data.count()
#Total page count is 84 sku, similar to the maximum number of items per page

sku                                          84
name                                         84
sizes                                        84
url_key                                      84
media                                        84
brand_name                                   84
is_premium                                   84
family_articles                              84
flags                                        84
product_group                                84
delivery_promises                            84
price.original                               84
price.promotional                            84
price.has_different_prices                   84
price.has_different_original_prices          84
price.has_different_promotional_prices       84
price.has_discount_on_selected_sizes_only    84
outfits                                       2
dtype: int64

In [6]:
df = pd.DataFrame()
for i in [0,1]:
    k=84*i
    url_2 =f'https://www.zalando.fr/api/catalog/articles?categories=promo-homme&limit=84&offset={k}&sort=popularity'
    response_2 = requests.get(url_2, headers=headers)
    results_2 = response_2.json()
    flattened_data_2 = json_normalize(results_2)
    flattened_data_2_2 = json_normalize(flattened_data_2.articles[0])
    df = df.append(flattened_data_2_2)
    
df.head()

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,outfits
0,R0622E00U-K11,DAVE PANTS - Pantalon cargo - navy,"[XS, S, M, XL, XXL]",redefined-rebel-pantalon-cargo-navy-r0622e00u-k11,[{'path': 'R0/62/2E/00/UK/11/R0622E00U-K11@4.j...,Redefined Rebel,False,[],"[{'key': 'campaign', 'value': '-20% EXTRA', 't...",clothing,[],"29,95 €","26,95 €",False,False,False,False,
1,TOB22O01B-Q11,ORIGINAL SLIM FIT - T-shirt à manches longues ...,"[XS, S, M, L, XL, XXL]",tommy-jeans-original-t-shirt-a-manches-longues...,[{'path': 'TO/B2/2O/01/BQ/11/TOB22O01B-Q11@8.j...,Tommy Jeans,False,[],"[{'key': 'campaign', 'value': '-20% EXTRA', 't...",clothing,[],"34,95 €","27,95 €",False,False,False,False,
2,TOB22O04X-Q11,BADGE TEE - T-shirt basique - black,"[XS, S, M, L, XL, XXL]",tommy-jeans-badge-tee-t-shirt-basique-tob22o04...,[{'path': 'TO/B2/2O/04/XQ/11/TOB22O04X-Q11@9.j...,Tommy Jeans,False,[],"[{'key': 'campaign', 'value': '-20% EXTRA', 't...",clothing,[],"39,95 €","19,95 €",False,False,False,False,
3,NI122S0AT-Q11,CLUB - Sweatshirt - black/white,"[XS, M, L, XL, XXL]",nike-sportswear-club-sweatshirt-blackwhite-ni1...,[{'path': 'NI/12/2S/0A/TQ/11/NI122S0AT-Q11@12....,Nike Sportswear,False,[],[],clothing,[],"39,90 €","38,90 €",True,False,True,False,
4,VA215B000-Q12,OLD SKOOL - Chaussures de skate - black,"[34.5, 35, 36, 36.5, 37, 38, 38.5, 39, 40, 40....",vans-old-skool-baskets-basses-va215b000-q12,[{'path': 'VA/21/5B/00/0Q/12/VA215B000-Q12@12....,Vans,False,[],"[{'key': 'campaign', 'value': '-20% EXTRA', 't...",shoe,[],"74,95 €","67,45 €",False,False,False,False,"[{'id': 'nmnnhTu7SCu', 'url_key': '/outfits/nm..."


## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [57]:
# your code here
#Below the most trending brand
df.groupby('brand_name').count().sku.nlargest(1)

#df.groupby('sku').max()

brand_name
Nike Sportswear    42
Name: sku, dtype: int64

In [13]:
df[['price.original','price.promotional']].dtypes

price.original       object
price.promotional    object
dtype: object

In [34]:
df['price.original']= df['price.original'].apply(lambda x: x.replace(',','.'))\
.astype(float)

In [None]:
df['price.promotional'] = df['price.promotional'].apply(lambda x: x.replace(',','.'))
df['price.promotional'] = df['price.promotional'].apply(lambda x: x.replace('€',''))\
                            .astype(float)
df['price.promotional']

In [39]:
#Obtaining discount
df['discount'] = (df['price.original'] - df['price.promotional'])\
                    / df['price.original'] * 100

In [71]:
#This is the top 4 SKUs with the highest discount. We see that two of them have
#the same maximum discount
df[['sku','discount']].nlargest(4,'discount')

Unnamed: 0,sku,discount
53,NI112O093-H11,50.420168
24,NI112O08W-G11,50.420168
46,A0F22O03Y-A11,50.108932
63,EL922F00L-Q11,50.089445


In [73]:
#Getting the average discount for the whole dataframe
avg_discount = (df['price.original'].sum() - df['price.promotional'].sum())\
                    / df['price.original'].sum() * 100

avg_discount

31.738587014987075