# Parsing APIs Example

## Intro

Now we will take a look on a real data. When you parse data from web you will often meet API based web-pages. 

For example [zalando.fr](https://www.zalando.fr/accueil-homme/) is API based web-page. 

In this guided lab you will learn how to obtain the links from webpages and extract the data. Read through this doc, execute the cells in order and make sure you understand the explanations. 

*Note: This guided lab uses Google Chrome. Other browsers like Safari and Firefox have similar tools for developers but they work differently. To save your time in following this lab, it is strongly recommended that you install and use Google Chrome.*

## Obtaining the link

Zalando is discount e-store where you can buy clothes and accesories with discount. When we go to the web-page, we can choose different sections. First the general process will be shown using [Children section](https://www.zalando.fr/accueil-enfant/) as example.

Here we will parse data about promotions only. Therefore, final output will be the DataFrame with all the goods under discount.

[![Image from Gyazo](https://i.gyazo.com/fa4874d8e81c7570273bbfb853d66308.png)](https://gyazo.com/fa4874d8e81c7570273bbfb853d66308)


We go to Promos page. Right click of mouse shows us a list of actions possible, from which we select Inspect.

<img src='https://i.gyazo.com/bccbd11d69c9040dc98758d443e32052.png' width="400">


You will see the menu dropdown on the right side or on the bottom of the window. There you should click on Network:


[![Image from Gyazo](https://i.gyazo.com/f7e0db81cbfee67694183d1a7640bf81.png)](https://gyazo.com/f7e0db81cbfee67694183d1a7640bf81)

Right after the developer part will change showing the files behind the page. In order to obtain only useful files we select the following settings:
1. Preserve Log
2. Select XHR files.

[![Image from Gyazo](https://i.gyazo.com/9a899d4441d9d93e795f79747f1e47d5.png)](https://gyazo.com/9a899d4441d9d93e795f79747f1e47d5)

In order to obtain some files we need to scrool down and go forward to second page. 

[![Image from Gyazo](https://i.gyazo.com/0956eb3d5125075a236c9a439c7749c7.png)](https://gyazo.com/0956eb3d5125075a236c9a439c7749c7)

In the Network panel you can see the following files being uploaded. All the data on the web-page is uploaded from the json file, which is one of the following. It is important to understand which file contains what kind of information. 

<a href="https://gyazo.com/cf97a655869f0b22df0ada1cb2a41c3c"><img src="https://i.gyazo.com/cf97a655869f0b22df0ada1cb2a41c3c.png" alt="Image from Gyazo" width="724.8"/></a>

When you find what kind of information you need for the data to be uploaded you just test it. Here we need the article... file:

<a href="https://gyazo.com/78b35bf492994b3f35c0564a21da202a"><img src="https://i.gyazo.com/78b35bf492994b3f35c0564a21da202a.png" alt="Image from Gyazo" width="727.2"/></a>

When we test the link in Chrome inkognito mode we obtain the proper json file:


<a href="https://gyazo.com/b60453fa98454fa29771c731a5174443"><img src="https://i.gyazo.com/b60453fa98454fa29771c731a5174443.png" alt="Image from Gyazo" width="1530.4"/></a>

In order to change the objects in the json file (kind of pagination), you need to change the offset (the number of the first element on the page). in fact, if you take a look on the link, it is easy to unerstand the structure of the link.

# Reading the data

Now the party rocks! When we know how can we obtain the data, it is not a problem to obtain the whole database with all the data from the web-page.
In this lab you will collect your database of Zalando products. You select which goods you want to track. You can define as many filters to your data as you want. Just make sure that the data represents the filters.




In [173]:
import json
import requests
import pandas as pd
from pandas import json_normalize


In [174]:
# Paste the url you obtained for your data
#url='https://www.zalando.fr/api/catalog/articles?categories=promo-enfant&limit=84&offset=84&sort=sale'

# a mí me lo ha dado ordenador por popularidad
url = "https://www.zalando.fr/api/catalog/articles?categories=promo-enfant&limit=84&offset=84&sort=popularity"

#### Collect first 84 object of the of the data (1st page)

Your output should be a Pandas DataFrame of goods. Each row should contain only text or numbers, having *family_articles, flags, media* and *sizes* remaining lists (they are exceptions). Hint: use the headers parameter to get the data!

In [175]:
# headers definition (cogido de los apuntes del notebook Api exchange)

headers = {"User-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"}

# cogido de cómo trabamajos el diccionario de las exchange rates, notebook Api exchange)
response = requests.get(url, headers = headers)
json = json.loads(response.text)
print(json)

{'total_count': 27815, 'pagination': {'page_count': 332, 'current_page': 2, 'per_page': 84}, 'sort': 'popularity', 'articles': [{'sku': 'N1243K08X-Q11', 'name': 'BLOCK TAPING TRICOT BABY SET - Survêtement - black', 'price': {'original': '35,95\xa0€', 'promotional': '28,85\xa0€', 'has_different_prices': False, 'has_different_original_prices': False, 'has_different_promotional_prices': False, 'has_discount_on_selected_sizes_only': False}, 'sizes': ['12m', '18m'], 'url_key': 'nike-performance-block-taping-tricot-set-survetement-n1243k08x-q11', 'media': [{'path': 'N1/24/3K/08/XQ/11/N1243K08X-Q11@7.jpg', 'role': 'DEFAULT', 'packet_shot': False}], 'brand_name': 'Nike Sportswear', 'is_premium': False, 'family_articles': [{'sku': 'N1243K08X-Q11', 'url_key': 'nike-performance-block-taping-tricot-set-survetement-n1243k08x-q11', 'media': [{'path': 'N1/24/3K/08/XQ/11/N1243K08X-Q11@7.jpg', 'role': 'FAMILY', 'packet_shot': False}], 'name': 'BLOCK TAPING TRICOT BABY SET - Survêtement - black', 'sizes

In [176]:
#El json contiene anidado un diccionario llamado artículos, que es con lo que te quieres quedar
json.keys()

dict_keys(['total_count', 'pagination', 'sort', 'articles', 'query_path', 'previous_page_path', 'next_page_path', 'query_params', 'page_gender', 'premium', 'filters', 'total_article_count', 'plusStatus', 'categoryTree', 'sortingKeys', 'breadcrumbs', 'querySemantics', 'articlesToShow', 'octopusTests', 'locale', 'isLoggedIn', 'notification', 'resetFilters', 'selectedFilters', 'ssrArticleCount', 'feedbackId', 'variants', 'contentPositions', 'prideEnabled', 'lazyArticleImages', 'hideSearchTerm', 'iconPaths', 'teaser', 'inCatalogTeaser', 'entryPointTeasers', 'upperInCatTeaser', 'collection', 'carouselTeaser', 'wishlist', 'pills', 'sizeOnboardingDialog'])

In [177]:
#Vamos a ver los valores del diccionario articles
articulos = json.get("articles") 
print(articulos)

[{'sku': 'N1243K08X-Q11', 'name': 'BLOCK TAPING TRICOT BABY SET - Survêtement - black', 'price': {'original': '35,95\xa0€', 'promotional': '28,85\xa0€', 'has_different_prices': False, 'has_different_original_prices': False, 'has_different_promotional_prices': False, 'has_discount_on_selected_sizes_only': False}, 'sizes': ['12m', '18m'], 'url_key': 'nike-performance-block-taping-tricot-set-survetement-n1243k08x-q11', 'media': [{'path': 'N1/24/3K/08/XQ/11/N1243K08X-Q11@7.jpg', 'role': 'DEFAULT', 'packet_shot': False}], 'brand_name': 'Nike Sportswear', 'is_premium': False, 'family_articles': [{'sku': 'N1243K08X-Q11', 'url_key': 'nike-performance-block-taping-tricot-set-survetement-n1243k08x-q11', 'media': [{'path': 'N1/24/3K/08/XQ/11/N1243K08X-Q11@7.jpg', 'role': 'FAMILY', 'packet_shot': False}], 'name': 'BLOCK TAPING TRICOT BABY SET - Survêtement - black', 'sizes': ['12m', '18m'], 'price': {'original': '35,95\xa0€', 'promotional': '28,85\xa0€', 'has_different_prices': False, 'has_differe

In [178]:
import pandas as pd

In [179]:
#Como ya hemos seleccionado anteriormente el diccionario artículos, ya no hace falta hacer flatten para artículos
pd.DataFrame(articulos)

Unnamed: 0,sku,name,price,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises
0,N1243K08X-Q11,BLOCK TAPING TRICOT BABY SET - Survêtement - b...,"{'original': '35,95 €', 'promotional': '28,85 ...","[12m, 18m]",nike-performance-block-taping-tricot-set-surve...,[{'path': 'N1/24/3K/08/XQ/11/N1243K08X-Q11@7.j...,Nike Sportswear,False,"[{'sku': 'N1243K08X-Q11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[]
1,NI114D0EZ-A11,AIR MAX 2090 - Baskets basses - white/volt/val...,"{'original': '69,95 €', 'promotional': '45,45 ...","[19.5, 21, 22, 23.5, 25, 26, 27]",nike-sportswear-air-max-baskets-basses-whitebl...,[{'path': 'NI/11/4D/0E/ZA/11/NI114D0EZ-A11@2.1...,Nike Sportswear,False,"[{'sku': 'NI114D0EZ-A11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': '-35%', 'tra...",shoe,[]
2,HG223F0C3-J11,Robe de soirée - lila,"{'original': '34,95 €', 'promotional': '13,95 ...","[3-4a, 5-6a, 7-8a, 9-10a, 11-12a, 13-14a, 15-16a]",happy-girls-robe-de-soiree-lila-hg223f0c3-j11,[{'path': 'HG/22/3F/0C/3J/11/HG223F0C3-J11@6.j...,happy girls,False,"[{'sku': 'HG223F0C3-J11', 'url_key': 'happy-gi...","[{'key': 'discountRate', 'value': '-60%', 'tra...",clothing,[]
3,K2024G05C-K11,Polo - blueus,"{'original': '28,95 €', 'promotional': '16,05 ...","[12a, 14a]",kaporal-polo-blueus-k2024g05c-k11,[{'path': 'K2/02/4G/05/CK/11/K2024G05C-K11@9.j...,Kaporal,False,"[{'sku': 'K2024G05C-K11', 'url_key': 'kaporal-...","[{'key': 'discountRate', 'value': '-45%', 'tra...",clothing,[]
4,NI123F00J-Q11,AIR DRESS - Robe en jersey - black/white,"{'original': '44,95 €', 'promotional': '13,45 ...","[8-9a, 10-11a, 12-13a, 14a]",nike-sportswear-air-dress-robe-en-jersey-black...,[{'path': 'NI/12/3F/00/JQ/11/NI123F00J-Q11@6.j...,Nike Sportswear,False,"[{'sku': 'NI123F00J-Q11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': '-70%', 'tra...",clothing,[]
...,...,...,...,...,...,...,...,...,...,...,...,...
79,LA224G052-C11,LACOSTE - TEE-SHIRT ENFANT-TJ4877 - T-shirt im...,"{'original': '39,00 €', 'promotional': '27,00 ...","[2a, 3y, 4a, 5a, 6a, 8a, 10a, 12a, 14a, 16a]",lacoste-tj4877-00-502-t-shirt-imprime-la224g05...,[{'path': 'LA/22/4G/05/2C/11/LA224G052-C11@13....,Lacoste,False,"[{'sku': 'LA224G052-C11', 'url_key': 'lacoste-...","[{'key': 'discountRate', 'value': '-31%', 'tra...",clothing,[]
80,F5713A04X-K11,Babies - dark blue,"{'original': '29,95 €', 'promotional': '23,95 ...","[24, 25, 26, 27, 28, 29, 30, 31, 32]",friboo-babies-dark-blue-f5713a04x-k11,[{'path': 'F5/71/3A/04/XK/11/F5713A04X-K11@8.j...,Friboo,False,"[{'sku': 'F5713A04X-K11', 'url_key': 'friboo-b...","[{'key': 'discountRate', 'value': '-20%', 'tra...",shoe,[]
81,B5813A00R-O11,RAIN - Babies - cognac,"{'original': '69,95 €', 'promotional': '27,95 ...","[27, 30, 31, 32, 33, 34, 35, 36]",bisgaard-rain-babies-b5813a00r-o11,[{'path': 'B5/81/3A/00/RO/11/B5813A00R-O11@9.j...,Bisgaard,False,"[{'sku': 'B5813A00R-O11', 'url_key': 'bisgaard...","[{'key': 'discountRate', 'value': '-60%', 'tra...",shoe,[]
82,NA824C09X-K11,NKMSOFUS LONG - Short en jean - light blue denim,"{'original': '24,99 €', 'promotional': '17,49 ...","[6a, 7a, 8a, 9a, 10a, 11a, 14a]",name-it-nkmsofus-long-short-en-jean-light-blue...,[{'path': 'NA/82/4C/09/XK/11/NA824C09X-K11@9.j...,Name it,False,"[{'sku': 'NA824C09X-K11', 'url_key': 'name-it-...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",clothing,[]


In [180]:
#Pero sí que podríamos hacerlo para los diccionarios anidados dentro de articulos

In [181]:
from pandas import json_normalize

In [182]:
#por lo que he visto en los apuntes, hay que hacer un flatten del diccionario para que te lo desanide:

flattened_articulos = pd.json_normalize(articulos)
flattened_articulos



Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only
0,N1243K08X-Q11,BLOCK TAPING TRICOT BABY SET - Survêtement - b...,"[12m, 18m]",nike-performance-block-taping-tricot-set-surve...,[{'path': 'N1/24/3K/08/XQ/11/N1243K08X-Q11@7.j...,Nike Sportswear,False,"[{'sku': 'N1243K08X-Q11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[],"35,95 €","28,85 €",False,False,False,False
1,NI114D0EZ-A11,AIR MAX 2090 - Baskets basses - white/volt/val...,"[19.5, 21, 22, 23.5, 25, 26, 27]",nike-sportswear-air-max-baskets-basses-whitebl...,[{'path': 'NI/11/4D/0E/ZA/11/NI114D0EZ-A11@2.1...,Nike Sportswear,False,"[{'sku': 'NI114D0EZ-A11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': '-35%', 'tra...",shoe,[],"69,95 €","45,45 €",False,False,False,False
2,HG223F0C3-J11,Robe de soirée - lila,"[3-4a, 5-6a, 7-8a, 9-10a, 11-12a, 13-14a, 15-16a]",happy-girls-robe-de-soiree-lila-hg223f0c3-j11,[{'path': 'HG/22/3F/0C/3J/11/HG223F0C3-J11@6.j...,happy girls,False,"[{'sku': 'HG223F0C3-J11', 'url_key': 'happy-gi...","[{'key': 'discountRate', 'value': '-60%', 'tra...",clothing,[],"34,95 €","13,95 €",False,False,False,False
3,K2024G05C-K11,Polo - blueus,"[12a, 14a]",kaporal-polo-blueus-k2024g05c-k11,[{'path': 'K2/02/4G/05/CK/11/K2024G05C-K11@9.j...,Kaporal,False,"[{'sku': 'K2024G05C-K11', 'url_key': 'kaporal-...","[{'key': 'discountRate', 'value': '-45%', 'tra...",clothing,[],"28,95 €","16,05 €",False,False,False,False
4,NI123F00J-Q11,AIR DRESS - Robe en jersey - black/white,"[8-9a, 10-11a, 12-13a, 14a]",nike-sportswear-air-dress-robe-en-jersey-black...,[{'path': 'NI/12/3F/00/JQ/11/NI123F00J-Q11@6.j...,Nike Sportswear,False,"[{'sku': 'NI123F00J-Q11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': '-70%', 'tra...",clothing,[],"44,95 €","13,45 €",False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,LA224G052-C11,LACOSTE - TEE-SHIRT ENFANT-TJ4877 - T-shirt im...,"[2a, 3y, 4a, 5a, 6a, 8a, 10a, 12a, 14a, 16a]",lacoste-tj4877-00-502-t-shirt-imprime-la224g05...,[{'path': 'LA/22/4G/05/2C/11/LA224G052-C11@13....,Lacoste,False,"[{'sku': 'LA224G052-C11', 'url_key': 'lacoste-...","[{'key': 'discountRate', 'value': '-31%', 'tra...",clothing,[],"39,00 €","27,00 €",True,False,True,False
80,F5713A04X-K11,Babies - dark blue,"[24, 25, 26, 27, 28, 29, 30, 31, 32]",friboo-babies-dark-blue-f5713a04x-k11,[{'path': 'F5/71/3A/04/XK/11/F5713A04X-K11@8.j...,Friboo,False,"[{'sku': 'F5713A04X-K11', 'url_key': 'friboo-b...","[{'key': 'discountRate', 'value': '-20%', 'tra...",shoe,[],"29,95 €","23,95 €",False,False,False,False
81,B5813A00R-O11,RAIN - Babies - cognac,"[27, 30, 31, 32, 33, 34, 35, 36]",bisgaard-rain-babies-b5813a00r-o11,[{'path': 'B5/81/3A/00/RO/11/B5813A00R-O11@9.j...,Bisgaard,False,"[{'sku': 'B5813A00R-O11', 'url_key': 'bisgaard...","[{'key': 'discountRate', 'value': '-60%', 'tra...",shoe,[],"69,95 €","27,95 €",True,False,True,False
82,NA824C09X-K11,NKMSOFUS LONG - Short en jean - light blue denim,"[6a, 7a, 8a, 9a, 10a, 11a, 14a]",name-it-nkmsofus-long-short-en-jean-light-blue...,[{'path': 'NA/82/4C/09/XK/11/NA824C09X-K11@9.j...,Name it,False,"[{'sku': 'NA824C09X-K11', 'url_key': 'name-it-...","[{'key': 'discountRate', 'value': 'Jusqu’à -30...",clothing,[],"24,99 €","17,49 €",True,False,True,False


In [183]:
flattened_articulos.shape

(84, 17)

In [184]:
#OMG HA SALIDOOOOOOO!!! :))))))) ahora tengo 5 columnas más

#### Collect all the objects from selected filters. Total number of pages can be found in the same json. Use *sku* column as index.

Your output should be a Pandas DataFrame of goods. Each row should contain only text or numbers, having family_articles, flags, media and sizes remaining lists (they are exceptions).

In [185]:
#Entiengo que esto ya lo he solucionado con el df flattened_articulos, no?

#### Display the trending brand in DataFrame

In [186]:
flattened_articulos.groupby("brand_name").count().sort_values(by = "name", ascending = False)

#The trending brand is Nike Sportwear

Unnamed: 0_level_0,sku,name,sizes,url_key,media,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only
brand_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Nike Sportswear,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13
Geox,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8
Puma,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
adidas Originals,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
Lacoste,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
Kaporal,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3
Nike Performance,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
Quiksilver,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
Name it,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
Re-Gen,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2


#### Display the brand with maximal total discount (sum of discounts on all goods)

In [187]:
#Our data is still text. Convert prices into numbers.
flattened_articulos.info()
# your code

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 84 entries, 0 to 83
Data columns (total 17 columns):
 #   Column                                     Non-Null Count  Dtype 
---  ------                                     --------------  ----- 
 0   sku                                        84 non-null     object
 1   name                                       84 non-null     object
 2   sizes                                      84 non-null     object
 3   url_key                                    84 non-null     object
 4   media                                      84 non-null     object
 5   brand_name                                 84 non-null     object
 6   is_premium                                 84 non-null     bool  
 7   family_articles                            84 non-null     object
 8   flags                                      84 non-null     object
 9   product_group                              84 non-null     object
 10  delivery_promises                       

In [188]:
#flattened_articulos["price.original"].astype(int)

#Nos da error, tenemos que limpiar antes las columnas para poder convertir su contenido en un integer

In [189]:
flattened_articulos["price.original"] = flattened_articulos["price.original"].str.replace("\xa0€","")
flattened_articulos["price.promotional"] = flattened_articulos["price.promotional"].str.replace("\xa0€","")

flattened_articulos.sample(10)

#He tenido que poner "\xa0€" para que me quedara limpio, aún así no me lo convierte a integer

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only
39,GE113G03C-J11,KARLY GIRL - Sandales - skin/light rose,"[28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 41]",geox-karly-girl-sandales-skinlight-rose-ge113g...,[{'path': 'GE/11/3G/03/CJ/11/GE113G03C-J11@6.j...,Geox,False,"[{'sku': 'GE113G03C-J11', 'url_key': 'geox-kar...","[{'key': 'discountRate', 'value': '-60%', 'tra...",shoe,[],5995,2395,True,False,True,False
30,F5723K02J-J11,BASIC LONGLINE HOODIE 2 PACK - Sweat à capuche...,"[122/128, 134/140, 146/152]",friboo-basic-longline-hoodie-2-pack-sweatshirt...,[{'path': 'F5/72/3K/02/JJ/11/F5723K02J-J11@8.1...,Friboo,False,"[{'sku': 'F5723K02J-J11', 'url_key': 'friboo-b...","[{'key': 'discountRate', 'value': '-45%', 'tra...",clothing,[],2995,1655,False,False,False,False
47,STF23K032-C11,HOODIE - Sweat à capuche - mid grey melange,"[12a, 14a, 16a]",staccato-hoodie-sweatshirt-mid-grey-melange-st...,[{'path': 'ST/F2/3K/03/2C/11/STF23K032-C11@11....,Staccato,False,"[{'sku': 'STF23K032-C11', 'url_key': 'staccato...","[{'key': 'discountRate', 'value': '-60%', 'tra...",clothing,[],2995,1195,False,False,False,False
70,AD116D0R6-A11,SAMOA - Baskets basses - footwear white/core ...,"[32, 33, 34, 35]",adidas-originals-samoa-baskets-basses-footwear...,[{'path': 'AD/11/6D/0R/6A/11/AD116D0R6-A11@10....,adidas Originals,False,"[{'sku': 'AD116D0R6-A11', 'url_key': 'adidas-o...","[{'key': 'discountRate', 'value': '-30%', 'tra...",shoe,[],4695,3305,False,False,False,False
52,JOC43N00N-G11,JUMPMAN BEANIE GLOVE SET - Gants - gym red,[One Size],jordan-jumpman-beanie-glove-set-bonnet-joc43n0...,[{'path': 'JO/C4/3N/00/NG/11/JOC43N00N-G11@12....,Jordan,False,"[{'sku': 'JOC43N00N-G11', 'url_key': 'jordan-j...","[{'key': 'discountRate', 'value': '-55%', 'tra...",accessoires,[],2995,1355,False,False,False,False
54,NA314E03M-C11,COCOON - Chaussures premiers pas - grigio,"[18, 19, 20, 21, 22, 23, 25]",naturino-cocoon-vl-chaussures-a-scratch-na314e...,[{'path': 'NA/31/4E/03/MC/11/NA314E03M-C11@1.1...,Naturino,True,"[{'sku': 'NA314E03M-C11', 'url_key': 'naturino...","[{'key': 'discountRate', 'value': '-30%', 'tra...",shoe,[],6700,4690,True,False,True,False
51,C1853E00C-Q11,LOGO BELT - Ceinture - black,"[70, 80]",calvin-klein-jeans-logo-belt-ceinture-black-c1...,[{'path': 'C1/85/3E/00/CQ/11/C1853E00C-Q11@3.j...,Calvin Klein Jeans,False,"[{'sku': 'C1853E00C-Q11', 'url_key': 'calvin-k...","[{'key': 'discountRate', 'value': '-15%', 'tra...",accessoires,[],1995,1695,False,False,False,False
44,NI153K006-A11,MOTIVATE VERBIAGE BABY SET - Cadeau de naissa...,[0-6m],nike-sportswear-motivate-verbiage-baby-set-cad...,[{'path': 'NI/15/3K/00/6A/11/NI153K006-A11@2.j...,Nike Sportswear,False,"[{'sku': 'NI153K006-A11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': '-35%', 'tra...",clothing,[],2195,1435,False,False,False,False
15,K2023A00P-K11,Jeans Skinny - dark blue denim,"[10a, 14a, 16a]",kaporal-jeans-skinny-dark-blue-denim-k2023a00p...,[{'path': 'K2/02/3A/00/PK/11/K2023A00P-K11@2.j...,Kaporal,False,"[{'sku': 'K2023A00P-K11', 'url_key': 'kaporal-...","[{'key': 'discountRate', 'value': '-50%', 'tra...",clothing,[],4895,2445,False,False,False,False
29,LA224G04E-C11,Polo - alpes grey chine/corrida ionian daba,"[2a, 3y, 4a, 5a, 6a, 8a, 10a, 12a, 14a]",lacoste-polo-alpes-grey-chinecorrida-ionian-da...,[{'path': 'LA/22/4G/04/EC/11/LA224G04E-C11@6.j...,Lacoste,False,"[{'sku': 'LA224G04E-C11', 'url_key': 'lacoste-...","[{'key': 'discountRate', 'value': 'Jusqu’à -41...",clothing,[],5100,3000,True,False,True,False


In [190]:
flattened_articulos.shape

(84, 17)

In [191]:
flattened_articulos["price.promotional"].astype(int)

#Me dice que todavía tengo algún error, que en alguna fila hay "58,45\xa0"
#No entiendo por qué, porque justamente hemos limpiado esa expresión!

ValueError: invalid literal for int() with base 10: '28,85'

In [192]:
#Cada vez me da que hay un tipo de contenido erróneo diferente :(

flattened_articulos["price.original"] = flattened_articulos["price.original"].str.replace("89,95\xa0","89,95")
flattened_articulos["price.promotional"] = flattened_articulos["price.promotional"].str.replace("89,95\xa0","89,95")
flattened_articulos.head()

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only
0,N1243K08X-Q11,BLOCK TAPING TRICOT BABY SET - Survêtement - b...,"[12m, 18m]",nike-performance-block-taping-tricot-set-surve...,[{'path': 'N1/24/3K/08/XQ/11/N1243K08X-Q11@7.j...,Nike Sportswear,False,"[{'sku': 'N1243K08X-Q11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[],3595,2885,False,False,False,False
1,NI114D0EZ-A11,AIR MAX 2090 - Baskets basses - white/volt/val...,"[19.5, 21, 22, 23.5, 25, 26, 27]",nike-sportswear-air-max-baskets-basses-whitebl...,[{'path': 'NI/11/4D/0E/ZA/11/NI114D0EZ-A11@2.1...,Nike Sportswear,False,"[{'sku': 'NI114D0EZ-A11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': '-35%', 'tra...",shoe,[],6995,4545,False,False,False,False
2,HG223F0C3-J11,Robe de soirée - lila,"[3-4a, 5-6a, 7-8a, 9-10a, 11-12a, 13-14a, 15-16a]",happy-girls-robe-de-soiree-lila-hg223f0c3-j11,[{'path': 'HG/22/3F/0C/3J/11/HG223F0C3-J11@6.j...,happy girls,False,"[{'sku': 'HG223F0C3-J11', 'url_key': 'happy-gi...","[{'key': 'discountRate', 'value': '-60%', 'tra...",clothing,[],3495,1395,False,False,False,False
3,K2024G05C-K11,Polo - blueus,"[12a, 14a]",kaporal-polo-blueus-k2024g05c-k11,[{'path': 'K2/02/4G/05/CK/11/K2024G05C-K11@9.j...,Kaporal,False,"[{'sku': 'K2024G05C-K11', 'url_key': 'kaporal-...","[{'key': 'discountRate', 'value': '-45%', 'tra...",clothing,[],2895,1605,False,False,False,False
4,NI123F00J-Q11,AIR DRESS - Robe en jersey - black/white,"[8-9a, 10-11a, 12-13a, 14a]",nike-sportswear-air-dress-robe-en-jersey-black...,[{'path': 'NI/12/3F/00/JQ/11/NI123F00J-Q11@6.j...,Nike Sportswear,False,"[{'sku': 'NI123F00J-Q11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': '-70%', 'tra...",clothing,[],4495,1345,False,False,False,False


In [None]:
#Como no podemos resolverlo con replace, lo intentaremos haciendo un extract:
import re

In [None]:
flattened_articulos["price.original"] = flattened_articulos["price.original"].str.extract(r"\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})")
flattened_articulos["price.promotional"] = flattened_articulos["price.promotional"].str.extract(r"\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})")

#El regex lo he encontrado, no lo he diseñado yo lol
flattened_articulos.sample(10)

#También me da errores... 

In [193]:
flattened_articulos["price.original"].astype(int)

ValueError: invalid literal for int() with base 10: '35,95'

In [194]:
error = (flattened_articulos["price.original"] == '35,95') | (flattened_articulos["price.promotional"] == '35,95')
error

0      True
1     False
2     False
3     False
4     False
      ...  
79    False
80    False
81    False
82    False
83    False
Length: 84, dtype: bool

In [195]:
#Vamos a hacer que todos las filas que contengan este string en su columna de precio original o de promoción
porque = flattened_articulos[flattened_articulos["price.original"] == '35,95']

In [196]:
porque.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 17 columns):
 #   Column                                     Non-Null Count  Dtype 
---  ------                                     --------------  ----- 
 0   sku                                        1 non-null      object
 1   name                                       1 non-null      object
 2   sizes                                      1 non-null      object
 3   url_key                                    1 non-null      object
 4   media                                      1 non-null      object
 5   brand_name                                 1 non-null      object
 6   is_premium                                 1 non-null      bool  
 7   family_articles                            1 non-null      object
 8   flags                                      1 non-null      object
 9   product_group                              1 non-null      object
 10  delivery_promises                         

In [197]:
#Vale, está en la columna de price.original
flattened_articulos[flattened_articulos["price.promotional"] == '35,95']

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only
6,C7424L02M-K11,KIDS RUNDALL - Veste d'hiver - navy,"[8a, 10a, 16a]",cars-jeans-kids-rundall-veste-dhiver-navy-c742...,[{'path': 'C7/42/4L/02/MK/11/C7424L02M-K11@9.j...,Cars Jeans,False,"[{'sku': 'C7424L02M-K11', 'url_key': 'cars-jea...","[{'key': 'discountRate', 'value': '-55%', 'tra...",clothing,[],7995,3595,False,False,False,False
13,LE224A08A-Q11,510 SKINNY - Jeans Skinny - black,"[3a, 4a, 5a, 6a, 8a, 10a, 12a, 14a, 16a]",levisr-510-skinny-fit-jeans-skinny-black-le224...,[{'path': 'LE/22/4A/08/AQ/11/LE224A08A-Q11@8.j...,Levi's®,False,"[{'sku': 'LE224A08A-Q11', 'url_key': 'levisr-5...","[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[],4495,3595,True,False,True,False


In [198]:
#Me dice que todavía hay filas con los valores no transferibles a integers... pero no entiendo por qué :( me rindo!

#### Display the brands without discount at all

In [199]:
#Crearemos una columna para ver aquellos productos cuyo precio original y de promocion es el mismo
flattened_articulos["no_discount"] = (flattened_articulos["price.original"] == flattened_articulos["price.promotional"])
flattened_articulos.head()

Unnamed: 0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only,no_discount
0,N1243K08X-Q11,BLOCK TAPING TRICOT BABY SET - Survêtement - b...,"[12m, 18m]",nike-performance-block-taping-tricot-set-surve...,[{'path': 'N1/24/3K/08/XQ/11/N1243K08X-Q11@7.j...,Nike Sportswear,False,"[{'sku': 'N1243K08X-Q11', 'url_key': 'nike-per...","[{'key': 'discountRate', 'value': '-20%', 'tra...",clothing,[],3595,2885,False,False,False,False,False
1,NI114D0EZ-A11,AIR MAX 2090 - Baskets basses - white/volt/val...,"[19.5, 21, 22, 23.5, 25, 26, 27]",nike-sportswear-air-max-baskets-basses-whitebl...,[{'path': 'NI/11/4D/0E/ZA/11/NI114D0EZ-A11@2.1...,Nike Sportswear,False,"[{'sku': 'NI114D0EZ-A11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': '-35%', 'tra...",shoe,[],6995,4545,False,False,False,False,False
2,HG223F0C3-J11,Robe de soirée - lila,"[3-4a, 5-6a, 7-8a, 9-10a, 11-12a, 13-14a, 15-16a]",happy-girls-robe-de-soiree-lila-hg223f0c3-j11,[{'path': 'HG/22/3F/0C/3J/11/HG223F0C3-J11@6.j...,happy girls,False,"[{'sku': 'HG223F0C3-J11', 'url_key': 'happy-gi...","[{'key': 'discountRate', 'value': '-60%', 'tra...",clothing,[],3495,1395,False,False,False,False,False
3,K2024G05C-K11,Polo - blueus,"[12a, 14a]",kaporal-polo-blueus-k2024g05c-k11,[{'path': 'K2/02/4G/05/CK/11/K2024G05C-K11@9.j...,Kaporal,False,"[{'sku': 'K2024G05C-K11', 'url_key': 'kaporal-...","[{'key': 'discountRate', 'value': '-45%', 'tra...",clothing,[],2895,1605,False,False,False,False,False
4,NI123F00J-Q11,AIR DRESS - Robe en jersey - black/white,"[8-9a, 10-11a, 12-13a, 14a]",nike-sportswear-air-dress-robe-en-jersey-black...,[{'path': 'NI/12/3F/00/JQ/11/NI123F00J-Q11@6.j...,Nike Sportswear,False,"[{'sku': 'NI123F00J-Q11', 'url_key': 'nike-spo...","[{'key': 'discountRate', 'value': '-70%', 'tra...",clothing,[],4495,1345,False,False,False,False,False


In [200]:
flattened_articulos.groupby("no_discount").count()

Unnamed: 0_level_0,sku,name,sizes,url_key,media,brand_name,is_premium,family_articles,flags,product_group,delivery_promises,price.original,price.promotional,price.has_different_prices,price.has_different_original_prices,price.has_different_promotional_prices,price.has_discount_on_selected_sizes_only
no_discount,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
False,84,84,84,84,84,84,84,84,84,84,84,84,84,84,84,84,84


In [201]:
#No hay, todos los productos tienen un precio de promoción diferente al precio original
#Por lo tanto no hay ninguna marca sin descuento