# Challenge: Promotions

In this challenge, you'll develop codes to parse and analyze data returned from another API on Zalando such as [Promos homme (Men's Promotions)
](https://www.zalando.fr/promo-homme/) or [Promos femme (Women's Promotions)](https://www.zalando.fr/promo-femme/). The workflow is almost the same as in the guided lesson but you'll work with different data.

## Obtaining the link

Wrote your codes in the cell below to obtain the data from the API endpoint you choose. A recap of the workflow:

1. Examine the webpages and choose one that you want to work with.

1. Use Google Chrome's DevTools to inspect the XHR network requests. Find out the API endpoint that serves data to the webpage.

1. Test the API endpoint in the browser to verify its data.

1. Change the page number offset of the API URL to test if it's working.

In [1]:
# your code here
import requests 
import json
import pandas as pd
import urllib3

# Game of thrones API
cdmx_metro = requests.get('https://datos.cdmx.gob.mx/api/datasets/1.0/search/', 
                    params = {'q':'afluencia diaria'})

In [2]:
cdmx_metro

<Response [200]>

In [3]:
cdmx_metro.json().keys()

dict_keys(['nhits', 'parameters', 'datasets'])

In [4]:
cdmx_metro.json()['datasets']

[{'datasetid': 'afluencia-preliminar-en-transporte-publico',
  'metas': {'frecuencia-de-actualizacion': 'semanal',
   'domain': 'lab-cdmx',
   'version': '1.0',
   'records_count': 8280,
   'modified': '2020-10-16T18:16:02+00:00',
   'dataset-agregado': False,
   'keyword': ['transporte público',
    'afluencia',
    'metro ',
    'metrobús',
    'tren ligero',
    'trolebus',
    'ecobici',
    'RTP',
    'covid-19',
    'covid19',
    'coronavirus '],
   'geographic_reference': ['mx_40_09'],
   'title': 'Afluencia preliminar en transporte público',
   'theme': ['Movilidad', 'Covid-19'],
   'modified_updates_on_data_change': True,
   'metadata_processed': '2020-10-16T18:16:03.032776+00:00',
   'data_processed': '2020-10-16T18:16:02+00:00',
   'territory': ['Ciudad de México'],
   'description': '<p><span style=\'color: rgb(51, 51, 51); font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 18px;\'>Esta base de datos contiene <b>los datos preliminares de la afluencia d

In [5]:
for i in cdmx_metro.json()['datasets']:
    print(i['datasetid'])

afluencia-preliminar-en-transporte-publico
afluencia-diaria-del-metro-cdmx
afluencia-diaria-de-metrobus-cdmx


In [6]:
cdmx_metro.json()['datasets'][1]

{'datasetid': 'afluencia-diaria-del-metro-cdmx',
 'metas': {'domain': 'lab-cdmx',
  'records_count': 753675,
  'modified': '2019-12-05T22:16:43+00:00',
  'dataset-agregado': False,
  'keyword': ['metro ',
   'afluencia',
   'transporte público',
   'transporte',
   'STCM',
   '#Dataton'],
  'geographic_reference': ['mx_40_09'],
  'title': 'Afluencia diaria del Metro CDMX',
  'theme': ['Movilidad'],
  'modified_updates_on_data_change': False,
  'metadata_processed': '2020-10-19T21:44:30.381277+00:00',
  'data_processed': '2020-10-19T21:44:29+00:00',
  'territory': ['Ciudad de México'],
  'description': '<p>Esta base muestra la afluencia diaria del Metro CDMX. Los datos abarcan de enero de 2010 a febrero de 2020. Esta base se actualizará mensualmente.\xa0</p><p><br/></p><p>Para ver los datos preliminares durante la emergencia covid-19 revisa\xa0<a href="https://datos.cdmx.gob.mx/explore/dataset/afluencia-preliminar-en-transporte-publico/table/" target="_blank">esta base.</a>\xa0Una vez q

In [7]:
url_metro = 'https://datos.cdmx.gob.mx/api/datasets/1.0/afluencia-diaria-del-metro-cdmx/'

## Reading the data

In the next cell, use Python to obtain data from the API endpoint you chose in the previous step. Workflow:

1. Import libraries.

1. Define the initial API endpoint URL.

1. Make request to obtain data of the 1st page. Flatten the data and store it in an empty object variable.

1. Find out the total page count in the 1st page data.

1. Use a FOR loop to make requests for the additional pages from 2 to page count. Append the data of each additional page to the flatterned data object.

1. Print and review the data you obtained.

In [8]:
# your code here
import requests 
import json
import pandas as pd
import urllib3
# from pandas.io.json import json_normalize

In [26]:
resp_metro =  requests.get('https://datos.cdmx.gob.mx/api/records/1.0/search/?dataset=afluencia-diaria-del-metro-cdmx&q=&rows=10000&facet=fecha&facet=ano&facet=linea&facet=estacion')

In [27]:
resp_metro

<Response [200]>

In [28]:
resp_metro_j = resp_metro.json()

In [46]:
resp_metro_j['facet_groups'][2]

{'facets': [{'count': 15460,
   'path': 'Pantitlán',
   'state': 'displayed',
   'name': 'Pantitlán'},
  {'count': 11595,
   'path': 'Chabacano',
   'state': 'displayed',
   'name': 'Chabacano'},
  {'count': 11595,
   'path': 'Tacubaya',
   'state': 'displayed',
   'name': 'Tacubaya'},
  {'count': 7730,
   'path': 'Atlalilco',
   'state': 'displayed',
   'name': 'Atlalilco'},
  {'count': 7730,
   'path': 'Balderas',
   'state': 'displayed',
   'name': 'Balderas'},
  {'count': 7730,
   'path': 'Bellas Artes',
   'state': 'displayed',
   'name': 'Bellas Artes'},
  {'count': 7730,
   'path': 'Candelaria',
   'state': 'displayed',
   'name': 'Candelaria'},
  {'count': 7730,
   'path': 'Centro Médico',
   'state': 'displayed',
   'name': 'Centro Médico'},
  {'count': 7730,
   'path': 'Consulado',
   'state': 'displayed',
   'name': 'Consulado'},
  {'count': 7730,
   'path': 'Deptvo. 18 de Marzo',
   'state': 'displayed',
   'name': 'Deptvo. 18 de Marzo'},
  {'count': 7730,
   'path': 'El Ro

In [49]:
df_metro= pd.DataFrame(resp_metro_j['facet_groups'][2])
df_metro

Unnamed: 0,facets,name
0,"{'count': 15460, 'path': 'Pantitlán', 'state':...",estacion
1,"{'count': 11595, 'path': 'Chabacano', 'state':...",estacion
2,"{'count': 11595, 'path': 'Tacubaya', 'state': ...",estacion
3,"{'count': 7730, 'path': 'Atlalilco', 'state': ...",estacion
4,"{'count': 7730, 'path': 'Balderas', 'state': '...",estacion
...,...,...
95,"{'count': 3865, 'path': 'Lázaro Cárdenas', 'st...",estacion
96,"{'count': 3865, 'path': 'Merced', 'state': 'di...",estacion
97,"{'count': 3865, 'path': 'Mexicaltzingo', 'stat...",estacion
98,"{'count': 3865, 'path': 'Miguel A. de Q.', 'st...",estacion


In [50]:
df_metro_n = pd.json_normalize(df_metro['facets'])
df_metro_n

Unnamed: 0,count,path,state,name
0,15460,Pantitlán,displayed,Pantitlán
1,11595,Chabacano,displayed,Chabacano
2,11595,Tacubaya,displayed,Tacubaya
3,7730,Atlalilco,displayed,Atlalilco
4,7730,Balderas,displayed,Balderas
...,...,...,...,...
95,3865,Lázaro Cárdenas,displayed,Lázaro Cárdenas
96,3865,Merced,displayed,Merced
97,3865,Mexicaltzingo,displayed,Mexicaltzingo
98,3865,Miguel A. de Q.,displayed,Miguel A. de Q.


## Bonus

Extract the following information from the data:

* The trending brand.

* The product(s) with the highest discount.

* The sum of discounts of all goods (sum_discounted_prices divided by sum_original_prices).

In [51]:
# your code here
df_metro_n.head()

Unnamed: 0,count,path,state,name
0,15460,Pantitlán,displayed,Pantitlán
1,11595,Chabacano,displayed,Chabacano
2,11595,Tacubaya,displayed,Tacubaya
3,7730,Atlalilco,displayed,Atlalilco
4,7730,Balderas,displayed,Balderas


In [52]:
df_metro_n.shape[0]

100

In [53]:
df_metro_n.dtypes

count     int64
path     object
state    object
name     object
dtype: object

In [57]:
# Top de las líneas del metro con menor afluencia
df_metro_n[['name', 'count']].sort_values(by=['count'], ascending=False).head()

Unnamed: 0,name,count
0,Pantitlán,15460
2,Tacubaya,11595
1,Chabacano,11595
15,Inst. del Petróleo,7730
27,Zapata,7730


In [63]:
# Top de las líneas del metro con menor afluencia
df_metro_n[['name', 'count']].sort_values(by=['count']).head(5)

Unnamed: 0,name,count
49,Cerro de la Estrella,3865
72,Ferrería,3865
71,Eugenia,3865
70,Etiopía,3865
69,Escuadrón 201,3865


In [78]:
# Estadísticos descriptivos
df_metro_n[['name', 'count']].describe()

Unnamed: 0,count
count,100.0
mean,5101.8
std,2189.133
min,3865.0
25%,3865.0
50%,3865.0
75%,7730.0
max,15460.0
