# Co když webová stránka používá Javascript?

## Sankční seznamy EU

Př.: Lidé z Afghánistánu, kterým se mají zmrazit finance

https://sanctionsmap.eu/#/main/details/1/lists?search=%7B%22value%22:%22%22,%22searchType%22:%7B%7D%7D


![sanction lists devtools](sanctions1.png)

In [9]:
from pprint import pprint
import re
import time

import pandas as pd
import requests

In [10]:
resp = requests.get('https://sanctionsmap.eu/#/main/details/1/lists?search=%7B%22value%22:%22%22,%22searchType%22:%7B%7D%7D1')

In [11]:
print(resp.text)

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
  <meta name="description" content="">
  <meta name="author" content="">
  <meta name="version" content="1.1.0">

  <link rel="apple-touch-icon" sizes="57x57" href="assets/images/icons/apple-touch-icon-57x57.png?v=LbGYE7o8gO">
  <link rel="apple-touch-icon" sizes="60x60" href="assets/images/icons/apple-touch-icon-60x60.png?v=LbGYE7o8gO">
  <link rel="apple-touch-icon" sizes="72x72" href="assets/images/icons/apple-touch-icon-72x72.png?v=LbGYE7o8gO">
  <link rel="apple-touch-icon" sizes="76x76" href="assets/images/icons/apple-touch-icon-76x76.png?v=LbGYE7o8gO">
  <link rel="apple-touch-icon" sizes="114x114" href="assets/images/icons/apple-touch-icon-114x114.png?v=LbGYE7o8gO">
  <link rel="apple-touch-icon" sizes="120x120" href="assets/images/icons/apple-touch-icon-120x120.png?v=LbGYE7o8gO">
  <link rel="apple-touch-icon" sizes="144x144"

In [12]:
re.findall('Mohammad', resp.text)

[]

# Jak dostat data z takové stránky?

## Chovat se jako browser

Buď

1. pochopit, co Javascript dělá, nebo
2. opravdu Javascript spouštět

## Varianta 1. - developer tools

[![Sanctions list JSON in developer tools](sanctions2.png)](sanctions2.png)

In [13]:
resp = requests.get('https://www.sanctionsmap.eu/api/v1/regime')

In [14]:
resp.json()

{'response': [{'id': 1,
   'type': 0,
   'specification': 'Restrictive measures imposed with respect to the Taliban',
   'acronym': None,
   'notes': "The measures were initially imposed on 15 October 1999. On 17 June 2011, the United Nations' Security Council adopted resolutions 1988 (2011) and 1989 (2011) and decided that the list of individuals and entities subject to restrictive measures originally imposed by resolution 1267 (1999) would be split in two. The original resolution 1267 (1999) concerns Afganistan and the Taliban. Now the resolution is concerning ISIL (Da'esh), Al-Qaida and associated individuals, groups, undertakings and entities. The measures imposed with respect to the Taliban are described under this restrictive measures regime. The  measures imposed on ISIL (Da'esh) and Al-Qaida are described under the thematic restrictive measures section.",
   'special': 0,
   'expiration': None,
   'amendment': 1562536800,
   'programme': ['AFG'],
   'has_lists': True,
   'has_m

In [15]:
regime_ids = [int(r['id']) for r in resp.json()['response'] if r['has_lists']]
regime_ids[:5]

[1, 2, 7, 9, 46]

In [18]:
len(regime_ids)

33

In [17]:
regime_data = []

for regime_id in regime_ids:
    time.sleep(1)
    print('.', end='')
    regime_data.append(requests.get(f'https://www.sanctionsmap.eu/api/v1/regime/{regime_id}').json())

.................................

In [19]:
idx = 0
regime_data[idx]

{'response': {'id': 1,
  'type': 0,
  'specification': 'Restrictive measures imposed with respect to the Taliban',
  'acronym': None,
  'notes': "The measures were initially imposed on 15 October 1999. On 17 June 2011, the United Nations' Security Council adopted resolutions 1988 (2011) and 1989 (2011) and decided that the list of individuals and entities subject to restrictive measures originally imposed by resolution 1267 (1999) would be split in two. The original resolution 1267 (1999) concerns Afganistan and the Taliban. Now the resolution is concerning ISIL (Da'esh), Al-Qaida and associated individuals, groups, undertakings and entities. The measures imposed with respect to the Taliban are described under this restrictive measures regime. The  measures imposed on ISIL (Da'esh) and Al-Qaida are described under the thematic restrictive measures section.",
  'special': 0,
  'expiration': None,
  'amendment': 1562536800,
  'programme': ['AFG'],
  'has_lists': True,
  'has_members': Fa

In [20]:
current_regime = regime_data[idx]['response']
current_regime['country']

{'code': 'AF', 'title': 'Afghanistan'}

In [21]:
for measure in current_regime['measures']:
    print(measure['id'], measure['description'])
    pprint(measure['type'])
    print('~~~~~~~~~~~~~~~~~~~~~~~~~~')

28 It is prohibited to export arms and related materiel to the listed individuals. Related technical advice, training and assistance is also prohibited.
{'icon': 'https://www.sanctionsmap.eu/storage/icons/hYvAur5S3ZQHtBfAo6ElESAS5xl8CMUEyPrNHyMf.svg',
 'id': 4,
 'parent_id': 2,
 'title': 'Arms export'}
~~~~~~~~~~~~~~~~~~~~~~~~~~
8 All assets of the listed persons and entities should be frozen. It is also prohibited to make any funds or assets directly or indirectly available to them.
{'icon': 'https://www.sanctionsmap.eu/storage/icons/bGumIzNvSWyfmf5BJSuJGPJXR1xeGz2xrzvane19.svg',
 'id': 6,
 'title': 'Asset freeze and prohibition to make funds available'}
~~~~~~~~~~~~~~~~~~~~~~~~~~
29 Member States shall enforce travel restrictions on persons listed in the Annex of Council Decision 2011/486/CFSP.
{'icon': 'https://www.sanctionsmap.eu/storage/icons/LUMa1Twbs3PWRgFNLhFdMcQO4ZuU9VYMHb8jrxUK.svg',
 'id': 18,
 'title': 'Restrictions on admission'}
~~~~~~~~~~~~~~~~~~~~~~~~~~


In [22]:
for regime_json in regime_data:
    current_regime = regime_json['response']
    for measure in current_regime['measures']:
        print(measure['id'], measure['type']['title'], measure['type']['id'])

28 Arms export 4
8 Asset freeze and prohibition to make funds available 6
29 Restrictions on admission 18
24 Arms export 4
2 Asset freeze and prohibition to make funds available 6
50 Restrictions on admission 18
30 Restrictions on equipment used for internal repression 19
3 Asset freeze and prohibition to make funds available 6
164 Prohibition to satisfy claims 17
34 Restrictions on admission 18
35 Arms export 4
9 Asset freeze and prohibition to make funds available 6
200 Prohibition to satisfy claims 17
36 Restrictions on admission 18
263 Asset freeze and prohibition to make funds available 6
264 Restrictions on admission 18
37 Arms embargo 2
38 Arms export 4
14 Asset freeze and prohibition to make funds available 6
165 Prohibition to satisfy claims 17
39 Restrictions on admission 18
1 Asset freeze and prohibition to make funds available 6
4 Asset freeze and prohibition to make funds available 6
46 Restrictions on admission 18
15 Asset freeze and prohibition to make funds available 6


In [23]:
aggregate_sanction_list = []

for regime_json in regime_data:
    current_regime = regime_json['response']
    for measure in current_regime['measures']:
        if measure['type']['id'] == 6:  # Funds freeze
            for sanction_list in measure['lists']:
                aggregate_sanction_list.extend(sanction_list['members'])

aggregate_sanction_list = pd.DataFrame(aggregate_sanction_list)
aggregate_sanction_list

Unnamed: 0,order_by,type,FSD_ID,name,id_code,reason,creation_date,programme,suspend,suspension_end_date
0,0.0,P,505.0,"Abdul Hai Hazem Abdul Qader, Abdul Hai Hazem",,,9.803772e+08,AFG,,
1,0.0,P,513.0,"Agha, Abdul Rahman",,,9.803772e+08,AFG,,
2,0.0,P,515.0,"Sayed Mohammad Azim Agha, Sayed Mohammad Azim ...",,,9.828828e+08,AFG,,
3,0.0,P,516.0,"Sayyed Ghiassouddine Agha, Sayed Ghias, Sayed ...",,,9.808956e+08,AFG,,
4,0.0,P,517.0,Mohammad Ahmadi,,,9.828828e+08,AFG,,
5,0.0,P,521.0,"Haji Ahmad Jan, Ahmed Jan Akhund, Ahmed Jan Ak...",,,9.803772e+08,AFG,,
6,0.0,P,522.0,Mohammad Essa Akhund,,,9.803772e+08,AFG,,
7,0.0,P,523.0,Attiqullah Akhund,,,9.828828e+08,AFG,,
8,0.0,P,524.0,"Akhund, Shahidwror, Allahdad, Allah Dad Matin",,,9.808956e+08,AFG,,
9,0.0,P,525.0,"Ubaidullah Akhund Yar Mohammed Akhund, Obaidul...",,,9.803772e+08,AFG,,


# Cvičení

Ze stránky [sanctionsmap.eu](https://sanctionsmap.eu) stáhněte

1. Seznam zemí, oproti kterým aplikuje sankce Evropská unie ("Adopted by" je EU nebo UN and EU).
2. Stáhněte seznam firem, kterým se v souvislosti s těmito sankcemi nesmí prodávat zbraně.

## Varianta 2. - opravdový browser

[Puppeteer](https://pptr.dev/) - ovládání Chrome přes protokol Developer Tools

- Prakticky vše, co může dělat člověk v browseru
- Screenshoty, naviagace, scrollování, úprava HTML, odesílání formulářů, změna velikosti okna...
- V Javascriptu

[Pyppeteer](https://github.com/miyakogi/pyppeteer) - port Puppeteera do Pythonu

- `python3 -m pip install pyppeteer && pyppeteer-install` (druhý příkaz stáhne Chromium ~100 MB)
- Programuje se pomocí async API (viz. další slide)

In [2]:
import asyncio
from pyppeteer import launch

elements = []

async def get_single_sanction_list():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://sanctionsmap.eu/#/main/details/1/lists?search=%7B%22value%22:%22%22,%22searchType%22:%7B%7D%7D1')
    await page.screenshot({'path': 'sanctions-afghanistan1.png'})
    await page.waitForSelector('li[data-heading="Name"] div')
    await page.screenshot({'path': 'sanctions-afghanistan2.png'})
    
    for el in await page.querySelectorAll('li[data-heading="Name"] div'):
        elements.append(await page.evaluate('(el) => el.textContent', el))

    await browser.close()


loop = asyncio.get_event_loop()
if loop.is_running():
    asyncio.ensure_future(get_single_sanction_list())
else:
    loop.run_until_complete(get_single_sanction_list())

In [3]:
elements

['\n  Abdul Hai Hazem Abdul Qader, Abdul Hai Hazem\n',
 '\n  Agha, Abdul Rahman\n',
 '\n  Sayed Mohammad Azim Agha, Sayed Mohammad Azim Agha, Agha Saheb\n',
 '\n  Sayyed Ghiassouddine Agha, Sayed Ghias, Sayed Ghiasuddin Sayed Ghousuddin, Sayyed Ghayasudin\n',
 '\n  Mohammad Ahmadi\n',
 '\n  Haji Ahmad Jan, Ahmed Jan Akhund, Ahmed Jan Akhundzada Wazir\n',
 '\n  Mohammad Essa Akhund\n',
 '\n  Attiqullah Akhund\n',
 '\n  Akhund, Shahidwror, Allahdad, Allah Dad Matin\n',
 '\n  Ubaidullah Akhund Yar Mohammed Akhund, Obaidullah Akhund, Obaid Ullah Akhund\n',
 '\n  Mohammad Abbas Akhund\n',
 '\n  Aminullah Amin Quddus, Muhammad Yusuf, Aminullah Amin\n',
 '\n  Nazirullah Hanafi Waliullah, Nazirullah Aanafi Waliullah\n',
 '\n  Muhammad Taher Anwari, Mohammad Taher Anwari, Mohammad Tahre Anwari, Haji Mudir, Muhammad Tahir Anwari\n',
 '\n  Arefullah Aref Ghazi Mohammad, Arefullah Aref\n',
 '\n  Sayed Esmatullah Asem Abdul Quddus, Asmatullah Asem, Sayed Esmatullah Asem, Esmatullah Asem\n',
 '\n  A