## Exploring the requests-html capabilities

#### Official documentation of the requests-html package: https://requests-html.readthedocs.io/en/latest/

Tasks: 
1. Provide list of top movies in certain genre
2. Provide ratio of rate value and rate count

In [2]:
from requests_html import HTMLSession

In [3]:
URL = "https://www.filmweb.pl/ranking/film/genre/26"

In [4]:
# establish/open a session and submitting GET request
session = HTMLSession()
r = session.get(URL)
r.status_code

200

In [5]:
# The html response to the GET request is contained in the '.html' method
r.html

<HTML url='https://www.filmweb.pl/ranking/film/genre/26'>

In [7]:
# We can extract all link addresses directly with '.links'
urls = r.html.absolute_links
urls

{'https://www.facebook.com/filmweb/',
 'https://www.filmweb.pl/',
 'https://www.filmweb.pl/advertising',
 'https://www.filmweb.pl/awards',
 'https://www.filmweb.pl/canalplus',
 'https://www.filmweb.pl/characters/search',
 'https://www.filmweb.pl/contest/5770',
 'https://www.filmweb.pl/contests',
 'https://www.filmweb.pl/contribute/filmtitle/new',
 'https://www.filmweb.pl/contribute/gametitle/new',
 'https://www.filmweb.pl/contribute/seriestitle/new',
 'https://www.filmweb.pl/editorial',
 'https://www.filmweb.pl/film/%C5%81owca+jeleni-1978-1045',
 'https://www.filmweb.pl/film/%C5%9Acie%C5%BCki+chwa%C5%82y-1957-12075',
 'https://www.filmweb.pl/film/%C5%BBycie+jest+pi%C4%99kne-1997-208',
 'https://www.filmweb.pl/film/B%C4%99karty+wojny-2009-137747',
 'https://www.filmweb.pl/film/Barbie-2023-754800/vod',
 'https://www.filmweb.pl/film/Bestie-2022-10013518',
 'https://www.filmweb.pl/film/Bez+lito%C5%9Bci+3.+Ostatni+rozdzia%C5%82-2023-10012260',
 'https://www.filmweb.pl/film/Cassandro-2023-10

In [43]:
#Select all tags with class='page__group'
r.html.find(".page__group")

[<Element 'div' class=('page__group',) data-group='g1'>,
 <Element 'div' class=('page__group',) data-group='g2'>,
 <Element 'div' class=('page__group',) data-group='g3'>,
 <Element 'div' class=('page__group',) data-group='g4'>,
 <Element 'div' class=('page__group',) data-group='g5'>]

#### This provided me a list with elements containing agregated information about certain page element
#### Let's see the source code

![obraz.png](attachment:f99cf895-cbfd-4301-a6bd-2afc7948da57.png)

#### Seems like the content we are interested in is located in tag: data-group="g3"
#### Let's find out what's inside it

In [96]:
r.html.find(".page__group")[2].text

'Najlepsze filmy wojenne\n1\nhttps://fwcdn.pl/fpo/12/11/1211/7254286.2.jpg\nLista Schindlera\nSchindler\'s List 1993\n8,39 10 366 119 ocen\ngatunekDramat / Wojenny\nSteven Spielberg\n2\nhttps://fwcdn.pl/fpo/02/08/208/7520031.2.jpg\nŻycie jest piękne\nLa vita è bella 1997\n8,37 10 282 700 ocen\ngatunekDramat / Komedia / Wojenny\nRoberto Benigni\n3\nhttps://fwcdn.pl/fpo/22/25/32225/7519150.2.jpg\nPianista\nThe Pianist 2002\n8,27 10 598 455 ocen\ngatunekBiograficzny / Dramat / Wojenny\nRoman Polański\n4\nhttps://fwcdn.pl/fpo/63/24/596324/7357840.2.jpg\nKorespondent Bryan\n2010\n8,19 10 7 403 oceny\ngatunekDokumentalny / Wojenny\nEugeniusz Starky\n5\nhttps://fwcdn.pl/fpo/10/92/1092/7053688.2.jpg\nCzas Apokalipsy\nApocalypse Now 1979\n8,15 10 183 643 oceny\ngatunekDramat / Wojenny\nFrancis Ford Coppola\n6\nhttps://fwcdn.pl/fpo/88/02/658802/7754988.2.jpg\nPrzełęcz ocalonych\nHacksaw Ridge 2016\n8,13 10 228 949 ocen\ngatunekBiograficzny / Dramat / Wojenny\nMel Gibson\n7\nhttps://fwcdn.pl/fpo/

#### This is it. See if we can get it easier somehow

![obraz.png](attachment:4993c663-7cd4-4e53-b8f0-ecd2921503f5.png)

In [35]:
r.html.find(".rankingType__card")[0].text

"Lista Schindlera\nSchindler's List 1993\n8,39 10 366 119 ocen\ngatunekDramat / Wojenny"

#### This is a master container and let's try to extract each data separately

In [36]:
#Title
r.html.find(".rankingType__titleWrapper")[0].text

'Lista Schindlera'

In [52]:
print(r.html.find(".rankingType__titleWrapper")[0].text)
print(r.html.find(".rankingType__originalTitle")[0].text)
print(r.html.find(".rankingType__rate--value")[0].text)
print(r.html.find(".rankingType__rate--count")[0].text)
print(r.html.find(".rankingType__genres")[0].text)

Lista Schindlera
Schindler's List 1993
8,39
366 119 ocen
gatunekDramat / Wojenny


In [53]:
type((r.html.find(".rankingType__genres")[0].text))

str

In [54]:
(r.html.find(".rankingType__genres")[0].text).strip('gatunek')

'Dramat / Wojenny'

#### Let's print out all the required data for a single title

In [84]:
print(r.html.find(".rankingType__titleWrapper")[0].text)
print(r.html.find(".rankingType__originalTitle")[0].text)
print(r.html.find(".rankingType__rate--value")[0].text)
print(r.html.find(".rankingType__rate--count")[0].text)
print((r.html.find(".rankingType__genres")[0].text).strip('gatunek'))

Lista Schindlera
Schindler's List 1993
8,39
366 119 ocen
Dramat / Wojenny


#### Now we put all the data in the arrays

In [97]:
titles = [l.text for l in r.html.find(".rankingType__titleWrapper")]
original_titles = [l.text for l in r.html.find(".rankingType__originalTitle")]
rate = [l.text for l in r.html.find(".rankingType__rate--value")]
rate_count = [l.text for l in r.html.find(".rankingType__rate--count")]
genres = [l.text.strip('gatunek') for l in r.html.find(".rankingType__genres")]

#### No. of elements

In [98]:
len(titles)

25

#### To ensure that the data can be manipulated, we will create a database.

In [101]:
# Database library
import pandas as pd 

#### We start creating the database by defining the data and column headers. 

In [100]:
data = {"Title":titles,"Original title":original_titles,"Rate":rate,"Rate count":rate_count,"Genres":genres}
df = pd.DataFrame(data=data)
df

Unnamed: 0,Title,Original title,Rate,Rate count,Genres
0,Lista Schindlera,Schindler's List 1993,839,366 119 ocen,Dramat / Wojenny
1,Życie jest piękne,La vita è bella 1997,837,282 700 ocen,Dramat / Komedia / Wojenny
2,Pianista,The Pianist 2002,827,598 455 ocen,Biograficzny / Dramat / Wojenny
3,Korespondent Bryan,2010,819,7 403 oceny,Dokumentalny / Wojenny
4,Czas Apokalipsy,Apocalypse Now 1979,815,183 643 oceny,Dramat / Wojenny
5,Przełęcz ocalonych,Hacksaw Ridge 2016,813,228 949 ocen,Biograficzny / Dramat / Wojenny
6,Pluton,Platoon 1986,812,130 179 ocen,Dramat / Wojenny
7,Idź i patrz,Idi i smotri 1985,812,17 931 ocen,Dramat / Wojenny
8,Chłopiec w pasiastej piżamie,The Boy in the Striped Pyjamas 2008,811,405 353 oceny,Dramat / Wojenny
9,Szeregowiec Ryan,Saving Private Ryan 1998,810,648 206 ocen,Dramat / Wojenny
