![logo.png](attachment:logo.png)

This Jupyter Notebook incorporates all parts

### Dynamic scraper
#### Goal
As the title suggests, the first section incorporates a data scraper. Its goal is to crawl through https://www.portaldrazeb.cz and to collect actual data about auctions and auctioneers. It also scrapes lists of auction attributes which we will subsequently use to filter the auctions with respect to location, type, etc.  
#### Problem
The problem is that the webpage has dynamic content and therefore it is not possible to easily extract the data we need since the "static" source code differs from the "dynamic" one. The website also does not provide API (it actually does, however, not for us and not for the purposes we need). 
#### Solution
We need to use proper methods to handle the dynamic content - our solution is the installation of package selenium and setting up a Google Chrome webdriver. We basically open the webpage, collect its source code and navigate between pages. Thanks to this package (and the webdriver which is also included in the GitHub repository) we manage to download all the data we need. More detailed description of particular methods can be found in the class docstring and in the comments.

In [10]:
# importing the class which will do the scraping and initialising the scraper 
from dynamic_scraper import DataDownloader
down = DataDownloader()

Downloader successfully initialized!
 

    This class crawls through dynamic content of https://www.portaldrazeb.cz and collects following things:

            1) soup object for every auctioneer
            2) link to every auction + auction category (since the category is not within the auction page itself)
            3) list of all possible values from drop-down menu (auction categories, regions and districts)
    


In [7]:
# link we will need
url_auctions='https://www.portaldrazeb.cz/drazby/pripravovane'

The next lines of code will scrape the data. Please do not interact with the Google Chrome window that will open in the background, just wait until it does its job and closes.

In [8]:
down.get_auction_links_and_categories(url_auctions) # takes approx. 5 minutes

100%|██████████████████████████████████████████████████████████████████████████████████| 56/56 [05:00<00:00,  5.37s/it]


Auction links and categories successfully downloaded! There are 1119 auctions right now.


In [11]:
down.get_items_from_dropdown_menu(url_auctions)

Auction categories, regions and districts successfully downloaded!
