Skip to content

Latest commit

 

History

History
32 lines (20 loc) · 1.57 KB

File metadata and controls

32 lines (20 loc) · 1.57 KB

Simple-Web-mining.-Beginner-level.

Here we study how easy to extract data from websites. Beginner level.

First step

Here we will unload data from the https://pudding.cool/ which will contain title, author and description of articles.

We will use Pandas for data manipulating, Urllib3 is used to open URLs and the Beautiful Soup package is used to extract data from html files.

And we have this table as a result of the first step:

image

Second step

Here we will unload data from the https://www.work.ua/jobs-kyiv-data+analyst/ which will contain job title and hiring company. In order to get all the data on request from this site, we will have to upload data from several pages:

image

And we have csv file as a result of the second step:

image

Third step

What if we want to gather information from the inner part of articles?

To do it we need to gather links of these articles and than go through them to gather inner information.

Study 3rd step to learn how to do it.

image

As a result we have titles and time from the inner side of articles: image