Simple-Web-mining.-Beginner-level.

Here we study how easy to extract data from websites. Beginner level.

First step

Here we will unload data from the https://pudding.cool/ which will contain title, author and description of articles.

We will use Pandas for data manipulating, Urllib3 is used to open URLs and the Beautiful Soup package is used to extract data from html files.

And we have this table as a result of the first step:

Second step

Here we will unload data from the https://www.work.ua/jobs-kyiv-data+analyst/ which will contain job title and hiring company. In order to get all the data on request from this site, we will have to upload data from several pages:

And we have csv file as a result of the second step:

Third step

What if we want to gather information from the inner part of articles?

To do it we need to gather links of these articles and than go through them to gather inner information.

Study 3rd step to learn how to do it.

As a result we have titles and time from the inner side of articles:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Simple-Web-mining.-Beginner-level.

First step

Second step

Third step

Files

README.md

Latest commit

History

README.md

File metadata and controls

Simple-Web-mining.-Beginner-level.

First step

Second step

Third step