Perú News

This project aims to gather headlines from main newspapers webpages from Perú using git actions.

Modifying the scrapper

You can setup your own headline scrapper by modifying settings.json

{
  "out_path": "data",
  "sources": [
    {
      "name": "some_name",
      "url": "https://some_url",
      "selector": "css selector here"
    }
  ]
}

out_path: this will be the path where the data is stored.
source.name: for each source, this will become the name of the file (eg: some_name.json).
source.selector: the css selector to get the headlines (SelectorGadget extension in chrome is a nice tool for this).

The scrapper is very simple, for each element the selector obtains, it will try to get the href and text values to store it in a json file.

The output data will have the format: {out_path}/YYYYMMDD/{source.name}.json

Name		Name	Last commit message	Last commit date
Latest commit History 1,574 Commits
.github		.github
data		data
news_scrapper		news_scrapper
site		site
.gitignore		.gitignore
README.md		README.md
scrapper.tar.gz		scrapper.tar.gz
settings.json		settings.json
settings.test.json		settings.test.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Perú News

Modifying the scrapper

About

Releases 1

Packages

Languages

renato145/peru_news

Folders and files

Latest commit

History

Repository files navigation

Perú News

Modifying the scrapper

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages