Skip to content

This project does the web scraping of the IBGE (Brazilian Institute of Geography and Statistics) news open API.

License

Notifications You must be signed in to change notification settings

marcos-vcs/mundo-hoje-web-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mundo-hoje WEB scraping API

To access Swagger API documentation CLICK HERE


Technologies used in development

  • Spring boot
  • Spring Security
  • Lombok
  • Jsoup

Motivation

  • This project was created to solve the need for the mobile application mundo-hoje that consumes the IBGE news API, the mundo-hoje project can be accessed by CLICKING HERE
  • The open IBGE news API does not return the complete news, but it does return the link to the complete news.
  • I was forced to develop an application that converted the news link into a json with the information that was relevant to my needs.
  • CLICK HERE to access the official and complete documentation of the IBGE news API.

Operation

  • This API has only 1 POST type endpoint (baseUrl/api/news/scraping-article)
  • This single endpoint receives in its body an item according to the IBGE news API.
  • Example of an item that must be passed in the request body:
{
     "id": 35875,
     "tipo": "Notícia",
     "titulo": "Mais de 70% das empresas industriais com 100 ou mais pessoas ocupadas inovaram em 2021",
     "introducao": "Setor químico concentra a maior proporção de indústrias inovadoras - Foto: Freepik Em 2021, a taxa de inovação no Brasil foi de 70,5%, percentual relativo às empresas industriais com 100 ou mais pessoas ocupadas que lançaram um produto ou implementaram...",
     "data_publicacao": "15/12/2022 10:00:00",
     "produto_id": 35867,
     "produtos": "35867|Pesquisa de Inovação Semestral|pesquisa-de-inovacao-semestral|3065",
     "editorias": "economicas",
     "imagens": "{\"image_intro\":\"images\\/agenciadenoticias\\/estatisticas_economicas\\/2022_12\\/pintec_THUMB_freepik.jpg\",\"float_intro\":\"\",\"image_intro_alt\":\"\",\"image_intro_caption\":\"\",\"image_fulltext\":\"images\\/agenciadenoticias\\/estatisticas_economicas\\/2022_12\\/pintec_HOME_freepik.jpg\",\"float_fulltext\":\"\",\"image_fulltext_alt\":\"\",\"image_fulltext_caption\":\"\"}",
     "produtos_relacionados": "35867",
     "destaque": true,
     "link": "http://agenciadenoticias.ibge.gov.br/agencia-noticias/2012-agencia-de-noticias/noticias/35875-mais-de-70-das-empresas-industriais-com-100-ou-mais-pessoas-ocupadas-inovaram-em-2021.html"
   }
  • The API will then add a new attribute to that object, an object I called article.
  • Structure of an article-type object:
export class Article{
title: string;
subtitle: string;
metadata: string;
text: string;
}

image


Questions or suggestions?

Feel free to open a new issue.


Thanks for visiting this repository!:sparkling_heart:

If you liked it, please leave a star.:star2:

About

This project does the web scraping of the IBGE (Brazilian Institute of Geography and Statistics) news open API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages