Skip to content

Web Scrapping application to extract worldwide covid Data/News using python PLY yaac compiler

Notifications You must be signed in to change notification settings

gautam9595/Web-Scrapping-Covid-Data-News

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 

Repository files navigation

Web-Scrapping-Covid-Data

Application to crawl web pages and extracting the required information from them by creating suitable grammar rules. Data is being extracted for the following given countries.

Country list
  • Europe

    France | UK | Russia | Italy | Germany | Spain | Poland | Netherlands | Ukraine | Belgium

  • North America

    USA | Mexico | Canada | Cuba | Costa Rica | Panama

  • Asia

    India | Turkey | Iran | Indonesia | Philippines | Japan | Israel | Malaysia | Thailand | Vietnam | Iraq | Bangladesh | Pakistan

  • South America

    Brazil | Argentina | Colombia | Peru | Chile | Bolivia | Uruguay | Paraguay | Venezuela

  • Africa

    South Africa | Morocco | Tunisia | Ethiopia | Libya | Egypt | Kenya | Zambia | Algeria | Botswana | Nigeria | Zimbabwe

  • Oceania

    Australia | Fiji | Papua New Guinea | New Caledonia | New Zealand

Crawling worldometers

Worldometer is a website where you will find all the coronavirus-related statistics world/continent/country-wise, like total cases, active cases, total death, new death, total recovered, serious/critical cases, total tests are done, etc.
Extracting yesterday's data to find the following queries for the world, all continents, and below given countries.

Queries

Total cases | Active cases | Total deaths | Total recovered | Total tests | Death/million | Tests/million | New case | New death | New recovered

Extraxting Countries Covid date-wise data.

Given a country name, start, and end date, application answers the following queries

Queries
  • Change in active cases in %
  • Change in daily death in %
  • Change in new recovered in %
  • Change in new cases in %
  • Closest country similar to Change in active cases in %
  • Closest country similar to Change in daily death in %
  • Closest country similar to Change in new recovered in %
  • Closest country similar to Change in new cases in %

Web-Scrapping-Covid-News

Extracting Covid worldwide News/Response

Timeline of Covid-19 is a website managed by wikipedia. It reports all the coivd related world/country specific news/response.
Given a Start date and End date, the application extracts all the worldwide covid related news/response between the two dates. Also It Plots a Word Cloud with all words present in the news ss

Plotting word cloud and finding covid words

For the below operations, stopwords are ignored. Given two non-overlapping date range

  • Application extracts all the common words and also covid common words. Covid Words being used are provided in the code file.
  • Finds the percentage of covid words in common words
  • Find the top-20 common and covid common words

Extracting Date Range

Given a country, Extracting the start and end date for which country's covid news is available

Extracting the country specific covid news

Given a country and date range,

  • Application extracts all the covid news related to that country between given dates.
  • Plotting a word cloud(Ignoring Word cloud) with all the words in the news extracted.

Finding Top 3 closest countries according to Jaccard Similarity

Given a country and a date range,

  • Application finds the top-3 countries with most similar word match according to Jaccard Similarity
  • Application finds the top-3 countries with most similar covid word match according to Jaccard Similarity
    J(A,B) = |A ∩ B|/|A ∪ B| where, A & B are the set of words extracted from news between given range
    

About

Web Scrapping application to extract worldwide covid Data/News using python PLY yaac compiler

Topics

Resources

Stars

Watchers

Forks