Application to crawl web pages and extracting the required information from them by creating suitable grammar rules. Data is being extracted for the following given countries.
Country list
-
Europe
France | UK | Russia | Italy | Germany | Spain | Poland | Netherlands | Ukraine | Belgium
-
North America
USA | Mexico | Canada | Cuba | Costa Rica | Panama
-
Asia
India | Turkey | Iran | Indonesia | Philippines | Japan | Israel | Malaysia | Thailand | Vietnam | Iraq | Bangladesh | Pakistan
-
South America
Brazil | Argentina | Colombia | Peru | Chile | Bolivia | Uruguay | Paraguay | Venezuela
-
Africa
South Africa | Morocco | Tunisia | Ethiopia | Libya | Egypt | Kenya | Zambia | Algeria | Botswana | Nigeria | Zimbabwe
-
Oceania
Australia | Fiji | Papua New Guinea | New Caledonia | New Zealand
Worldometer is a website where you will find all the coronavirus-related statistics world/continent/country-wise, like total cases, active cases, total death, new death, total recovered, serious/critical cases, total tests are done, etc.
Extracting yesterday's data to find the following queries for the world, all continents, and below given countries.
Queries
Total cases | Active cases | Total deaths | Total recovered | Total tests | Death/million | Tests/million | New case | New death | New recovered
Given a country name, start, and end date, application answers the following queries
Queries
- Change in active cases in %
- Change in daily death in %
- Change in new recovered in %
- Change in new cases in %
- Closest country similar to Change in active cases in %
- Closest country similar to Change in daily death in %
- Closest country similar to Change in new recovered in %
- Closest country similar to Change in new cases in %
Timeline of Covid-19 is a website managed by wikipedia. It reports all the coivd related world/country specific news/response.
Given a Start date and End date, the application extracts all the worldwide covid related news/response between the two dates. Also It Plots a Word Cloud with all words present in the news
For the below operations, stopwords are ignored. Given two non-overlapping date range
- Application extracts all the common words and also covid common words. Covid Words being used are provided in the code file.
- Finds the percentage of covid words in common words
- Find the top-20 common and covid common words
Given a country, Extracting the start and end date for which country's covid news is available
Given a country and date range,
- Application extracts all the covid news related to that country between given dates.
- Plotting a word cloud(Ignoring Word cloud) with all the words in the news extracted.
Given a country and a date range,
- Application finds the top-3 countries with most similar word match according to Jaccard Similarity
- Application finds the top-3 countries with most similar covid word match according to Jaccard Similarity
J(A,B) = |A ∩ B|/|A ∪ B| where, A & B are the set of words extracted from news between given range