## Automate news article collection for training the model using web scraper 

#### Import libraries for web scraping

In [6]:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

In this example, we demonstrate web scraping of news articles on the web application **Factly**. You can use any application of your choice

In [7]:
url = "https://factly.in/category/english/"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req)

In [8]:
html = webpage.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")

The HTML document of the page was analysed and as a result the below logic is applied to scrape the articles.

In [9]:
articles = [article.find("h2").find("a").get("href") for article in soup.find(class_ = "col-8 main-content").find_all("article")]

In [10]:
articles

['https://factly.in/old-video-of-justin-trudeau-before-he-became-the-pm-of-canada-shared-as-him-living-now-without-security/',
 'https://factly.in/ram-nath-kovind-has-not-made-these-statements-praising-mahatma-gandhi-and-the-congress-party/',
 'https://factly.in/ruchi-soyas-debt-of-%e2%82%b92212-crores-was-written-off-even-before-its-acquisition-by-patanjali-ayurvedic/',
 'https://factly.in/an-edited-photo-is-falsely-shared-as-an-amazing-picture-of-a-golden-snake/',
 'https://factly.in/president-droupadi-murmu-did-not-issue-any-orders-prohibiting-non-veg-food-and-alcohol-inside-rashtrapati-bhavan/',
 'https://factly.in/old-video-of-a-pakistani-citizen-threatening-an-electricity-officer-in-karachi-shared-as-that-from-india/',
 'https://factly.in/the-man-seen-burning-the-indian-national-flag-in-these-photos-has-already-been-arrested/',
 'https://factly.in/2017-news-clipping-of-forbes-listing-india-as-asias-most-corrupt-country-is-shared-as-recent/',
 'https://factly.in/this-comparison-of

In [11]:
for article_url in articles:
    if "review:" not in article_url and "data:" not in article_url and "explainer:" not in article_url:
        article_req = Request(article_url, headers={'User-Agent': 'Mozilla/5.0'})
        article_page = urlopen(article_req)
        article_html = article_page.read().decode("utf-8")
        article_soup = BeautifulSoup(article_html, "html.parser")
        article_content = article_soup.find(class_ = "post-content-right").get_text()
        article_nature = article_soup.find("blockquote").find_all("strong")[2].get_text()
        print(f"Article: \n{article_content}\n")
        print(f"Nature of the article: {article_nature}")
        print("\n\n")

Article: 
A post containing a video of Canadian Prime Minister Justin Trudeau walking into a building without security is widely shared on social media. The post description reads, ‘This is the Prime Minister of Canada; he lives without security.’ Let’s fact-check the claim via this article.Claim: The Prime Minister of Canada, Justin Trudeau, lives without security.Fact: The post contains an old video of Justin Trudeau before he became the Prime Minister of Canada. Contrary to what the post claims, the Canadian Prime Minister is guarded by a special team of the Royal Canadian Mounted Police (RCMP) named Prime Minister’s Protective Detail (PMPD). Hence the claim made in the post is MISLEADING.After doing a reverse image search on the screenshots of the video, we found a longer version of the video on YouTube. CBS News uploaded the video, and its title reads. ‘Exclusive: Justin Trudeau arrives at The Peace Tower in Ottawa.’Uploaded on 21st October 2015, the video description reads, ‘The 

Article: 
A post is being shared on social media claiming that the new President Droupadi Murmu had issued orders to completely ban any kind of non-vegetarian feasts or drink inside the Rashtrapati Bhavan. Let’s verify the claim made in the post.Claim: President Droupadi Murmu ordered a complete ban on non-vegetarian food and alcohol consumption inside the Rashtrapati Bhavan.Fact: President Droupadi Murmu did not issue any such orders prohibiting non-vegetarian feasts and drinking inside the Rashtrapati Bhavan. PIB Fact-check through a tweet clarified that no such new changes have been made inside the Rashtrapati Bhavan. Hence, the claim made in the post is FALSE.When we searched to check whether President Droupadi Murmu had issued any such new instructions prohibiting non-vegetarian feasts and drinking in Rashtrapati Bhavan, we could not find any news report confirming this news. If President Droupadi Murmu had issued any such orders, the media would have prominently reported it. Acco

Article: 
A collage of a few photos is being shared widely on social media with a claim that it shows the difference between the prison cells of Savarkar on one hand, and Gandhi and Nehru on the other hand. The prison cells are being compared and it is being claimed that while Gandhi and Nehru got VIP facilities, Savarkar got none. Let’s fact-check the claim made in the post.Claim: Comparison of prison cells – while Gandhi and Nehru got VIP facilities, Savarkar got none.Fact: The posted pictures are true [Cellular Jail (Savarkar between 1911-21), Ahmednagar Fort Jail (Nehru between 1942-45), and Aga Khan Palace (Gandhi between 1942-44)]. However, the comparison of prison cells made in the post is misleading. Savarkar was moved to the Cellular jail as he attempted an escape and was sentenced to 50 years in prison (two life terms). But Nehru and Gandhi did not receive any such sentence, they were only arrested in connection with the Quit India movement. Hence the claim made in the post i