## Getting news articles from BBC News using API

This notebook will show the process of collecting news articles from March 11, 2021 and March 12, 2021 in BBC News using NewsAPI. The information that will be collected are the following: 

- date
- title
- full article
- author

### What is NewsAPI  

NewsAPI is a simple REST API that returns JSON metadata for relevant headlines based on a query.  It retrieves articles and breaking news headlines from news sources and blogs across the web. Here is the documentation link: https://newsapi.org/docs.

### Endpoints  

- https://newsapi.org/docs/endpoints/everything
- https://newsapi.org/docs/endpoints/top-headlines
- https://newsapi.org/docs/endpoints/sources

### Setting up the endpoint and API Key

In [1]:
end_point = 'https://newsapi.org/v2/everything?'
api_key = "105398d6a62a4c7f918f74c03e01a192"

### Importing libraries  

The following libraries will help us to extract the needed information for the news articles.

In [2]:
import requests
import pandas as pd

### Setting up the query parameters

In [3]:
query_11 = {
    'domains': 'bbc.co.uk',
    'language': 'en',
    'from' : '2021-03-11',
    'to': '2021-03-11',
    'pageSize': 50,
    'apiKey': api_key
}

query_12 = {
    'domains': 'bbc.co.uk',
    'language': 'en',
    'from' : '2021-03-12',
    'to': '2021-03-12',
    'pageSize': 50,
    'apiKey': api_key
}

### Request BBC news articles from March 11, 2021 to March 12, 2021

In [4]:
response_11 = requests.get(end_point, params=query_11)
results_11 = response_11.json()
results_11

{'status': 'ok',
 'totalResults': 224,
 'articles': [{'source': {'id': 'bbc-news', 'name': 'BBC News'},
   'author': 'https://www.facebook.com/bbcnews',
   'title': 'De Blasio: NYC mayor calls on Cuomo to quit over harassment claims',
   'description': 'Andrew Cuomo has been accused of sexual misconduct by six women, allegations he denies.',
   'url': 'https://www.bbc.co.uk/news/world-us-canada-56367991',
   'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/12A7E/production/_117541467_mediaitem117541466.jpg',
   'publishedAt': '2021-03-11T23:21:24Z',
   'content': 'image copyrightGetty Images\r\nimage captionBill de Blasio called the allegations against Andrew Cuomo "disgusting"\r\nNew York City Mayor Bill de Blasio has urged Governor Andrew Cuomo to resign over al… [+2675 chars]'},
  {'source': {'id': 'bbc-news', 'name': 'BBC News'},
   'author': 'https://www.facebook.com/bbcnews',
   'title': 'Covid in Scotland: Call to increase wedding guest numbers',
   'description': 

In [5]:
response_12 = requests.get(end_point, params=query_12)
results_12 = response_12.json()
results_12

{'status': 'ok',
 'totalResults': 226,
 'articles': [{'source': {'id': 'bbc-news', 'name': 'BBC News'},
   'author': None,
   'title': 'Mozambique’s president sacks top generals amid surge of insecurity',
   'description': 'No official reason has been given for the sackings but there is growing concern that the military has been unable to prevent attacks by Islamist militants in Cabo Delgado province, in the north of the country.\n\nAlso in the programme: Large parts of Italy are …',
   'url': 'https://www.bbc.co.uk/programmes/w172x2z7gy3ng1q',
   'urlToImage': 'https://ichef.bbci.co.uk/images/ic/1200x675/p099k961.jpg',
   'publishedAt': '2021-03-12T23:57:00Z',
   'content': 'No official reason has been given for the sackings but there is growing concern that the military has been unable to prevent attacks by Islamist militants in Cabo Delgado province, in the north of th… [+358 chars]'},
  {'source': {'id': 'bbc-news', 'name': 'BBC News'},
   'author': 'https://www.facebook.com/bbcnew

### Extracting needed information and saving to a JSON file

In [6]:
articles_11 = results_11['articles']
articles_11

[{'source': {'id': 'bbc-news', 'name': 'BBC News'},
  'author': 'https://www.facebook.com/bbcnews',
  'title': 'De Blasio: NYC mayor calls on Cuomo to quit over harassment claims',
  'description': 'Andrew Cuomo has been accused of sexual misconduct by six women, allegations he denies.',
  'url': 'https://www.bbc.co.uk/news/world-us-canada-56367991',
  'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/12A7E/production/_117541467_mediaitem117541466.jpg',
  'publishedAt': '2021-03-11T23:21:24Z',
  'content': 'image copyrightGetty Images\r\nimage captionBill de Blasio called the allegations against Andrew Cuomo "disgusting"\r\nNew York City Mayor Bill de Blasio has urged Governor Andrew Cuomo to resign over al… [+2675 chars]'},
 {'source': {'id': 'bbc-news', 'name': 'BBC News'},
  'author': 'https://www.facebook.com/bbcnews',
  'title': 'Covid in Scotland: Call to increase wedding guest numbers',
  'description': 'The wedding industry says it would be safe to allow up to 50 g

Let's try putting these articles in a DataFrame so we can view the data properly

In [7]:
articles_df_11 = pd.DataFrame([thing for thing in articles_11])
articles_df_11

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,De Blasio: NYC mayor calls on Cuomo to quit ov...,Andrew Cuomo has been accused of sexual miscon...,https://www.bbc.co.uk/news/world-us-canada-563...,https://ichef.bbci.co.uk/news/1024/branded_new...,2021-03-11T23:21:24Z,image copyrightGetty Images\r\nimage captionBi...
1,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Covid in Scotland: Call to increase wedding gu...,The wedding industry says it would be safe to ...,https://www.bbc.co.uk/news/uk-scotland-56360130,https://ichef.bbci.co.uk/news/1024/branded_new...,2021-03-11T23:15:16Z,image copyrightGetty Images\r\nimage captionTh...
2,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,The Scottish mercenary hired to kill Pablo Esc...,How a team of British combatants travelled to ...,https://www.bbc.co.uk/news/uk-scotland-56332300,https://ichef.bbci.co.uk/news/1024/branded_new...,2021-03-11T23:15:11Z,By Steven BrocklehurstBBC Scotland News\r\nima...
3,"{'id': 'bbc-news', 'name': 'BBC News'}",,The drone's-eye view of a bowling alley that w...,A one-shot drone tour of a Minneapolis establi...,https://www.bbc.co.uk/news/av/world-us-canada-...,https://ichef.bbci.co.uk/images/ic/400xn/p099f...,2021-03-11T23:10:50Z,"Syria: Two women, 10 years on. Video, 00:03:00..."
4,"{'id': 'bbc-news', 'name': 'BBC News'}",,2021/03/11 23:00 GMT,The latest five minute news bulletin from BBC ...,https://www.bbc.co.uk/programmes/w172x5pc0dlwm09,https://ichef.bbci.co.uk/images/ic/1200x675/p0...,2021-03-11T23:06:00Z,The latest five minute news bulletin from BBC ...
5,"{'id': 'bbc-news', 'name': 'BBC News'}",,"Transfer rumours: Ronaldo, Messi, Haaland, Com...",Paris St-Germain monitor Cristiano Ronaldo's s...,https://www.bbc.co.uk/sport/56364611,https://ichef.bbci.co.uk/live-experience/cps/6...,2021-03-11T23:01:45Z,Paris St-Germain are monitoring Cristiano Rona...
6,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Cumbria coal mine: 'Increased controversy' pro...,Ministers were warned the proposal is damaging...,https://www.bbc.co.uk/news/uk-politics-56364306,https://ichef.bbci.co.uk/news/1024/branded_new...,2021-03-11T23:00:25Z,image copyrightWest Cumbria Mining Company\r\n...
7,"{'id': 'bbc-news', 'name': 'BBC News'}",,Manchester United 1-1 AC Milan: Ole Gunnar Sol...,Manchester United manager Ole Gunnar Solskjaer...,https://www.bbc.co.uk/sport/av/football/56368771,https://ichef.bbci.co.uk/live-experience/cps/6...,2021-03-11T22:46:00Z,Manchester United manager Ole Gunnar Solskjaer...
8,"{'id': 'bbc-news', 'name': 'BBC News'}",,Rangers: Steven Gerrard stresses importance of...,Steven Gerrard hopes he can keep veteran Allan...,https://www.bbc.co.uk/sport/football/56368139,https://ichef.bbci.co.uk/live-experience/cps/6...,2021-03-11T22:45:15Z,Allan McGregor pulled of a brilliant stop in t...
9,"{'id': 'bbc-news', 'name': 'BBC News'}",https://www.facebook.com/bbcnews,Pentagon rebukes Fox host Tucker Carlson for m...,Fox News host Tucker Carlson had called female...,https://www.bbc.co.uk/news/world-us-canada-563...,https://ichef.bbci.co.uk/news/1024/branded_new...,2021-03-11T22:44:30Z,The Pentagon has condemned a Fox News host who...


Let's try to convert the date time object for 'publishedAt' to make it readable and before appending the data into a JSON object.

In [8]:
#Convert date time object to "YYYY-MM-D" format
import datetime

new_format = "%Y-%m-%d"
def convertDate(date):
    d1 = datetime.datetime.strptime(date,"%Y-%m-%dT%H:%M:%SZ")
    df = d1.strftime(new_format)
    
    return df

Let's get first the articles from March 11, 2021

In [9]:
articles_info = []

for i in range(len(articles_11)): 
    date = convertDate(articles_11[i]['publishedAt'])
    title = articles_11[i]['title']
    full_article = articles_11[i]['content']
    author = articles_11[i]['title']
    
    articles_info.append({
        "date": date,
        "title": title,
        "full_article": full_article,
        "author": author
    })
    
articles_info

[{'date': '2021-03-11',
  'title': 'De Blasio: NYC mayor calls on Cuomo to quit over harassment claims',
  'full_article': 'image copyrightGetty Images\r\nimage captionBill de Blasio called the allegations against Andrew Cuomo "disgusting"\r\nNew York City Mayor Bill de Blasio has urged Governor Andrew Cuomo to resign over al… [+2675 chars]',
  'author': 'De Blasio: NYC mayor calls on Cuomo to quit over harassment claims'},
 {'date': '2021-03-11',
  'title': 'Covid in Scotland: Call to increase wedding guest numbers',
  'full_article': "image copyrightGetty Images\r\nimage captionThe rules across most of Scotland mean that only five people - including the couple - can attend a wedding ceremony\r\nScotland's wedding industry is calling f… [+4562 chars]",
  'author': 'Covid in Scotland: Call to increase wedding guest numbers'},
 {'date': '2021-03-11',
  'title': 'The Scottish mercenary hired to kill Pablo Escobar',
  'full_article': 'By Steven BrocklehurstBBC Scotland News\r\nimage copyr

Finally we get the articles from March 12, 2021

In [10]:
articles_12 = results_12['articles']
articles_12

for i in range(len(articles_12)): 
    date = convertDate(articles_12[i]['publishedAt'])
    title = articles_12[i]['title']
    full_article = articles_12[i]['content']
    author = articles_12[i]['title']
    
    articles_info.append({
        "date": date,
        "title": title,
        "full_article": full_article,
        "author": author
    })
    
articles_info

[{'date': '2021-03-11',
  'title': 'De Blasio: NYC mayor calls on Cuomo to quit over harassment claims',
  'full_article': 'image copyrightGetty Images\r\nimage captionBill de Blasio called the allegations against Andrew Cuomo "disgusting"\r\nNew York City Mayor Bill de Blasio has urged Governor Andrew Cuomo to resign over al… [+2675 chars]',
  'author': 'De Blasio: NYC mayor calls on Cuomo to quit over harassment claims'},
 {'date': '2021-03-11',
  'title': 'Covid in Scotland: Call to increase wedding guest numbers',
  'full_article': "image copyrightGetty Images\r\nimage captionThe rules across most of Scotland mean that only five people - including the couple - can attend a wedding ceremony\r\nScotland's wedding industry is calling f… [+4562 chars]",
  'author': 'Covid in Scotland: Call to increase wedding guest numbers'},
 {'date': '2021-03-11',
  'title': 'The Scottish mercenary hired to kill Pablo Escobar',
  'full_article': 'By Steven BrocklehurstBBC Scotland News\r\nimage copyr

Then we import json in order to write and save the dataset into a JSON file

In [11]:
import json 

json_object = json.dumps(articles_info, indent = 4) 

# Writing to pokemon_villanueva.json 
with open("newsarticles_villanueva.json", "w") as outfile: 
    outfile.write(json_object) 