# Getting data directly from a website

This notebook scrapes data from [CNN News](https://edition.cnn.com/) usine NewsAPI, gathering the contents of news articles published on March 11 and 12, 2021 and saves the details in a json file.

### Import necessary libraries

In [1]:
import requests
import pprint
import json

### Set url and parameters 

In [2]:
url = 'https://newsapi.org/v2/everything?'
api_key = '90f6bc83c6214d1ca353346dba8de01e'

In [3]:
parameters = {
    'domains': 'cnn.com',
    'pageSize': 100,
    'apiKey': api_key,
    'from': '2021-03-11',
    'to': '2021-03-12',
    'language': 'en'
}
response = requests.get(url, params = parameters)
response_json = response.json()

In [4]:
pprint.pprint(response_json)

{'articles': [{'author': 'Manu Raju and Alex Rogers, CNN',
               'content': '(CNN)Senate Minority Leader Mitch McConnell is '
                          'quietly maneuvering to field a slate of GOP Senate '
                          'candidates in critical battleground states, '
                          'attempting to avoid a repeat of election cycles a '
                          'decade … [+10897 chars]',
               'description': 'Senate Minority Leader Mitch McConnell is '
                              'quietly maneuvering to field a slate of GOP '
                              'Senate candidates in critical battleground '
                              'states, attempting to avoid a repeat of '
                              'election cycles a decade ago when candidates '
                              'emerged from primaries only to implode and de…',
               'publishedAt': '2021-03-12T23:46:41Z',
               'source': {'id': 'cnn', 'name': 'CNN'},
            

### Extract needed data from the response

In [5]:
articles = response_json['articles']
articles_json = []

for i in range(0, len(articles)):
    link = articles[i]['url']
    date = articles[i]['publishedAt']
    title = articles[i]['title']
    author = articles[i]['author']
    content = articles[i]['content']
    
    articles_json.append({
        'link': link,
        'date': date,
        'title': title,
        'author': author,
        'content': content
    })

In [6]:
pprint.pprint(articles_json)

[{'author': 'Manu Raju and Alex Rogers, CNN',
  'content': '(CNN)Senate Minority Leader Mitch McConnell is quietly '
             'maneuvering to field a slate of GOP Senate candidates in '
             'critical battleground states, attempting to avoid a repeat of '
             'election cycles a decade … [+10897 chars]',
  'date': '2021-03-12T23:46:41Z',
  'link': 'https://www.cnn.com/2021/03/12/politics/mitch-mcconnell-donald-trump-2022-senate-races/index.html',
  'title': "McConnell quietly courts Senate primary candidates 'who can win' "
           'regardless of Trump ties'},
 {'author': None,
  'content': None,
  'date': '2021-03-12T23:32:36Z',
  'link': 'https://www.cnn.com/videos/politics/2021/03/12/george-takei-anti-asian-hate-crimes-acfc-sot-vpx.cnn',
  'title': 'George Takei on how Trump exacerbated anti-Asian hate'},
 {'author': 'Sonnet Swire, CNN',
  'content': 'THE POINT -- NOW ON YOUTUBE! \r\n'
             'In each episode of his weekly YouTube show, Chris Cillizza wi

 {'author': 'Barbara Starr, Zachary Cohen and Whitney Wild, CNN',
  'content': None,
  'date': '2021-03-12T19:44:26Z',
  'link': 'https://www.cnn.com/2021/03/12/politics/austin-army-national-guard-capitol/index.html',
  'title': 'Defense Secretary overruled Army recommendation to reduce number '
           'of National Guard troops at Capitol'},
 {'author': 'Brad Parks and Carma Hassan, CNN',
  'content': None,
  'date': '2021-03-12T19:33:48Z',
  'link': 'https://www.cnn.com/2021/03/12/us/george-floyd-minneapolis-settlement/index.html',
  'title': 'Minneapolis City Council approves $27 million settlement to George '
           "Floyd's estate"},
 {'author': 'David Williams, CNN',
  'content': None,
  'date': '2021-03-12T19:30:13Z',
  'link': 'https://www.cnn.com/2021/03/12/us/texas-drive-thru-arrest-trnd/index.html',
  'title': "'Hangry' customer helps police stop alleged suspect running "
           'through a Chick-fil-A drive-thru'},
 {'author': 'By Chelsea Stone',
  'content': 'Thi

### Save as json file

In [7]:
# Save articles_json as articles.json
articles_json = json.dumps(articles_json, indent = 4)
with open("articles.json", "w") as outfile:
    outfile.write(articles_json)