# Pulling News and storing it on a MongoDB database

In [2]:
from newsapi import NewsApiClient
from pymongo import MongoClient
from _utils import *
from datetime import datetime

##### Step 1. Using NewsApi to pull news articles

Begin by reading in your own personal api key for NewsApi. You can get one for free at https://newsapi.org/, but it is limited with the number of calls you can make per day.

In [3]:
with open("credentials/api_key.txt") as f:
    api_key = f.read()

Now we can use our api key to pull news from the NewsApi. We will be pulling news from the last 30 days. Note the page size is set to 100, which is the maximum number of articles you can pull per call. Here we will pull down all news stories that mention the word "Google".

Note you will have to change the dates since the free api only allows you to access news articles that are a month old.

In [6]:
query = 'Google'
start_date = datetime(2023, 8, 1).strftime('%Y-%m-%d')
end_date = datetime(2023, 8, 31).strftime('%Y-%m-%d')

# create our custom `GetNews` class
news = GetNews(api_key, query, start_date, end_date)

# get all the downloaded articles
news_list = news.get_articles()

# get the first article
news_list[0]

{'title': 'Let Google Check Your Grammar For You',
 'media': 'Lifehacker.com',
 'date': '2023-08-07',
 'url': 'https://lifehacker.com/let-google-check-your-grammar-for-you-1850712771'}

We can see that here we have pulled out only the relevant information from the json file. We have the title, description, url, and date of publication. We will be using this information to create a database of news articles.

#### Step 2: Saving to MongoDB

With our news articles downloaded we can easily save this information to a MongoDB database. We will be using the pymongo library to do this. First we will create a connection to our database. Note that you will have to change the name of the database to whatever you want to call it. You will need to have mongodb installed on your computer to do this. You can find instructions on how to do this here: https://docs.mongodb.com/manual/installation/. 

There is plenty of great resources to learn about MongoDB on youtube too

In [None]:
# load in the mongo server string
with open("credentials/mongo_client.txt") as f:
    mongo_client = f.read()

# specify the database and collection names
db_name = "stock_data"
collection_name = "news"

# add the articles to the database
news.add_to_db(mongo_client, db_name, collection_name)

Super easy! I have also incoorporated some useful functionality to handle duplicates so the same article is not saved twice.