# Good News
A service to find good news from your usual headlines :)

- Trained on data from https://www.kaggle.com/uciml/news-aggregator-dataset.
- Followed MSFT tutorial here: https://notebooks.azure.com/Microsoft/libraries/samples/html/Discover%20Sentiments%20in%20Tweets.ipynb
- Also followed NLTK tutorial here: http://www.nltk.org/howto/sentiment.html

## Data Import

### Setup

In [1]:
# Handling data
import pandas as pd

# Cleaning data
import re

# API interaction
import requests, json

In [26]:
# API Variables
API_KEY = '88d3bf9f4bed4dfabda05fbfa5f3999e'
BASE_URL = 'https://newsapi.org/v2/top-headlines'
default_params = {'country': 'us', 'category': 'general'}

sample_url = 'https://jsonplaceholder.typicode.com/posts/1'

test_mode = False

### API Functions

In [33]:
def run_sample_json_tests(sj):
    '''Since we're going to be running the same API tests, this is helpful'''
    assert sj['userId'] == 1
    assert sj['id'] == 1
    assert sj['title'] == "sunt aut facere repellat provident occaecati excepturi optio reprehenderit"
    assert sj['body'] == "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"

In [34]:
def get_raw_json(url, api_key, api_params):
    '''API client'''
    payload = api_params
    payload['apiKey'] = api_key
    return requests.get(url, params=payload).json()

In [35]:
# Test for get_raw_json()
sample_json = get_raw_json(sample_url, None, default_params)
run_sample_json_tests(sample_json)

In [46]:
def import_to_dict(filename=None, url=BASE_URL, api_key=API_KEY, params=default_params):
    '''Imports data as dict. If test_mode, uses local file. Otherwise uses api'''
    raw_json = None

    if test_mode:
        with open(filename) as f:
            return json.load(f)       
    else:
        return get_raw_json(url, api_key, params)

In [47]:
# Tests for import_to_dict()
previous_test_mode_status = test_mode

# test_mode ON
test_mode = True
imported_from_file = import_to_dict('sample.json')
run_sample_json_tests(imported_from_file)

# test_mode OFF
test_mode = False
imported_from_web = import_to_dict(url=sample_url)
run_sample_json_tests(imported_from_web)

# Restore previous value of test_mode
test_mode = previous_test_mode_status

## Data Cleaning

We create a DataFrame from our JSON data and preview the first five elements.

In [120]:
data = import_to_dict('news.json')
data = pd.DataFrame(data['articles'])
data.drop('source', axis=1, inplace=True) # Redundant data

data.head()

Unnamed: 0,author,description,publishedAt,title,url,urlToImage
0,Paul Ziobro,Teamsters negotiators said they have tentative...,2018-06-22T03:48:04Z,"UPS, Teamsters Reach Handshake Deal on New Con...",https://www.wsj.com/articles/ups-teamsters-rea...,https://si.wsj.net/public/resources/images/OG-...
1,Joe Flint,ABC and the production company that makes ‘Ros...,2018-06-22T03:25:41Z,ABC Plans Spinoff of 'Roseanne' Without Roseanne,https://www.wsj.com/articles/abc-plans-spinoff...,https://images.wsj.net/im-15296/social
2,Sara Germano,Puma is returning to men’s basketball shoes ha...,2018-06-22T03:25:32Z,Puma Gives Basketball Another Shot,https://www.wsj.com/articles/puma-gives-basket...,https://images.wsj.net/im-15213/social
3,Matthew Gutierrez,The 19-year-old Bahamian played one season at ...,2018-06-22T02:13:41Z,Phoenix Suns Select Deandre Ayton No. 1 in NBA...,https://www.wsj.com/articles/phoenix-suns-sele...,https://images.wsj.net/im-15299/social
4,Wall Street Journal,The ministers needed to complete a deal betwee...,2018-06-22T01:02:08Z,Eurozone Agrees on Final Details of Plan to En...,https://www.wsj.com/articles/eurozone-agrees-o...,https://images.wsj.net/im-15294/social


Let's check the type of the `publishedAt` field:

In [124]:
type(data['publishedAt'][0])

str

Since it's a `str`, let's convert it to a Date object so we can sort chronologically later if needed:

In [127]:
data['publishedAt'] = pd.to_datetime(data['publishedAt'], infer_datetime_format=True)
type(data['publishedAt'][0])

pandas._libs.tslib.Timestamp

## Prep for Model

## Model