# GoodNews

We pick up right where we left off from the [GoodNews_DataPrep](GoodNews_DataPrep.ipynb) notebook.

In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import tensorflow as tf

  from ._conv import register_converters as _register_converters


Below is the processed data to work with:

In [10]:
data = pd.read_hdf('data.h5', key='data', mode='r')
data.head()

Unnamed: 0_level_0,title,url,publisher,category,timestamp
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,"Fed official says weak data caused by weather,...",http://www.latimes.com/business/money/la-fi-mo...,Los Angeles Times,b,2014-03-10 16:52:50.698
2,Fed's Charles Plosser sees high bar for change...,http://www.livemint.com/Politics/H2EvwJSK2VE6O...,Livemint,b,2014-03-10 16:52:51.207
3,US open: Stocks fall after Fed official hints ...,http://www.ifamagazine.com/news/us-open-stocks...,IFA Magazine,b,2014-03-10 16:52:51.550
4,"Fed risks falling 'behind the curve', Charles ...",http://www.ifamagazine.com/news/fed-risks-fall...,IFA Magazine,b,2014-03-10 16:52:51.793
5,Fed's Plosser: Nasty Weather Has Curbed Job Gr...,http://www.moneynews.com/Economy/federal-reser...,Moneynews,b,2014-03-10 16:52:52.027


## Model Preparation

While a simple positive/negative/neutral classification might suffice, the ultimate goal of this algorithm is to show the most relevant news stories to the user. This consists of two components:
1. Sentiment (e.g. how positive is this article?), and
2. Relevance (e.g. how likely is the user to read this article?).

In order to rank these articles, we need an **unsupervised regressor model** which is also intelligent enough to consider user activity.

### The Cold Start Problem
Before sign-up, or when the user first creates an account, there will be no personalization, and they will simply be recommended positive articles. So, the approach will go like this:
1. Rank all articles by positivity.
2. (If logged in and not cold start) Run more holisitc recommendation engine that takes sorted articles into account along with other user behavior. Use Weighted Alternating Least Squares (WALS) to provide recommendations.

## Model Creation

### Sentiment Analysis (static)
This portion of the recommendation pipeline is static; it will not vary with user engagement.

In [4]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA
analyzer = SIA()



Now we run through the `title` column of the DataFrame and assign `compound` scores to each row, which ranges between -1 and 1.

In [11]:
# This code takes around a minute to run!
compound_scores = []

for title in data['title']:
    score_dict = analyzer.polarity_scores(title)
    compound_scores.append(score_dict['compound'])

print(compound_scores[:5])

[-0.4404, 0.0, 0.0, -0.4019, -0.25]


In [17]:
data['compound_score'] = compound_scores
data.head()

Unnamed: 0_level_0,title,url,publisher,category,timestamp,compound_score
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,"Fed official says weak data caused by weather,...",http://www.latimes.com/business/money/la-fi-mo...,Los Angeles Times,b,2014-03-10 16:52:50.698,-0.4404
2,Fed's Charles Plosser sees high bar for change...,http://www.livemint.com/Politics/H2EvwJSK2VE6O...,Livemint,b,2014-03-10 16:52:51.207,0.0
3,US open: Stocks fall after Fed official hints ...,http://www.ifamagazine.com/news/us-open-stocks...,IFA Magazine,b,2014-03-10 16:52:51.550,0.0
4,"Fed risks falling 'behind the curve', Charles ...",http://www.ifamagazine.com/news/fed-risks-fall...,IFA Magazine,b,2014-03-10 16:52:51.793,-0.4019
5,Fed's Plosser: Nasty Weather Has Curbed Job Gr...,http://www.moneynews.com/Economy/federal-reser...,Moneynews,b,2014-03-10 16:52:52.027,-0.25


## Recommendation Engine
This portion varies with user engagement, and will come into play with different user accounts. Instead of collaborative filtering, this engine will use content-based filtering.

https://cloud.google.com/solutions/machine-learning/recommendation-system-tensorflow-overview.

The users in Firebase will have the following fields:
- uuid
- profile
    - firstname
    - lastname
    - picture
- interests
    - history (last 10)
        - article1
        ...
    - 

In [3]:
raise NotImplementedError()

NotImplementedError: 

### Combined Model
Now we join the two into a pipeline.

In [4]:
raise NotImplementedError()

NotImplementedError: 

## Initial Evaluation
Let's run some metrics to determine how robust this model is.

In [5]:
raise NotImplementedError()

NotImplementedError: 

## Hyperparameter Tuning
Given the initial results, we tune variables for highest accuracy while avoiding overfitting.

In [6]:
raise NotImplementedError()

NotImplementedError: 

## Final Evaluation
We re-run the metrics before hyperparameter tuning.

In [7]:
raise NotImplementedError()

NotImplementedError: 

## Freeze and Export
We write to a file so it can be queried and updated by our API.

In [9]:
raise NotImplementedError()

NotImplementedError: 