<a id="toc"></a>

# <u>Table of Contents</u>

1.) [Setup](#setup)  
&nbsp;&nbsp;&nbsp;&nbsp; 1.1.) [Imports](#imports)   
&nbsp;&nbsp;&nbsp;&nbsp; 1.2.) [Helpers](#helpers)   
&nbsp;&nbsp;&nbsp;&nbsp; 1.3.) [Load data](#load)   
2.) [Datetime](#datetime)  
3.) [Speakers](#speakers)  
4.) [Transcript](#transcript)  
5.) [Save to CSV](#save)  

---
<a id="setup"></a>

# [^](#toc) <u>Setup</u>

<a id="imports"></a>

### [^](#toc) Standard imports

In [8]:
### Standard imports
import pandas as pd
import numpy as np
pd.options.display.max_columns = 50

### Regex and datetime
import re
import datetime

# Helps convert String representation of list into a list
import ast

### Removes warnings that occassionally show in imports
import warnings
warnings.filterwarnings('ignore')

### Visualization imports

In [9]:
### Standard imports
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()

### Altair
import altair as alt
alt.renderers.enable('notebook')

### Plotly
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
import plotly.plotly as py
from plotly import tools
init_notebook_mode(connected=True)

# WordCloud
from wordcloud import WordCloud

# Folium
import folium

<a id="helpers"></a>

### [^](#toc) Helpers

In [10]:
def string_literal(x):
    try:
        return ast.literal_eval(x)
    except:
        return x
    
# A short hand way to plot most bar graphs
def pretty_bar(data, ax, xlabel=None, ylabel=None, title=None, int_text=False, x=None, y=None):
    
    if x is None:
        x = data.values
    if y is None:
        y = data.index
    
    # Plots the data
    fig = sns.barplot(x, y, ax=ax)
    
    # Places text for each value in data
    for i, v in enumerate(x):
        
        # Decides whether the text should be rounded or left as floats
        if int_text:
            ax.text(0, i, int(v), color='k', fontsize=14)
        else:
            ax.text(0, i, round(v, 3), color='k', fontsize=14)
     
    ### Labels plot
    ylabel != None and fig.set(ylabel=ylabel)
    xlabel != None and fig.set(xlabel=xlabel)
    title != None and fig.set(title=title)
    
### Used to style Python print statements
class color:
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'

<a id="load"></a>

### [^](#toc) Load data

In [None]:
dateparse = lambda x: pd.datetime.strptime(x, "%Y-%m-%d %H:%M:%S")

df = pd.read_csv("data/PBS-newhour-clean.csv", parse_dates=['Date'], date_parser=dateparse)
for col in ["Transcript", "Story", "Speakers"]:
    df[col] = df[col].map(string_literal)
    
print("Shape of df:", df.shape)
df.head()

In [25]:
df.iloc[0].URL

'https://www.pbs.org/newshour/show/robert-macneil-and-jim-lehrer-and-the-watergate-hearings'