# Citation

Much of the code and examples are copied/modified from 

> Blueprints for Text Analytics Using Python by Jens Albrecht, Sidharth Ramachandran, and Christian Winkler (O'Reilly, 2021), 978-1-492-07408-3.
>

- https://github.com/blueprints-for-text-analytics-python/blueprints-text
- https://github.com/blueprints-for-text-analytics-python/blueprints-text/blob/master/ch01/First_Insights.ipynb

---

# Setup

In [None]:
%matplotlib inline

import os
from pathlib import Path
import helpsk as hlp
import numpy as np
import pandas as pd

from helpers.utilities import Timer, get_logger
from helpers.text_processing import count_tokens

def get_project_directory():
    return os.getcwd().replace('/source/executables', '')

print(get_project_directory())

---

# Exploratory Data Analysis

This section provides a basic exploration of the text and dataset.

In [None]:
with Timer("Loading Data"):
    path = os.path.join(get_project_directory(), 'artifacts/data/processed/un-general-debates-blueprint.pkl')
    un_debates = pd.read_pickle(path)

---

In [None]:
hlp.pandas.numeric_summary(un_debates)

In [None]:
hlp.pandas.non_numeric_summary(un_debates)

---

In [None]:
un_debates[un_debates['speaker'].str.contains('Bush')]['speaker'].value_counts()

---

In [None]:
count_tokens(un_debates['tokens']).head(20)

---

---