# Table Scraps: An Actionable Framework for Multi-Table Data Wrangling From An Artifact Study of Computational Journalism




## Unabridged Taxonomies of Data Wrangling in Computational Journalism

Here we present complete versions of our two taxonomies of data wrangling in computational journalism: *Actions* and *Processes*. We use shortcodes to refer to longer descriptions of open and axial codes in the paper, follow this naming convention:

<1>.<2>.<3>.<4>.<5>

<1>: *A* for Action and *P* for Process

<2>: The first character of the code, capitalized

<3>: Letters a-z, lowercase

<4>: Arabic numerals, 1 - 9

<5>: Roman numerals, lowercase, i - x

In [1]:
import pandas as pd
import numpy as np
from lib.util import displayMarkdown

codeset = pd.read_csv('data/codeset.csv', ).replace(np.nan, '', regex=True)

def displayTree(codes, minLevel=1000):
    """Format codeset dataframe as a tree in Markdown"""
    codeMarkdownTree = [ '{}1. {}.&nbsp;&nbsp;&nbsp;**{}**: {}\n'.format(
        '\t' * c['level'],
        c['shortcode'],        
        (c['alias'] if c['alias'] != '' else c['name']()),
        c['desc']) for i, c in codes[codes.level <= minLevel].iterrows()
               ]

    return displayMarkdown("{}".format('\n'.join(codeMarkdownTree)))

In [2]:
%%html
<style>
    ol > li {
        list-style: none
    }
</style>

## Actions Taxonomy

The *Actions* taxonomy details individual data wrangling steps made by journalists.

In [4]:
displayTree(codeset[codeset.type=='actions'], 3)

KeyError: 'shortcode'

## Process Taxonomy

The process taxonomy consists of the the paper authors' interpretations of the processes that occur during data wrangling.

In [None]:
displayTree(codeset[codeset.type=='process'], 3)