The `codebook.ipynb` notebook has three purposes. First, to parse the taxonomy of open and axial codes in `code_tree.yaml`. Second, to parse all the open codes in PDF printouts of computational notebooks located in the `notebooks/` directory. Third, it calculates code-analysis frequency for each code, the number of unique analyses that contain at least one instance of each open and axial code. It combines all this data in the `codes` data frame and exports this for further analysis in `data/codes.csv`.

In [1]:
import re, yaml
import pandas as pd
from lib.util import getCodes, displayMarkdown

%autosave 0

pd.set_option("display.max_rows", None)  # Don't truncate rows when printing a Pandas DataFrame instance

Autosave disabled


# Parse code tree

Recursively traverse the YAML code tree to transform the data from a tree into tabular form. The node called "root" does not actually exist in the code tree.

In [2]:
with open('code_tree.yaml', 'r') as f:
    code_yaml = yaml.safe_load(f)
    
codes = []

def preTreeWalk(pNode, node, func, lvl=0):
    """ A recursive, pre-order traversal of the code groups YAML structure"""
    leaf = 'sub' not in node.keys()
    func(pNode, node, lvl, leaf)
    if not leaf:
        for child in node['sub']:
            preTreeWalk(node, child, func, lvl + 1)

parseYaml = lambda parent, child, lvl, leaf: codes.append({
    'parent': parent['name'].lower(),
    'name': child.get('alias', child['name']).lower(),
    'desc': child['desc'],
    'level': lvl,
    'is_leaf': leaf
})

for grp in code_yaml:
    preTreeWalk({'name': 'root'}, grp, parseYaml)

codes = pd.DataFrame(codes)[['parent', 'name', 'desc', 'level', 'is_leaf']]
codes.head()

Unnamed: 0,parent,name,desc,level,is_leaf
0,root,actions,Codes that describe actions journalists take t...,0,False
1,actions,import,How raw data is introduced into the programmin...,1,False
2,import,fetch,Data is retrieved from some external sources t...,2,False
3,fetch,extract data from pdf,"Use a table extraction tool, such as Tabula, t...",3,True
4,fetch,make an api request,Make a request to a web service,3,True


## Display code tree

All open codes, their descriptions, and the corresponding axial codes are stored in the `code_tree.yaml` file. As the master copy for all open and axial codes resides here, the raw text itself can be difficult to read. Thus, it can be helpful to read this tree in Markdown.

In [3]:
minLevel = 5

In [4]:
codeMarkdownTree = [ '{}1. **{}**: {}\n'.format('\t' * c['level'], c['name'].title(), c['desc']) for i, c in codes[codes.level <= minLevel].iterrows() ]

displayMarkdown("""
### Codes\n {}
""".format('\n'.join(codeMarkdownTree)))


### Codes
 1. **Actions**: Codes that describe actions journalists take to wrangle data

	1. **Import**: How raw data is introduced into the programming/wrangling environment

		1. **Fetch**: Data is retrieved from some external sources to the programming environment

			1. **Extract Data From Pdf**: Use a table extraction tool, such as Tabula, to parse tables inside PDF documents

			1. **Make An Api Request**: Make a request to a web service

			1. **Query A Database**: Data is imported through a database connection

			1. **Scrape The Web**: Systematically parsing HTML web pages for data

		1. **Create**: Data is created inside the programming environment

			1. **Construct Table Manually**: The table is either copy-and-pasted or values are created manually

			1. **Generate Data Computationally**: Using tables with values generated programmatically

			1. **Copy Table Schema**: A table is copied with a schema but without any values

			1. **Backfill Missing Data**: Create data observations where there are missing entries.

		1. **Load**: Data resides on the local disk and is *loaded* into the environment

	1. **Recalculate**: Creating or revising table variables based on existing variables, without *integrating* other tables

		1. **Detrend**: Removing the secular effect from a variable "filter out the secular effect in order to see what is going on specifically with the phenomenon you are investigating." (Meyer, p41)

			1. **Adjust For Inflation**: Removing the effect of price inflation from data

			1. **Compute Index Number**: Calculate the change in a variable over time

			1. **Adjust For Season**: Adjusting a variable to compensate for season

		1. **Formulate A Performance Metric**: A quantitative variable that facilitates fair comparisons

			1. **Standardize Variable**: Measuring deviation from "normal," such as z-scores

			1. **Figure A Rate**: Calculating a normalized rate to "provide a comparison against some easily recognized baseline" (Meyer, 38)

			1. **Calculate A Central Tendency**: Measuring what a typical value is in the data, such as mean, median, or mode

			1. **Calculate Change Over Time**: Such as the percentage difference over time

			1. **Calculate Spread**: Calculating the difference between two values or rates

			1. **Calculate Domian-Specific Performance Metric**: Calculate a domain specific metric

			1. **Get Extreme Values**: Calculate the highest or lowest value(s) in a variable

	1. **Modify**: *Modifying* the data constitutes changes to variables in the table, without *integrating* other tables

		1. **Encode Table Identification In Row**: Adding some table identification value as a table variable for all observations

		1. **Network-Ify The Data**: Codes related to network data

			1. **Create An Edge**: Reference another observation in the table

			1. **Define Edge Weights**: Adding a quantitative variable to the relationship between two observations

		1. **Generate Keys**: Operations that attempt to create unique identifiers for observations

			1. **Create Soft Key**: Keys not guarenteed to be unique per observation

			1. **Create A Unique Key**: Keys that are guarenteed to be unique per observation

			1. **Prep Key For Joining**: Clean existing key column prior to joining

		1. **Rank Data**: Operations impose meaning on observations by their order in the table

			1. **Assign Ranks**: Explicitly ordering observations as a variable

			1. **Sort Table**: Implicitly ordering observations by table position

		1. **Variable Molding**: Codes concerning the relationship between table columns and variables

			1. **Separate Variables In Column**: Extract a column with more than one variable into separate colvars

			1. **Combine Variables**: Combining multiple colvars into one column

		1. **Create A Flag**: Create a variable of boolean values

		1. **Pad Variable**: Adding either character prefixes or suffixes consistently to every observation in a variable

		1. **Scale Variable**: Operations that apply some mathematical operation to a quantitative variable

		1. **Consolidate Variable Levels**: Codes that map a set of unique values to a smaller set

			1. **Bin Values**: Consolidate a quantitative variable into a smaller set of ordinal data

			1. **Combine Categorial Values**: Consolidate the levels of a categorical variable into a smaller set of levels

	1. **Clean**: Operations to correct data that might be considered "dirty"

		1. **Trim The Fat**: Remove sections of data that are not relevant

			1. **Remove Variables**: Removing variables from a table by specifying which to remove or retain

			1. **Remove Observations**: Remove observations from a table by filtering on variables

				1. **Trim By Date Range**: Removing observations inside or outside a range of dates

				1. **Trim By Geographic Area**: Remove observations that are inside or outside the geographic region

				1. **Trim By Quantitative Threshold**: Remove observations that are above, below, equal to, or not equal to a quantitative value

				1. **Trim By Contains Value**: Remove observations that do or do not contain specific a specific value or multiple values

		1. **Remove Incomplete Data**: Drop observation if it contains incomplete values, often denoted as NA

		1. **Deduplicate**: Remove duplicate observations

		1. **Fix Values**: Individual values have errors that must be corrected

			1. **Resolve Entities**: Resolving the issue of different categorical values for the same entitiy

			1. **Fix Data Errors Manually**: Instances where individual row-column values are changed manually

			1. **Fix Mixed Data Types**: Casting all the values of a variable to one data type

			1. **Remove Value Characters**: When characters inside a value are removed, such as periods, commas, and dollar signs

			1. **Replace Na Values**: Replace NA values in a table section with the same value

			1. **Strip Whitespace**: Remove whitespace characters from the beginning and end of a string value

		1. **Format**: Operations that modify the table values appearance or style

			1. **Format Values**: Operations that change value appearence, such as change case, specifying date format, rounding floats.

			1. **Format Schema**: Operations that modify the table schema

				1. **Canonicalize Column Names**: Operations that change column names

				1. **Change Column Data Type**: For example, changing a column of values from dates to strings

	1. **Integrate**: Combining data residing in different tables into one table

		1. **Union Tables**: Combining multiple tables "row-wise" such that it adds more observations to the table

		1. **Inner Join Tables**: Take the intersection of two tables on a shared key variable

		1. **Supplement**: The variables of one table are supplemented with the variables of another table

			1. **Outer Join Tables**: Retain observations with no corresponding match in table being joined upon

			1. **Full Join Tables**: Retain observations with no corresponding match in either table

			1. **Concat Parallel Tables**: When two tables are joined without a joining key

			1. **Use Lookup Table**: Using a table with two columns to map from one value to another

		1. **Cartesian Product**: When a new table is created by the unique paring of each key in their respective tables

		1. **Self Join Table**: A table is joined with itself

	1. **Transform**: Operations that transform a table into an aggregated, coarser view of the original table.

		1. **Summarize**: Codes that aggregate and calculate tables to get a more coarse view of the data.

			1. **Rollup**: Rename entity to the name of its parent (for hierarchical data)

			1. **Join Aggregate**: Extend the table (columnwise) with aggregate values, hence the number of rows stays constant but columns increase

			1. **Split, Apply, Combine**: Partition a table, apply the same calculation on each partition, and union partitions

				1. **Group By Single Column**: The partition is formed by a single column

				1. **Group By Multiple Columns**: The partition is formed by multiple columns, creating hierarchy

			1. **Rolling Window Calculation**: Performs rolling-window aggregation

			1. **Create Frequency Table**: Count the frequency of non-quantitative variables within a column

			1. **Cross Tabulate**: such as with a pivot table/crosstab

		1. **Reshape**: Operations fundamentally change the table's structure, without summarizing any data

			1. **Spread Table**: Expand two columns of key value pairs into multiple columns

			1. **Gather Table**: Collapses table into key value pairs

	1. **Display Dataset**: Different ways to check in on the state of the dataset during wrangling

		1. **Format Table Display**: Operations that adjust the table displace, such as how many decimals to round floats

		1. **Visualize Data**: Employing any kind of data visualization, including a table

		1. **Describe Statistically**: Generates any kind of descriptive statistics of the dataset's central tendency, dispersion and distribution shape

	1. **Check Sanity**: Operations that confirm the inspect the state of the data during wrangling

		1. **Run A Test**: Operations output a clear pass or fail value

			1. **Check For Column Mismatch**: Check the number of columns between rows or tables

			1. **Test For Equality**: Test if two data structures are exactly the same

			1. **Test Different Computations For Equality**: Test the results of a calculation against different methods/packages

			1. **Validate Data Quality With Domain-Specific Rules**: Such as if the average temperature is higher than the maximum recorded temperature

		1. **Check Results**: Operations that output some visual representation of the table

			1. **Peek At Data**: Display the first *n* rows and all columns of the table

			1. **Inspect Table Schema**: Check the data types of columns

			1. **Display Rows With Missing Values**: Show rows that contain an NA value

			1. **Check For Nas**: See if any rows have NA values

		1. **Count The Data**: Operation that count things in the table

			1. **Count Number Of Rows**: Printing out the total number of observations in a table

			1. **Count Unique Values**: Report the number of unique values in one or more variables

	1. **Export**: Export the results of data wrangling.

1. **Observations**: These codes cover observations from the coder about the wrangling processes, not actions performed by the journalist

	1. **Data Acquisition**: How the data was acquired by journalists

		1. **Collect Raw Data**: Using first-hand observations or logs as data

		1. **Use Previously Cleaned Data**: Data that originated from a colleague

		1. **Use Public Data**: Includes open-source datasets, tables on Wikipedia, etc..

		1. **Use Academic Data**: Use data collected from an academic study

		1. **Use Non-Public, Provided Data**: Use data that is not publically available

		1. **Use Open Government Data**: Data publically available on open data portals

		1. **Freedom Of Information Data**: Data that was obtained via FOI/FOIA requests

		1. **Use Another News Orgs Data**: A dataset previously published by another news organization

		1. **Use Data From Colleague**: A dataset was provided by another journalist

	1. **Workflow Building**: Codes pertaining to how the wrangling workflow is built.

		1. **Annotate Workflow**: Adding comments or notes in Markdown that explain what the journalists doing.

		1. **Think Computationally**: Codes that demonstrate computational thinking on the part of the journalist.

			1. **Architect A Subroutine**: A set of instructions grouped together to be performed multiple times

			1. **Architect Repeating Process**: Instances where journalists employed a loop

		1. **Toggle Step On And Off**: Ensuring that some code segments are not always run, such as by commenting out lines of code

	1. **Wrangling Purpose**: Why does this data need to be wrangled?

		1. **Input For Downstream Applications**: Output from wrangling will be input into some other program

			1. **Wrangle Data For Graphics**: Data need to be formatted in order to be visualized in an article, including tables.

			1. **Wrangle Data For Model**: Data is being wrangled in order to create a model, whether the main point of the piece is for prediction or classification

		1. **Remove Erroneous Data**: There are errors in the data that need to be removed

		1. **Creating New Datasets**: The purpose of wrangling is to create a new dataset

			1. **Combine Drifting Datasets**: Reconcile difference in periodically published datasets that have superficially changed over time

			1. **Combine Seemingly Disparate Datasets**: When a notebook largely constitutes combining seemingly unrelated datasets

			1. **Combine Data And Geography**: Pairing data with GIS info

		1. **Get A Summary Of The Data**: Data of individual observations is aggregated in an attempt to find some meaningful structure or patterns

	1. **Analysis**: Kinds of analysis data journalists need to wrangle data to perform

		1. **Interpret Statistical/Ml Model**: Analyze features from a model such as linear regression or classification trees

		1. **Compare Different Groups Along A Common Metric**: The end analysis is just comparing different groups by a common metric.

		1. **Identify Extreme Values**: Identify values that are at the ends of the range, but not strictly outliers.

		1. **Outlier Detection**: Finding extreme cases or outliers in the data

		1. **Show Trend Over Time**: Analysis consists of showing how values change over time

		1. **Calculate A Statistic**: Calculate a single value for from a dataset, such as number of records.

		1. **Answer A Question**: Analysis consists of using data to answer a specific question

		1. **Examine Relationship**: Analysis consists of examining the relationship between different phenomena

		1. **Explain Variance**: This can be done via PCA

		1. **Identify Clusters Or Lack Of Clusters**: Look for meaningingful groups within the data

		1. **Find Nearest Neighbours In The Network**: (Network analysis) Find the closest neighbours for all points

		1. **Explore Dynamic Network Flow**: (Network analysis) explore the flow between different nodes in the graph, e.g. migration between cities.

	1. **Strategies**: General strategies journalists employ when wrangling data.

		1. **Tables Evolve**: Data and objects are destroyed during the wrangling process.

			1. **Value Replacement**: The output of any column calculation is reassigned to an existing column.

			1. **Temporary Joining Column**: When a key for joining two tables is created and destroyed immediately after the join.

			1. **Refine Table**: Table refinement refers to when a table is subset *in place*, a new object is not created in the environment.

		1. **Data Is Precious**: Data and objects are neverly actually lost in the programming environment.

			1. **Preserve Existing Values**: The output of any column calculation is assigned to a new column

			1. **Create Child Table**: A child table is a subset of the parent table declared as a new object in the environment.

		1. **Set Data Confidence Threshold**: Removes rows where a quantitative value is less than, greater than, or not equal to a numeric value.

		1. **Table Splitting**: Tables may be divided, partitioned, or otherwise split into multiple tables to accomplish a transformation goal.

			1. **Split, Compute, And Merge**: First, the journalist partitions a single data frame into multiple, separate data frames. Then, often identical computations are run on all the data frame. Finally, the multiple data frames are consolidated into one data frame again.

			1. **Split And Compute**: One table is split into two or more and identical computations are applied to each table

		1. **Tolerate Dirty Data**: Analysis continues despite clear data quality issues.

	1. **Pain Points**: Areas where journalist seem/could be frustrated in the wrangling process.

		1. **Fix Incorrect Calculation**: Calculations in the data are incorrect and the journalist must recalculate them

		1. **Repetitive Code**: Instances where code is repetitively copied and pasted.

		1. **Make An Incorrect Conclusion**: Instances where the journalist has made an incorrect conclusion about the data.

		1. **Post-Merge Clean Up**: Pain points that come from the result of merging two datasets together

			1. **Resort After Merge**: When a sort has to be re-done because a merge ruining the pre-merged order.

			1. **Fill In Na Values After An Outer Join**: As outer joins do not drop non-matching rows, those values have NA

		1. **Encode Redundant Information**: When data that already exists in the table is recoded into the table.

		1. **Post-Aggregation Clean Up**: Pain points that come from the result of grouping a table.

			1. **Data Loss From Aggregation**: When table columns are lost because they were dropped form resulting table due to not being relevant in aggregation.

			1. **Silently Dropping Values After Groupby**: Values other than thsoe being grouped and calculated upon are lost in a group by operation

		1. **Data Too Large For Repo**: Raw data cannot be included in SCM because files are too large



# Parse coded notebooks

For each computational notebook and script used for wrangling data in each analysis, we created PDF printouts with a `.html.pdf`. This extension distinguishes them from possible PDFs checked into the repositories by contributors. All of these printouts fit the glob pattern `notebooks/**/**/*.html.pdf`. We open-coded PDF printouts using the comments feature in [Adobe Acrobat DC](https://acrobat.adobe.com/en/acrobat.html). Open codes are extracted from each PDF using some internals of the open-source [pdfannots CLI](https://github.com/0xabu/pdfannots). See the [main function in pdfannots.py](https://github.com/0xabu/pdfannots/blob/6dd8dd29a93a0f5ec55e4b47f0eb27d8088a11a0/pdfannots.py#L469) for more details. 

The `codeData` data frame links open codes with the notebooks in which they appear. Warning: this cell may take awhile to execute.

In [5]:
%%time
codeData = getCodes()

CPU times: user 1min, sys: 130 ms, total: 1min
Wall time: 1min 1s


## Quality Assurance

This section contains various coding QA measures.

### Matching codes between notebooks and the Code Tree

The cell below ensures that there aren't any codes in the code tree that aren't in the PDF printouts and vice versa. More precisely, it checks that the difference between the set of open codes in `code_tree.yaml` and the set of unique codes that appear in every PDF printout (`notebooks/**/**/*.html.pdf`) is the empty set.

In [6]:
# Parse the code YAML for just the open codes (leaves)
leaves = []
def collectLeaves(node, repo):
    """Recursively traverse dictionary tree and collect only the leave nodes"""
    if 'sub' in node.keys():
        for subnode in node['sub']:
            collectLeaves(subnode, repo)
    else:
        safeCode = node['name'].strip().lower()
        repo.append(safeCode)

for grp in code_yaml:
    collectLeaves(grp, leaves)

# Convert from lists to sets
leaves = set(leaves)
pdf_codes = set(codeData['code'].unique())

# Find any discrepancies
diff = lambda a, b, codes: displayMarkdown('Codes in `{}` but not in `{}`:\n{}\n'.format(a, b, '\n'.join(['* ' + c for c in codes])))

falsePositives = pdf_codes.difference(leaves)
falseNegatives = leaves.difference(pdf_codes)

if not (bool(falsePositives) or bool(falseNegatives)):
    # Both sets are the null set
    displayMarkdown('<p>All codes have been grouped!</p><img src="https://media.giphy.com/media/XreQmk7ETCak0/giphy.gif"> ')
else:
    # Problems
    if len(pdf_codes.difference(leaves)) > 0:
        diff('*.html.pdf', 'code_tree.yaml', pdf_codes.difference(leaves))
    if len(leaves.difference(pdf_codes)) > 0:
        diff('code_tree.yaml', '*.html.pdf', leaves.difference(pdf_codes))

Codes in `*.html.pdf` but not in `code_tree.yaml`:
* subset columns
* prep tables for joining
* calculate central tendency
* combine entities
* create unique key


Codes in `code_tree.yaml` but not in `*.html.pdf`:
* combine categorial values
* remove variables
* calculate a central tendency


### Find codes in notebooks

If extracted codes and the codes in `code_tree.yaml` don't match, then we can find the corresponding open code by grouping data by code, article, and analysis.

In [7]:
needles = ['inner join tables', 'outer join tables', 'full join tables', 'concat parallel tables', 'use lookup table', 'self join table']

codeData['mark'] = '✔️'

codeData[codeData.code.isin([n.lower() for n in needles ])] \
    [['org', 'analysis', 'notebook', 'code', 'mark']] \
     .drop_duplicates() \
     .set_index(['org', 'analysis', 'notebook', 'code']) \
     .unstack(fill_value='')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,mark,mark,mark,mark,mark,mark
Unnamed: 0_level_1,Unnamed: 1_level_1,code,concat parallel tables,full join tables,inner join tables,outer join tables,self join table,use lookup table
org,analysis,notebook,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
TheOregonian,long-term-care-db,mung-3-25-scrape,,,,✔️,,✔️
baltimore-sun-data,school-star-ratings-2018,mapping.ipynb,,,,✔️,,
buffalonews,new-york-schools-assessment,percent-proficiency.ipynb,,,,,,✔️
buzzfeednews,2015-11-refugees-in-the-united-states,us-refugee-analysis.ipynb,,,,✔️,,
buzzfeednews,2016-11-bellwether-counties,county-predictiveness.ipynb,,,,✔️,,
buzzfeednews,2019-04-democratic-candidate-codonors,analyze-campaign-codonors.ipynb,,,,✔️,✔️,
fivethirtyeight,bechdel,bechdel,,,✔️,,,
fivethirtyeight,infrastructure-jobs,infrastructure-jobs.R,,,✔️,✔️,,✔️
la_times,california-ccscore-analysis,analysis.ipynb,,,,✔️,,
la_times,california-crop-production-wages-analysis,02-transform.ipynb,,,✔️,,,


### Code group gaps

There are some code groups that should be applied to all analyses.

#### Analysis Codes


In [8]:
analysisCodes = codes[codes.parent == 'analysis'].name.unique()
codedAnalyses = set(codeData[codeData.code.isin(analysisCodes)].analysis.unique())
analyses = set(codeData.analysis.unique())
codeDiff = analyses.difference(codedAnalyses)

if len(codeDiff) > 0:
    displayMarkdown("""
    The following analyses do not have analysis codes:

    * {}

    """.format('\n* '.join(codeDiff)))
else:
    displayMarkdown("All analyses have analysis codes")

All analyses have analysis codes

# Calculating code-analysis frequency

Because PDF printouts have only open codes inside, we have to do a little bit of data wrangling to figure out how many analyses 


In [9]:
codeAnalysisTmp = pd.merge(codes[codes.is_leaf].copy()[['parent', 'name']], codeData[['code', 'analysis', 'notebook']],
                        how='left',
                        left_on='name',
                        right_on='code') \
                 .drop(['code'], axis=1) \
                 .drop_duplicates()

codeAnalysis = codeAnalysisTmp[['name', 'analysis', 'notebook']].copy()

while codeAnalysisTmp.parent.nunique() > 0:
    codeAnalysisTmp = codeAnalysisTmp[['parent', 'analysis', 'notebook']] \
        .rename(columns={'parent': 'name'}) \
        .drop_duplicates()

    codeAnalysisTmp = pd.merge(
        codeAnalysisTmp,
        codes[['parent', 'name']],    
        how = 'left',
        on = 'name')

    codeAnalysis = pd.concat([codeAnalysis, codeAnalysisTmp[['name', 'analysis', 'notebook']]]) \
        .drop_duplicates()

codeAnalysis = pd.merge(
    codeAnalysis,
    codes[codes.name != 'root'][['name', 'level', 'is_leaf']], 
    how='left')

In [10]:
codeAnalysisGrp = codeAnalysis.groupby('name')['analysis'].nunique().to_frame('analysis').reset_index()

displayMarkdown('The minimum analysis count for any code should be 1: {}'.format(1 == min(codeAnalysisGrp.analysis)))    

The minimum analysis count for any code should be 1: False

In [11]:
priorSize = codes.shape[0]

codes = pd.merge(codes, codeAnalysisGrp, how='left', on='name')

displayMarkdown(('The data frame `codes` differ by {} rows after the aggregate join'.format(priorSize - codes.shape[0])))
codes.head()

The data frame `codes` differ by 0 rows after the aggregate join

Unnamed: 0,parent,name,desc,level,is_leaf,analysis
0,root,actions,Codes that describe actions journalists take t...,0,False,49.0
1,actions,import,How raw data is introduced into the programmin...,1,False,38.0
2,import,fetch,Data is retrieved from some external sources t...,2,False,0.0
3,fetch,extract data from pdf,"Use a table extraction tool, such as Tabula, t...",3,True,0.0
4,fetch,make an api request,Make a request to a web service,3,True,0.0


# Export results

We export a couple of CSV files for other notebooks to use.

* `data/codes.csv` contains information on individual axial codes such as their level in the tree and how many analyses in which the code occurs.

* `data/code-analysis-network.csv` contains the occurrence of open and axial codes in individual notebooks.

In [12]:
codes.to_csv('data/codes.csv', index=False)

codeAnalysis.to_csv('data/code-analysis-network.csv', index=False)