Skip to content

emtseng/qual-code-visualizer

Repository files navigation

Qual code-visualizer

This repo contains a visualizer for a qualitatively coded dataset. Given a codebook and a set of tagged transcripts, it will produce a simple HTML-based visualization of your dataset you can open on any browser. Transcripts are human-readable and navigable by code, and codes are counted by occurrence.

This repo is templatable, so have at it.

Last updated Jan. 26, 2021 by @emtseng

  • Updated to python3
  • Added sys checks at the top of each active script for python3

0) Setup

We'll run this tool in a python virtualenv. Ensure you have a python3 installation. Then:

virtualenv -p <path to your python3, for instance /usr/bin/python> py3
source py3/bin/activate
pip install -r requirements.txt

To deactivate the virtualenv:

deactivate

1) Prepare data

Place the codebook in your top-level directory. It should be a CSV should be formatted as:

   code1 , description1
   code2 , description2
   ...
   code3, description3

where codes are (possibly quote-surrounded) strings and descriptions are (possibly quote-delimited) strings. We will eat extra ',' appearing at end of lines.

Place your transcripts either in your top-level directory or in a directory (e.g. csvs/) at the top level. They should be CSVs formatted as:

Name, text , code1 , code2 , ...

where Name, text is a string for some text that Name has said, and each code is a string. Note that your speaker (Name) and their utterance (text) must be separated by a comma for this to work.

2) Reformat transcripts

The transcripts will need to be reformatted for use in the code extractor. To do this, run:

python reformat.py -i <input directory> -o <output directory> -c <codebook.csv>

You should then use the reformatted transcripts in your output directory for step 3.

3) Run codes

This repo contains a script (code-extract.py) that will process either a directory of transcripts or a list of transcripts. Usage is as follows:

python code-extract.py [update] <project title>  <output directory> <codebook> [master.csv] <csv1> [<csv2>] ...
python code-extract.py [update] <project title> <output directory> <codebook> [master.csv] <csv directory> ...

update is optional, and specifies that the transcripts are already processed by code-extract.py previously. This will regenerate output HTML and CSVs by updating based on the CSVs included. All CSVs should be included if one wants to update. (You can also ignore this completely and just re-run it with new codebooks each time.)

master.csv is for doing updates. It contains all the quotes from all the interviews, and any other CSV is used to update values in that master.csv. (You can also ignore this completely and just re-run it with new codebooks each time.)

The script will produce a folder of HTML in the output directory specified. Open the resulting outputdir/index.html in a browser to navigate through your codes.

Shortcuts

python reformat.py -i csvs/ -o reformatted-csvs/ -c codebook-combined-all.csv
python code-extract.py Remote-Clinic outputs/ codebook-combined-all.csv reformatted-csvs/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published