# Using qualdocs to process Google Docs comments


This assumes you have `client_secret.json` in your working directory per the instructions in README.md

## Imports

In [1]:
!pip install qualdocs



In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_rows', 500)

In [3]:
import qualdocs

## Authentication

Authenticate with get_credentials(), which gives you a link to open. Do the authentication dance, and it will give you a string to copy and paste into a box in the output window below. If you have trouble on Google's end, restart the notebook and use private/incognito mode. Sometimes it can take a couple tries. It stores credentials to a .json file in `~/.credentials/`. Close the window and come back when authentication is complete.

In [4]:
credentials = qualdocs.get_credentials()


Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?client_id=TRUNCATED_FOR_PRIVACY

Enter verification code: RANDOM_LETTERS_AND_NUMBERS
Authentication successful.
Storing credentials to /home/staeiou/.credentials/drive-api-qualdocs.json


In [5]:
service = qualdocs.get_service()

## Search for files to process

In [6]:
ids = qualdocs.get_file_ids(service, search="lorem")
ids

{'qualdocs-test lorem': '1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI',
 'qualdocs test lorem 2': '1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk'}

## Query and parse into pandas dataframe

In [8]:
json_dict = qualdocs.get_json_dict(service, ids)

In [9]:
df = qualdocs.json_to_df(json_dict,ids)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,text,comment_id,coder,url
code,subcode,sub_subcode,name,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
topcode1,,,qualdocs test lorem 2,"Sed convallis purus lorem, ut euismod lorem scelerisque at. Vestibulum est diam, convallis ac dictum id, sodales nec nisl. Maecenas malesuada neque vel enim vestibulum laoreet.",AAAAA-veBHo,R.Stuart Geiger,https://docs.google.com/document/d/1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk/edit?disco=AAAAA-veBHo
topcode1,subcode1,,qualdocs-test lorem,"Nunc lacinia luctus mauris, vel malesuada lacus fermentum ullamcorper. Proin lacinia odio non tincidunt finibus.",AAAAA-y8kfE,R.Stuart Geiger,https://docs.google.com/document/d/1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI/edit?disco=AAAAA-y8kfE
topcode1,subcode1,,qualdocs-test lorem,"Vestibulum ac felis eget nisi iaculis condimentum. Morbi eros ligula, posuere id enim eu, dictum scelerisque magna.",AAAAA-y8ke8,R.Stuart Geiger,https://docs.google.com/document/d/1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI/edit?disco=AAAAA-y8ke8
topcode1,subcode2,,qualdocs test lorem 2,Vestibulum ac felis eget nisi iaculis condimentum.,AAAACP0KnYI,James Knox,https://docs.google.com/document/d/1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk/edit?disco=AAAACP0KnYI
topcode1,subcode2,,qualdocs-test lorem,"Lorem ipsum dolor sit amet, consectetur adipiscing elit. In enim sapien, fringilla vel sodales sed, tincidunt ut odio. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.",AAAAA-rM2bI,R.Stuart Geiger,https://docs.google.com/document/d/1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI/edit?disco=AAAAA-rM2bI
topcode1,subcode2,,qualdocs-test lorem,"Duis a erat sodales, laoreet est vel, consequat libero. Donec nisl erat, venenatis sit amet lacinia sit amet, finibus in nulla.",AAAAA-y8kfI,R.Stuart Geiger,https://docs.google.com/document/d/1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI/edit?disco=AAAAA-y8kfI
topcode1,subcode2,,qualdocs-test lorem,"Nunc lacinia luctus mauris, vel malesuada lacus fermentum ullamcorper. Proin lacinia odio non tincidunt finibus.",AAAAA-y8kfE,R.Stuart Geiger,https://docs.google.com/document/d/1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI/edit?disco=AAAAA-y8kfE
topcode1,subcode3,,qualdocs test lorem 2,Vestibulum ac felis eget nisi iaculis condimentum.,AAAACP0KnYI,James Knox,https://docs.google.com/document/d/1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk/edit?disco=AAAACP0KnYI
topcode1,subcode3,,qualdocs test lorem 2,Lorem ipsum dolor sit amet,AAAAA-veBH0,R.Stuart Geiger,https://docs.google.com/document/d/1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk/edit?disco=AAAAA-veBH0
topcode1,subcode4,,qualdocs test lorem 2,Lorem ipsum dolor sit amet,AAAAA-veBH0,R.Stuart Geiger,https://docs.google.com/document/d/1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk/edit?disco=AAAAA-veBH0


## Processing

Get list of codes

In [11]:
qualdocs.get_code_list(df)

['topcode1',
 'topcode1: subcode1',
 'topcode1: subcode1',
 'topcode1: subcode2',
 'topcode1: subcode2',
 'topcode1: subcode2',
 'topcode1: subcode2',
 'topcode1: subcode3',
 'topcode1: subcode3',
 'topcode1: subcode4',
 'topcode1: subcode4',
 'topcode2',
 'topcode2',
 'topcode2: subcode1',
 'topcode2: subcode1',
 'topcode2: subcode2: subsubcode1',
 'topcode2: subcode3',
 'topcode2: subcode3: subsubcode1',
 'topcode2: subcode3: subsubcode3',
 'topcode2: subcode3: subsubcode4',
 'topcode2: subcode4',
 'topcode2: subcode4',
 'topcode3']

Get all top-level codes:

In [13]:
list(df.index.get_level_values(0).unique())

['topcode1', 'topcode2', 'topcode3']

Get all subcodes:

In [15]:
list(df.index.get_level_values(1).unique())

['', ' subcode1', ' subcode2', ' subcode3', ' subcode4']

Get all sub-subcodes:

In [17]:
list(df.index.get_level_values(2).unique())

['', ' subsubcode1', ' subsubcode3', ' subsubcode4']

## Counts by code
### Ordered by index

In [19]:
df_counts = pd.DataFrame(df.groupby(level=[0, 1, 2]).count()['comment_id'])
df_counts.columns = ['count']
df_counts

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count
code,subcode,sub_subcode,Unnamed: 3_level_1
topcode1,,,1
topcode1,subcode1,,2
topcode1,subcode2,,4
topcode1,subcode3,,2
topcode1,subcode4,,2
topcode2,,,2
topcode2,subcode1,,2
topcode2,subcode2,subsubcode1,1
topcode2,subcode3,,1
topcode2,subcode3,subsubcode1,1


### Ordered by count

In [21]:
df_counts.sort_values(by='count', ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count
code,subcode,sub_subcode,Unnamed: 3_level_1
topcode1,subcode2,,4
topcode1,subcode1,,2
topcode1,subcode3,,2
topcode1,subcode4,,2
topcode2,,,2
topcode2,subcode1,,2
topcode2,subcode4,,2
topcode1,,,1
topcode2,subcode2,subsubcode1,1
topcode2,subcode3,,1


## Filtering and searching
### Filter by a top-level code:

In [23]:
df.xs('topcode1', level='code', drop_level=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,text,comment_id,coder,url
code,subcode,sub_subcode,name,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
topcode1,,,qualdocs test lorem 2,"Sed convallis purus lorem, ut euismod lorem scelerisque at. Vestibulum est diam, convallis ac dictum id, sodales nec nisl. Maecenas malesuada neque vel enim vestibulum laoreet.",AAAAA-veBHo,R.Stuart Geiger,https://docs.google.com/document/d/1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk/edit?disco=AAAAA-veBHo
topcode1,subcode1,,qualdocs-test lorem,"Nunc lacinia luctus mauris, vel malesuada lacus fermentum ullamcorper. Proin lacinia odio non tincidunt finibus.",AAAAA-y8kfE,R.Stuart Geiger,https://docs.google.com/document/d/1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI/edit?disco=AAAAA-y8kfE
topcode1,subcode1,,qualdocs-test lorem,"Vestibulum ac felis eget nisi iaculis condimentum. Morbi eros ligula, posuere id enim eu, dictum scelerisque magna.",AAAAA-y8ke8,R.Stuart Geiger,https://docs.google.com/document/d/1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI/edit?disco=AAAAA-y8ke8
topcode1,subcode2,,qualdocs test lorem 2,Vestibulum ac felis eget nisi iaculis condimentum.,AAAACP0KnYI,James Knox,https://docs.google.com/document/d/1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk/edit?disco=AAAACP0KnYI
topcode1,subcode2,,qualdocs-test lorem,"Lorem ipsum dolor sit amet, consectetur adipiscing elit. In enim sapien, fringilla vel sodales sed, tincidunt ut odio. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.",AAAAA-rM2bI,R.Stuart Geiger,https://docs.google.com/document/d/1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI/edit?disco=AAAAA-rM2bI
topcode1,subcode2,,qualdocs-test lorem,"Duis a erat sodales, laoreet est vel, consequat libero. Donec nisl erat, venenatis sit amet lacinia sit amet, finibus in nulla.",AAAAA-y8kfI,R.Stuart Geiger,https://docs.google.com/document/d/1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI/edit?disco=AAAAA-y8kfI
topcode1,subcode2,,qualdocs-test lorem,"Nunc lacinia luctus mauris, vel malesuada lacus fermentum ullamcorper. Proin lacinia odio non tincidunt finibus.",AAAAA-y8kfE,R.Stuart Geiger,https://docs.google.com/document/d/1EtYEx9U9KRfAOAh9LaSsmIQyDqiJ392qZJom1Jmv5MI/edit?disco=AAAAA-y8kfE
topcode1,subcode3,,qualdocs test lorem 2,Vestibulum ac felis eget nisi iaculis condimentum.,AAAACP0KnYI,James Knox,https://docs.google.com/document/d/1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk/edit?disco=AAAACP0KnYI
topcode1,subcode3,,qualdocs test lorem 2,Lorem ipsum dolor sit amet,AAAAA-veBH0,R.Stuart Geiger,https://docs.google.com/document/d/1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk/edit?disco=AAAAA-veBH0
topcode1,subcode4,,qualdocs test lorem 2,Lorem ipsum dolor sit amet,AAAAA-veBH0,R.Stuart Geiger,https://docs.google.com/document/d/1guJL7obwENn5GYOZarifxk3xPvWrk6NJ7WEfXQHTkBk/edit?disco=AAAAA-veBH0
