<a href="https://colab.research.google.com/github/lustraka/data-analyst-portfolio-project-2022/blob/main/cs01_cds_methods/20211130_Build_knowledge_base.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Case Study: CDS Methods
## Map CDS methods in Kitchen (2021b)

In [1]:
# Import dependencies
import requests
import pandas as pd
import numpy as np
import re
from datetime import date
import json
import os
import ast

In [8]:
# Import the knowledge base
kb_path = 'https://raw.githubusercontent.com/lustraka/data-analyst-portfolio-project-2022/main/data/'
kb_files = ['tao_iim.py', 'elem.txt', 'rels.csv', 'relt.csv']
for kb_file in kb_files:
  r = requests.get(kb_path+kb_file)
  with open(kb_file, 'wb') as file:
    file.write(r.content)

import tao_iim

## Update the Knowledge Base Programmatically

In [9]:
# Initialize & update dataframes
elem_df, rels_df, relt_df = tao_iim.load_elem_rel()
# Update here (applied updates below)

Shapes of loaded files:  [(111, 3), (226, 3), (4, 2)]


**2021-11-30** Updates:
```python
with open('bbb', 'r') as file:
  batches = ast.literal_eval(file.read())
for batch in batches:
  elem_df, rels_df, relt_df = tao_iim.add_batch(batch, elem_df, rels_df, relt_df)
```
where `bbb` stands for `batches_20211129.txt` and `batches_20211130.txt`.


In [10]:
# Save the results
do_it = False
# do_it = True
if do_it:
  tao_iim.dump_elem_rel(elem_df, rels_df, relt_df)

## Explore the Knowledge Base

In [12]:
elem_df.loc[elem_df.title.isna()].sort_values(by='term')

Unnamed: 0,term,title,url
100,critical data science method,,
101,critical data study,,
102,data (auto)ethnography method,,
103,data archaeology method,,
104,data walk method,,
105,discursive analysis method,,
106,historical analysis and genealogy method,,
107,"interview, focus group and workshop method",,
108,participatory and action research method,,
109,research-creation and creative practice method,,


In [17]:
for method in rels_df.query('elea == "critical data study"').eleb:
  print(method)
  print(rels_df.query(f'relt == "instantiates" and eleb ==\"{method}\"').elea.to_list())

data (auto)ethnography method
['Pink et al. 2018b', 'Meng and DiSalvo 2018', 'Coletta and Kitchin 2017', 'Tanweer et al. 2016', 'Lowrie 2017', 'Evans and Kitchin 2018', 'Grommé et al. 2018', 'Pink and Fors 2017', 'Beneito-Montagut et al. 2017', 'Lehtiniemi and Ruckenstein 2019', 'Fraser 2019a', 'Kitchin 2021a', 'Kitchin et al. 2016']
interview, focus group and workshop method
['Meng and DiSalvo 2018', 'Bates et al. 2016', 'Bates 2018', 'Ruijer et al. 2017', 'Kitchin and Moore-Cherry 2020', 'Brayne 2017', 'Delaney 2019', 'Delaney and Kitchin 2020', 'Thatcher 2014', 'Gray 2019', 'Chenou and Cepeda-Másmela 2019', 'Cinnamon 2019', 'Baack 2015']
data archaeology method
['Light et al. 2016', 'Leszczynski 2017', 'Tanweer et al. 2016', 'Dodge and Kitchin 2000', 'Bates et al. 2016', 'Bates 2018', 'Dumit and Nafus 2018', 'Kitchin and McArdle 2016', 'Kitchin and Stehle 2021', 'Kitchin et al. 2016', 'Dodge and Kitchin 2005', 'Currie et al. 2016', 'Meng and DiSalvo 2018', 'Loukissas 2018', 'Iliadis

In [25]:
# JOIN rels_df AND elem_df ON rels_df.elea = elem_df.term
df = rels_df.merge(elem_df[['term', 'title']], how='left', left_on='elea', right_on='term')

for method in df.query('elea == "critical data study"').eleb:
  print(f'- **{method.title()}**')
  for _,row in df.query(f'relt == "instantiates" and eleb ==\"{method}\"').sort_values(by='term').iterrows():
    print(f'\t- ({row[3]}) {row[4]}')

- **Data (Auto)Ethnography Method**
	- (Beneito-Montagut et al. 2017) Beneito-Montagut, R., Begueria, A. and Cassián, N. (2017) ‘Doing digital team ethnography: Being there together and digital social data’, Qualitative Research, 17(6), 664–682.
	- (Coletta and Kitchin 2017) Coletta, C. and Kitchin, R. (2017) ‘Algorhythmic governance: Regulating the “heartbeat” of a city using the Internet of Things’, Big Data & Society, 4: 1–16.
	- (Evans and Kitchin 2018) Evans, L. and Kitchin, R. (2018) ‘A smart place to work? Big data systems, labour, control, and modern retail stores’, New Technology, Work and Employment, 33(1): 44–57.
	- (Fraser 2019a) Fraser, A. (2019a) ‘Curating digital geographies in an era of data colonialism’, Geoforum, 104: 193–200.
	- (Grommé et al. 2018) Grommé, F., Ruppert, E. and Cakici, B. (2018) ‘Data scientists: A new faction of the transnational field of statistics’, in Knox, H. and Nafus, D. (eds), Ethnography for a Data-Saturated World. Manchester University Press

In [24]:
df.query(f'relt == "instantiates" and eleb ==\"{method}\"')

Unnamed: 0,elea,relt,eleb,term,title
123,Pickles 1995,instantiates,critical data science method,Pickles 1995,"Pickles, J. (ed) (1995) Ground Truth: The Soci..."
124,Schuurman and Pratt 2002,instantiates,critical data science method,Schuurman and Pratt 2002,"Schuurman, N. and Pratt, G. (2002) ‘Care of th..."
125,Dunn 2007,instantiates,critical data science method,Dunn 2007,"Dunn, C.E. (2007) ‘Participatory GIS – A peopl..."
126,Crampton et al. 2013,instantiates,critical data science method,Crampton et al. 2013,"Crampton, J., Graham, M., Poorthuis, A., Shelt..."
127,Williams 2020,instantiates,critical data science method,Williams 2020,"Williams, S. (2020) Data Action: Using Data fo..."
128,D’Ignazio and Klein 2020,instantiates,critical data science method,D’Ignazio and Klein 2020,"D’Ignazio, C. and Klein, L.F. (2020) Data Femi..."
129,Graham et al. 2014,instantiates,critical data science method,Graham et al. 2014,"Graham, M., Hogan, B., Straumann, R.K. and Med..."
130,Graham et al. 2015,instantiates,critical data science method,Graham et al. 2015,"Graham, M., De Sabbata, S. and Zook, M. (2015)..."
131,Robinson and Franklin 2020,instantiates,critical data science method,Robinson and Franklin 2020,"Robinson, C. and Franklin, R.S. (2020) ‘The se..."
