<a href="https://colab.research.google.com/github/lustraka/data-analyst-portfolio-project-2022/blob/main/cs01_cds_methods/20211130_Build_knowledge_base.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Case Study: CDS Methods
## Map CDS methods in Kitchen (2021b)

In [1]:
# Import dependencies
import requests
import pandas as pd
import numpy as np
import re
from datetime import date
import json
import os
import ast

In [2]:
# Import the knowledge base
kb_path = 'https://raw.githubusercontent.com/lustraka/data-analyst-portfolio-project-2022/main/data/'
kb_files = ['tao_iim.py', 'elem.txt', 'rels.csv', 'relt.csv']
for kb_file in kb_files:
  r = requests.get(kb_path+kb_file)
  with open(kb_file, 'wb') as file:
    file.write(r.content)

import tao_iim

## Update the Knowledge Base Programmatically

In [36]:
# Initialize & update dataframes
elem_df, rels_df, relt_df = tao_iim.load_elem_rel()
# Update here (applied updates below)

Shapes of loaded files:  [(111, 3), (226, 3), (4, 2)]


**2021-11-30** Updates:
```python
with open('bbb', 'r') as file:
  batches = ast.literal_eval(file.read())
for batch in batches:
  elem_df, rels_df, relt_df = tao_iim.add_batch(batch, elem_df, rels_df, relt_df)
```
where `bbb` stands for 
- `batches_20211129.txt` and 
- `batches_20211130.txt`.


## Refactor concept names 
Refactor concept names in line with this concept model:

![Ontology](https://www.plantuml.com/plantuml/png/FP2nJWCn38RtF8ML2Kxg6o3KjGDGiNIv9JcRdgMhKpcEKDyUNmWCrdy-V_fivLWjgRNR5fQa2F4aHYfay5wGPc6H2Ac2Pv1Y1ChNrGBx7oGn_c92o0y8IU1qXeIeL6iWGTZn8RrGXa-gfUdYpgPRTtgE-RdbZPTaN6IMU_uTUuxn6zbQS9Qd3xrUajBpB4M_E-GP0ej0VCclQwbMu-4GkSB-tK-BVOzNH-YM2pBzKQD5O8bzeHV4QLhOg4xJeBmRge4CrNqhZtzJxuPfl-f8WlwiFm00)

In [11]:
# Create a new dataframe with terms
renadf = elem_df[['term']].copy()

# Add new column with the most frequent prefix
renadf['new_term'] = renadf.term.apply(lambda t: 'pbl '+t)

# Set `term` as index for future transformation to dict
renadf.set_index('term', inplace=True)
renadf.tail()

Unnamed: 0_level_0,new_term
term,Unnamed: 1_level_1
historical analysis and genealogy method,pbl historical analysis and genealogy method
"interview, focus group and workshop method","pbl interview, focus group and workshop method"
participatory and action research method,pbl participatory and action research method
research-creation and creative practice method,pbl research-creation and creative practice me...
van Zoonen et al. 2017,pbl van Zoonen et al. 2017


In [18]:
# Adjust a new name for a web resource (there is only one)
inn_ele = elem_df.loc[elem_df.url.notna()].term.values[0]
renadf.at[inn_ele, 'new_term'] = 'inn Fraser 2019a'
renadf.at[inn_ele, 'new_term']

'inn Fraser 2019a'

**Ex-post Note**: Above should be `url Fraser 2019a`! Corrected directly in data ...

In [29]:
# Identify methods (they starts with a lowercase letter except the last element)
methods = elem_df.loc[elem_df.term.apply(lambda t: t[0].islower())]['term'].to_list()[:-1]
# Adjust prefixes for methods
for method in methods:
  print(method, end=' --> ')
  renadf.at[method, 'new_term'] = 'orw ' + method
  print(renadf.at[method, 'new_term'])

critical data science method --> orw critical data science method
critical data study --> orw critical data study
data (auto)ethnography method --> orw data (auto)ethnography method
data archaeology method --> orw data archaeology method
data walk method --> orw data walk method
discursive analysis method --> orw discursive analysis method
historical analysis and genealogy method --> orw historical analysis and genealogy method
interview, focus group and workshop method --> orw interview, focus group and workshop method
participatory and action research method --> orw participatory and action research method
research-creation and creative practice method --> orw research-creation and creative practice method


In [30]:
# Check the values visually
renadf.tail(12)

Unnamed: 0_level_0,new_term
term,Unnamed: 1_level_1
Wyly 2019,pbl Wyly 2019
critical data science method,orw critical data science method
critical data study,orw critical data study
data (auto)ethnography method,orw data (auto)ethnography method
data archaeology method,orw data archaeology method
data walk method,orw data walk method
discursive analysis method,orw discursive analysis method
historical analysis and genealogy method,orw historical analysis and genealogy method
"interview, focus group and workshop method","orw interview, focus group and workshop method"
participatory and action research method,orw participatory and action research method


In [37]:
# Make a dict and rename terms
renadict = renadf.to_dict()['new_term']
elem_df.term = elem_df.term.apply(lambda t: renadict[t])
rels_df.elea = rels_df.elea.apply(lambda t: renadict[t])
rels_df.eleb = rels_df.eleb.apply(lambda t: renadict[t])

In [38]:
# Check elements
elem_df.head()

Unnamed: 0,term,title,url
0,pbl Ampatzidou et al. 2015,"Ampatzidou, C., Bouw, M., van de Klundert, F.,...",
1,pbl Andrejevic 2007,"Andrejevic, M. (2007) iSpy: Surveillance and P...",
2,pbl Baack 2015,"Baack, S. (2015) ‘Datafication and empowerment...",
3,pbl Barnes and Wilson 2014,"Barnes, T.J. and Wilson, M.W. (2014) ‘Big Data...",
4,pbl Bates 2018,"Bates, J. (2018) ‘The politics of data frictio...",


In [39]:
# Check relationships
rels_df.head()

Unnamed: 0,elea,relt,eleb
0,pbl Kitchin 2021b,cites,pbl Herbert 2000
1,pbl Kitchin 2021b,cites,pbl Crang and Cook 2007
2,pbl Kitchin 2021b,cites,pbl Knox and Nafus 2018:3
3,pbl Kitchin 2021b,cites,pbl Pink et al. 2018b
4,pbl Kitchin 2021b,cites,pbl Meng and DiSalvo 2018


In [41]:
# Check relationship types
rels_df.relt.value_counts()

instantiates    117
cites            99
uses              9
embodies          1
Name: relt, dtype: int64

In [42]:
renreldict = {'instantiates' : 'employs', 'uses' : 'employs', 'cites' : 'cites', 'embodies' : 'embodies'}
rels_df.relt = rels_df.relt.apply(lambda r: renreldict[r])
# Test relationships types
rels_df.relt.value_counts()

employs     126
cites        99
embodies      1
Name: relt, dtype: int64

## Explore the Knowledge Base

In [43]:
elem_df.loc[elem_df.title.isna()].sort_values(by='term')

Unnamed: 0,term,title,url
100,orw critical data science method,,
101,orw critical data study,,
102,orw data (auto)ethnography method,,
103,orw data archaeology method,,
104,orw data walk method,,
105,orw discursive analysis method,,
106,orw historical analysis and genealogy method,,
107,"orw interview, focus group and workshop method",,
108,orw participatory and action research method,,
109,orw research-creation and creative practice me...,,


In [46]:
for method in rels_df.query('elea == "orw critical data study"').eleb:
  print(method)
  print(rels_df.query(f'relt == "employs" and eleb ==\"{method}\"').elea.to_list())

orw data (auto)ethnography method
['orw critical data study', 'pbl Pink et al. 2018b', 'pbl Meng and DiSalvo 2018', 'pbl Coletta and Kitchin 2017', 'pbl Tanweer et al. 2016', 'pbl Lowrie 2017', 'pbl Evans and Kitchin 2018', 'pbl Grommé et al. 2018', 'pbl Pink and Fors 2017', 'pbl Beneito-Montagut et al. 2017', 'pbl Lehtiniemi and Ruckenstein 2019', 'pbl Fraser 2019a', 'pbl Kitchin 2021a', 'pbl Kitchin et al. 2016']
orw interview, focus group and workshop method
['orw critical data study', 'pbl Meng and DiSalvo 2018', 'pbl Bates et al. 2016', 'pbl Bates 2018', 'pbl Ruijer et al. 2017', 'pbl Kitchin and Moore-Cherry 2020', 'pbl Brayne 2017', 'pbl Delaney 2019', 'pbl Delaney and Kitchin 2020', 'pbl Thatcher 2014', 'pbl Gray 2019', 'pbl Chenou and Cepeda-Másmela 2019', 'pbl Cinnamon 2019', 'pbl Baack 2015']
orw data archaeology method
['orw critical data study', 'pbl Light et al. 2016', 'pbl Leszczynski 2017', 'pbl Tanweer et al. 2016', 'pbl Dodge and Kitchin 2000', 'pbl Bates et al. 2016'

In [60]:
# Select the relevant rows
rels_df.loc[(rels_df.relt == 'employs') & (rels_df.eleb == method) & (rels_df.elea.apply(lambda t: t[:3] == 'pbl'))]

Unnamed: 0,elea,relt,eleb
123,pbl Pickles 1995,employs,orw critical data science method
124,pbl Schuurman and Pratt 2002,employs,orw critical data science method
125,pbl Dunn 2007,employs,orw critical data science method
126,pbl Crampton et al. 2013,employs,orw critical data science method
127,pbl Williams 2020,employs,orw critical data science method
128,pbl D’Ignazio and Klein 2020,employs,orw critical data science method
129,pbl Graham et al. 2014,employs,orw critical data science method
130,pbl Graham et al. 2015,employs,orw critical data science method
131,pbl Robinson and Franklin 2020,employs,orw critical data science method


In [63]:
# JOIN rels_df AND elem_df ON rels_df.elea = elem_df.term
df = rels_df.merge(elem_df[['term', 'title']], how='left', left_on='elea', right_on='term')

for method in df.query('elea == "orw critical data study"').eleb:
  print(f'- **{method[4:].title()}**')
  # Select the relevant rows
  publications = df.loc[(df.relt == 'employs') & (df.eleb == method) & (df.elea.apply(lambda t: t[:3] == 'pbl'))]  
  for _,row in publications.sort_values(by='elea').iterrows():
    print(f'\t- ({row[3]}) {row[4]}')

- **Data (Auto)Ethnography Method**
	- (pbl Beneito-Montagut et al. 2017) Beneito-Montagut, R., Begueria, A. and Cassián, N. (2017) ‘Doing digital team ethnography: Being there together and digital social data’, Qualitative Research, 17(6), 664–682.
	- (pbl Coletta and Kitchin 2017) Coletta, C. and Kitchin, R. (2017) ‘Algorhythmic governance: Regulating the “heartbeat” of a city using the Internet of Things’, Big Data & Society, 4: 1–16.
	- (pbl Evans and Kitchin 2018) Evans, L. and Kitchin, R. (2018) ‘A smart place to work? Big data systems, labour, control, and modern retail stores’, New Technology, Work and Employment, 33(1): 44–57.
	- (pbl Fraser 2019a) Fraser, A. (2019a) ‘Curating digital geographies in an era of data colonialism’, Geoforum, 104: 193–200.
	- (pbl Grommé et al. 2018) Grommé, F., Ruppert, E. and Cakici, B. (2018) ‘Data scientists: A new faction of the transnational field of statistics’, in Knox, H. and Nafus, D. (eds), Ethnography for a Data-Saturated World. Manches

In [65]:
df.query(f'relt == "employs" and eleb ==\"{method}\"')

Unnamed: 0,elea,relt,eleb,term,title
24,orw critical data study,employs,orw critical data science method,orw critical data study,
123,pbl Pickles 1995,employs,orw critical data science method,pbl Pickles 1995,"Pickles, J. (ed) (1995) Ground Truth: The Soci..."
124,pbl Schuurman and Pratt 2002,employs,orw critical data science method,pbl Schuurman and Pratt 2002,"Schuurman, N. and Pratt, G. (2002) ‘Care of th..."
125,pbl Dunn 2007,employs,orw critical data science method,pbl Dunn 2007,"Dunn, C.E. (2007) ‘Participatory GIS – A peopl..."
126,pbl Crampton et al. 2013,employs,orw critical data science method,pbl Crampton et al. 2013,"Crampton, J., Graham, M., Poorthuis, A., Shelt..."
127,pbl Williams 2020,employs,orw critical data science method,pbl Williams 2020,"Williams, S. (2020) Data Action: Using Data fo..."
128,pbl D’Ignazio and Klein 2020,employs,orw critical data science method,pbl D’Ignazio and Klein 2020,"D’Ignazio, C. and Klein, L.F. (2020) Data Femi..."
129,pbl Graham et al. 2014,employs,orw critical data science method,pbl Graham et al. 2014,"Graham, M., Hogan, B., Straumann, R.K. and Med..."
130,pbl Graham et al. 2015,employs,orw critical data science method,pbl Graham et al. 2015,"Graham, M., De Sabbata, S. and Zook, M. (2015)..."
131,pbl Robinson and Franklin 2020,employs,orw critical data science method,pbl Robinson and Franklin 2020,"Robinson, C. and Franklin, R.S. (2020) ‘The se..."


## Save the results

In [67]:
# Save the results
# do_it = False
do_it = True
if do_it:
  tao_iim.dump_elem_rel(elem_df, rels_df, relt_df)

Don't forget to download results for further use!!
