# Govbase for data scientists

## What is Govbase?

## Why use Govbase?

## Importing data from Govbase (in notebook only)
- Using [Python wrapper for Airtable API](https://github.com/josephbestjames/airtable.py) to import data
- Need to request access and create account to get API key
- Recommend saving the API key in a (.gitignore-d) text file and importing
- Loading into pandas DataFrame for ease of analysis

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
from airtable import airtable

# Set Airtable access parameters
BASE_ID = 'appx3e9Przn9iprkU'
with open('api_key.txt', 'r') as f:
    API_KEY = f.readline().strip()
    
# Set Airtable table-specific parameters
FIELDS = {'Projects': ['Project name', 'Status', 'Online / offline', 'Implements structures', 
                       'Category', 'Subcategory', 'Tags', 'Project ownership type', 'Legally owned by organization', 
                       'Contributed to by organization', 'Funded by organization', 'Instances'],
         'Organizations': ['Organization name', 'Structure (observed)', 'Activities', 'Size', 
                           'Do you need permission to join?', 'To join, you need to...', 'Total effort to join', 
                           'How do members meet?', 'Process to leave', 'Instances',
                           'How open-source is your infrastructure?'],
         'Structures': ['Structure name', 'Belongs to ontology', 'Is subclass of', 'Is component of',
                        'Is property of', 'Tags', 'Is superclass of', 'Is subclass of (all)',
                        'Implemented by project', 'Used by organization in their governance',
                        'Adopted by organization']}

In [3]:
at = airtable.Airtable(BASE_ID, API_KEY)

In [4]:
def get_table_as_df(tableName):
    '''Get all records in a table and load into DataFrame'''
    records = []
    for r in at.iterate(tableName, fields=FIELDS[tableName]):
        records.append({'id': r['id'], **(r['fields'])})
    df = pd.DataFrame(records)
    df.set_index('id', inplace=True)
    return df

In [5]:
# Load tables into DataFrames
df_projects = get_table_as_df('Projects')
df_orgs = get_table_as_df('Organizations')
df_structs = get_table_as_df('Structures')

In [6]:
print(len(df_projects.index))
print(len(df_orgs.index))
print(len(df_structs.index))

496
443
411


In [7]:
df_projects

Unnamed: 0_level_0,Project name,Category,Legally owned by organization,Online / offline,Contributed to by organization,Status,Project ownership type,Subcategory,Implements structures,Tags,Instances,Funded by organization
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
rec0Ddf5WWSVr8qRC,Futarchy,product,[recD7IYQqmaGm3PW1],Online communities,[recD7IYQqmaGm3PW1],Inactive,Open-source,[software library],,,,
rec0QC4Cmj2iIhWJC,Simple Machines Forum,product,,Online communities,,Active,Open-source,[application/tool],[recz34NUJWuz5yYI4],,,
rec0TtUnv3rJqorXf,Jupyter,product,[reccOsNXSYFXhYqpK],Not community-related,[recRQ3M1BTucR3kS9],Active,Open-source,[software framework],,,,
rec0U16mgXwzoXzBf,Yearn,product,,Not community-related,,Active,,[application/tool],,"[blockchain ecosystem, DeFi]",,
rec0XapkrY8v3FJ5Z,Airesis,platform,,Online communities,,Active,,[social network],,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
reczdVXlxGbFpIx0m,Flux,product,[rec9hbEa8ogbnHo3l],Not specific,[rec9hbEa8ogbnHo3l],Active,,,"[recMZpmuQWNADy7jY, recjSSjuZcs0VulLF, recTVzY...",,[rec3vdAGL069mSeWn],
reczicxV98mR9Rkg0,Google Docs,platform,,Not specific,,Active,,,[recvj08aLkF9uaU2e],,"[rec4blirx7ubTiLVw, recrAmuaC0zuAlHtK]",
reczijQGabIk2SAa2,The Internet,platform,,Online communities,,Active,Open-source,,,,,
reczo23u6HdWQ6983,Zodiac,product,,Online communities,,Active,Open-source,[software library],,"[DAO ecosystem, blockchain ecosystem]",,"[rec5hy1aZhtNdSZiN, recCUHw90FBPT0TQL]"


## What is currently in Govbase?

### **Numbers**: of active projects/organizations/documents

### **Viz**: distribution of represented project categories, structures, project ownership types

### **Viz**: distribution of represented organization structures, activities

## What can we learn from Govbase? Some examples.

### **Viz**: Which projects are most frequently used by organizations?

### **Viz**: Which tools are built on most? (dependencies of the most projects)

### **Viz**: Distribution of owner types/funding sources for blockchain ecosystem projects

### **Viz**: Distribution of how open-source organizations are; relationship to e.g., funding source, activity

### **Viz**: Most common combinations of structures observed in DAOs (i.e., what's most likely to be needed to implement a DAO)

### **Viz**: Distribution of DAO joining/leaving methods, total effort to join, how do members meet? Are these related to e.g., DAO size, activities

## Notes on Govbase future plans/maintenance
- Expansion of Documents, Constitutional models, APIs, Entity-Decision model tables?
- Maintenance of more fleshed out tables?
    - Scrape fields where possible?
    - Auto-updating of scrapeable fields every ~1 week?
    - Call upon reader to submit updates where needed?