# Taxonomy To SQLite

## Step By Step

In [1]:
import pandas as pd
import sqlite3
from tqdm.notebook import tqdm

In [2]:
pd.set_option("display.max_columns", 500)
pd.set_option('display.max_rows', 1000)

**Read in the cbsa taxonomy file (nucc_taxonomy_210.csv.csv)**

In [3]:
taxonomy_codes = pd.read_csv("../data/nucc_taxonomy_210.csv")

In [4]:
display(taxonomy_codes.shape)
display(taxonomy_codes.head())

(865, 10)

Unnamed: 0,Code,Grouping,Classification,Specialization,Definition,Effective Date,Deactivation Date,Last Modified Date,Notes,Display Name
0,193200000X,Group,Multi-Specialty,,A business group of one or more individual pra...,10/1/2003,,,[7/1/2003: new],Multi-Specialty Group
1,193400000X,Group,Single Specialty,,A business group of one or more individual pra...,10/1/2003,,,[7/1/2003: new],Single Specialty Group
2,207K00000X,Allopathic & Osteopathic Physicians,Allergy & Immunology,,An allergist-immunologist is trained in evalua...,4/1/2003,,7/1/2007,"Source: American Board of Medical Specialties,...",Allergy & Immunology Physician
3,207KA0200X,Allopathic & Osteopathic Physicians,Allergy & Immunology,Allergy,Definition to come...,4/1/2003,,,,Allergy Physician
4,207KI0005X,Allopathic & Osteopathic Physicians,Allergy & Immunology,Clinical & Laboratory Immunology,Definition to come...,4/1/2003,,,,Clinical & Laboratory Immunology (Allergy & Im...


In [5]:
# Listing currently existing tables in the database
with sqlite3.connect('../data/hcbb_group_reviews.sqlite') as db :
    query = """
    SELECT name
    FROM sqlite_master 
    WHERE type ='table' 
    AND name NOT LIKE 'sqlite_%';
    """ 
    
    test_df = pd.read_sql(query, db)

display(test_df)

Unnamed: 0,name
0,cbsa
1,npidata
2,taxonomy


**Loading Taxonomy to SQLite**

**IMPORTANT! This loading into the database should only be run once.** If you run this multiple times, it will create duplicate entries in the database. For the security of not re-running this code by accident, the code here is converted into markdown. **If you need to rebuild the database, delete the `data/hcbb.sqlite` file and re-run this cell as code. You will also need to make sure to re-run any other related scripts that builds other tables in the database.**

## Testing Final DB Load

In [6]:
# Listing currently existing tables in the database
with sqlite3.connect('../data/hcbb_group_reviews.sqlite') as db :
    query = """
    SELECT name
    FROM sqlite_master 
    WHERE type ='table' 
    AND name NOT LIKE 'sqlite_%';
    """ 
    
    test_df = pd.read_sql(query, db)

display(test_df)

Unnamed: 0,name
0,cbsa
1,npidata
2,taxonomy


**We should have 865 unique Codes**

In [7]:
with sqlite3.connect('../data/hcbb_group_reviews.sqlite') as db :
    query = """
    SELECT COUNT(DISTINCT(taxonomy_code))
    FROM taxonomy;
    """ 
    
    test_df = pd.read_sql(query, db)

display(test_df)

Unnamed: 0,COUNT(DISTINCT(taxonomy_code))
0,865


**We should have 244 unique Classifications**

In [8]:
with sqlite3.connect('../data/hcbb_group_reviews.sqlite') as db :
    query = """
    SELECT COUNT(DISTINCT(classification))
    FROM taxonomy;
    """ 
    
    test_df = pd.read_sql(query, db)

display(test_df)

Unnamed: 0,COUNT(DISTINCT(classification))
0,244


**We should have 29 unique Groupings**

In [9]:
with sqlite3.connect('../data/hcbb_group_reviews.sqlite') as db :
    query = """
    SELECT COUNT(DISTINCT(grouping))
    FROM taxonomy;
    """ 
    
    test_df = pd.read_sql(query, db)

display(test_df)

Unnamed: 0,COUNT(DISTINCT(grouping))
0,29
