You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To use nppes-postgres in my environment I had to make some changes. I do not believe that these changes are broadly useful enough for a pull request so I am sharing them as a comment. Thank you @semerj for sharing the basic technique.
Modified shell script to run against remote postgres instance. Note - This is specific to my environment, it is not a good idea to do this remotely unless necessary! You have to transfer all 2.8GB of uncompressed NPPES data and that takes some time even on a fast network.
Added some indexes. Note that these are UPPER indexes. The npi column was indexed as both a numeric PK and as a varchar, my use case has me joining to another table where npi data is stored as a varchar.
#!/bin/sh
#TODO set this so that it just replicates the psql commands
# ./create_npi_db.sh [USERNAME] [DATABASE] [NPI-DATA] [TAXONOMY-DATA] [HOST]
#wget the nppes fulls to ./data
#wget the nucc taxonomy fulls to ./data
# clean data (per Readme.md)
# Replace empty "" integer fields in NPI CSV file
# interesting fact, this takes the NPPES file from 5.5GB to 2.8GB
sed 's/""//g' $3 > cleaned_npi.csv
# change taxonomy per readme.md
# Convert taxonomy data to utf-8 and delimiter to "|"
# csvformat took some extra effort on OSX, had to install pip then get csvkit
iconv -c -t utf8 $4 | csvformat -T > taxonomy.tab
psql -U $1 -d $2 -h $5 -f nppes_tables.sql
#had to change this to pipe and use stdin
#when you use psql and COPY remotely, it runs the COPY command on that server so the file isn't readable
cat taxonomy.tab | psql -U $1 -d $2 -h $5 -c "COPY taxonomy FROM STDIN WITH CSV HEADER DELIMITER AS E'\t';"
#doing this remotely is SLOW!
cat cleaned_npi.csv | psql -U $1 -d $2 -h $5 -c "COPY npi FROM STDIN WITH CSV HEADER DELIMITER AS ',' NULL AS '';"
psql -U $1 -d $2 -h $5 -f nppes_indexes.sql
--constraints
--make npi PK
ALTER TABLE npi ADD CONSTRAINT PRIMARY KEY (npi);
--indexes on npi
--npi is implicit because it is a PK
--index npi as a varchar since other systems represent npis as strings
CREATE INDEX npi_npi_as_string ON npi (cast(npi as varchar));
--replacement npi
CREATE INDEX npi_replacement ON npi (replacement_npi));
--last name (upper)
CREATE INDEX npi_upper_last ON npi (upper(provider_last_name));
--first and last name (composite, upper)
CREATE INDEX npi_upper_first_last ON npi (upper(provider_last_name), upper(provider_first_name));
--entity_type
CREATE INDEX npi_entity_type ON npi (entity_type_code);
The text was updated successfully, but these errors were encountered:
To use nppes-postgres in my environment I had to make some changes. I do not believe that these changes are broadly useful enough for a pull request so I am sharing them as a comment. Thank you @semerj for sharing the basic technique.
Modified shell script to run against remote postgres instance. Note - This is specific to my environment, it is not a good idea to do this remotely unless necessary! You have to transfer all 2.8GB of uncompressed NPPES data and that takes some time even on a fast network.
Added some indexes. Note that these are UPPER indexes. The npi column was indexed as both a numeric PK and as a varchar, my use case has me joining to another table where npi data is stored as a varchar.
The text was updated successfully, but these errors were encountered: