## Importing/Exporting  data into/from PSQL database 

Population Distribution and Change: 2000 to 2010

- https://www.census.gov/prod/cen2010/briefs/c2010br-01.pdf

Data dictionary
- https://www.census.gov/prod/cen2010/doc/pl94-171.pdf



In [1]:
%load_ext sql

Connect to the empty database made with pgadmin

In [2]:
%sql postgresql://postgres:eric@localhost:5432/analysis

'Connected: postgres@analysis'

### CREATE
Lets create a table with the columns we need

In [3]:
%%sql
CREATE TABLE us_counties_2010 (
    geo_name varchar(90),                    -- Name of the geography
    state_us_abbreviation varchar(2),        -- State/U.S. abbreviation
    summary_level varchar(3),                -- Summary Level
    region smallint,                         -- Region
    division smallint,                       -- Division
    state_fips varchar(2),                   -- State FIPS code
    county_fips varchar(3),                  -- County code
    area_land bigint,                        -- Area (Land) in square meters
    area_water bigint,                        -- Area (Water) in square meters
    population_count_100_percent integer,    -- Population count (100%)
    housing_unit_count_100_percent integer,  -- Housing Unit count (100%)
    internal_point_lat numeric(10,7),        -- Internal point (latitude)
    internal_point_lon numeric(10,7),        -- Internal point (longitude)

    -- This section is referred to as P1. Race:
    p0010001 integer,   -- Total population
    p0010002 integer,   -- Population of one race:
    p0010003 integer,       -- White Alone
    p0010004 integer,       -- Black or African American alone
    p0010005 integer,       -- American Indian and Alaska Native alone
    p0010006 integer,       -- Asian alone
    p0010007 integer,       -- Native Hawaiian and Other Pacific Islander alone
    p0010008 integer,       -- Some Other Race alone
    p0010009 integer,   -- Population of two or more races
    p0010010 integer,   -- Population of two races:
    p0010011 integer,       -- White; Black or African American
    p0010012 integer,       -- White; American Indian and Alaska Native
    p0010013 integer,       -- White; Asian
    p0010014 integer,       -- White; Native Hawaiian and Other Pacific Islander
    p0010015 integer,       -- White; Some Other Race
    p0010016 integer,       -- Black or African American; American Indian and Alaska Native
    p0010017 integer,       -- Black or African American; Asian
    p0010018 integer,       -- Black or African American; Native Hawaiian and Other Pacific Islander
    p0010019 integer,       -- Black or African American; Some Other Race
    p0010020 integer,       -- American Indian and Alaska Native; Asian
    p0010021 integer,       -- American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander
    p0010022 integer,       -- American Indian and Alaska Native; Some Other Race
    p0010023 integer,       -- Asian; Native Hawaiian and Other Pacific Islander
    p0010024 integer,       -- Asian; Some Other Race
    p0010025 integer,       -- Native Hawaiian and Other Pacific Islander; Some Other Race
    p0010026 integer,   -- Population of three races
    p0010047 integer,   -- Population of four races
    p0010063 integer,   -- Population of five races
    p0010070 integer,   -- Population of six races

    -- This section is referred to as P2. HISPANIC OR LATINO, AND NOT HISPANIC OR LATINO BY RACE
    p0020001 integer,   -- Total
    p0020002 integer,   -- Hispanic or Latino
    p0020003 integer,   -- Not Hispanic or Latino:
    p0020004 integer,   -- Population of one race:
    p0020005 integer,       -- White Alone
    p0020006 integer,       -- Black or African American alone
    p0020007 integer,       -- American Indian and Alaska Native alone
    p0020008 integer,       -- Asian alone
    p0020009 integer,       -- Native Hawaiian and Other Pacific Islander alone
    p0020010 integer,       -- Some Other Race alone
    p0020011 integer,   -- Two or More Races
    p0020012 integer,   -- Population of two races
    p0020028 integer,   -- Population of three races
    p0020049 integer,   -- Population of four races
    p0020065 integer,   -- Population of five races
    p0020072 integer,   -- Population of six races

    -- This section is referred to as P3. RACE FOR THE POPULATION 18 YEARS AND OVER
    p0030001 integer,   -- Total
    p0030002 integer,   -- Population of one race:
    p0030003 integer,       -- White alone
    p0030004 integer,       -- Black or African American alone
    p0030005 integer,       -- American Indian and Alaska Native alone
    p0030006 integer,       -- Asian alone
    p0030007 integer,       -- Native Hawaiian and Other Pacific Islander alone
    p0030008 integer,       -- Some Other Race alone
    p0030009 integer,   -- Two or More Races
    p0030010 integer,   -- Population of two races
    p0030026 integer,   -- Population of three races
    p0030047 integer,   -- Population of four races
    p0030063 integer,   -- Population of five races
    p0030070 integer,   -- Population of six races

    -- This section is referred to as P4. HISPANIC OR LATINO, AND NOT HISPANIC OR LATINO BY RACE
    -- FOR THE POPULATION 18 YEARS AND OVER
    p0040001 integer,   -- Total
    p0040002 integer,   -- Hispanic or Latino
    p0040003 integer,   -- Not Hispanic or Latino:
    p0040004 integer,   -- Population of one race:
    p0040005 integer,   -- White alone
    p0040006 integer,   -- Black or African American alone
    p0040007 integer,   -- American Indian and Alaska Native alone
    p0040008 integer,   -- Asian alone
    p0040009 integer,   -- Native Hawaiian and Other Pacific Islander alone
    p0040010 integer,   -- Some Other Race alone
    p0040011 integer,   -- Two or More Races
    p0040012 integer,   -- Population of two races
    p0040028 integer,   -- Population of three races
    p0040049 integer,   -- Population of four races
    p0040065 integer,   -- Population of five races
    p0040072 integer,   -- Population of six races

    -- This section is referred to as H1. OCCUPANCY STATUS
    h0010001 integer,   -- Total housing units
    h0010002 integer,   -- Occupied
    h0010003 integer    -- Vacant
);

 * postgresql://postgres:***@localhost:5432/analysis
Done.


[]

In [5]:
%sql select * from us_counties_2010

 * postgresql://postgres:***@localhost:5432/analysis
0 rows affected.


geo_name,state_us_abbreviation,summary_level,region,division,state_fips,county_fips,area_land,area_water,population_count_100_percent,housing_unit_count_100_percent,internal_point_lat,internal_point_lon,p0010001,p0010002,p0010003,p0010004,p0010005,p0010006,p0010007,p0010008,p0010009,p0010010,p0010011,p0010012,p0010013,p0010014,p0010015,p0010016,p0010017,p0010018,p0010019,p0010020,p0010021,p0010022,p0010023,p0010024,p0010025,p0010026,p0010047,p0010063,p0010070,p0020001,p0020002,p0020003,p0020004,p0020005,p0020006,p0020007,p0020008,p0020009,p0020010,p0020011,p0020012,p0020028,p0020049,p0020065,p0020072,p0030001,p0030002,p0030003,p0030004,p0030005,p0030006,p0030007,p0030008,p0030009,p0030010,p0030026,p0030047,p0030063,p0030070,p0040001,p0040002,p0040003,p0040004,p0040005,p0040006,p0040007,p0040008,p0040009,p0040010,p0040011,p0040012,p0040028,p0040049,p0040065,p0040072,h0010001,h0010002,h0010003


In [7]:
%%sql
copy us_counties_2010
from 'D:\\ml_code\\data_science\\sql\\data\\us_counties_2010.csv'
WITH (FORMAT CSV, HEADER);

 * postgresql://postgres:***@localhost:5432/analysis
3143 rows affected.


[]

In [8]:
%sql select * from us_counties_2010 limit 5;

 * postgresql://postgres:***@localhost:5432/analysis
5 rows affected.


geo_name,state_us_abbreviation,summary_level,region,division,state_fips,county_fips,area_land,area_water,population_count_100_percent,housing_unit_count_100_percent,internal_point_lat,internal_point_lon,p0010001,p0010002,p0010003,p0010004,p0010005,p0010006,p0010007,p0010008,p0010009,p0010010,p0010011,p0010012,p0010013,p0010014,p0010015,p0010016,p0010017,p0010018,p0010019,p0010020,p0010021,p0010022,p0010023,p0010024,p0010025,p0010026,p0010047,p0010063,p0010070,p0020001,p0020002,p0020003,p0020004,p0020005,p0020006,p0020007,p0020008,p0020009,p0020010,p0020011,p0020012,p0020028,p0020049,p0020065,p0020072,p0030001,p0030002,p0030003,p0030004,p0030005,p0030006,p0030007,p0030008,p0030009,p0030010,p0030026,p0030047,p0030063,p0030070,p0040001,p0040002,p0040003,p0040004,p0040005,p0040006,p0040007,p0040008,p0040009,p0040010,p0040011,p0040012,p0040028,p0040049,p0040065,p0040072,h0010001,h0010002,h0010003
Autauga County,AL,50,3,6,1,1,1539582278,25775735,54571,22135,32.5363818,-86.6444901,54571,53702,42855,9643,232,474,32,466,869,814,219,262,177,11,50,32,19,9,16,0,0,5,5,8,1,49,6,0,0,54571,1310,53261,52500,42154,9595,217,467,22,45,761,719,36,6,0,0,39958,39530,31910,6767,180,346,23,304,428,404,22,2,0,0,39958,828,39130,38746,31461,6738,169,341,15,22,384,363,19,2,0,0,22135,20221,1914
Baldwin County,AL,50,3,6,1,3,4117521611,1133190229,182265,104061,30.6592183,-87.7460666,182265,179542,156153,17105,1216,1348,89,3631,2723,2583,658,1035,336,35,311,63,34,7,38,12,0,15,11,14,14,128,11,1,0,182265,7992,174273,171976,152200,16966,1146,1340,79,245,2297,2205,87,5,0,0,140367,138905,122238,12272,923,1002,70,2400,1462,1398,60,3,1,0,140367,5186,135181,133937,119671,12193,876,994,62,141,1244,1198,46,0,0,0,104061,73180,30881
Barbour County,AL,50,3,6,1,5,2291818968,50864716,27457,11829,31.8706701,-85.4054562,27457,27199,13180,12875,114,107,29,894,258,254,76,49,7,2,34,54,2,4,7,0,1,2,1,5,10,3,0,0,1,27457,1387,26070,25861,12837,12820,60,107,24,13,209,206,2,0,0,1,21442,21275,10855,9647,86,80,24,583,167,163,3,0,0,1,21442,925,20517,20382,10624,9605,47,80,21,5,135,132,2,0,0,1,11829,9820,2009
Bibb County,AL,50,3,6,1,7,1612480789,9289057,22915,8981,33.0158929,-87.1271475,22915,22712,17381,5047,64,22,13,185,203,195,50,77,16,3,16,9,13,5,4,0,0,2,0,0,0,8,0,0,0,22915,406,22509,22328,17191,5024,64,22,7,20,181,175,6,0,0,0,17714,17584,13403,3975,47,14,11,134,130,128,2,0,0,0,17714,310,17404,17284,13247,3963,47,14,5,8,120,119,1,0,0,0,8981,7953,1028
Blount County,AL,50,3,6,1,9,1669961855,15157440,57322,23887,33.9774479,-86.5672464,57322,56638,53068,761,307,117,38,2347,684,662,112,330,74,8,102,12,2,0,6,3,0,2,4,7,0,21,1,0,0,57322,4626,52696,52129,50952,724,285,115,18,35,567,547,19,1,0,0,43216,42810,40515,549,227,91,23,1405,406,394,12,0,0,0,43216,2724,40492,40141,39285,524,212,89,14,17,351,341,10,0,0,0,23887,21578,2309


what are the biggest counties?

In [9]:
%%sql
select geo_name,state_us_abbreviation, area_land
from us_counties_2010
order by area_land desc
limit 3;

 * postgresql://postgres:***@localhost:5432/analysis
3 rows affected.


geo_name,state_us_abbreviation,area_land
Yukon-Koyukuk Census Area,AK,376855656455
North Slope Borough,AK,229720054439
Bethel Census Area,AK,105075822708


counties furthest east

In [11]:
%%sql
select geo_name,state_us_abbreviation, internal_point_lon
from us_counties_2010
order by internal_point_lon asc
limit 3;

 * postgresql://postgres:***@localhost:5432/analysis
3 rows affected.


geo_name,state_us_abbreviation,internal_point_lon
Nome Census Area,AK,-164.1889119
Wade Hampton Census Area,AK,-163.1909497
Aleutians East Borough,AK,-161.9507485


### Importing subset of columns

If csv file doesn't contain all the columns in the table can import subset

In [27]:
#%sql DROP table supervisor_salaries_temp


In [28]:
%%sql
create table supervisor_salaries (
    town varchar(30),
    county varchar(30),
    supervisor varchar(30),
    start_date date,
    salary money,
    benefits money
);

 * postgresql://postgres:***@localhost:5432/analysis
Done.


[]

Import a csv file with data for only some columns or psql with throw an error, tell psql to fill data for specific columns

In [29]:
%%sql
COPY supervisor_salaries (town, supervisor,salary)
FROM 'D:\\ml_code\\data_science\\sql\\data\\supervisor_salaries.csv'
WITH (FORMAT CSV, HEADER);

 * postgresql://postgres:***@localhost:5432/analysis
5 rows affected.


[]

In [30]:
%sql select * from supervisor_salaries;

 * postgresql://postgres:***@localhost:5432/analysis
5 rows affected.


town,county,supervisor,start_date,salary,benefits
Anytown,,Jones,,"£27,000.00",
Bumblyburg,,Baker,,"£24,999.00",
Moetown,,Smith,,"£32,100.00",
Bigville,,Kao,,"£31,500.00",
New Brillig,,Carroll,,"£72,690.00",


### Using temporary tables to process an import

Populate the county column even though the value is missing from csv file


In [31]:
%sql DELETE FROM supervisor_salaries;

 * postgresql://postgres:***@localhost:5432/analysis
5 rows affected.


[]

In [32]:
%sql select * from supervisor_salaries

 * postgresql://postgres:***@localhost:5432/analysis
0 rows affected.


town,county,supervisor,start_date,salary,benefits


In [33]:
%%sql
CREATE TEMPORARY TABLE supervisor_salaries_temp (LIKE supervisor_salaries);

COPY supervisor_salaries_temp(town, supervisor, salary)
FROM 'D:\\ml_code\\data_science\\sql\\data\\supervisor_salaries.csv'
WITH (FORMAT CSV, HEADER)

 * postgresql://postgres:***@localhost:5432/analysis
Done.
5 rows affected.


[]

In [34]:
%%sql 
INSERT INTO supervisor_salaries (town, county, supervisor, salary)
SELECT town, 'Some County', supervisor, salary
FROM supervisor_salaries_temp;

 * postgresql://postgres:***@localhost:5432/analysis
5 rows affected.


[]

In [35]:
%sql DROP TABLE supervisor_salaries_temp;

 * postgresql://postgres:***@localhost:5432/analysis
Done.


[]

In [37]:
%sql SELECT * from supervisor_salaries;

 * postgresql://postgres:***@localhost:5432/analysis
5 rows affected.


town,county,supervisor,start_date,salary,benefits
Anytown,Some County,Jones,,"£27,000.00",
Bumblyburg,Some County,Baker,,"£24,999.00",
Moetown,Some County,Smith,,"£32,100.00",
Bigville,Some County,Kao,,"£31,500.00",
New Brillig,Some County,Carroll,,"£72,690.00",


### Exporting data
Export an entire table with copy, can also export particular columns by naming columns after copy table statement

In [45]:
%%sql
COPY us_counties_2010
TO 'D:\\ml_code\\data_science\\sql\\data\\us_counties_export.txt'
WITH (FORMAT CSV, HEADER, DELIMITER '|');

 * postgresql://postgres:***@localhost:5432/analysis
3143 rows affected.


[]

Exporting query results

In [46]:
%%sql
COPY 
(
    SELECT geo_name, state_us_abbreviation
    FROM us_counties_2010
    WHERE geo_name ILIKE '%mill%'
)
TO 'D:\\ml_code\\data_science\\sql\\data\\us_counties_mill_export.csv'
WITH (FORMAT CSV, HEADER, DELIMITER '|')

 * postgresql://postgres:***@localhost:5432/analysis
9 rows affected.


[]