### Spatial Data Management with PostgreSQL and PostGis


**Sample dataset:**
- [nyc_data.zip](https://github.com/giswqs/postgis/raw/master/data/nyc_data.zip) (Watch this [video](https://youtu.be/fROzLrjNDrs) to load data into PostGIS)

**References**:
ref: https://postgis.gishub.org/chapters/installation.html
- [Introduction to PostGIS](https://postgis.net/workshops/postgis-intro)
- [Using SQL with Geodatabases](https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/sql-and-enterprise-geodatabases.htm)
- [Relational functions for ST_Geometry](https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm)

## Connecting to the database

In [1]:
%load_ext sql

  from pandas.core import (


In [2]:
%sql postgresql://workshop:workshop@localhost:5432/workshop

In [3]:
%%sql

SELECT * from nyc_subway_stations LIMIT 5

gid,objectid,id,name,alt_name,cross_st,long_name,label,borough,nghbhd,routes,transfers,color,express,closed,geom
1,1.0,376.0,Cortlandt St,,Church St,"Cortlandt St (R,W) Manhattan","Cortlandt St (R,W)",Manhattan,,"R,W","R,W",YELLOW,,,010400002026690000010000000101000000371775B5C3CE2141CBD2347771315141
2,2.0,2.0,Rector St,,,Rector St (1) Manhattan,Rector St (1),Manhattan,,1,1,RED,,,010400002026690000010000000101000000CBE327F938CD21415EDBE1572D315141
3,3.0,1.0,South Ferry,,,South Ferry (1) Manhattan,South Ferry (1),Manhattan,,1,1,RED,,,010400002026690000010000000101000000C676635D10CD2141A0ECDB6975305141
4,4.0,125.0,138th St,Grand Concourse,Grand Concourse,"138th St / Grand Concourse (4,5) Bronx","138th St / Grand Concourse (4,5)",Bronx,,45,45,GREEN,,,010400002026690000010000000101000000F4CF3E3654032241B5704681A73C5141
5,5.0,126.0,149th St,Grand Concourse,Grand Concourse,149th St / Grand Concourse (4) Bronx,149th St / Grand Concourse (4),Bronx,,4,245,GREEN,express,,01040000202669000001000000010100000084DADF7AED0422410C380E6E3A3D5141


## Spatial Relationships

So far we have only used spatial functions that measure (`ST_Area`,
`ST_Length`), serialize (`ST_GeomFromText`) or deserialize (`ST_AsGML`)
geometries. What these functions have in common is that they only work
on one geometry at a time.

Spatial databases are powerful because they not only store geometry,
they also have the ability to compare *relationships between
geometries*.

Questions like "Which are the closest bike racks to a park?" or "Where
are the intersections of subway lines and streets?" can only be answered
by comparing geometries representing the bike racks, streets, and subway
lines.

The OGC standard defines the following set of methods to compare
geometries.

## ST_Equals

`ST_Equals(geometry A, geometry B)`tests the spatial equality of two geometries.

![](https://postgis.net/workshops/postgis-intro/_images/st_equals.png)

ST_Equals returns TRUE if two geometries of the same type have identical
x,y coordinate values, i.e. if the second shape is equal (identical) to
the first shape.

First, let\'s retrieve a representation of a point from our
`nyc_subway_stations` table. We\'ll take just the entry for \'Broad
St\'.

In [4]:
%%sql

SELECT name, geom, ST_AsText(geom)
FROM nyc_subway_stations
WHERE name = 'Broad St';

name,geom,st_astext
Broad St,0104000020266900000100000001010000000EEBD4CF27CF2141BC17D69516315141,MULTIPOINT(583571.905921312 4506714.34119218)


Then, plug the geometry representation back into an
`ST_Equals` test:

In [5]:
%%sql

SELECT name
FROM nyc_subway_stations
WHERE ST_Equals(geom, '0101000020266900000EEBD4CF27CF2141BC17D69516315141');

name
Broad St


## ST_Intersects, ST_Disjoint, ST_Crosses and ST_Overlaps

`ST_Intersects`,
`ST_Crosses`, and
`ST_Overlaps` test whether the
interiors of the geometries intersect.

![](https://postgis.net/workshops/postgis-intro/_images/st_intersects.png)

`ST_Intersects(geometry A, geometry B)` returns t (TRUE) if the two shapes have any space in
common, i.e., if their boundaries or interiors intersect.

![](https://postgis.net/workshops/postgis-intro/_images/st_disjoint.png)

The opposite of ST_Intersects is
`ST_Disjoint(geometry A , geometry B)`. If two geometries are disjoint, they do not intersect,
and vice-versa. In fact, it is often more efficient to test \"not
intersects\" than to test \"disjoint\" because the intersects tests can
be spatially indexed, while the disjoint test cannot.

![](https://postgis.net/workshops/postgis-intro/_images/st_crosses.png)

For multipoint/polygon, multipoint/linestring, linestring/linestring,
linestring/polygon, and linestring/multipolygon comparisons,
`ST_Crosses(geometry A, geometry B)`
returns t (TRUE) if the intersection results in a geometry whose
dimension is one less than the maximum dimension of the two source
geometries and the intersection set is interior to both source
geometries.

![](https://postgis.net/workshops/postgis-intro/_images/st_overlaps.png)

`ST_Overlaps(geometry A, geometry B)`
compares two geometries of the same dimension and returns TRUE if their
intersection set results in a geometry different from both but of the
same dimension.

Let\'s take our Broad Street subway station and determine its
neighborhood using the `ST_Intersects`
function:

In [6]:
%%sql

SELECT name, ST_AsText(geom)
FROM nyc_subway_stations
WHERE name = 'Broad St';

name,st_astext
Broad St,MULTIPOINT(583571.905921312 4506714.34119218)


In [7]:
%%sql

SELECT name, boroughname
FROM nyc_neighborhoods
WHERE ST_Intersects(geom, ST_GeomFromText('POINT(583571 4506714)',26918));

name,boroughname
Financial District,Manhattan


In [None]:
%%sql

SELECT ST_Distance(
  ST_GeometryFromText('POINT(0 5)'),
  ST_GeometryFromText('LINESTRING(-2 2, 2 2)')
  );

SELECT name, geom
FROM nyc_streets
WHERE ST_DWithin(
        geom,
        ST_GeomFromText('POINT(583571 4506714)',26918),
        10
      );

-- Create point
SELECT ST_AsText(ST_SetSRID(ST_Point(-72, 47), 4326));

In [None]:
%%sql

-- What is the geometry value for the street named ‘Atlantic Commons’?

select st_astext(geom) from nyc_streets ns  where name = 'Atlantic Commons';

-- MULTILINESTRING((586781.701577724 4504202.15314339,586863.51964484 4504215.9881701))

-- What neighborhood and borough is Atlantic Commons in?

SELECT name, boroughname, geom
FROM nyc_neighborhoods
WHERE ST_Intersects(
  geom,
  ST_GeomFromText('LINESTRING(586782 4504202,586864 4504216)', 26918)
);

-- What streets does Atlantic Commons join with?

select  name, geom
from nyc_streets ns
	where ST_crosses(geom, ST_GeomFromText('LINESTRING(586782 4504202,586864 4504216)', 26918));

-- What streets does Atlantic Commons join with?

SELECT name
FROM nyc_streets
WHERE ST_DWithin(
  geom,
  ST_GeomFromText('LINESTRING(586782 4504202,586864 4504216)', 26918),
  0.1
);


-- Approximately how many people live on (within 50 meters of) Atlantic Commons?

select sum(popn_total) from nyc_census_blocks ncb
WHERE ST_DWithin(
  geom,
  ST_GeomFromText('LINESTRING(586782 4504202,586864 4504216)', 26918),
  50
);


In [26]:
%%sql

SELECT
  subways.name AS subway_name,
  neighborhoods.name AS neighborhood_name,
  neighborhoods.boroname AS borough,
  subways.geom
FROM nyc_neighborhoods AS neighborhoods
JOIN nyc_subway_stations AS subways
ON ST_Contains(neighborhoods.geom, subways.geom)

WHERE subways.name = 'Broad St';

RuntimeError: (psycopg2.errors.UndefinedColumn) column neighborhoods.boroname does not exist
LINE 4:   neighborhoods.boroname AS borough,
          ^
HINT:  Perhaps you meant to reference the column "neighborhoods.boroughname".

[SQL: SELECT
  subways.name AS subway_name,
  neighborhoods.name AS neighborhood_name,
  neighborhoods.boroname AS borough,
  subways.geom
FROM nyc_neighborhoods AS neighborhoods
JOIN nyc_subway_stations AS subways
ON ST_Contains(neighborhoods.geom, subways.geom)

WHERE subways.name = 'Broad St';]
(Background on this error at: https://sqlalche.me/e/20/f405)
If you need help solving this issue, send us a message: https://ploomber.io/community


## ST_Touches

`ST_Touches` tests whether two
geometries touch at their boundaries, but do not intersect in their
interiors

![](https://postgis.net/workshops/postgis-intro/_images/st_touches.png)

`ST_Touches(geometry A, geometry B)`
returns TRUE if either of the geometries\' boundaries intersect or if
only one of the geometry\'s interiors intersects the other\'s boundary.

## ST_Within and ST_Contains

`ST_Within` and
`ST_Contains` test whether one
geometry is fully within the other.

![](https://postgis.net/workshops/postgis-intro/_images/st_within.png)

`ST_Within(geometry A , geometry B)`
returns TRUE if the first geometry is completely within the second
geometry. ST_Within tests for the exact opposite result of ST_Contains.

`ST_Contains(geometry A, geometry B)`
returns TRUE if the second geometry is completely contained by the first
geometry.

## ST_Distance and ST_DWithin

An extremely common GIS question is \"find all the stuff within distance
X of this other stuff\".

The `ST_Distance(geometry A, geometry B)` calculates the *shortest* distance between two
geometries and returns it as a float. This is useful for actually
reporting back the distance between objects.

In [8]:
%%sql

SELECT ST_Distance(
  ST_GeometryFromText('POINT(0 5)'),
  ST_GeometryFromText('LINESTRING(-2 2, 2 2)'));

st_distance
3.0


For testing whether two objects are within a distance of one another,
the `ST_DWithin` function provides an
index-accelerated true/false test. This is useful for questions like
\"how many trees are within a 500 meter buffer of the road?\". You
don\'t have to calculate an actual buffer, you just have to test the
distance relationship.

![](https://postgis.net/workshops/postgis-intro/_images/st_dwithin.png)

Using our Broad Street subway station again, we can find the streets
nearby (within 10 meters of) the subway stop:

In [9]:
%%sql

SELECT name
FROM nyc_streets
WHERE ST_DWithin(
        geom,
        ST_GeomFromText('POINT(583571 4506714)',26918),
        10
      );

name
Wall St
Broad St
Nassau St


In [None]:
%%%sql

-- What is the population and racial make-up of the neighborhoods of Manhattan?

select ncb.boroname from nyc_census_blocks ncb group by ncb.boroname;

select boroname from nyc_neighborhoods nn group by boroname;

select neighborhoods.name as neighborhood_name,
sum(census.popn_total) as population,
round(cast(100 * SUM(census.popn_white) / sum(census.popn_total) as numeric), 2) as white_pct,
round(cast(100 * SUM(census.popn_black) / sum(census.popn_total) as numeric), 2) as black_pct
from nyc_census_blocks as census
join nyc_neighborhoods as neighborhoods
on st_intersects(census.geom, neighborhoods.geom)
WHERE neighborhoods.boroname = 'Manhattan'
GROUP BY neighborhoods.name
order by White_pct desc;

-- NY racial baseline make-up of the city
SELECT
  100.0 * Sum(popn_white) / Sum(popn_total) AS white_pct,
  100.0 * Sum(popn_black) / Sum(popn_total) AS black_pct,
  Sum(popn_total) AS popn_total
FROM nyc_census_blocks;

SELECT DISTINCT routes FROM nyc_subway_stations as subways
WHERE strpos(subways.routes,'A') > 0;

-- Let’s summarize the racial make-up of within 200 meters of the A-train line
SELECT
round(cast(100 * SUM(census.popn_white) / sum(census.popn_total) as numeric), 2) as white_pct,
round(cast(100 * SUM(census.popn_black) / sum(census.popn_total) as numeric), 2) as black_pct,
SUM(popn_total) as popn_total
FROM nyc_census_blocks as census
JOIN nyc_subway_stations as subways
on ST_DWithin(census.geom, subways.geom, 200)
WHERE strpos(subways.routes,'A') > 0;

-- 13.2 Advanced Join

CREATE TABLE subway_lines ( route char(1) );
INSERT INTO subway_lines (route) VALUES
  ('A'),('B'),('C'),('D'),('E'),('F'),('G'),
  ('J'),('L'),('M'),('N'),('Q'),('R'),('S'),
  ('Z'),('1'),('2'),('3'),('4'),('5'),('6'),
  ('7');


select
lines.route,
round(cast(100 * SUM(census.popn_white) / sum(census.popn_total) as numeric), 2) as white_pct,
round(cast(100 * SUM(census.popn_black) / sum(census.popn_total) as numeric), 2) as black_pct,
SUM(popn_total) as popn_total
FROM nyc_census_blocks as census
JOIN nyc_subway_stations as subways
on ST_DWithin(census.geom, subways.geom, 200)
join subway_lines as lines
on strpos(subways.routes, lines.route) > 0
group by lines.route
order  by black_pct DESC;

And we can verify the answer on a map. The Broad St station is actually
at the intersection of Wall, Broad and Nassau Streets.

![image](https://postgis.net/workshops/postgis-intro/_images/broad_st.jpg)

## Function List

[ST_Contains(geometry A, geometry
B)](http://postgis.net/docs/ST_Contains.html): Returns true if and only
if no points of B lie in the exterior of A, and at least one point of
the interior of B lies in the interior of A.

[ST_Crosses(geometry A, geometry
B)](http://postgis.net/docs/ST_Crosses.html): Returns TRUE if the
supplied geometries have some, but not all, interior points in common.

[ST_Disjoint(geometry A , geometry
B)](http://postgis.net/docs/ST_Disjoint.html): Returns TRUE if the
Geometries do not \"spatially intersect\" - if they do not share any
space together.

[ST_Distance(geometry A, geometry
B)](http://postgis.net/docs/ST_Distance.html): Returns the 2-dimensional
cartesian minimum distance (based on spatial ref) between two geometries
in projected units.

[ST_DWithin(geometry A, geometry B,
radius)](http://postgis.net/docs/ST_DWithin.html): Returns true if the
geometries are within the specified distance (radius) of one another.

[ST_Equals(geometry A, geometry
B)](http://postgis.net/docs/ST_Equals.html): Returns true if the given
geometries represent the same geometry. Directionality is ignored.

[ST_Intersects(geometry A, geometry
B)](http://postgis.net/docs/ST_Intersects.html): Returns TRUE if the
Geometries/Geography \"spatially intersect\" - (share any portion of
space) and FALSE if they don\'t (they are Disjoint).

[ST_Overlaps(geometry A, geometry
B)](http://postgis.net/docs/ST_Overlaps.html): Returns TRUE if the
Geometries share space, are of the same dimension, but are not
completely contained by each other.

[ST_Touches(geometry A, geometry
B)](http://postgis.net/docs/ST_Touches.html): Returns TRUE if the
geometries have at least one point in common, but their interiors do not
intersect.

[ST_Within(geometry A , geometry
B)](http://postgis.net/docs/ST_Within.html): Returns true if the
geometry A is completely inside geometry B
