# SQL for accessing spatial data on postgreSQL

データベースシステム講義資料  
version 0.0.1   
authors: H. Chenan & N. Tsutsumida  

Copyright (c) 2023 Narumasa Tsutsumida  
Released under the MIT license  
https://opensource.org/licenses/mit-license.php  

## Task

埼玉県内の全鉄道駅の2020年4月（休日・昼間）の人口を大きい順に並べ，最初の10件を示す．

## prerequisites

In [1]:
import os
from sqlalchemy import create_engine
import pandas as pd
pd.set_option('display.max_columns', 20)


In [2]:
def query_pandas(sql, db):
    """
    Executes a SQL query on a PostgreSQL database and returns the result as a Pandas DataFrame.

    Args:
        sql (str): The SQL query to execute.
        db (str): The name of the PostgreSQL database to connect to.

    Returns:
        pandas.DataFrame: The result of the SQL query as a Pandas DataFrame.
    """

    DATABASE_URL='postgresql://postgres:postgres@postgis_container:5432/{}'.format(db)
    conn = create_engine(DATABASE_URL)

    df = pd.read_sql(sql=sql, con=conn)

    return df

## Define a sql command

In [3]:
sql = """
    with station_buffer as ( 
        select distinct on (pt.name) poly.name_1 as pref_name, pt.name as station_name, 
            st_buffer(st_transform(pt.way, 3857), 300) as buffer_geom 
        from planet_osm_point pt 
        inner join adm2 poly on st_within(st_transform(pt.way, 3857), st_transform(poly.geom, 3857)) 
        where pt.railway = 'station' and poly.name_1 = 'Saitama' 
        ), 
        pop_filtered as ( 
            select p.name as mesh_name, d.population, st_transform(p.geom, 3857) as geom 
            from pop as d 
            inner join pop_mesh as p on p.name = d.mesh1kmid 
            where d.year = '2020' and d.month = '04' and d.dayflag = '0' and d.timezone = '0' 
        ) 
    select s.pref_name, s.station_name, sum(p.population) as sum_population 
        from station_buffer s 
        inner join pop_filtered p on st_intersects(p.geom, s.buffer_geom) 
        group by s.pref_name, s.station_name 
        order by sum_population desc limit 10; 
        """


## Outputs

In [4]:
out = query_pandas(sql,'gisdb')
print(out)


  pref_name station_name  sum_population
0   Saitama           川口         66310.0
1   Saitama           大宮         65388.0
2   Saitama           与野         61951.0
3   Saitama          新越谷         58661.0
4   Saitama          上福岡         53701.0
5   Saitama           上尾         53365.0
6   Saitama  獨協大学前〈草加松原〉         49743.0
7   Saitama          北朝霞         49734.0
8   Saitama          朝霞台         49734.0
9   Saitama          小手指         46422.0
