# Get a PostgreSQL instance

This tutorial will show you how to get a PostgreSQL instance up and running locally to test JupySQL. You can run this in a Jupyter notebook.

## Pre-requisites

To run this tutorial, you need to install following Python packages:

In [None]:
%pip install jupysql pandas pyarrow --quiet

You also need a PostgreSQL connector. Here's a list of [supported connectors.](https://docs.sqlalchemy.org/en/14/dialects/postgresql.html#dialect-postgresql) We recommend using `psycopg2`. The easiest way to install it is via:

In [None]:
%pip install psycopg2-binary --quiet

```{tip}
If you have issues, check out our [installation guide](postgres-install) or [message us on Slack.](https://ploomber.io/community)
```

You also need Docker installed and running to start the PostgreSQL instance.

## Start PostgreSQL instance

We fetch the official image, create a new database, and user (this will take 1-2 minutes):

In [None]:
%%bash
docker run --name postgres -e POSTGRES_DB=db \
  -e POSTGRES_USER=user \
  -e POSTGRES_PASSWORD=password \
  -p 5432:5432 -d postgres

Our database is running, let's load some data!

## Load sample data

Now, let's fetch some sample data. We'll be using the [NYC taxi dataset](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page):

In [None]:
import pandas as pd

df = pd.read_parquet(
    "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet"
)
df.shape

As you can see, this chunk of data contains ~1.4M rows, loading the data will take about a minute:

In [None]:
from sqlalchemy import create_engine

engine = create_engine("postgresql://user:password@localhost/db")
df.to_sql(name="taxi", con=engine, chunksize=100_000)
engine.dispose()

## Query

Now, let's start JuppySQL, authenticate and start querying the data!

In [None]:
%load_ext sql

In [None]:
%sql postgresql://user:password@localhost/db

```{important}
If the cell above fails, you might have some missing packages. Message us on [Slack](https://ploomber.io/community) and we'll help you!
```

List the tables in the database:

In [None]:
%sqlcmd tables

List columns in the taxi table:

In [None]:
%sqlcmd columns --table taxi

Query our data:

In [None]:
%%sql
SELECT COUNT(*) FROM taxi

In [None]:
%%sql
SELECT * FROM taxi
LIMIT 3

## Clean up

To stop and remove the container:

In [None]:
! docker container ls

In [None]:
%%capture out
! docker container ls --filter ancestor=postgres --quiet

In [None]:
container_id = out.stdout.strip()
print(f"Container id: {container_id}")

In [None]:
! docker container stop {container_id}

In [None]:
! docker container rm {container_id}

In [None]:
! docker container ls

## PostgreSQL features

For reference, `psql`-style "backslash" [meta-commands](https://www.postgresql.org/docs/9.6/static/app-psql.html#APP-PSQL-META-COMMANDS) commands (``\d``, ``\dt``, etc.)
are provided by [PGSpecial](https://pypi.python.org/pypi/pgspecial).  Example:

In [None]:
%sql \d