# Q1. Understanding docker images

Run docker with the `python:3.13` image. Use an entrypoint `bash` to interact with the container.

What's the version of `pip` in the image?

```bash
docker run -it --rm --entrypoint bash python:3.13
pip --version
```

Answer: `25.3`

# Q2. Understanding Docker networking and docker-compose

Given the following `docker-compose.yaml`, what is the `hostname` and `port` that pgadmin should use to connect to the postgres database?

```yaml
services:
  db:
    container_name: postgres
    image: postgres:17-alpine
    environment:
      POSTGRES_USER: 'postgres'
      POSTGRES_PASSWORD: 'postgres'
      POSTGRES_DB: 'ny_taxi'
    ports:
      - '5433:5432'
    volumes:
      - vol-pgdata:/var/lib/postgresql/data

  pgadmin:
    container_name: pgadmin
    image: dpage/pgadmin4:latest
    environment:
      PGADMIN_DEFAULT_EMAIL: "pgadmin@pgadmin.com"
      PGADMIN_DEFAULT_PASSWORD: "pgadmin"
    ports:
      - "8080:80"
    volumes:
      - vol-pgadmin_data:/var/lib/pgadmin

volumes:
  vol-pgdata:
    name: vol-pgdata
  vol-pgadmin_data:
    name: vol-pgadmin_data
```

Answer:
* hostname: `db`
* port: `5432`

# Q3. Counting short trips

For the trips in November 2025 (lpep_pickup_datetime between '2025-11-01' and '2025-12-01', exclusive of the upper bound), how many trips had a `trip_distance` of less than or equal to 1 mile?

In [None]:
!wget -P data/ https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2025-11.parquet
!wget -P data/ https://github.com/DataTalksClub/nyc-tlc-data/releases/download/misc/taxi_zone_lookup.csv

In [3]:
import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('postgresql://postgres:postgres@localhost:5433/ny_taxi')

df_green = pd.read_parquet("data/green_tripdata_2025-11.parquet")
df_green.to_sql(name='green_taxi_data', con=engine, if_exists='replace', index=False)

df_zone = pd.read_csv("data/taxi_zone_lookup.csv")
df_zone.to_sql(name='zone', con=engine, if_exists='replace',index=False)

265

```sql
SELECT COUNT(*) 
FROM green_taxi_data
WHERE trip_distance <=1 
	AND lpep_pickup_datetime>= '2025-11-01'
	AND lpep_pickup_datetime < '2025-12-01';
```

Answer: `8007`

# Q4. Longest trip for each day

Which was the pick up day with the longest trip distance? Only consider trips with `trip_distance` less than 100 miles (to exclude data errors).

Use the pick up time for your calculations.

```sql
SELECT 
	CAST(lpep_pickup_datetime AS DATE) AS pickup_day,
	trip_distance
FROM green_taxi_data
WHERE trip_distance <100
ORDER BY trip_distance DESC
LIMIT 1;
```

Answer: `2025-11-14`


# Q5. Biggest pickup zone

Which was the pickup zone with the largest `total_amount` (sum of all trips) on November 18th, 2025?

```sql

SELECT 
	z."Zone",
	SUM(g.total_amount) AS total_sum
FROM 
	green_taxi_data g
JOIN
	zone z ON g."PULocationID" = z."LocationID"
WHERE
	CAST(g.lpep_pickup_datetime AS DATE) = '2025-11-18'
GROUP BY
	z."Zone"
ORDER BY
	total_sum DESC
LIMIT 1;

```

Answer: `East Harlem North`

# Q6. Largest tip

For the passengers picked up in the zone named "East Harlem North" in November 2025, which was the drop off zone that had the largest tip?

Note: it's `tip` , not `trip`. We need the name of the zone, not the ID.

```sql
SELECT 
    zdo."Zone" AS dropoff_zone, 
    g.tip_amount
FROM 
    green_taxi_data g
JOIN 
    zone zpu ON g."PULocationID" = zpu."LocationID"
JOIN 
    zone zdo ON g."DOLocationID" = zdo."LocationID"
WHERE 
    zpu."Zone" = 'East Harlem North'
ORDER BY 
    g.tip_amount DESC
LIMIT 1;
```

Answer: `Yorkville West`

# Q7. Terraform Workflow

Which of the following sequences, respectively, describes the workflow for: 
1. Downloading the provider plugins and setting up backend,
2. Generating proposed changes and auto-executing the plan
3. Remove all resources managed by terraform`

Answer: `terraform init, terraform apply -auto-approve, terraform destroy`