# Q1. Understanding docker first run 

Run docker with the `python:3.12.8` image in an interactive mode, use the entrypoint `bash`.

What's the version of `pip` in the image?

In [1]:
!docker run -it --rm --entrypoint pip python:3.12.8 --version

pip 24.3.1 from /usr/local/lib/python3.12/site-packages/pip (python 3.12)


# Q2. Understanding Docker networking and docker-compose

Given the following `docker-compose.yaml`, what is the `hostname` and `port` that **pgadmin** should use to connect to the postgres database?

```yaml
services:
  db:
    container_name: postgres
    image: postgres:17-alpine
    environment:
      POSTGRES_USER: 'postgres'
      POSTGRES_PASSWORD: 'postgres'
      POSTGRES_DB: 'ny_taxi'
    ports:
      - '5433:5432'
    volumes:
      - vol-pgdata:/var/lib/postgresql/data

  pgadmin:
    container_name: pgadmin
    image: dpage/pgadmin4:latest
    environment:
      PGADMIN_DEFAULT_EMAIL: "pgadmin@pgadmin.com"
      PGADMIN_DEFAULT_PASSWORD: "pgadmin"
    ports:
      - "8080:80"
    volumes:
      - vol-pgadmin_data:/var/lib/pgadmin  

volumes:
  vol-pgdata:
    name: vol-pgdata
  vol-pgadmin_data:
    name: vol-pgadmin_data
```

Answer:
* hostname: `db`
* port: `5432`

# Q3. Trip Segmentation Count

During the period of October 1st 2019 (inclusive) and November 1st 2019 (exclusive), how many trips, **respectively**, happened:
1. Up to 1 mile
2. In between 1 (exclusive) and 3 miles (inclusive),
3. In between 3 (exclusive) and 7 miles (inclusive),
4. In between 7 (exclusive) and 10 miles (inclusive),
5. Over 10 miles

In [None]:
!wget -P data/ https://github.com/DataTalksClub/nyc-tlc-data/releases/download/green/green_tripdata_2019-10.csv.gz 
!wget -P data/ https://github.com/DataTalksClub/nyc-tlc-data/releases/download/misc/taxi_zone_lookup.csv

In [12]:
import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('postgresql://postgres:postgres@localhost:5433/ny_taxi')

dtype = {
    "VendorID": "Int64",
    "passenger_count": "Int64",
    "trip_distance": "float64",
    "RatecodeID": "Int64",
    "store_and_fwd_flag": "string",
    "PULocationID": "Int64",
    "DOLocationID": "Int64",
    "payment_type": "Int64",
    "fare_amount": "float64",
    "extra": "float64",
    "mta_tax": "float64",
    "tip_amount": "float64",
    "tolls_amount": "float64",
    "improvement_surcharge": "float64",
    "total_amount": "float64",
    "congestion_surcharge": "float64"
}

parse_dates = [
    "lpep_pickup_datetime",
    "lpep_dropoff_datetime"
]

df_green = pd.read_csv("data/green_tripdata_2019-10.csv", dtype=dtype, parse_dates=parse_dates)
df_green.to_sql(name='green_taxi_data', con=engine, if_exists='replace', index=False)

df_zone = pd.read_csv("data/taxi_zone_lookup.csv")
df_zone.to_sql(name='zone', con=engine, if_exists='replace',index=False)

265

```sql
SELECT 
    CASE 
        WHEN trip_distance <= 1 THEN '1. Up to 1 mile'
        WHEN trip_distance > 1 AND trip_distance <= 3 THEN '2. In between 1 (exclusive) and 3 miles (inclusive)'
        WHEN trip_distance > 3 AND trip_distance <= 7 THEN '3. In between 3 (exclusive) and 7 miles (inclusive)'
        WHEN trip_distance > 7 AND trip_distance <= 10 THEN '4. In between 7 (exclusive) and 10 miles (inclusive)'
        ELSE '5. Over 10 miles'
    END AS distance_range,
    COUNT(1) AS total_trips
FROM 
    green_taxi_data
GROUP BY 
    1
ORDER BY 
    1;
```

Answer: `104,838; 199,013; 109,645; 27,688; 35,202`

# Q4. Longest trip for each day

Which was the pick up day with the longest trip distance?
Use the pick up time for your calculations.

Tip: For every day, we only care about one single trip with the longest distance. 

```sql
SELECT 
    CAST(lpep_pickup_datetime AS DATE) AS pickup_day,
    trip_distance
FROM 
    green_taxi_data
ORDER BY 
    trip_distance DESC
LIMIT 1;
```

Answer: `2019-10-31`


# Q5. Three biggest pickup zones

Which were the top pickup locations with over 13,000 in
`total_amount` (across all trips) for 2019-10-18?

Consider only `lpep_pickup_datetime` when filtering by date.

```sql

SELECT 
    z."Zone", 
    SUM(g.total_amount) AS total_sum
FROM 
    green_taxi_data g
JOIN 
    zone z ON g."PULocationID" = z."LocationID"
WHERE 
    CAST(g.lpep_pickup_datetime AS DATE) = '2019-10-18'
GROUP BY 
    z."Zone"
HAVING 
    SUM(g.total_amount) > 13000
ORDER BY 
    total_sum DESC;

```

Answer: `East Harlem North, East Harlem South, Morningside Heights`

# Q6. Largest tip

For the passengers picked up in October 2019 in the zone
named "East Harlem North" which was the drop off zone that had
the largest tip?

Note: it's `tip` , not `trip`

We need the name of the zone, not the ID.

```sql
SELECT 
    zdo."Zone" AS dropoff_zone, 
    g.tip_amount
FROM 
    green_taxi_data g
JOIN 
    zone zpu ON g."PULocationID" = zpu."LocationID"
JOIN 
    zone zdo ON g."DOLocationID" = zdo."LocationID"
WHERE 
    zpu."Zone" = 'East Harlem North'
ORDER BY 
    g.tip_amount DESC
LIMIT 1;
```

Answer: `JFK Airport`

# Q7. Terraform Workflow

Which of the following sequences, **respectively**, describes the workflow for: 
1. Downloading the provider plugins and setting up backend,
2. Generating proposed changes and auto-executing the plan
3. Remove all resources managed by terraform`

Answer: `terraform init, terraform apply -auto-approve, terraform destroy`