# Week 1 Homework

In this homework we'll prepare the environment 
and practice with Docker and SQL

## Question 1. Knowing docker tags

Run the command to get information on Docker 

```docker --help```

Now run the command to get help on the "docker build" command

Which tag has the following text? - *Write the image ID to the file* 

- `--imageid string`
- `--iidfile string`
- `--idimage string`
- `--idfile string`

In [33]:
! docker build --help


Usage:  docker build [OPTIONS] PATH | URL | -

Build an image from a Dockerfile

Options:
      --add-host list           Add a custom host-to-IP mapping (host:ip)
      --build-arg list          Set build-time variables
      --cache-from strings      Images to consider as cache sources
      --disable-content-trust   Skip image verification (default true)
  -f, --file string             Name of the Dockerfile (Default is
                                'PATH/Dockerfile')
      --iidfile string          Write the image ID to the file
      --isolation string        Container isolation technology
      --label list              Set metadata for an image
      --network string          Set the networking mode for the RUN
                                instructions during build (default "default")
      --no-cache                Do not use cache when building the image
  -o, --output stringArray      Output destination (format:
                                type=local,dest=path)
    

**Q1 Answer:**

`--iidfile string          Write the image ID to the file`

## Question 2. Understanding docker first run 

Run docker with the python:3.9 image in an interactive mode and the entrypoint of bash.
Now check the python modules that are installed ( use pip list). 
How many python packages/modules are installed?

- 1
- 6
- 3
- 7

In [34]:
! docker run -it --entrypoint=bash python:3.9 -c 'pip list'


Package    Version
---------- -------
pip        22.0.4
setuptools 58.1.0
wheel      0.38.4
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

**Q2 Answer:**
3

## Prepare Postgres

Run Postgres and load data as shown in the videos
We'll use the green taxi trips from January 2019:

```wget https://github.com/DataTalksClub/nyc-tlc-data/releases/download/green/green_tripdata_2019-01.csv.gz```

You will also need the dataset with zones:

```wget https://s3.amazonaws.com/nyc-tlc/misc/taxi+_zone_lookup.csv```

Download this data and put it into Postgres (with jupyter notebooks or with a pipeline)


In [None]:
! docker-compose up -d

In [None]:
%%bash

URL="https://github.com/DataTalksClub/nyc-tlc-data/releases/download/green/green_tripdata_2019-01.csv.gz"

python ingest_data.py \
  --user=root \
  --password=root \
  --host=localhost \
  --port=5432 \
  --db=ny_taxi \
  --table_name=green_taxi_trips \
  --url=${URL}

In [None]:
%%bash

URL="https://s3.amazonaws.com/nyc-tlc/misc/taxi+_zone_lookup.csv"

python ingest_data.py \
  --user=root \
  --password=root \
  --host=localhost \
  --port=5432 \
  --db=ny_taxi \
  --table_name=zones \
  --url=${URL}

In [38]:
! pip install ipython-sql



In [39]:
from sqlalchemy import create_engine
create_engine('postgresql://root:root@localhost:5432/ny_taxi')



Engine(postgresql://root:***@localhost:5432/ny_taxi)

In [None]:
%load_ext sql

In [48]:
%sql postgresql://root:root@localhost:5432/ny_taxi



## Question 3. Count records 

How many taxi trips were totally made on January 15?

Tip: started and finished on 2019-01-15. 

Remember that `lpep_pickup_datetime` and `lpep_dropoff_datetime` columns are in the format timestamp (date and hour+min+sec) and not in date.

- 20689
- 20530
- 17630
- 21090

In [49]:
%%sql

    select count(*) from  green_taxi_data gtd 
    where 
        date(lpep_pickup_datetime) = '2019-01-15' and 
        date(lpep_dropoff_datetime) = '2019-01-15'

 * postgresql://root:***@localhost:5432/ny_taxi
1 rows affected.


count
20530


**Q3 Answer:**
20530

## Question 4. Largest trip for each day

Which was the day with the largest trip distance
Use the pick up time for your calculations.

- 2019-01-18
- 2019-01-28
- 2019-01-15
- 2019-01-10

R: 2019-01-15

In [50]:
%%sql
    select 
        date(lpep_pickup_datetime),
        max(trip_distance) as trip_distance
    from green_taxi_data gtd 
    group by date(lpep_pickup_datetime)
    order by trip_distance  desc
    limit 3


 * postgresql://root:***@localhost:5432/ny_taxi
3 rows affected.


date,trip_distance
2019-01-15,117.99
2019-01-18,80.96
2019-01-28,64.27


**Q4 Answer:**
2019-01-15	

## Question 5. The number of passengers

In 2019-01-01 how many trips had 2 and 3 passengers?
 
- 2: 1282 ; 3: 266
- 2: 1532 ; 3: 126
- 2: 1282 ; 3: 254
- 2: 1282 ; 3: 274

R: 2: 1282 ; 3: 254

In [51]:
%%sql
    select 
        passenger_count,
        count(index)
    from green_taxi_data gtd 
    where passenger_count in (2,3)
    and date(lpep_pickup_datetime) = '2019-01-01'
    group by passenger_count 

 * postgresql://root:***@localhost:5432/ny_taxi
2 rows affected.


passenger_count,count
2,1282
3,254


**Q5 Answer:**
- 2: 1282 ; 3: 254


## Question 6. Largest tip

For the passengers picked up in the Astoria Zone which was the drop off zone that had the largest tip?
We want the name of the zone, not the id.

Note: it's not a typo, it's `tip` , not `trip`

- Central Park
- Jamaica
- South Ozone Park
- Long Island City/Queens Plaza

R: Long Island City/Queens Plaza

In [52]:
%%sql
    with res as (
        select 
            gtd."DOLocationID" as DOLocationID,
            max( gtd.tip_amount) as tip_amount
        from green_taxi_data gtd 
        inner join zones z 
            on gtd."PULocationID" = z."LocationID" 
        where z."Zone" = 'Astoria'
        group by gtd."PULocationID", gtd."DOLocationID"
        order by tip_amount desc 
        limit 1
    ) 
    select
        res.*,
        z."Zone" 
    from res
    inner join zones z 
    on res.DOLocationID = z."LocationID" 

 * postgresql://root:***@localhost:5432/ny_taxi
1 rows affected.


dolocationid,tip_amount,Zone
146,88.0,Long Island City/Queens Plaza


**Q6 Answer:**
- Long Island City/Queens Plaza