# Module 4 Homework: Analytics Engineering with dbt
In this homework, we'll use the dbt project in 04-analytics-engineering/taxi_rides_ny/ to transform NYC taxi data and answer questions by querying the models.


## Setup
Set up your dbt project following the setup guide
Load the Green and Yellow taxi data for 2019-2020 into your warehouse
Run dbt build --target prod to create all models and run tests
Note: By default, dbt uses the dev target. You must use --target prod to build the models in the production dataset, which is required for the homework queries below.

After a successful build, you should have models like fct_trips, dim_zones, and fct_monthly_zone_revenue in your warehouse.

## Setup b

- Load the Green and Yellow taxi data for 2019-2020 into your Google Cloud i.e. green_tripdata , yellow_tripdata
- I use dbt cloud and run dbt build --target prod to create all models and run tests
- Now, I have models like fct_trips, dim_zones, and fct_monthly_zone_revenue in your warehouse.



## Question 1. dbt Lineage and Execution
Given a dbt project with the following structure:

models/
├── staging/
│   ├── stg_green_tripdata.sql
│   └── stg_yellow_tripdata.sql
└── intermediate/
    └── int_trips_unioned.sql (depends on stg_green_tripdata & stg_yellow_tripdata)
If you run dbt run --select int_trips_unioned, what models will be built?


- Any model with upstream and downstream dependencies to int_trips_unioned
![Diagram](images/lineage.png)


## Question 2. dbt Tests
You've configured a generic test like this in your schema.yml:

columns:
  - name: payment_type
    data_tests:
      - accepted_values:
          arguments:
            values: [1, 2, 3, 4, 5]
            quote: false
Your model fct_trips has been running successfully for months. A new value 6 now appears in the source data.

What happens when you run dbt test --select fct_trips?




- dbt will fail the test, returning a non-zero exit code
![Diagram](images/Picture.png)



## Question 3. Counting Records in fct_monthly_zone_revenue
After running your dbt project, query the fct_monthly_zone_revenue model.

What is the count of records in the fct_monthly_zone_revenue model?


- 12,184

```sql
SELECT COUNT(*) AS total_records
FROM `project_id.dbt.fct_monthly_zone_revenue`;
```


## Question 4. Best Performing Zone for Green Taxis (2020)
Using the fct_monthly_zone_revenue table, find the pickup zone with the highest total revenue (revenue_monthly_total_amount) for Green taxi trips in 2020.

Which zone had the highest revenue?

- East Harlem North

```sql
SELECT
    pickup_zone,
    SUM(revenue_monthly_total_amount) AS total_revenue
FROM `project_id.dbt.fct_monthly_zone_revenue`
WHERE service_type = 'Green'
  AND EXTRACT(YEAR FROM revenue_month) = 2020
GROUP BY pickup_zone
ORDER BY total_revenue DESC
LIMIT 1;
```


## Question 5. Green Taxi Trip Counts (October 2019)
Using the fct_monthly_zone_revenue table, what is the total number of trips (total_monthly_trips) for Green taxis in October 2019?


- 384,624
```sql
SELECT
    SUM(total_monthly_trips) AS total_trips
FROM `project_id.dbt.fct_monthly_zone_revenue`
WHERE service_type = 'Green'
  AND EXTRACT(YEAR FROM revenue_month) = 2019
  AND EXTRACT(MONTH FROM revenue_month) = 10;
```


## Question 6. Build a Staging Model for FHV Data
Create a staging model for the For-Hire Vehicle (FHV) trip data for 2019.

Load the FHV trip data for 2019 into your data warehouse
Create a staging model stg_fhv_tripdata with these requirements:
Filter out records where dispatching_base_num IS NULL
Rename fields to match your project's naming conventions (e.g., PUlocationID → pickup_location_id)
What is the count of records in stg_fhv_tripdata?


- 43,244,693
```sql
-- Count all records in the staging FHV table
SELECT COUNT(*) AS total_records
FROM `project_id.dataset_id.fhv_tripdata`;
```