# DATA OFFICE TECHNICAL SKILLS ASSESSMENT:
This test is designed to give us a better understanding of how our candidates leverage their technical skills to answer business questions and communicate findings.
To complete the assignment you will have to work with the following Big Query public data set:
Chicago Taxi Trips dataset:
`bigquery-public-data.chicago_taxi_trips.taxi_trips`
## PART I
### 1. Please submit your SQL code showing how you approached the data to be able to answer the following two questions. Note that a query that returns a lot of data and needs to be exported/manipulated in Excel is not preferred. A query that just returns the answer requested and no other data is preferred.
#### a. Which three distinct taxi companies had the largest month-over-month increase in trips, and what were those months and trip amounts?
#### b. Which three distinct taxi companies had the largest month-over-month decrease in fare-per-mile, and what were those months and fare-per-mile values?
### 2. Submit an executive summary/report of your findings, clearly answering the questions above.
## PART II
This portion of the assignment gives our candidates creative freedom to look at this data set in any way they want.
There are no tricks here. This is simply meant to allow us to understand their ability to unearth insights and leverage visualizations to tell a story.
### 3. Considering the context of the questions from part I, conduct an additional analysis using the same dataset and design a report that provides at least one additional insight, a trend or any other relevant detail that piques your interest.
This report should:
#### a. Clearly explain the value or potential use of that observation for someone who is interested in the answers to the questions above.
#### b. Include at least one visualization.
If you have any questions regarding the assignment please contact Noam Berns
noam.berns@ourbranch.com, Austin McCleary austin.mccleary@ourbranch.com and Carson
Wilshire at carson.wilshire@ourbranch.com
Please email your final submission to your Branch recruiter and cc the above three managers.

# Get My BQ Credentials to Access the Dataset

## Load Directory Locations

In [2]:
import json
import os

# Check if the file exists and load the JSON file into a dictionary
file_path = r'C:\Users\mike\Develop\Projects\Code Notebook\Credentials\locations_conf.json'
if os.path.exists(file_path):
    with open(file_path, 'r') as f:
        locations_data = json.load(f)
    for key, value in locations_data.items():
        print(f"{key}: {value}")
else:
    print(f"File not found: {file_path}")

Common_Funcs_Dir: /Users/mike/Develop/Projects/Code Notebook/Common/Functions
Credentials_Dir: /Users/mike/Develop/Projects/Code Notebook/Credentials
Rel_Pickes_Dir: ../.pickles
Pub_Data_Dir: '/Users/mike/Data/Public
BQ_Service_Key: /Users/mike/Develop/Conf/GCP Service Keys/mikecancell-development-0bcca41f8486.json


# Connect to Google Cloud
from google.cloud import bigquery

In [None]:
from google.oauth2 import service_account

# Resolve the key path from the locations data
key_path = locations_data.get('BQ_Service_Key', 'default_key_path.json')

# Create credentials using the key file
credentials = service_account.Credentials.from_service_account_file(key_path)

In [14]:
import warnings
from pandas_gbq.exceptions import LargeResultsWarning

# Suppress the LargeResultsWarning
warnings.simplefilter('ignore', category=LargeResultsWarning)

# Import the pandas_gbq library
import pandas_gbq

# Define the SQL query
query = """
SELECT DISTINCT
    taxi_id,
    CAST(DATE_TRUNC(trip_start_timestamp, MONTH) AS DATE) AS trip_start_month,
    CAST(DATE_TRUNC(trip_end_timestamp, MONTH) AS DATE)   AS trip_end_month,
    SUM(trip_seconds) AS total_trip_seconds,
    SUM(trip_miles)   AS total_trip_miles,
    pickup_census_tract,
    dropoff_census_tract,
    pickup_community_area,
    dropoff_community_area,
    SUM(fare) AS total_fare,
    SUM(tips)  AS total_tip,
    SUM(tolls) AS total_toll,
    SUM(Extras) AS total_extras,
    SUM(Trip_total) AS total_trip_total,
    payment_type,
    company,
    pickup_latitude,
    pickup_longitude,
    dropoff_latitude,
    dropoff_longitude
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
GROUP BY
    taxi_id,
    trip_start_month,
    trip_end_month,
    pickup_census_tract,
    dropoff_census_tract,
    pickup_community_area,
    dropoff_community_area,
    payment_type,
    company,
    pickup_latitude,
    pickup_longitude,
    dropoff_latitude,
    dropoff_longitude
"""

# Read the data from BigQuery into a pandas DataFrame
taxi_data = pandas_gbq.read_gbq(query, project_id=credentials.project_id, credentials=credentials)

# Display the first few rows of the dataframe
print(taxi_data.head())



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Downloading:   0%|[32m          [0m|

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

KeyboardInterrupt: 