# DATA OFFICE TECHNICAL SKILLS ASSESSMENT:
This test is designed to give us a better understanding of how our candidates leverage their technical skills to answer business questions and communicate findings.
To complete the assignment you will have to work with the following Big Query public data set:
Chicago Taxi Trips dataset:
`bigquery-public-data.chicago_taxi_trips.taxi_trips`

## PART I

### 1. ! DONE - Please submit your SQL code (follow [URL](https://github.com/mcancell/Py-ETL-Notebook/blob/main/Datasets/Chicago%20Taxi%20Trips/Chi_Taxi_Trip_Insights_Branch_Part_I.bqsql)):
showing how you approached the data to be able to answer the following two questions. Note that a query that returns a lot of data and needs to be exported/manipulated in Excel is not preferred. A query that just returns the answer requested and no other data is preferred.

See Code Here: [Datasets/Chicago Taxi Trips/Chi_Taxi_Trip_Insights_Branch_Part_I.bqsql](https://github.com/mcancell/Py-ETL-Notebook/blob/main/Datasets/Chicago%20Taxi%20Trips/Chi_Taxi_Trip_Insights_Branch_Part_I.bqsql)

[See URL to My Git](https://github.com/mcancell/Py-ETL-Notebook/blob/main/Datasets/Chicago%20Taxi%20Trips/Chi_Taxi_Trip_Insights_Branch_Part_I.bqsql)

#### a. Distinct Company with Largest MoM Trip Increase
Which three distinct taxi companies had the largest month-over-month increase in trips, and what were those months and trip amounts?

##### Get My BQ Credentials to Access the Dataset

##### Load Directory Locations

In [1]:
import json
import os

# Check if the file exists and load the JSON file into a dictionary
file_path = r'C:\Users\mike\Develop\Projects\Code Notebook\Credentials\locations_conf.json'
if os.path.exists(file_path):
    with open(file_path, 'r') as f:
        locations_data = json.load(f)
    for key, value in locations_data.items():
        print(f"{key}: {value}")
else:
    print(f"File not found: {file_path}")

Common_Funcs_Dir: /Users/mike/Develop/Projects/Code Notebook/Common/Functions
Credentials_Dir: /Users/mike/Develop/Projects/Code Notebook/Credentials
Rel_Pickes_Dir: ../.pickles
Pub_Data_Dir: '/Users/mike/Data/Public
BQ_Service_Key: /Users/mike/Develop/Conf/GCP Service Keys/mikecancell-development-0bcca41f8486.json


##### Connect to Google Cloud
from google.cloud import bigquery

In [2]:
from google.oauth2 import service_account

# Resolve the key path from the locations data
key_path = locations_data.get('BQ_Service_Key', 'default_key_path.json')

# Create credentials using the key file
credentials = service_account.Credentials.from_service_account_file(key_path)

##### Load the Query into a Variable

In [3]:
# Define the path to the SQL file
# Dynamically construct the absolute path to the SQL file
sql_file_path = os.path.abspath(os.path.join('C:\\Users\\mike\\Develop\\Projects\\Code Notebook\\Datasets\\Chicago Taxi Trips', 'Chi_Taxi_Trip_Insights_Branch_Part_I.bqsql'))

# Check if the file exists and load the SQL code into a variable
if os.path.exists(sql_file_path):
    with open(sql_file_path, 'r') as sql_file:
        sql_code = sql_file.read()
    print("SQL code loaded successfully.")
else:
    print(f"File not found: {sql_file_path}")

SQL code loaded successfully.


##### Show the Query Code
Note that the Header Doc and Comments are AI Generated

In [4]:
import sqlparse

# Beautify the SQL code
formatted_sql_code = sqlparse.format(sql_code, reindent=True, keyword_case='upper')

# Display the beautified SQL code
print(formatted_sql_code)

/*
    AI Generated doc:
    This query analyzes Chicago Taxi Trips data to identify:
    1. The largest month-over-month increases in trips for each taxi company.
    2. The largest month-over-month decreases in fare per mile for each taxi company.
    3. The overall company averages (`Metric_Gross_Avg`) for trips and fare per mile.
    4. The difference from the average (`Metric_Diff_From_Avg`) and percentage difference from the average (`Percent_Diff_From_Avg`).

    Key Features:
    - Standardizes company names to handle variations (e.g., punctuation, case).
    - Calculates month-over-month metrics (trip count and fare per mile).
    - Includes overall company averages for trips and fare per mile.
    - Includes metrics for difference from average and percentage difference from average.
    - Ranks companies based on the largest increases and decreases.
    - Extracts insights for the top 3 companies for each metric.

    Filters Applied:
    - Exclude trips with no company infor

##### Execute the Query
from google.cloud import bigquery

In [5]:
import warnings
from pandas_gbq.exceptions import LargeResultsWarning

# Suppress the LargeResultsWarning
warnings.simplefilter('ignore', category=LargeResultsWarning)

# Import the pandas_gbq library
import pandas_gbq

# Define the SQL query
query = sql_code

# Read the data from BigQuery into a pandas DataFrame
Insights_Part_I = pandas_gbq.read_gbq(query, project_id=credentials.project_id, credentials=credentials)

# Display the first few rows of the dataframe
print(Insights_Part_I.head())


Downloading: 100%|[32m██████████[0m|
                                  Metric_Description  \
0     I.a-Largest Month-Over-Month Increase in Trips   
1     I.a-Largest Month-Over-Month Increase in Trips   
2     I.a-Largest Month-Over-Month Increase in Trips   
3  I.b-Largest Month-Over-Month Decrease in Fare ...   
4  I.b-Largest Month-Over-Month Decrease in Fare ...   

                        Taxi_Company Trip_Month  Mon_Metric_Val  \
0          Chicago Carriage Cab Corp    2016-07   166663.000000   
1                          Flash Cab    2016-01   347793.000000   
2          Taxi Affiliation Services    2014-03   858829.000000   
3                  Metro Jet Taxi A.    2021-08        2.605708   
4  Blue Ribbon Taxi Association Inc.    2020-08       26.148880   

    Metric_Delta  Mon_Prior_Metric_Val  Metric_Gross_Avg  \
0  131272.000000          35391.000000         154555.06   
1  276654.000000          71139.000000         188727.70   
2  115543.000000         743286.000000   

##### Read the Largest Month Over Month Increase in Trips Measurement Data

In [6]:
largest_trip_increase = Insights_Part_I[Insights_Part_I['Metric_Description'] == 'I.a-Largest Month-Over-Month Increase in Trips']
print(largest_trip_increase)

                               Metric_Description               Taxi_Company  \
0  I.a-Largest Month-Over-Month Increase in Trips  Chicago Carriage Cab Corp   
1  I.a-Largest Month-Over-Month Increase in Trips                  Flash Cab   
2  I.a-Largest Month-Over-Month Increase in Trips  Taxi Affiliation Services   

  Trip_Month  Mon_Metric_Val  Metric_Delta  Mon_Prior_Metric_Val  \
0    2016-07        166663.0      131272.0               35391.0   
1    2016-01        347793.0      276654.0               71139.0   
2    2014-03        858829.0      115543.0              743286.0   

   Metric_Gross_Avg  Metric_Diff_From_Avg  Percent_Diff_From_Avg  \
0         154555.06              12107.94                   7.83   
1         188727.70             159065.30                  84.28   
2         334605.25             524223.75                 156.67   

   Metric_Mon_Pct_Chg Metric_Mon_Pct_Chg_Str  \
0                3.71                 370.9%   
1                3.89                

#### Show the Largest Month Over Month Increase in Trips Measurement Data

In [82]:
from plotly import graph_objects as go  # Import the required module

# Check if the required column exists in the DataFrame
if 'Metric_Description' not in Insights_Part_I.columns:
    raise KeyError("The 'Metric_Description' column is missing from the Insights_Part_I DataFrame. Please verify the column name.")

# Filter the data to include only rows with 'Increase' in the 'Metric_Description' column
filtered_data = Insights_Part_I[Insights_Part_I['Metric_Description'].str.contains('Increase', na=False)].copy()  # Create a deep copy to avoid SettingWithCopyWarning

# Ensure the 'Metric_Insight' column exists in the DataFrame
if 'Metric_Insight' not in filtered_data.columns:
    filtered_data['Metric_Insight'] = ''  # Create an empty 'Metric_Insight' column if it doesn't exist

# Replace '-' with '•' and add HTML <br> tags for new lines
filtered_data['Metric_Insight'] = filtered_data['Metric_Insight'].str.replace('-', '•').str.replace('•', '<br>•')

# Add a break after each sentence ends, retaining the period, but avoid breaking within numbers like "123.123"
filtered_data['Metric_Insight'] = filtered_data['Metric_Insight'].str.replace(r'(?<!\d)\.(?!\d)', '.<br>', regex=True)

# Remove leading white space prior to each new line in the 'Metric_Insight' column
filtered_data['Metric_Insight'] = filtered_data['Metric_Insight'].str.replace(r'<br>\s+', '<br>', regex=True)

# Replace multiple spaces with a single space
filtered_data['Metric_Insight'] = filtered_data['Metric_Insight'].str.replace(r'\s{2,}', ' ', regex=True)

# Create an interactive table using Plotly
fig = go.Figure(data=[go.Table(
    header=dict(
        values=['<b>Metric_Description</b>', '<b>Trip_Month</b>', '<b>Taxi_Company</b>', '<b>Metric_Insight</b>'],
        fill_color='grey',
        align='left',
        font=dict(size=12, color='black')
    ),
    cells=dict(
        values=[
            filtered_data['Metric_Description'], 
            filtered_data['Trip_Month'], 
            filtered_data['Taxi_Company'], 
            filtered_data['Metric_Insight']
        ],
        fill_color=[['white', 'lightgrey'] * (len(filtered_data) // 2 + 1)],
        align='left',
        font=dict(size=11),
        height=None,  # Allow cell height to adjust dynamically
        line=dict(color='black')
    )
)],
layout=dict(
    autosize=True,  # Allow the table to adjust size automatically
    width=1500,  # Set a fixed width to accommodate all columns
    height=None,  # Allow height to adjust dynamically
    margin=dict(l=10, r=10, t=10, b=10)  # Adjust margins for better spacing
))

# Set column widths dynamically
fig.data[0].columnwidth = [2, 1, 1.5, 5]  # Adjust column widths proportionally: Metric_Description, Trip_Month, Taxi_Company, Metric_Insight

# Display the interactive table
fig.show()

#### b. Distinct Company with Largest MoM Decrease in Fare-per-Mile 
Which three distinct taxi companies had the largest month-over-month decrease in fare-per-mile, and what were those months and fare-per-mile values?

### 2. Executive Summary/Report of Findings
Submit an executive summary/report of your findings, clearly answering the questions above.

## PART II
This portion of the assignment gives our candidates creative freedom to look at this data set in any way they want.
There are no tricks here. This is simply meant to allow us to understand their ability to unearth insights and leverage visualizations to tell a story.

### 3. Additional Analysis Same Dataset
Considering the context of the questions from part I, conduct an additional analysis using the same dataset and design a report that provides at least one additional insight, a trend or any other relevant detail that piques your interest.

This report should:

#### a. Clearly Explain Value or Potential Use of Observation 
Clearly explain the value or potential use of that observation for someone who is interested in the answers to the questions above.

#### b. Include at least one visualization.
If you have any questions regarding the assignment please contact Noam Berns
noam.berns@ourbranch.com, Austin McCleary austin.mccleary@ourbranch.com and Carson
Wilshire at carson.wilshire@ourbranch.com
Please email your final submission to your Branch recruiter and cc the above three managers.