# Applied Data Science (MAST30034) Tutorial 2

`pyspark` (15-45 minutes):
- _Under the hood - lazy evaluation_
- Basic transformation functions
- Spark SQL (Optional)

`geopandas` (30 minutes):
- Installation
- Shapefiles
- Spark User Defined Functions (UDF)

New Visualizations and Basic Analysis (15 minutes):
- `folium`
- Choropleths
- A revision of Pearson Correlation
- Feature Engineering

You may use any visualization taught in previous subjects for your Project (assumed knowledge).
_________________

In [None]:
from pyspark.sql import SparkSession, functions as F

# Create a spark session (which will run spark jobs)
spark = (
    SparkSession.builder.appName("MAST30034 Tutorial 2")
    .config("spark.sql.repl.eagerEval.enabled", True) 
    .config("spark.sql.parquet.cacheMetadata", "true")
    .getOrCreate()
)

In [None]:
sdf = spark.read.parquet('../../data/tlc_data/')

## Transformations and Immutability
Transformations in PySpark will transform a Spark DataFrame into a new DataFrame without altering the original data. This means that Spark is **immutable** (i.e there is no `inplace=True` argument like some `pandas` methods).

For example, operations will return transformed results rather than mutating the original. Therefore, it is quite common to see:
```python
sdf = sdf.withColumn(
    'int_col',
    F.col('str_numerical_col').cast('INT')
)
```


## Lazy Evaluation at the High Level
Finally, Spark operations are evaluated lazily. This is because there is a driver under-the-hood which looks to optimize and make your operations more efficient. This means that your data does not "move" until called upon. Let's explore how this works at the high-level.

### DAG (Directed Acyclic Graph)
For those who have not taken 2nd year algorithms, a DAG is a graph where there are "no loops". That is, one can only traverse forward and never backwards. This is especially useful (in this context) with representing a job or processes. For example:

![dag](https://external-content.duckduckgo.com/iu/?u=http%3A%2F%2Fmichal.karzynski.pl%2Fimages%2Fillustrations%2F2017-03-19%2Fairflow-example-dag.png&f=1&nofb=1)

Here, we have a DAG generated by Apache Airflow which manages a certain job. From left to right, things are run in order.

### Spark, DAGs, and Lazy Evaluation 
Specifically for Spark, transformations are added to a DAG similar to the above example. Whenever the Spark driver requests data (i.e writing, using `.collect()`, etc), the DAG gets executed. This means that until we need to **physicalize** the data, no code is run at all.

Why is this good? The major advantage is that Spark makes several code optimizations under-the-hood by looking at the whole DAG each time you add a step. For any in-built transformation or function, Spark will optimize the efficiency by reordering the steps or taking shortcuts to simplify several steps. This is not possible if the data is physicalized each time you run it like `pandas`. 


Example:
- You execute every transformation like `pandas`.
- This means you must physicalise every intermediate step into memory. Whilst this can be fast, you are limited by the memory (RAM). 
- When you take into account the costs of RAM over time (especially if this is an ETL pipeline that you run daily), this is not efficient. This is because you are never really interested in the intermediate transformation steps when running a production pipeline. 
- As such, your job when writing Spark is to tell Spark the overall process and let it figure it out.

General Rules:
- Always use built-in functions and methods where possible. This is because Spark is designed to optimize code that it understands.
- When applying a custom user function known as a **User Defined Function** (UDF), Spark does not understand the function code. This means it is is black box and will eat up resources and time.
- Never use `.collect()` if possible, aim to plan your code out and only physicalize your data when required. 
- Save "checkpoints" for your dataset. If you have applied a few transformations and are happy with this intermediate step, save it. That way, you don't need to keep rerunning your code every time you need the data.

### Lazy Evaluation in applications
Lazy evaluation is also extensively used in many other applications. Python natively have generator/yield as a way to implement lazy evaluation. TensorFlow, a popular tensor computing library also implements lazy evaluation via [computation graphs](https://d3lm.medium.com/understand-tensorflow-by-mimicking-its-api-from-scratch-faa55787170d) (DAG).

<img src="../../media/tensorflow_lazyeval.png" alt-text="computation graph" width=800px>

Here's some examples of Lazy Evaluation. 

Use Cases:
- As a Data Scientist, this really only occurs under-the-hood with the `ML` libraries you import.
- As a Data Engineer, this is very common with `node` related Graph Databases such as `Neo4J`.

Essentially, this happens under-the-hood for a lot of `python` libraries that you use, notably, Spark.

In [None]:
import time

In [None]:
def create_object(n=4):
    """
    A function to create `n` number of objects
    using a naive method.
    :param n: number of elements (default 4)
    """
    obj = []
    for i in range(n):
        obj.append(time.sleep(i))
    return obj


def create_object_lazy(n=4):
    """
    A function to create `n` number of objects
    using lazy evaluation.
    :param n: number of elements (default 4)
    """
    for i in range(n):
        yield time.sleep(i)
        

def execute_action(obj, n=3):
    """
    A function that iterates an object for `n` number of times.
    :param n: number of iterations (default 3)
    """
    obj = iter(obj)
    for i in range(n):
        next(obj)
        # Do some action
        pass

In [None]:
# This object takes 6 seconds to create
%time eager = create_object()
# Executed immediately, total time: 6 sec
%time execute_action(eager)

# This object is created immediately
%time lazy = create_object_lazy()
# Takes 3 sec to execute, total time: 3 sec
%time execute_action(lazy)

## Renaming Fields and Data Type Conversions
Functions:
```python
sdf.withColumnRenamed(
    'column_from',
    'column_to'
)

# example 1 for converting data types
sdf.withColumn(
    'column_to',
    F.col('column_from').cast('data type')
)

# example 2 for applying UDFs - more later this tute
sdf.withColumn(
    'column_to',
    some_udf(F.col('column_from'))
)
```

In [None]:
sdf.show(1, vertical=True, truncate=100)

In [None]:
# example of renaming a column (we won't save it)
sdf.withColumnRenamed(
    'VendorID',
    'vendor_id'
).printSchema()

In [None]:
# converting a couple columns to integers and saving it
for field in ('PU', 'DO'):
    field = f'{field}LocationID'
    sdf = sdf.withColumn(
        field,
        F.col(field).cast('INT')
    )
    
sdf.printSchema()

See here for the list of accepted data types: https://spark.apache.org/docs/latest/sql-ref-datatypes.html

Let's try some more advanced conversions. For example, if we look at the `store_and_fwd_flag`, it actually represents a boolean condition. According to the Data Dictionary though, we currently have `N` and `Y` representing `No` and `Yes` respectively.

In pandas, we would have done something like this:
```python
df['store_and_fwd_flag'] = (df['store_and_fwd_flag'] == 'Y').astype(bool)
```


In [None]:
sdf = sdf.withColumn(
    'store_and_fwd_flag',
    (F.col("store_and_fwd_flag") == 'Y').cast('BOOLEAN')
)

sdf.show(1, vertical=True, truncate=100)

Lastly, you can also do built-in `if`/`else` based results with `F.when()` and `.otherwise()`. Let's say we want a boolean field to determine if the record is valid.

In [None]:
sdf = sdf.withColumn(
    'is_valid_record',
    # when we have a positive distance/passenger/total amount then True
    # else False
    F.when(
        (F.col('trip_distance') > 0)
        & (F.col('passenger_count') > 0)
        & (F.col('total_amount') > 0),
        True
    ).otherwise(False)
)

In [None]:
sdf.show(1, vertical=True, truncate=100)

Make sure to read through the data dictionary carefully to determine which records are valid or invalid. As long as you justify your logic (and it adheres to the data dictionary), then you will get marks.

Be especially careful with `total_amount` as it is pretty much the addition of several other fields making it a useless feature when conducting analysis or using it in a regression model.

## Spark SQL
For those who have taken database systems or prefer using SQL, you can use Spark SQL to run queries.

Whilst there are plenty of options (creating tables, views, temp tables, etc), we'll stick with views. If you are unsure what a view is, think of it as some kind of layer that sits on top of the dataset.

In [None]:
# create a temporary SQL view for the DataFrame
sdf.createOrReplaceTempView('taxi')

sql_query = spark.sql("""
SELECT 
    PULocationID,
    COUNT(tpep_pickup_datetime) AS number_of_trips,
    ROUND(AVG(trip_distance), 4) AS average_distance_miles,
    ROUND(AVG(fare_amount), 4) AS average_fare_amount_usd
FROM 
    taxi
WHERE
    passenger_count == 5
    AND trip_distance > 0
GROUP BY 
    PULocationID
ORDER BY 
    average_fare_amount_usd DESC
""")

sql_query.limit(10)

As a class, discuss what the query does. This is good revision of SQL and as hard as it gets for a technical interview for an entry level position that requires basic SQL understanding.

## GeoPandas
- **NOTE: This only applies on the more recent datasets that use zones over coordinates**

Requirements:
- `geopandas`

Shapefile Links:
- https://s3.amazonaws.com/nyc-tlc/misc/taxi_zones.zip
- https://s3.amazonaws.com/nyc-tlc/misc/taxi+_zone_lookup.csv

**Installation (MacOS Intel chip, Linux, WSL/WSL2):**
- MacOS and Linux users use `pip3 install geopandas` or equivalent.

**Installation (MacOS M1/M2 chip):**
```bash
brew install geos
export DYLD_LIBRARY_PATH=/opt/homebrew/opt/geos/lib/
pip3 install shapely
brew install gdal
pip3 install fiona
brew install proj
pip3 install pyproj
pip3 install pygeos
pip3 install geopandas
```
If you run into any errors related to `pyproj` please restart your terminal. It should work after.

Why do I need to do this for MacOS M1/M2 chip? `geopandas` itself is written in Python so there are no issues, but it depends on other libraries that are written in `C/C++` and need to be compiled. The Intel and M1/M2 architecture is difference, hence this roundabout method of installing.

**Installation (Windows 10/11):**
1. Visit https://www.lfd.uci.edu/~gohlke/pythonlibs/
2. You will need 2 different `.whl` (wheel) files. These are `GDAL` and `fiona`.
    - `GDAL`: https://www.lfd.uci.edu/~gohlke/pythonlibs/#gdal
    - `fiona`: https://www.lfd.uci.edu/~gohlke/pythonlibs/#fiona
3. Download the version **corresponding to your device OS and Python**. For example:
    - `Fiona‑1.8.21‑cp311‑cp311‑win_amd64.whl` represents the `fiona v1.8.21` wheel designed for `c-python 3.11` for `windows` running a `64bit` architecture (`amd64`).
4. Once both packages are downloaded, you will need to open up command prompt and `cd` into the directory. 
5. Install the dependencies **in this specific order**. 
    - `GDAL`  (wheel you downloaded)
    - `fiona` (wheel you downloaded)
    - `geopandas` (`pip` package)
    
Example for Windows 11 (64 bit) running `Python 3.9.X`:
```bash
# cd into directory containing files
cd geopandas_dependencies
pip3 install GDAL‑3.4.3‑cp39‑cp39‑win_amd64.whl
pip3 install Fiona‑1.8.21‑cp39‑cp39‑win_amd64.whl
pip3 install geopandas
```

Why do I need to do this for Windows? Like many other useful Data Science and Engineering packages, they are designed to be native to Linux and bash. Windows OS is not suitable hence our recommendation to install WSL/WSL2.

In [None]:
import pandas as pd
import geopandas as gpd

Depending on your OS and Python version, you may recieve a warning with `pygeos`. As long as it works, it is fine.

In [None]:
# sf stands for shape file
sf = gpd.read_file("../../data/taxi_zones/taxi_zones.shp")
zones = pd.read_csv("../../data/taxi_zones/taxi+_zone_lookup.csv")

sf.head()

Shapefiles are one way of storing geometric objects. When working with these, you will usually need to convert it into a more human-understandable coordinate system such as latitude/longitude. See https://www.earthdatascience.org/courses/earth-analytics/spatial-data-r/understand-epsg-wkt-and-other-crs-definition-file-types/ for details on conversion.

In [None]:
# Convert the geometry shaape to to latitude and longitude
# Please attribute this if you are using it
sf['geometry'] = sf['geometry'].to_crs("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
sf.head()

Note how the `geometry` attribute now looks a bit more familiar and understandable.

In [None]:
zones.head()

Which attribute should we join on? What kind of join will we do i.e left, right, inner?

In [None]:
gdf = gpd.GeoDataFrame(
    pd.merge(zones, sf, on='LocationID', how='inner')
)

gdf.head()

Awesome! We can now combine `geopandas` with `folium` to plot nice zones. Requirements:
- A `folium` map object with a central coordinate of interest
- A `GeoJSON` that can be parsed by `folium`

The `GeoJSON` should be a `JSON` object representing a specific zone (`LocationID`) with its geometry.

In [None]:
# create a JSON 
geoJSON = gdf[['LocationID', 'geometry']].drop_duplicates('LocationID').to_json()

# print the first 300 chars of the json
print(geoJSON[:300])

Now, let's add it to the `folium` map object. We'll use what we call a **Choropleth Map** to show zones.

Folium Docs: http://python-visualization.github.io/folium/modules.html?highlight=choropleth#folium.features.Choropleth

In [None]:
import folium

In [None]:
_map = folium.Map(location=[40.66, -73.94], tiles="Stamen Terrain", zoom_start=10)

# refer to the folium documentations on how to plot aggregated data.
_map.add_child(folium.Choropleth(
    geo_data=geoJSON,
    name='choropleth',
))

_map.save('../../plots/foliumChoroplethMap.html')
_map

## Personal Checklist for Visualisations and Dashboards:
1. Your visualisation needs to tell a story.
2. It should be interpretable without being overly verbose.
3. The scale and axis need to make sense (and you can assume the reader knows the difference between a normal scale vs log scale).
4. The choice of visualisation needs to make sense:
    - Line plot vs Bar chart with non-numerical categories
    - Map plot with points vs clusters for each location
    - Scatterplot vs Histogram plot to see distribution
    - etc
5. Choice of colour scheme / alpha / size need to be easy on the eyes.

At the end of the day, even if you think your visualisation is "pretty" or "beautiful", if a reader cannot understand it, then it is not a good visualisation.

Let's go through an example on **pickup locations**.

In [None]:
# first off, join the geometries with the dataset
df = pd.read_parquet('../../data/tute_data/sample_data.parquet')
df.head()

In [None]:
df = df \
    .merge(gdf[['LocationID', 'geometry']], left_on='PULocationID', right_on='LocationID') \
    .drop('LocationID', axis=1)

df.head()

To start off, many students may sum each zone (location) and plot the total earnings to gain a rough idea of income. Whilst this can be okay for a first analysis, it may be more useful to look at proportion by dividing by frequency.

Why is this a more suitable idea?


In [None]:
proportions = df[['PULocationID', 'total_amount']] \
                .groupby('PULocationID') \
                .agg(
                    {
                        'total_amount': 'sum', # sum over total amount earned
                        'PULocationID': 'count' # count number of instances from sample
                    }
                ) \
                .rename({'PULocationID': 'total_trips'}, axis=1)

proportions.head()

In [None]:
proportions['avg_trip_amount'] = proportions['total_amount'] / proportions['total_trips']
proportions.head()

Remember, this is **only a random sample of 5% of the true population**.

In [None]:
m = folium.Map(location=[40.73, -73.74], tiles="Stamen Terrain", zoom_start=10)

# refer to the folium documentations on more information on how to plot aggregated data.
c = folium.Choropleth(
    geo_data=geoJSON, # geoJSON 
    name='choropleth', # name of plot
    data=df, # data source
    columns=['PULocationID','total_amount'], # the columns required
    key_on='properties.LocationID', # this is from the geoJSON's properties
    fill_color='YlOrRd', # color scheme
    nan_fill_color='black',
    legend_name='Average Trip Earnings USD$'
)

c.add_to(m)

m

- What do the black coloured zones represent?
- Which area seems most profitable so far?

Let's add some markers for the airports to differentiate them. I'll show you a few methods:

In [None]:
gdf.loc[gdf['Zone'].str.contains('Airport')]

In [None]:
import re



However, the `folium` [Marker object](https://python-visualization.github.io/folium/modules.html?highlight=marker#folium.map.Marker) requires coordinates. 

We can derive zone centroids using `geopandas`'s `.centroid` attribute of a `geometry`. This will return a `Point` object (not what we want just yet) which can be converted into `latitude` (`y`) and `longitude` (`x`) coordinates by accessing the `.x` and `.y` attribute of the `Point`.

In [None]:
# (y, x) since we want (lat, long)
gdf['centroid'] = gdf['geometry'].apply(lambda x: (x.centroid.y, x.centroid.x))
gdf[['Zone', 'LocationID', 'centroid']].head()

In [None]:
for zone_name, coord in gdf.loc[gdf['Zone'].str.contains('Airport'), ['Zone', 'centroid']].values:
    m.add_child(
        folium.Marker(location=coord, popup=zone_name)
    )
m

## Spark User Defined Functions (UDF)
So far, all the functions covered have been about simple aggregations, filtering rows, or changing data types. Sometimes though, you will require more advanced preprocessing techniques that are not built-in with Spark.

The best way to create a UDF is to use the class decorator. Here's the general syntax:
```python
from pyspark.sql import functions as F
from pyspark.sql.types import SomeDataType

# class decorator method for creating a UDF
# You must always specify the expected return type (i.e string, array, int, etc)
@F.udf(SomeDataType)
def some_udf(col):
    ...
    return ...
       
sdf = sdf.withColumn(
    'transformed_col',
    some_udf(F.col('raw_col'))
)
```

Let's grab the centroids using a Spark UDF. We cover it here as this is an example of something that Spark can't do with built-in methods.

First off, we will need to convert `gdf` into a spark dataframe:
1. Convert the `geometry` into `wkt` (Well Known Text). This is essentially a way of encoding geometry objects into text because Spark does not handle custom data types. We'll use `shapely` (a `geopandas` dependency) to do this.
2. Create a Spark dataframe from the `gdf`.
3. Create UDF and apply.

You should now be able to join your Spark dataset with this Spark `GeoDataFrame`.

In [None]:
# attempt at creating spark df without wkt conversion results in an error
spark.createDataFrame(
    gdf[['Zone', 'LocationID', 'geometry']]
)

In [None]:
gdf['wkt'] = gdf['geometry'].to_wkt()
gdf[['Zone', 'LocationID', 'geometry', 'wkt']].head()

In [None]:
spark_gdf = spark.createDataFrame(
    gdf[['Zone', 'LocationID', 'wkt']]
)

spark_gdf.show(1, vertical=True, truncate=100)

In [None]:
from shapely import wkt
from pyspark.sql.types import ArrayType, FloatType

@F.udf(ArrayType(FloatType()))
def get_centroids(wkt_geo):
    centroid = wkt.loads(wkt_geo).centroid
    return centroid.y, centroid.x

In [None]:
spark_gdf = spark_gdf.withColumn(
    'geometry',
    get_centroids(F.col('wkt'))
)

spark_gdf.show(1, vertical=True, truncate=100)

In [None]:
spark_gdf.printSchema()

_________________


### Other Visualizations
We recommend that you plot and look at these attributes in your own time using `matplotlib` and `seaborn`.

Scatterplot of `fare_amount` vs `trip_amount`:  
- What's the relationship look like? 
- Why are there many values around 0?
    
    
Histogram and distribution plot of `fare_amount`, `trip_amount`, `trip_distance`:  
- Is the distribution skewed? 
- Does a log transformation make the distribution nicer? 
- What outliers do we have?
- What business rules should I be taking into account?
    
    
Correlation Heatmap between attributes of relevance:  
- Which attributes should we choose? Remember, Pearson's correlation only applies to numerical features and assumes a linear relationship.
- Does correlation imply causality?
    
You may also apply relevant transformations where suitable i.e `log`. Just make sure you **state it clearly** in your figure caption or legend.

A revision of skewness (in case you have forgotten and that's okay):

<img src=https://mammothmemory.net/images/user/base/Maths/Statistics%20and%20probability/Standard%20deviation/skewed-distribution-graphs.c97bc76.jpg alt-text="skew" width=800px>

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
sns.heatmap(df.corr())
# wow that's easy...

plt.title('Pearson Correlation Metric')
plt.show()
# ... but is it really that easy? read below!

Things to take note of:
- `trip_distance` highly correlates with high tips, tolls and overall trip amount
- `payment_type` seems to have some form of negative correlation with `tip_amount`. **Be careful as this is a discrete category.**
- Having `VendorID`, `PULocationID`, `DOLocationID`, etc as features **is misleading**, why??? 

**Important:** Only include numerical and ordinal features when computing the Pearson Correlation metric. You cannot compute the correlation between a category and numerical feature (i.e `VendorID` vs `payment_type` vs `trip_distance`).

How about Locations? Does correlation work for it?

In [None]:
CORR_COLS = [
    "passenger_count", "trip_distance", "fare_amount", "extra", 
    "mta_tax", "tip_amount", "tolls_amount", "improvement_surcharge", 
    "total_amount", "airport_fee"
]

sns.heatmap(df[CORR_COLS].corr())

plt.title('Pearson Correlation Metric')
plt.show()

- If you're interested in calculating correlation between nominal and continuous data, here's a [great explanation](https://stats.stackexchange.com/questions/119835/correlation-between-a-nominal-iv-and-a-continuous-dv-variable/124618#124618).   
- Remember, you need to refer back to the data dictionary as well as the fare page: https://www1.nyc.gov/site/tlc/passengers/taxi-fare.page

- You should especially take note of the fare page if you're looking to see how `RatecodeID` plays a role on the fare.

If you would like to use `pyspark` for the full dataset, you can compute the Pearson correlation between any two features. We don't expect students to use the full distribution for each field or use Spearman for nominal features, though, you are more than welcome to do so if you would like.

In [None]:
from pyspark.ml.stat import Correlation

sdf \
    .where(
        F.col('payment_type') == 1
    ) \
    .corr('trip_distance', 'tip_amount')

We'll cover this in a bit more detail in the next tutorial, though, just be aware that `pyspark.ml` is out of scope for this subject due to time constraints.

`VectorAssembler` is a function used to merge multiple columns into a single vector column. This is required as Spark works for single vectors only. Just note that Spark's correlation is still a Work-In-Progress (WIP) so it can't handle `NULL` values. Make sure to drop or impute them (with justification) for your own project.

The process for this mirrors `sklearn` i.e `model.fit()` or `data.transform()` so this shouldn't be too unfamiliar.

In [None]:
import matplotlib.pyplot as plt
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.stat import Correlation

features = "correlation_features"
assembler = VectorAssembler(
    inputCols=CORR_COLS, # input names (can be list of fields)
    outputCol=features # output name (single vector output)
)

# transform the features -> this is similar to sklearn's .fit() or .transform()
feature_vector = assembler \
                .transform(
                    sdf.dropna('any')
                ) \
                .select(features)

corr_matrix_dense = Correlation.corr(feature_vector, features)
corr_matrix_dense

As you can see in this intermediate step, we have a dense vector output. This is one exception where using `.collect()` is required. We'll grab the dense vector and create  a `pandas` dataframe. There's no need for `pyspark` since the results are computed already.

In [None]:
corr_matrix_dense.collect()

In [None]:
corr_matrix = corr_matrix_dense.collect()[0][0].toArray().tolist()

df_corr = pd.DataFrame(corr_matrix, index=CORR_COLS, columns=CORR_COLS)

In [None]:
df_corr

Whilst the precision is excellent (and this is why using Spark is great over pandas), when presenting a report there should be at most 4 decimal places. 

We'll round it to 4 decimal places using a `pandas` option.

In [None]:
pd.options.display.float_format = '{:,.4f}'.format # any number of digits with 2 floating points

In [None]:
df_corr

We can compare the differences in correlation between the full distribution and 5% random sample.

What we do below **is not a valid comparison**. We are merely using it tell you that the LHS is not the same as the RHS.

In [None]:
df[CORR_COLS].corr()

In [None]:
abs(df_corr.abs() - df[CORR_COLS].corr().abs())

Look at this... there is a significant difference between `correlation(total_amount, tolls_amount)` between the two correlation matrices. **Be careful**. 

## Feature Engineering?
- We want to see if the the profitability of zones remains consistent with respect to hour of day, day of week and pickup location. The distribution of profitable zones should be similar across all years.

- How is a zone profitable? Frequency of trips? Duration of trips? Best "earners"? We've had creative metrics over the past few years that students invented. 

- For example, you could create your own feature and scale it accordingly. Perhaps the expected dollar per minute + possible tolls scaled by the expected frequency of trips might be a good start.

- Just remember that trip frequency $\approx$ taxi demand in a zone (you don't know the true number of taxis in a zone at the time).

- Additionally, variable rate fares exist: _"50 cents per 1/5 mile when travelling above 12mph OR 50 cents per 60 seconds in slow traffic or when the vehicle is stopped."_ 

- This means profit rates may require you to state the assumption that you are assuming constant velocity throughout the trip. We have had students in the past approximate this by finding speed limits of zones in NYC.