[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jillxoreilly/StatsCourse/blob/main/Car%20Parking%20Exercise%201.ipynb#scrollTo=0d1bf928)

# Exercise: Car Park planning 1

In this exercise, you will plan car parking at a ferry terminal and inside the ferry itself. 

You will be given data about the lengths of vehicles in a <tt>.csv</tt> file. By plotting the data and calculating descriptive statististics, you will produce a short report recommending the size and number of parking spots required.

<div style = "    padding-top: 10px;
    padding-bottom: 10px;
    padding-left: 10px;
    padding-right: 10px;
    box-shadow: 0px 8px 16px 0px rgba(0,0,0,0.2);
    vertical-align: middle;">
    
<h2>The brief:</h2> 

The SpeedyFerry Company are planning a new terminal. Vehicles will arrive at the terminal in advance of their sailing time and be parked in a car park to await boarding.

SpeedyFerry would like to know how to mark out the car park. They want to fit as many parking spaces into their land as possible, whilst still making sure that the vehicles fit in the spaces
    <ul>
<li> How long and wide should the parking spots be?
<li> Should different vehicle types be separated in different sections of the car park?
<li> If so, what ratio of long vehicle places to short vehicle places is needed?
        </ul>
    
<b>Your task is to produce a report answering these questions, justifying you answer with plots and descriptive statistics based on the sample data provided by SpeedyFerry, introduced below</b>
</div>

<img src="images/carsBanner.png" width=100% alt="Picture of some cars" >

## Set up Python libraries

As usual, run the code cell below to import the relevant Python libraries

In [None]:
import numpy
import pandas
import seaborn
import matplotlib.pyplot as plt
plt.rcdefaults();
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Load and view the data

To make our plan for car parking, we need some information about the vehicles to be accommodated.

SpeedyFerry have provided a data file with a complete list of the vehicles parked at a vehicle-ferry terminal at 1pm on Sunday 24th April 2022, which they regard as a representative sample.

Let's load the datafile "VehicleLengths.csv" and have a look what information we have in the dataset

In [None]:
pandas.read_csv('https://github.com/jillxoreilly/StatsCourse/blob/main/data/vehicles.csv')

That was a long list of vehicles!

* What information do we have about each vehicle?

<p style="color:blue"><i> Double click to type your answer here </i><br>
Delete the contents of the cell and type your answer instead <br>
Press <b>shift</b> and <b>enter</b> when you have finished</p>

## Plotting the data

The data table has over 1000 rows, so we cannot get much sense of the data just from looking at the numbers.

At this stage it would be useful to make some plots to see what we are working with.

First, let's load the data and give it a name, <tt>vehicles</tt>.

In [None]:
vehicles = pandas.read_csv('data/vehicles.csv')

Now we are going to us a plotting library called <tt>seaborn</tt> to make some plots of the data.

You can see some examples of the beautiful and professional-looking of plots we could make with <tt>seaborn</tt> <a href="https://seaborn.pydata.org/examples/index.html">here</a> 

### Distribution of lengths

Let's start by plotting a histogram of the lengths of the vehicles:

In [None]:
seaborn.histplot(data=vehicles, x="length", bins=numpy.arange(3,17,2))

The bins are rather coarse. Each bin is 2m wide. Can you work out which bit of the plotting command determined where to put the bin edges, and edit it so the bins are 0.25 metres wide?

In [None]:
seaborn.histplot(data=vehicles, x="length", bins=numpy.arange(3,17,2))
# Hint: 
# the bottom of the lowest bin is 3
# the top of the highest bin is 17
# the bin width is 2

# Try changing each of these numbers in the command numpy.arange(3,17,2) 
# and see what happens to the plot

Hopefully, by editing the bin edges command, you have made a histogram that looks a bit like this:

<img src="images/CarParkingHistXKCD.png" width=50% alt="histogram of vehicle lengths with narrower bins" >

You can see that the bulk of the vehicles have lengths in the range 2.5 to about 6 metres. However, there are also some much longer vehicles.

### Vehicle types

At this stage, it may be helpful to look at how the vehicles break down by type. 

We'll use the function <tt>catplot</tt> which plots the number (or <tt>count</tt>) of data points in each category

In [None]:
seaborn.catplot(data=vehicles, x="type", kind="count")

Maybe the fact that we have different categories of vehicles could explain the distribution of lengths?

To explore this idea, we can plot "stacked" histograms for each vehicle type:

In [None]:
seaborn.histplot(data=vehicles, x="length", hue="type", 
                 multiple="stack", bins=numpy.arange(3,17,0.25))

OK, that is helpful - we can see that cars and trucks form distinct distributions of lengths, with the car+tow category (containing cars towing caravans and trailers) a bit more spread out

* Would it make sense to split the car park into sections for different vehicle types?

### Length vs width

To mark out parking spaces, we need to decide the width as well as the length. Typically a parking space should be 1m wider than the vehicle for a car, and 2m for a truck or tow, to allow for manoevering into the the space and for the doors to be opened.

How does the width (and height) of vehicles vary with their length? We can easily visualise the relationships between variables using the <tt>pairplot</tt> function in <tt>Seaborn</tt>

In [None]:
seaborn.scatterplot(data=vehicles, x="length", y="width", hue="type")

In [None]:
seaborn.pairplot(data=vehicles, hue="type")

## Descriptive statistics

If you are deciding how long to make the car parking spaces, what are the most useful descriptive statistics?
* Mean, median, or some other centile of the length distrubution?
* Standard deviation or inter quartile range, or some other measure of spread?

I think you probably need to know something like the 90th and 99th centiles of the length distriubtion - to determine what length car parking space will fit most, or almost aall, vehicles

Luckily, <tt>numpy</tt> can find these centiles for you

In [None]:
numpy.percentile(vehicles.length, 99)

That was the 99th centile of vehicles lengths. Can you work out how to change to code to get:
* the 95th centile of length
* the median length (remember - the median is the 50th centile)
* the 90th centile of width?

### Add a line to the graph

Let's add a line to the histogram to indicate the 90th centile:

In [None]:
seaborn.histplot(vehicles, x="length", hue="type", 
                 multiple="stack", bins=numpy.arange(3,17,0.25))

plt.axvline(numpy.percentile(vehicles.length, 99), color='black')
# plt.axvline(numpy.mean(vehicles.length))
# plt.axvline(numpy.median(vehicles.length))

Hmmmm, well if we make our parking spots 15.72m long, that will accommomdate 99% of vehicles, but also waste a lot of space, as the spaces will be 10m longer than most of the vehicles parked in them (the cars)

### Separating vehicle categories

How about we designate separate areas for cars and long vehicles?

Let's get the 99th centile for each type separately. 

In the box below, we have the code we used above to get the 95th centile of all vehicles' lengths, and three variations, breaking the data down by group. Try <i>uncommenting</i> one line at a time, by deleting the <tt>#</tt> symbol on the line you want to use and adding a <tt>#</tt> at the start of the lines you no longer need, to get the 95th centile for each vehicle type.

In [None]:
numpy.percentile(vehicles.length, 99)

# numpy.percentile(vehicles.length[vehicles.type=='car'], 99)
# numpy.percentile(vehicles.length[vehicles.type=='caravan'], 99)
# numpy.percentile(vehicles.length[vehicles.type=='truck'], 99)

OK, let's add the <a href="https://www.google.com/search?client=firefox-b-d&q=disaggregated">disaggregated</a> 99th centile lines to our histogram:

In [None]:
seaborn.histplot(vehicles, x="length", hue="type", 
                 multiple="stack", bins=numpy.arange(3,17,0.25))

plt.axvline(numpy.percentile(vehicles.length[vehicles.type=='car'], 99), color='blue')
plt.axvline(numpy.percentile(vehicles.length[vehicles.type=='towing'], 99), color='orange')
plt.axvline(numpy.percentile(vehicles.length[vehicles.type=='truck'], 99), color='green')

# Your report for SpeedyFerry

<div style = "    padding-top: 10px;
    padding-bottom: 10px;
    padding-left: 10px;
    padding-right: 10px;
    box-shadow: 0px 8px 16px 0px rgba(0,0,0,0.2);
    vertical-align: middle;">
    
This is a <tt>stub</tt> for your report to SpeedyFerry. 

The text in each markdown cell is given to guide you. You will replace this with your own text.

Similarly, you will edit the code in each code cell to produce the necessary plots and statistics.

This stub is quite structured to guide you through the process. Later in the course, you will develop your reports with less structured guidance.
    
</div>

## Description of vehicle types and sizes

Based on the sample data recorded at 1pm on Sunday 24th April 2022, the vehicles to be accommodated fall into XXX categories:
* cars
* xxx
* xxx

The majority of vehicles are cars.

In [None]:
vehicles = pandas.read_csv('data/vehicles.csv')
seaborn.......
# find some code above to produce a plot 
# that illustrates that the majority of vehicles are cars

The length and width of vehicles differs substantially between classes, therefore we would recommend .....[your comment on how to segregate the parking areas for vehicle classes]......:

In [None]:
# find some code above to produce one plot for lengths and another for widths, 
# that illustrates the different distributions of dimensions between vehicle classes
# add vertical lines for your recommended parking spot lengths as per the text below

## Size and number of parking spaces in each zone

We recommend that the parking spaces in each zone should be sized to fit the XXth centile of each vehicle class. As can be seen from the plot above [XXth centile marked by vertical lines], the XXth centile is not excessively large compared to the bulk of vehicle lengths/widths and minimizes cases in which a vehicle will not fit.

The exact lengths are:

In [None]:
# edit this code to give numerical values for the length 
# of the parking spots in each of your zones and using your chosen percentile

print('Zone 1: cars -- ', numpy.percentile(vehicles.length[vehicles.type=='car'], 80))
print('Zone 2: spaceships -- ', numpy.percentile(vehicles.length[vehicles.type=='car'], 80))

Given the observed frequencies in each vehicle class, we recommend the following minimum number of spaces in each zone:

In [None]:
# edit this code to give numerical values for the length 
# of the parking spots in each of your zones and using your chosen percentile

vehicles.type.value_counts()