# Exploratory Data Analysis

**Goal:** You work for a company that sells sculptures that are acquired from various artists around the world. Your task is to predict the cost required to ship these sculptures to customers based on the information provided in the dataset.

## Brainstorm

### Without looking at the data, what are some of the factors that could immediately affect shipping costs?

1. Distance:
    - the further away the source and destinations are, the higher the shipping costs
2. Weight
    - the higher the weight, the more the shipping costs
3. Dimensions
    - According to USPS [1], "Dimensional Weight Pricing charges more for large packages that weigh very little."
    - Since larger packages occupy more space, they are bound to increase packaging costs
    - This is also why major companies like Apple [2] prefer to cut down on packaging sizes.
4. Fragility
    - Fragile packages need to be handled carefully and hence need more time and patience from the ground staff
5. Price of item
    - The more valuable the item is, the more it's going to cost to insure the item
6. Mode of transport
    - air transportation being the most expensive, followed by water and then land

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%load_ext nb_black

<IPython.core.display.Javascript object>

In [2]:
train_df = pd.read_csv(
    "data/train.csv", parse_dates=["Scheduled Date", "Delivery Date"]
)

<IPython.core.display.Javascript object>

In [3]:
train_df.head()

Unnamed: 0,Customer Id,Artist Name,Artist Reputation,Height,Width,Weight,Material,Price Of Sculpture,Base Shipping Price,International,Express Shipment,Installation Included,Transport,Fragile,Customer Information,Remote Location,Scheduled Date,Delivery Date,Customer Location,Cost
0,fffe3900350033003300,Billy Jenkins,0.26,17.0,6.0,4128.0,Brass,13.91,16.27,Yes,Yes,No,Airways,No,Working Class,No,2015-06-07,2015-06-03,"New Michelle, OH 50777",-283.29
1,fffe3800330031003900,Jean Bryant,0.28,3.0,3.0,61.0,Brass,6.83,15.0,No,No,No,Roadways,No,Working Class,No,2017-03-06,2017-03-05,"New Michaelport, WY 12072",-159.96
2,fffe3600370035003100,Laura Miller,0.07,8.0,5.0,237.0,Clay,4.96,21.18,No,No,No,Roadways,Yes,Working Class,Yes,2015-03-09,2015-03-08,"Bowmanshire, WA 19241",-154.29
3,fffe350031003300,Robert Chaires,0.12,9.0,,,Aluminium,5.81,16.31,No,No,No,,No,Wealthy,Yes,2015-05-24,2015-05-20,"East Robyn, KY 86375",-161.16
4,fffe3900320038003400,Rosalyn Krol,0.15,17.0,6.0,324.0,Aluminium,3.18,11.94,Yes,Yes,Yes,Airways,No,Working Class,No,2016-12-18,2016-12-14,"Aprilside, PA 52793",-159.23


<IPython.core.display.Javascript object>

In [4]:
train_df.describe()

Unnamed: 0,Artist Reputation,Height,Width,Weight,Price Of Sculpture,Base Shipping Price,Cost
count,5750.0,6125.0,5916.0,5913.0,6500.0,6500.0,6500.0
mean,0.46185,21.766204,9.617647,400694.8,1192.42009,37.407174,17139.2
std,0.265781,11.968192,5.417,2678081.0,8819.61675,26.873519,240657.9
min,0.0,3.0,2.0,3.0,3.0,10.0,-880172.7
25%,0.24,12.0,6.0,503.0,5.23,16.7,188.44
50%,0.45,20.0,8.0,3102.0,8.025,23.505,382.065
75%,0.68,30.0,12.0,36456.0,89.47,57.905,1156.115
max,1.0,73.0,50.0,117927900.0,382385.67,99.98,11143430.0


<IPython.core.display.Javascript object>

## References

1. [All About USPS Dimensional Weight Pricing](https://stamps.custhelp.com/app/answers/detail/a_id/6114/~/all-about-usps-dimensional-weight-pricing)
2. [Apple’s Paper and Packaging Strategy](https://www.apple.com/environment/pdf/Packaging_and_Forestry_September_2017.pdf)