To answer the question:  
What is the average percentage of delays that are already created before departure?

I actually want to answer three questions here:  
* What percentage of delays are caused before departure
* What is the ratio of delays before departure to delays after
* What percentage of a given arrival delay was caused before departure

In [1]:
import numpy as np
import pandas as pd
import pgaccess as pg

In [3]:
df = pg.execute_query('''
SELECT
    dep_delay,
    arr_delay,
    CASE
        WHEN arr_delay <> 0 THEN dep_delay / arr_delay
        ELSE NULL
    END AS dep_delay_ratio
FROM flights
WHERE
    arr_delay IS NOT NULL AND
    dep_delay IS NOT NULL
''')
df

Unnamed: 0,dep_delay,arr_delay,dep_delay_ratio
0,8.0,12.0,0.666667
1,-1.0,-7.0,0.142857
2,1.0,-16.0,-0.062500
3,30.0,7.0,4.285714
4,1.0,-23.0,-0.043478
...,...,...,...
15611147,1.0,-18.0,-0.055556
15611148,2.0,-4.0,-0.500000
15611149,11.0,-6.0,-1.833333
15611150,8.0,-8.0,-1.000000


In [13]:
# What is the ratio between departure and arrival delays
delayedDep = (df['dep_delay'] > 0).value_counts()[True]
delayedArr = (df['arr_delay'] > 0).value_counts()[True]
delayedDep / delayedArr

0.9709981585233552

In [18]:
# What percentage of delays are cause before departure
fracDelayed = (df[df['arr_delay'] > 0]['dep_delay'] > 0).value_counts()[True]
fracDelayed / delayedArr

0.7125838743199954

In [19]:
# What percentage of a given arrival delay was cause before departure
# This is already represented individually by the dep_delay_ratio column
# I will calculate a mean for it here though
df['dep_delay_ratio'].mean()

0.467627087937476

It occurs to me now that this isn't accurate when the departure or arrival is early.

When the flight arrives early, having a percentage of the delay doen't make much sense  
When the flight is late but left early... that could actually be left as a negative percentage. This will pull the average down, indicating that delays are caused during flight more than before flight.

In [20]:
df[df['arr_delay'] > 0]['dep_delay_ratio'].mean()

0.7866119952047244