In [1]:
import numpy as np
import pandas as pd

#### Loading the data and removing unwanted columns

In [2]:
df1 = pd.read_csv("OrderBook_10.csv").drop("Unnamed: 0", axis=1)
df2 = pd.read_csv("KlineData_10.csv").drop("Unnamed: 0", axis=1)

Example showcasing a simple 10 percent negative jump of `BestAsk` at 12872

In [3]:
df1.loc[[12871, 12872, 12873, 12874], :]

Unnamed: 0,Timestamp,BestBid,BestAsk,MidPrice,AskVol,BidVol
12871,1681134000000.0,28230.1,28230.1,28230.1,12.665,0.0
12872,1681134000000.0,28230.0,24167.9,26198.95,0.0,316.217
12873,1681134000000.0,28228.3,28228.4,28228.35,12.924,0.056
12874,1681134000000.0,28228.3,28143.5,28185.9,0.0,56.129


Example showcasing several consecutive jumps of `BestBid` starting at 160382

In [4]:
df1.loc[[160381, 160382, 160383, 160384, 160385, 160386, 160387, 160388], :]

Unnamed: 0,Timestamp,BestBid,BestAsk,MidPrice,AskVol,BidVol
160381,1681171000000.0,29623.1,29629.7,29626.4,0.0,0.0
160382,1681171000000.0,5000.0,29662.9,17331.45,0.0,0.0
160383,1681171000000.0,29623.2,29623.2,29623.2,3.509,0.0
160384,1681171000000.0,5000.0,29645.0,17322.5,0.0,0.0
160385,1681171000000.0,29623.2,29623.2,29623.2,3.497,0.0
160386,1681171000000.0,5000.0,29627.5,17313.75,0.0,0.0
160387,1681171000000.0,29623.1,29623.1,29623.1,0.0,5.643
160388,1681171000000.0,29637.3,29030.6,29333.95,0.0,263.623


#### Any instantenous change in price > 10 percent is considered a jump

We calculate the average price from the given Kline data set and take the first difference of the `BestBid` and `BestAsk` columns in the Order Book data set.

In [5]:
# Smoothening sharp jumps (>10 percent) in the Bid and Ask price
avgP = df2["Price"].mean()
dummy = df1[["BestBid", "BestAsk"]].diff()
index_b = df1[["BestBid"]][abs(dummy["BestBid"]) > 0.1*avgP].index
index_a = df1[["BestAsk"]][abs(dummy["BestAsk"]) > 0.1*avgP].index

We record the indices were the first difference is greater than 10 percent. In the first differnce dataframe if there are two consective changes of greater than ten percent, i.e., if the indices we have recorded come in consecutive pairs then that's a jump otherwise it's a real change. Eg, say the price goes up from 30k to 40k and then goes back to 30k the next timestamp. Then the first diff will be 10k and -10k this will be interpreted as a jump. If the price had goe up to 40k and said there for a while we would see the first diff be 10k but the next index will be pretty small in magnitude.

In [6]:
for i in index_b:
    if i+1 in index_b:
        df1.loc[i, "BestBid"] = df1.loc[i-1, "BestBid"]

for i in index_a:
    if i+1 in index_a:
        df1.loc[i, "BestAsk"] = df1.loc[i-1, "BestAsk"]

So I smoothen the jump by filling in the previous value. This is done in a loop as sometimes there are consecutive jumps.

Next, we will handle the Nan values. While recording the Order Book data it would sometimes give my an empty list of Bids and asks. Which I have recorder as Nan values. Now since this is because there were no bids or asks for that point we will simply forward fill the Nan values.

In [7]:
# Removing NaN values from Order Book Data
df1[["BestBid", "BestAsk"]] = df1[["BestBid", "BestAsk"]].fillna(method='ffill')

Now let's check for Nan values:

In [8]:
np.where(df1.isnull())

(array([ 46035,  46036, 160377, 160379, 274582, 274587, 274593, 274594],
       dtype=int64),
 array([3, 3, 3, 3, 3, 3, 3, 3], dtype=int64))

Hmm seems like there are some still in now why is that. Let's have a look at them:

In [9]:
df1.iloc[list(np.where(df1.isnull())[0]), :]

Unnamed: 0,Timestamp,BestBid,BestAsk,MidPrice,AskVol,BidVol
46035,1681142000000.0,28364.4,28430.1,,0.0,0.0
46036,1681142000000.0,28364.4,28430.1,,0.0,0.0
160377,1681171000000.0,29623.1,29623.1,,0.0,0.0
160379,1681171000000.0,29623.1,29623.2,,0.0,0.0
274582,1681200000000.0,30095.7,30095.8,,0.0,0.0
274587,1681200000000.0,30095.7,30095.8,,0.0,0.0
274593,1681200000000.0,30092.3,30099.7,,0.0,0.0
274594,1681200000000.0,30092.3,30099.7,,0.0,0.0


Ah that's the midprice! That had been calculated while storing the data and it's Nan because the BestAsk and BestBid are Nan. No need to worry we'll recalculate the midprice again later.

In [10]:
df2["Turnover"] = df2["Volume"]*df2["Price"] #Turnover = Total Value Traded = Value of a Contract * No. of Contracts traded = Price * Vol

In [11]:
df2

Unnamed: 0,Timestamp,Price,Volume,NumberOfTrades,Turnover
0,1681088646449,28349.4,22.471,215,6.370394e+05
1,1681088646781,28349.3,22.477,217,6.372072e+05
2,1681088647074,28349.4,24.050,225,6.818031e+05
3,1681088647646,28349.3,25.091,230,7.113123e+05
4,1681088647949,28349.4,26.476,248,7.505787e+05
...,...,...,...,...,...
376938,1681226449469,30168.0,315.034,2422,9.503946e+06
376939,1681226449747,30167.9,315.653,2430,9.522588e+06
376940,1681226450078,30167.9,316.633,2440,9.552153e+06
376941,1681226450378,30167.8,323.802,2488,9.768394e+06


Now we match the timestamps:

In [12]:
index = df2.index[abs(df2["Timestamp"] - df1["Timestamp"][0]) == min(abs(df2["Timestamp"] - df1["Timestamp"][0]))].to_list()
index

[91228]

We see that Order Book data is way ahead of the Kline and the first timestamp of th Order Book data is most closely matched by the index 91228 in the Kline dataset. Let's see what the difference in the timestamps is:

In [13]:
diff = df2["Timestamp"][index[0]] - df1["Timestamp"][0]
diff

-62.0

The timestamp at index 91228 of the Kline Data lags before the first index of Order Book data by 62ms

In [14]:
df3 = df2.drop(df2.index[0:91228]).reset_index(drop=True)
df3

Unnamed: 0,Timestamp,Price,Volume,NumberOfTrades,Turnover
0,1681130777625,28278.8,52.844,345,1.494365e+06
1,1681130777875,28278.7,54.765,358,1.548683e+06
2,1681130778254,28278.7,54.831,359,1.550549e+06
3,1681130778747,28278.7,55.211,362,1.561295e+06
4,1681130779058,28278.7,55.257,363,1.562596e+06
...,...,...,...,...,...
285710,1681226449469,30168.0,315.034,2422,9.503946e+06
285711,1681226449747,30167.9,315.653,2430,9.522588e+06
285712,1681226450078,30167.9,316.633,2440,9.552153e+06
285713,1681226450378,30167.8,323.802,2488,9.768394e+06


I drop the indices before the first timestamp. While I substract the difference. Since the Kline data lags by 62ms we substract the diff (=-62ms) to match the first timestamps of both datasets

In [15]:
df3["Timestamp"] = df3["Timestamp"] - (diff)
df3

Unnamed: 0,Timestamp,Price,Volume,NumberOfTrades,Turnover
0,1.681131e+12,28278.8,52.844,345,1.494365e+06
1,1.681131e+12,28278.7,54.765,358,1.548683e+06
2,1.681131e+12,28278.7,54.831,359,1.550549e+06
3,1.681131e+12,28278.7,55.211,362,1.561295e+06
4,1.681131e+12,28278.7,55.257,363,1.562596e+06
...,...,...,...,...,...
285710,1.681226e+12,30168.0,315.034,2422,9.503946e+06
285711,1.681226e+12,30167.9,315.653,2430,9.522588e+06
285712,1.681226e+12,30167.9,316.633,2440,9.552153e+06
285713,1.681226e+12,30167.8,323.802,2488,9.768394e+06


Now we drop the duplicate rows in case there are any

In [16]:
df3 = df3[~df3.duplicated('Timestamp', keep='first')]
df1 = df1[~df1.duplicated('Timestamp', keep='first')]

Changing the timestamps from UNIX to DateTime

In [17]:
df3.index = pd.to_timedelta(df3["Timestamp"].rename("Time"), "ms")
df3.drop("Timestamp", axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3.drop("Timestamp", axis=1, inplace=True)


In [18]:
df1.index = pd.to_timedelta(df1["Timestamp"].rename("Time"), "ms")
df1.drop("Timestamp", axis=1, inplace=True)

Now let's rsample both data frames to 250ms

In [23]:
df1 = df1.resample("250ms").ffill()
df1

Unnamed: 0_level_0,BestBid,BestAsk,MidPrice,AskVol,BidVol
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
19457 days 12:46:17.687000,28278.7,28278.8,28278.75,9.301,19.070
19457 days 12:46:17.937000,28278.7,28278.8,28278.75,9.568,19.094
19457 days 12:46:18.187000,28278.7,28278.8,28278.75,9.568,19.094
19457 days 12:46:18.437000,28278.7,28278.8,28278.75,9.597,19.093
19457 days 12:46:18.687000,28278.7,28278.4,28278.55,0.000,18.840
...,...,...,...,...,...
19458 days 15:20:20.687000,30177.4,30177.5,30177.45,8.235,5.362
19458 days 15:20:20.937000,30178.6,30177.5,30178.05,0.000,30.638
19458 days 15:20:21.187000,30179.9,30178.5,30179.20,0.000,60.970
19458 days 15:20:21.437000,30183.2,30178.6,30180.90,0.000,10.976


In [24]:
df3 = df3.resample("250ms").ffill()
df3

Unnamed: 0_level_0,Price,Volume,NumberOfTrades,Turnover
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
19457 days 12:46:17.687000,28278.8,52.844,345,1.494365e+06
19457 days 12:46:17.937000,28278.7,54.765,358,1.548683e+06
19457 days 12:46:18.187000,28278.7,54.765,358,1.548683e+06
19457 days 12:46:18.437000,28278.7,54.831,359,1.550549e+06
19457 days 12:46:18.687000,28278.7,54.831,359,1.550549e+06
...,...,...,...,...
19458 days 15:20:49.687000,30168.0,315.034,2422,9.503946e+06
19458 days 15:20:49.937000,30167.9,315.653,2430,9.522588e+06
19458 days 15:20:50.187000,30167.9,316.633,2440,9.552153e+06
19458 days 15:20:50.437000,30167.9,316.633,2440,9.552153e+06


Now we join the two into a single dataframe:

In [29]:
df = df1.join(df3)
df

Unnamed: 0_level_0,BestBid,BestAsk,MidPrice,AskVol,BidVol,Price,Volume,NumberOfTrades,Turnover
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
19457 days 12:46:17.687000,28278.7,28278.8,28278.75,9.301,19.070,28278.8,52.844,345,1.494365e+06
19457 days 12:46:17.937000,28278.7,28278.8,28278.75,9.568,19.094,28278.7,54.765,358,1.548683e+06
19457 days 12:46:18.187000,28278.7,28278.8,28278.75,9.568,19.094,28278.7,54.765,358,1.548683e+06
19457 days 12:46:18.437000,28278.7,28278.8,28278.75,9.597,19.093,28278.7,54.831,359,1.550549e+06
19457 days 12:46:18.687000,28278.7,28278.4,28278.55,0.000,18.840,28278.7,54.831,359,1.550549e+06
...,...,...,...,...,...,...,...,...,...
19458 days 15:20:20.687000,30177.4,30177.5,30177.45,8.235,5.362,30177.5,92.306,904,2.785564e+06
19458 days 15:20:20.937000,30178.6,30177.5,30178.05,0.000,30.638,30178.4,100.888,956,3.044638e+06
19458 days 15:20:21.187000,30179.9,30178.5,30179.20,0.000,60.970,30181.8,119.037,1055,3.592751e+06
19458 days 15:20:21.437000,30183.2,30178.6,30180.90,0.000,10.976,30181.8,119.714,1061,3.613184e+06


We define the convention for column names:

In [45]:
convention = {
              "BestBid":"BidDiff", 
              "BidVol":"BVolDiff",
              "BestAsk": "AskDiff",
              "AskVol": "AVolDiff",
              "Turnover": "TurnDiff",
              "Volume": "VolDiff"
}
p = 5
k = 20
N = 1

In [48]:
df4 = df[["BestBid", "BidVol", "BestAsk", "AskVol", "Turnover", "Volume"]].diff().rename(columns=convention)
df4[["BidVol", "AskVol", "Price"]] = df[["BidVol", "AskVol", "Price"]]

# Calculating MidPrice
df4["MidPrice"] = (df["BestAsk"] + df["BestBid"])/2

# Calculating Average of MidPrice for (t,t-1) to be used while calculating MPB
df4["AvgMP"] = (df4["MidPrice"] + df4["MidPrice"].shift(1))/2

# Dealing with a inverted market by straightening it up and weighting it, while weighting no spread as one tenth the tick size
df4["Spread"] = np.where(df["BestAsk"] - df["BestBid"] > 0, df["BestAsk"] - df["BestBid"], 
                               np.where(df["BestAsk"] - df["BestBid"] == 0, 0.01, 100))
# Drop first column
df4.drop(df4.index[0], inplace=True)
df4

Unnamed: 0_level_0,BidDiff,BVolDiff,AskDiff,AVolDiff,TurnDiff,VolDiff,BidVol,AskVol,Price,MidPrice,AvgMP,Spread
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
19457 days 12:46:17.937000,0.0,0.024,0.0,0.267,54318.0983,1.921,19.094,9.568,28278.7,28278.75,28278.750,0.1
19457 days 12:46:18.187000,0.0,0.000,0.0,0.000,0.0000,0.000,19.094,9.568,28278.7,28278.75,28278.750,0.1
19457 days 12:46:18.437000,0.0,-0.001,0.0,0.029,1866.3942,0.066,19.093,9.597,28278.7,28278.75,28278.750,0.1
19457 days 12:46:18.687000,0.0,-0.253,-0.4,-9.597,0.0000,0.000,18.840,0.000,28278.7,28278.55,28278.650,100.0
19457 days 12:46:18.937000,0.0,0.023,0.4,11.728,10745.9060,0.380,18.863,11.728,28278.7,28278.75,28278.650,0.1
...,...,...,...,...,...,...,...,...,...,...,...,...
19458 days 15:20:20.687000,0.0,-65.735,27.5,8.235,11778.4166,0.390,5.362,8.235,30177.5,30177.45,30170.575,0.1
19458 days 15:20:20.937000,1.2,25.276,0.0,-8.235,259074.1042,8.582,30.638,0.000,30178.4,30178.05,30177.750,100.0
19458 days 15:20:21.187000,1.3,30.332,1.0,0.000,548112.5074,18.149,60.970,0.000,30181.8,30179.20,30178.625,100.0
19458 days 15:20:21.437000,3.3,-49.994,0.1,0.000,20433.0786,0.677,10.976,0.000,30181.8,30180.90,30180.050,100.0


In [49]:
# Calculating MPC
df4["MPC"] = df4["MidPrice"].shift(-1).rolling(k).mean().shift(1-k) - df4["MidPrice"]

# Calculating MPB
df4["MPB"] = np.where(df4["VolDiff"] != 0, ((df4.iloc[:,4]/df4.iloc[:,5])/N), np.nan)
df4["MPB"] = df4["MPB"].fillna(method='ffill')
index_mpb = df4.columns.get_loc("MPB")
index_mp = df4.columns.get_loc("MidPrice")
df4.iloc[0, index_mpb] = df4.iloc[0, index_mp]
df4["MPB"] = df4["MPB"] - df4["AvgMP"]


# Calculating VOI and OIR
df4["OIR_(t)"] = (df4["BidVol"] - df4["AskVol"])/(df4["BidVol"] + df4["AskVol"])
dBid = pd.Series(np.where(df4["BidDiff"] < 0, 0, np.where(df4["BidDiff"] == 0, df4["BVolDiff"], df4["BidVol"])), index=df4.index)
dAsk = pd.Series(np.where(df4["AskDiff"] < 0, df4["AskVol"], np.where(df4["AskDiff"] == 0, df4["AVolDiff"], 0)), index=df4.index)
df4["VOI_(t)"] = dBid - dAsk

# Calculating VOI and OIR for given lags
for i in range(1, p+1):
    df4[f"OIR_(t-{i})"] = df4["OIR_(t)"].shift(i)
    df4[f"VOI_(t-{i})"] = df4["VOI_(t)"].shift(i)

df4 = df4.drop(columns=df4.columns[:8])
df4 = df4.drop(columns=["AvgMP"])
df4.dropna()

Unnamed: 0_level_0,Price,MidPrice,Spread,MPC,MPB,OIR_(t),VOI_(t),OIR_(t-1),VOI_(t-1),OIR_(t-2),VOI_(t-2),OIR_(t-3),VOI_(t-3),OIR_(t-4),VOI_(t-4),OIR_(t-5),VOI_(t-5)
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
19457 days 12:46:19.187000,28278.7,28275.35,100.00,3.7225,1.650000,1.000000,1.946,0.233239,0.023,1.000000,-0.253,0.330986,-0.030,0.332356,0.000,0.332356,-0.243
19457 days 12:46:19.437000,28278.7,28278.75,0.10,0.3225,1.650000,0.333333,-1.601,1.000000,1.946,0.233239,0.023,1.000000,-0.253,0.330986,-0.030,0.332356,0.000
19457 days 12:46:19.687000,28278.7,28278.75,0.10,0.4000,-0.050000,0.312049,-0.905,0.333333,-1.601,1.000000,1.946,0.233239,0.023,1.000000,-0.253,0.330986,-0.030
19457 days 12:46:22.687000,28278.7,28278.75,0.10,-0.0075,-14.648956,0.121231,2.343,0.049599,-5.032,0.223125,0.059,-1.000000,-1.151,0.260881,0.685,0.229699,6.370
19457 days 12:46:22.937000,28278.8,28278.75,0.10,-0.0075,73.162987,0.123473,-0.001,0.121231,2.343,0.049599,-5.032,0.223125,0.059,-1.000000,-1.151,0.260881,0.685
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19458 days 15:20:15.687000,30178.5,30178.55,0.10,-4.3850,23.700000,-0.026396,0.000,1.000000,123.745,1.000000,-0.673,1.000000,1.662,-0.198961,1.449,-0.360810,-6.497
19458 days 15:20:15.937000,30178.5,30179.30,100.00,-5.1975,-0.425000,-1.000000,-0.475,-0.026396,0.000,1.000000,123.745,1.000000,-0.673,1.000000,1.662,-0.198961,1.449
19458 days 15:20:16.187000,30178.6,30178.50,0.01,-4.3625,33.299130,1.000000,0.000,-1.000000,-0.475,-0.026396,0.000,1.000000,123.745,1.000000,-0.673,1.000000,1.662
19458 days 15:20:16.437000,30178.6,30178.50,0.01,-4.2425,0.100000,1.000000,1.208,1.000000,0.000,-1.000000,-0.475,-0.026396,0.000,1.000000,123.745,1.000000,-0.673


When calculating MPB, I knew I had to optimise the constant. I had pushed that to a later date as I thought optimisation would depend on model accuracy and that requires more data. However, after looking at the formula again today, I noticed something:
   
When calculating the average Trade Price:
$$\bar{TP}_{t}=\frac{1}{N} \cdot \frac{T_{t}-T_{t-1}}{V_{t}-V_{t-1}}$$


Since we calculate $T_{t}=P_{t} \cdot V_{t}$, say the Price does not change but the volume does then:

$$\bar{TP}_{t}=\frac{1}{N} \cdot \frac{P_{t} \cdot V_{t}-P_{t-1} \cdot V_{t-1}}{V_{t}-V_{t-1}}$$
and $P_{t}=P_{t-1}=P$, therefore,
$$\bar{TP}_{t}=\frac{1}{N} \cdot \frac{P \cdot (V_{t} - V_{t-1})}{V_{t}-V_{t-1}}=\frac{P}{N}$$
And hence,
$$MPB_{t}=\bar{TP}_{t}-\bar{MP}_{t}=\frac{P}{N}-\bar{MP}_{t} \approx \frac{P}{N}-\bar{P}$$

Now, you might have noticed that I had used $N=1$ in the code above. This is the reason why if N is anything other than 1 then the range of MPB will be all over the place, Say for example if it the average price of BTC then $\frac{P}{N} \approx 1$ and $MPB_{t} \approx - \bar{P}$. Therefore, $N=1$ makes a lot more sense. This also works well with scaling as linear models are quite suseptible to the scale of the features and overestimate features that are higher in magnitude. This is why I had scaled the data in the code but then commented it out for later. Now I don't think we would need to scale te data as none of the features are vastly different in scale.

However, thinking through what if $P_{t}=P_{t-1}+1=P+1$, then:

$$\bar{TP}_{t}=\frac{1}{N} \cdot \frac{P \cdot (V_{t} - V_{t-1}) + V_{t}}{V_{t}-V_{t-1}}=\frac{P}{N}+\frac{V_{t}}{N(V_{t}-V_{t-1})}$$
And hence,
$$MPB_{t}=\bar{TP}_{t}-\bar{MP}_{t}=\frac{P}{N}+\frac{V_{t}}{N(V_{t}-V_{t-1})}-\bar{MP}_{t}$$

In this case it would make sense to divide by a constant. But since the volume is a cumulative sum from day start and therefore time dependent we can divide it by another time dependent variable in order to make the MPB time independent and in scale. Also, I feel in that case we are better of doing this:

$$MPB_{t}=\frac{1}{N} \cdot (\frac{T_{t}-T_{t-1}}{V_{t}-V_{t-1}}-\bar{MP}_{t})$$

I'll read the paper again to figure out a better approach.

Now let's test if our function works:

In [50]:
from BuildLinearData import linear_data
df1 = pd.read_csv("OrderBook_10.csv")
df2 = pd.read_csv("KlineData_10.csv")

In [51]:
linear_data(df1, df2)

Unnamed: 0_level_0,Price,MidPrice,Spread,MPC,MPB,OIR_(t),VOI_(t),OIR_(t-1),VOI_(t-1),OIR_(t-2),VOI_(t-2),OIR_(t-3),VOI_(t-3),OIR_(t-4),VOI_(t-4),OIR_(t-5),VOI_(t-5)
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
19457 days 12:46:19.187000,28278.7,28275.35,100.00,3.7225,1.650000,1.000000,1.946,0.233239,0.023,1.000000,-0.253,0.330986,-0.030,0.332356,0.000,0.332356,-0.243
19457 days 12:46:19.437000,28278.7,28278.75,0.10,0.3225,1.650000,0.333333,-1.601,1.000000,1.946,0.233239,0.023,1.000000,-0.253,0.330986,-0.030,0.332356,0.000
19457 days 12:46:19.687000,28278.7,28278.75,0.10,0.4000,-0.050000,0.312049,-0.905,0.333333,-1.601,1.000000,1.946,0.233239,0.023,1.000000,-0.253,0.330986,-0.030
19457 days 12:46:22.687000,28278.7,28278.75,0.10,-0.0075,-14.648956,0.121231,2.343,0.049599,-5.032,0.223125,0.059,-1.000000,-1.151,0.260881,0.685,0.229699,6.370
19457 days 12:46:22.937000,28278.8,28278.75,0.10,-0.0075,73.162987,0.123473,-0.001,0.121231,2.343,0.049599,-5.032,0.223125,0.059,-1.000000,-1.151,0.260881,0.685
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19458 days 15:20:15.687000,30178.5,30178.55,0.10,-4.3850,23.700000,-0.026396,0.000,1.000000,123.745,1.000000,-0.673,1.000000,1.662,-0.198961,1.449,-0.360810,-6.497
19458 days 15:20:15.937000,30178.5,30179.30,100.00,-5.1975,-0.425000,-1.000000,-0.475,-0.026396,0.000,1.000000,123.745,1.000000,-0.673,1.000000,1.662,-0.198961,1.449
19458 days 15:20:16.187000,30178.6,30178.50,0.01,-4.3625,33.299130,1.000000,0.000,-1.000000,-0.475,-0.026396,0.000,1.000000,123.745,1.000000,-0.673,1.000000,1.662
19458 days 15:20:16.437000,30178.6,30178.50,0.01,-4.2425,0.100000,1.000000,1.208,1.000000,0.000,-1.000000,-0.475,-0.026396,0.000,1.000000,123.745,1.000000,-0.673


Yup it works!

In [53]:
df4["MPB"].max()

114671.75000499695