## Necessary libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error

## Control points

In [None]:
# Load of digitazed control points from Gen-6 plot, at 75ºF, and presentation (first 5 lines)

df_gen6 = pd.read_excel('gen6_75f.xlsx', sheet_name='gen6_75f')
df_gen6.head(5)

In [None]:
# Statistical summary of the control points

df_gen6.describe()

In [None]:
# Plots of the control points

plt.figure(figsize=(18,6))

plt.subplot(121)
plt.plot(df_gen6['sal']/1000, df_gen6['rw'], '.', label="SLB Gen-6")
# plt.xlim(0,300)
# plt.ylim(0,35)
plt.legend()
plt.grid(True)
plt.xlabel("Salinity(kppm)")
plt.ylabel("Rw@75ºF(Ohm.m)")

plt.subplot(122)
plt.plot(df_gen6['sal']/1000, df_gen6['cw'], '.', label="SLB Gen-6")
# plt.xlim(0,300)
# plt.ylim(0,35)
plt.legend()
plt.grid(True)
plt.xlabel("Salinity(kppm)")
plt.ylabel("Estimated Cw@75ºF(s/m)")

plt.show()

Fig.1 - Control points from SLB Gen-6

## Input salinities and conductivities for verification

The maximum salinity of the water is around 263,080 ppm (26.3 wt%) so values way above this limit are unreal. Just for verification purpose a collection (DataFrame) of salinities are created from 1 to 300,000 ppm (step of 10) and its corresponding conductivity

In [None]:
# Array of water salinity, from 1 to 300,000 ppm, and conductivity

df_salrw = pd.DataFrame(np.arange(1,300000,10), columns = ["sal"])

In [None]:
df_salrw.describe()

## Salinity to water resistivity

As the control points were taken only at 75º F, all the calculation with the verification pairs were done at the same temperature.

### By Crain (C)

In [None]:
# Resistivity by Cain (rw75c) from salinity

df_salrw ['rw75c'] = ((400000 / 75) / df_salrw ['sal']) ** 0.88

# Corresponding conductivity

df_salrw ['cw75c'] = 1/df_salrw ['rw75c']

### By Bateman-Konen (BK)

In [None]:
# Resistivity by Bateman-Konen, from salinity

df_salrw ['rw75bk'] = 0.0123 + (3647.5 / df_salrw ['sal']**0.955)

# Corresponding conductivity

df_salrw ['cw75bk'] = 1/df_salrw ['rw75bk']

### By Kennedy (K)

Kennedy proposed a quadratic equation to get conductivity (cwk) from salinity in % (sal/1000):

    cwk = a(sal/10000 - ws0)^2 + b(sal/10000 - ws0) + c0

In [None]:
# Conductivity by Kenndy (cwk), from salinity

a = - 0.02922
b = - 0.0364
ws0 = 29.46518957
c0 = 24.30854

df_salrw ['cw75k'] = a*((df_salrw ['sal']/10000) - ws0)**2 + b*((df_salrw ['sal']/10000) - ws0) + c0

# Corresponding resistivity

df_salrw ['rw75k'] = 1/df_salrw ['cw75k']

In [None]:
df_salrw.describe()

## Plots of salinity to estimated water resistivity

In [None]:
plt.figure(figsize=(18,5))

plt.subplot(121)
plt.loglog(df_gen6['sal']/1000, df_gen6['rw'], '.', label="SLB Gen-6")
plt.loglog(df_salrw['sal']/1000, df_salrw['rw75c'], label="Crain")
plt.loglog(df_salrw['sal']/1000, df_salrw['rw75bk'], label="Bateman-Konen")
plt.loglog(df_salrw['sal']/1000, df_salrw['rw75k'], label="Kennedy")
plt.legend()
plt.xlim(0.1,300)
plt.ylim(0.1,100)
plt.grid(True)
plt.xlabel("Salinity(kppm)")
plt.ylabel("Estimated Rw@75ºF(ohmm)")
plt.text(0.13, 60, 'A', fontsize=14, weight="bold")

plt.subplot(122)
plt.plot(df_gen6['sal']/1000, df_gen6['cw'], '.', label="SLB Gen-6")
plt.plot(df_salrw['sal']/1000, df_salrw['cw75c'], label="Crain")
plt.plot(df_salrw['sal']/1000, df_salrw['cw75bk'], label="Bateman-Konen")
plt.plot(df_salrw['sal']/1000, df_salrw['cw75k'], label="Kennedy")
plt.xlim(0,300)
plt.ylim(0,35)
plt.legend()
plt.grid(True)
plt.xlabel("Salinity(kppm)")
plt.ylabel("Estimated Cw@75ºF(s/m)")
plt.text(260, 33, 'B', fontsize=14, weight="bold")

plt.show()

Fig.2 - Salinity to estimated water resistivity (A) and conductivity (B)

## Preliminary observation

The loglog plot of salinity vs estimated water resistivity, figure 2A, doesn't help in the verification. Figure 2B, the loglog plot of salinity vs estimated conductivity, contributes more in that objective. In that plot is clear that models start to diverge above salinities of 160,000 ppm. Beyond that point the Kennedy formula is the only that follows the control points. Let do another plot to confirm this preliminary observation.

## Residual respect to the control points

By using np.polyfit and np.poly1d an regression equation for the control points can be obtained, then salinity for verification can be  plugged in this equation in order to have a continuous version of the control points, which is the reference for computing the residual. 

In [None]:
# Regression equation of the control points

x = df_gen6['sal']
y = df_gen6['cw']

degree = 2
poly_coeff = np.polyfit(x,y, degree)
poly_eq = np.poly1d(poly_coeff) 

new_y = np.poly1d(poly_coeff)
print(new_y)

In [None]:
# Continuous control points

continous_x = df_salrw ['sal']
continous_y = new_y(continous_x)

plt.figure(figsize=(8,5))
plt.plot(x/1000, y, '.',label="SLB Gen-6")
plt.plot(continous_x/1000, continous_y, label="Regression")
plt.xlabel("Salinity(kppm)")
plt.ylabel("Estimated Cw@75ºF(s/m)")
plt.grid(True)
plt.xlim(0,300)
plt.legend()
plt.show()

Fig.3 - Regression of the control data

In [None]:
plt.figure(figsize=(8,5))
plt.plot(continous_x/1000, continous_y - continous_y, label="Regression from SLB Gen-6")
plt.plot(continous_x/1000, continous_y - df_salrw['cw75c'], label="Crain")
plt.plot(continous_x/1000, continous_y - df_salrw['cw75bk'], label="Bateman-Konen")
plt.plot(continous_x/1000, continous_y - df_salrw['cw75k'], label="Kennedy")
plt.xlabel("Salinity(kppm)")
plt.ylabel("Residual")
plt.grid(True)
plt.xlim(0,300)
plt.legend()
plt.show()

Fig.4 - Residual of the three formulas respect to the control data

## Final observation

A residual of formula respect a control data (regression in this case) should be distributed around zero. Figure 4 confirms what in shown in figure 3B, that the Kennedy formula is the best from salinity to conductivity. The other two formulas show strong trends far from zero, specially between 140 and 160 kppm of salinity.