# Data Import 数据导入

In this notebook we present how the data should be imported in order to be used with the automatic seismic to well tie package _wtie_. Instructions to install the package can be found in the README.md file.
在本 notebook 中，我们展示了如何导入数据以便与自动地震到井标定包 _wtie_ 一起使用。安装该包的说明可以在 README.md 文件中找到。

In order to use the package with your own data, you must first implement small python utilities to load the data from their original format and store it into _wtie_'s internal format. Example functions can be found in the file _wtie/utils/datasets/tutorial.py_. In the following, we show how to do so step by step.
为了将该包用于您自己的数据，您必须首先实现一些小的 python 实用程序，以从其原始格式加载数据并将其存储到 _wtie_ 的内部格式中。示例函数可以在文件 _wtie/utils/datasets/tutorial.py_ 中找到。下面，我们将逐步展示如何操作。


#### Load packages 加载包


In [None]:
import os
import sys
from pathlib import Path
from pprint import pprint

import lasio
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import segyio

project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
if project_root not in sys.path:
    sys.path.append(project_root)

from wtie import grid, viz
from wtie.processing.logs import interpolate_nans

# uncomment if your browser supports it
# %matplotlib notebook

## Dataset 数据集

In this tutorial we work on data from the Well 159-19A of the open [Volve dataset](https://www.equinor.com/en/what-we-do/digitalisation-in-our-dna/volve-field-data-village-download.html). Data is saved in the folder **data/tutorial**.
在本教程中，我们使用来自开放的 [Volve 数据集](https://www.equinor.com/en/what-we-do/digitalisation-in-our-dna/volve-field-data-village-download.html) 中 159-19A 井的数据。数据保存在 **data/tutorial** 文件夹中。


In [None]:
# data path
folder = Path("../data/tutorial/Volve")
trajectory_path = folder / "volve_path_15_9-19_A.txt"
table_path = folder / "volve_checkshot_15_9_19A.txt"
logs_path = folder / "volve_159-19A_LFP.las"
seis_path = folder / "volve_15_9_19A_gather.sgy"

assert folder.exists()

## Data objects 数据对象

The package implements a series of data objects that must be employed in order to ensure compatibility with the various methods. All classes are present in the _grid.py_ file (`from wtie import grid`). Each class follows a similar logic, and must be provided with the following information:
该包实现了一系列数据对象，必须使用这些对象以确保与各种方法的兼容性。所有类都存在于 _grid.py_ 文件中 (`from wtie import grid`)。每个类都遵循相似的逻辑，并且必须提供以下信息：

- the _data values_ (e.g. seismic or log amplitudes).
- _数据值_ (例如地震或测井振幅)。
- the _data basis_ (e.g. from 1.2 to 1.9s with a 0.001s sampling rate).
- _数据基准_ (例如从 1.2 到 1.9 秒，采样率为 0.001 秒)。
- the _basis type_ (e.g. two-way-time or measured depth).
- _基准类型_ (例如双程时或测量深度)。
  The different basis types supported can be found in `grid.EXISTING_BASIS_TYPES`. Units of the basis must be as writen (i.e. meters, seconds and degrees).
  支持的不同基准类型可以在 `grid.EXISTING_BASIS_TYPES` 中找到。基准的单位必须按规定书写 (即米、秒和度)。


In [None]:
pprint(grid.EXISTING_BASIS_TYPES)

↑ 这些是地球物理和测井领域中常用的基准类型（坐标轴）及其单位，含义如下：

- **`'angle': 'Angle [°]'`**: **角度**，单位是度(°)。通常用于叠前地震数据，表示地震波的入射角。
- **`'md': 'MD (kb) [m]'`**: **测量深度 (Measured Depth)**，单位是米(m)。指沿着井眼轨迹从井口参考点（通常是转盘衬套 Kelly Bushing, kb）开始测量的深度。
- **`'tlag': 'Lag [s]'`**: **时间延迟 (Time Lag)**，单位是秒(s)。通常指两个时间序列（如合成记录和实际地震数据）之间的时移量。
- **`'tvdkb': 'TVDKB [m]'`**: **从转盘衬套算起的真垂直深度 (True Vertical Depth from Kelly Bushing)**，单位是米(m)。指井下某点到井口参考点（KB）的垂直距离。
- **`'tvdss': 'TVDSS (MSL) [m]'`**: **海平面以下的真垂直深度 (True Vertical Depth Sub-Sea)**，单位是米(m)。指井下某点到平均海平面（Mean Sea Level, MSL）的垂直距离。
- **`'twt': 'TWT [s]'`**: **双程旅行时 (Two-Way Time)**，单位是秒(s)。指地震波从地表到地下反射界面再返回到地表所用的时间。
- **`'zlag': 'Lag [m]'`**: **深度延迟 (Depth Lag)**，单位是米(m)。指两个深度序列之间的深度差或位移。


### Import well logs 导入井测井

We use the [lasio](https://github.com/kinverarity1/lasio) library to read the las file, read the data in a pandas [dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) and store it into the custom objects `grid.Log` and `grid.LogSet`.
我们使用 [lasio](https://github.com/kinverarity1/lasio) 库来读取 las 文件，将数据读入 pandas [dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) 中，并将其存储到自定义对象 `grid.Log` 和 `grid.LogSet` 中。


In [None]:
las_logs = lasio.read(logs_path)
las_logs = las_logs.df()
las_logs.head(4)

In [None]:
basis = las_logs["LFP_VP"].index
vp = las_logs["LFP_VP"].values

fig, ax = plt.subplots()
ax.plot(basis, vp)  # type: ignore

The logs are recorded in _measured depth_, according to the `grid.EXISTING_BASIS_TYPES` we therefore need to set the `basis_type` variable as `'md'`.
测井是按 _测量深度_ 记录的，根据 `grid.EXISTING_BASIS_TYPES`，我们因此需要将 `basis_type` 变量设置为 `'md'`。


In [None]:
Vp = grid.Log(las_logs["LFP_VP"].values, las_logs["LFP_VP"].index, "md", name="Vp")
viz.plot_trace(Vp);

In [None]:
def import_logs(file_path: str) -> grid.LogSet:
    file_path = Path(file_path)  # type: ignore
    assert file_path.name == "volve_159-19A_LFP.las"  # type: ignore

    # Read file
    las_logs = lasio.read(file_path)
    las_logs = las_logs.df()

    # Select some logs, there are more, we only load the follwoing
    # must at least contain the keys 'Vp' for acoustic velocity
    # and 'Rho' for the bulk density. 'Vs', for shear velocity, must also
    # be imported if one whishes to perform a prestack well-tie.
    # Other logs are optional.
    log_dict = {}

    log_dict["Vp"] = grid.Log(las_logs["LFP_VP"].values, las_logs["LFP_VP"].index, "md", name="Vp")
    log_dict["Vs"] = grid.Log(las_logs["LFP_VS"].values, las_logs["LFP_VS"].index, "md", name="Vs")

    # Density contains some NaNs, I fill them with linear interpolation.
    log_dict["Rho"] = grid.Log(
        interpolate_nans(las_logs["LFP_RHOB"].values),  # type: ignore
        las_logs["LFP_RHOB"].index,
        "md",
        name="Rho",
    )

    log_dict["GR"] = grid.Log(interpolate_nans(las_logs["LFP_GR"].values), las_logs["LFP_VP"].index, "md")  # type: ignore
    log_dict["Cali"] = grid.Log(las_logs["LFP_CALI"].values, las_logs["LFP_VP"].index, "md")

    return grid.LogSet(log_dict)

In [None]:
logset_md = import_logs(logs_path)  # md is for measured depth  # type: ignore
print(logset_md)

In [None]:
viz.plot_logset(logset_md);

### Import Seismic 导入地震数据

We use the package [segyio](https://github.com/equinor/segyio) to read the segy and strore the data in the `grid.Seismic` and `grid.PreStackSeismic` objects. The provided segy is a composite angle gather extracted along the well path.
我们使用 [segyio](https://github.com/equinor/segyio) 包来读取 segy 文件，并将数据存储在 `grid.Seismic` 和 `grid.PreStackSeismic` 对象中。提供的 segy 文件是沿井路径提取的复合角度道集。


In [None]:
with segyio.open(seis_path, "r") as f:
    print(f.samples.size)  # number of time samples  # type: ignore
    print(f.ilines)
    print(f.xlines)
    print(f.offsets)  # these are actually angles, from 0 to 45 degrees

In [None]:
def import_seismic(file_path: str) -> grid.Seismic:
    file_path = Path(file_path)  # type: ignore
    assert file_path.name == "volve_15_9_19A_gather.sgy"  # type: ignore

    with segyio.open(file_path, "r") as f:
        _twt = f.samples / 1000  # two-way-time in seconds  # type: ignore
        _seis = np.squeeze(segyio.tools.cube(f))  # 2D (angles, samples)

    # stacking the first 8 angles
    _seis = np.sum(_seis[:8, :], axis=0)

    return grid.Seismic(_seis, _twt, "twt", name="Real seismic")


def import_prestack_seismic(file_path: str) -> grid.PreStackSeismic:
    """For simplicity, only angle gathers are allowed."""
    file_path = Path(file_path)  # type: ignore
    assert file_path.name == "volve_15_9_19A_gather.sgy"  # type: ignore

    with segyio.open(file_path, "r") as f:
        _twt = f.samples / 1000  # type: ignore
        _seis = np.squeeze(segyio.tools.cube(f))
        _angles = f.offsets

    seismic = []
    for i, theta in enumerate(_angles):  # type: ignore
        seismic.append(grid.Seismic(_seis[i, :], _twt, "twt", theta=theta))

    return grid.PreStackSeismic(seismic, name="Real gather")  # type: ignore

In [None]:
seismic = import_seismic(seis_path)  # type: ignore
gather = import_prestack_seismic(seis_path)  # type: ignore
viz.plot_trace(seismic)
viz.plot_prestack_trace_as_pixels(gather, figsize=(7, 9));

### Well trajectory 井轨迹

We store the well trajectory in the `grid.WellPath` object.
我们将井轨迹存储在 `grid.WellPath` 对象中。


In [None]:
print(grid.WellPath.__doc__)
print(grid.WellPath.__init__.__doc__)

In [None]:
def import_well_path(file_path: str) -> grid.WellPath:
    file_path = Path(file_path)  # type: ignore
    assert file_path.name == "volve_path_15_9-19_A.txt"  # type: ignore

    _wp = pd.read_csv(file_path, header=None, delimiter=r"\s+", names=("MD (kb) [m]", "Inclination", "Azimuth"))

    kb = 25  # meters

    _tvd = grid.WellPath.get_tvdkb_from_inclination(
        _wp.loc[:, "MD (kb) [m]"].values,  # type: ignore
        _wp.loc[:, "Inclination"].values[:-1],  # type: ignore
    )
    _tvd = grid.WellPath.tvdkb_to_tvdss(_tvd, kb)

    return grid.WellPath(md=_wp.loc[:, "MD (kb) [m]"].values, tvdss=_tvd, kb=kb)  # type: ignore

In [None]:
wellpath = import_well_path(trajectory_path)  # type: ignore
viz.plot_wellpath(wellpath);

### Time-Depth relation table 时深关系表

We strore the depth-time relation table to a `grid.TimeDepthTable` object.
我们将时深关系表存储到 `grid.TimeDepthTable` 对象中。


In [None]:
print(grid.TimeDepthTable.__doc__)
print(grid.TimeDepthTable.__init__.__doc__)

In [None]:
def import_time_depth_table(file_path: str) -> grid.TimeDepthTable:
    file_path = Path(file_path)  # type: ignore
    assert file_path.name == "volve_checkshot_15_9_19A.txt"  # type: ignore

    _td = pd.read_csv(
        file_path, header=None, delimiter=r"\s+", skiprows=[0], names=("Curve Name", "TVDBTDD", "TVDKB", "TVDSS", "TWT")
    )

    _twt = _td.loc[:, "TWT"].values / 1000  # seconds  # type: ignore
    _tvdss = np.abs(_td.loc[:, "TVDSS"].values)  # meters  # type: ignore

    return grid.TimeDepthTable(twt=_twt, tvdss=_tvdss)

In [None]:
td_table = import_time_depth_table(table_path)  # type: ignore
viz.plot_td_table(td_table);