<a href="https://colab.research.google.com/github/sunyingjian/AI-in-well-logging/blob/master/%E2%80%9C02_Map_View_ipynb%E2%80%9D%E7%9A%84%E5%89%AF%E6%9C%AC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring the FORCE 2020 Well Log Challenge - Part 2
## Map plots 

**Brendon Hall, Enthought**

bhall@enthought.com

Welcome back!  In the [first notebook](https://github.com/brendonhall/FORCE-2020-Lithology/blob/master/notebooks/01-Log-Plot-MPL.ipynb), I showed how to use `matplotlib` to display any well curves for any well in the data supplied for the [2020 FORCE Machine Learning Contest](https://xeek.ai/challenges/force-well-logs/overview). In this notebook, I'm going to use `plotly` to display the well locations on a map. We will create an interactive map that looks like this:

![map of wells](https://github.com/brendonhall/FORCE-2020-Lithology/blob/master/notebooks/images/map_view.png?raw=1)

This will help build an intuition for how the wells are related spatially.  Perhaps looking at the data in this way will make it easier to apply geologic constraints. Wells that are closer together might have properties that are more correlated with each other, and this could be a useful fact to exploit when building machine learning models to predict log curves and lithofacies. 

Please get in touch if you have any questions.  You can also join in the conversation on [Software Underground's slack](https://softwareunderground.org/slack) in the **#force_2020_ml_contest** channel.

Feel free to use this code, hack it, adapt it for your own needs.

The well log data is licensed as [Norwegian License for Open Government Data (NLOD) 2.0](https://data.norge.no/nlod/en/2.0/).
The well log labels that are included are provided by FORCE 2020 Machine Learning Contest under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/).

In [1]:
!git clone https://github.com/brendonhall/FORCE-2020-Lithology.git

Cloning into 'FORCE-2020-Lithology'...
remote: Enumerating objects: 33, done.[K
remote: Counting objects: 100% (33/33), done.[K
remote: Compressing objects: 100% (25/25), done.[K
remote: Total 33 (delta 9), reused 24 (delta 4), pack-reused 0[K
Unpacking objects: 100% (33/33), done.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import os.path

import numpy as np
import pandas as pd

import plotly.express as px

pd.options.display.max_rows = 8

In [None]:
# change this to the location of the training data on your disk if
# you have already downloaded it
local_train_csv = '/content/train.csv'
train_df = pd.read_csv(local_train_csv, sep=';')
    
train_well_names = train_df['WELL'].unique()

In [None]:
train_df

Unnamed: 0,WELL,DEPTH_MD,X_LOC,Y_LOC,Z_LOC,GROUP,FORMATION,CALI,RSHA,RMED,RDEP,RHOB,GR,SGR,NPHI,PEF,DTC,SP,BS,ROP,DTS,DCAL,DRHO,MUDWEIGHT,RMIC,ROPA,RXO,FORCE_2020_LITHOFACIES_LITHOLOGY,FORCE_2020_LITHOFACIES_CONFIDENCE
0,15/9-13,494.5280,437641.96875,6470972.5,-469.501831,NORDLAND GP.,,19.480835,,1.611410,1.798681,1.884186,80.200851,,,20.915468,161.131180,24.612379,,34.636410,,,-0.574928,,,,,65000,1.0
1,15/9-13,494.6800,437641.96875,6470972.5,-469.653809,NORDLAND GP.,,19.468800,,1.618070,1.795641,1.889794,79.262886,,,19.383013,160.603470,23.895531,,34.636410,,,-0.570188,,,,,65000,1.0
2,15/9-13,494.8320,437641.96875,6470972.5,-469.805786,NORDLAND GP.,,19.468800,,1.626459,1.800733,1.896523,74.821999,,,22.591518,160.173615,23.916357,,34.779556,,,-0.574245,,,,,65000,1.0
3,15/9-13,494.9840,437641.96875,6470972.5,-469.957794,NORDLAND GP.,,19.459282,,1.621594,1.801517,1.891913,72.878922,,,32.191910,160.149429,23.793688,,39.965164,,,-0.586315,,,,,65000,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1170507,7/1-2 S,3169.4644,,,,VESTLAND GP.,Bryne Fm.,8.379244,,,,2.537613,75.363937,,,7.019858,,,8.5,28.024338,,,-0.007600,,,26.840818,,65030,2.0
1170508,7/1-2 S,3169.6164,,,,VESTLAND GP.,Bryne Fm.,8.350248,,,,2.491860,66.452843,,,9.049782,,,8.5,28.091282,,,-0.018297,,,27.007942,,65030,2.0
1170509,7/1-2 S,3169.7684,,,,VESTLAND GP.,Bryne Fm.,8.313779,,,,2.447539,55.784817,,,8.903917,,,8.5,28.019775,,,-0.011438,,,27.175179,,65030,2.0
1170510,7/1-2 S,3169.9204,,,,VESTLAND GP.,Bryne Fm.,8.294910,,,,2.430716,48.432129,,,9.150043,,,8.5,25.985943,,,-0.011398,,,27.342442,,65030,2.0


我们还将在教程中查看测试井，并根据训练数据查看这些井的位置。 与上面相同，如果您尚未下载测试数据，则该单元将下载该数据。

In [None]:
local_test_csv = '/content/test.csv'

test_df = pd.read_csv(local_test_csv, sep=';')

test_well_names = test_df['WELL'].unique()

对于本教程，我们只需要两个数据集中的唯一孔名称。 让我们结合两个名称数组，并仅使用此列创建一个数据框。 稍后我们将需要它与井元数据合并。

In [None]:
well_names = np.concatenate((train_well_names, test_well_names))
# need this array in a DataFrame for a merge operation below
well_names_df = pd.DataFrame({'WELL':well_names})

well_names_df

Unnamed: 0,WELL
0,15/9-13
1,15/9-15
2,15/9-17
3,16/1-2
...,...
104,34/3-3 A
105,34/6-1 S
106,35/6-2 S
107,35/9-8


组合数据集中共有108口井(训练井98口，测试井10口)

要绘制孔的位置，我们需要知道它们的位置。 训练数据集包含一些UTM坐标（东/北）中的轨迹信息。 我们可以将第一个位置作为井的位置。 但是，并非数据集中的所有孔都具有有效位置。 让我们来看看[NPD Factpages摘要页面]（https://factpages.npd.no/en/wellbore/tableview/exploration/all）中的一些“元”数据。 点击表格第一行上方的“导出CSV”，将数据保存到磁盘。

这些数据包含有关油井本身的大量信息，例如操作员，类型，田地，地层以及位置。 给出了两个UTM井口空间坐标以及纬度和经度。 我们将使用经/纬度和一小部分数据与地图图中的井一起显示。 查看NPD网站上的列，并考虑它们的含义。 我们可能会在功能设计中使用一些有用的信息。

In [None]:
# location of the meta data csv on your machine 
well_meta_csv = '/content/numpy-private/挪威测井 相关数据.csv'

if not os.path.isfile(well_meta_csv):
    # load from s3
    print('Loading meta data from disk.')
    s3_meta_csv = 's3://zarr-depot/wells/FORCE: Machine Predicted Lithology/wellbore_exploration_all.csv'
    well_meta_df = pd.read_csv(s3_meta_csv)
    well_meta_df.to_csv(well_meta_csv, index=False)
    
else:
    # load from disk
    print('Loading meta data from disk.')
    well_meta_df = pd.read_csv(well_meta_csv)

# rename the columns so they are more readable
well_meta_df.rename(columns={'wlbWellboreName': 'WELL',
                             'wlbWell': 'WELL_HEAD',
                            'wlbNsDecDeg': 'lat',
                            'wlbEwDesDeg': 'lon',
                            'wlbDrillingOperator': 'Drilling Operator',
                            'wlbPurposePlanned': 'Purpose',
                            'wlbCompletionYear': 'Completion Year',
                            'wlbFormationAtTd': 'Formation'}, inplace=True)

# get df of WELL_HEAD and the lat long
well_locations_df = well_meta_df[['WELL_HEAD', 'lat', 'lon']].drop_duplicates(subset=['WELL_HEAD'])

# we only need a few of the columns for the map plot
well_meta_df = well_meta_df[['WELL','Drilling Operator',
                            'Purpose','Completion Year', 'Formation']]

well_locations_df

Loading meta data from disk.


Unnamed: 0,WELL_HEAD,lat,lon
0,1/2-1,56.887519,2.476583
1,1/2-2,56.992222,2.496572
2,1/3-1,56.855833,2.851389
3,1/3-2,56.936111,2.750000
...,...,...,...
1949,7325/1-1,73.913528,25.116714
1950,7325/4-1,73.649319,25.178261
1951,7335/3-1,73.997183,35.837147
1952,7435/12-1,74.071725,35.808628


如您所见，元数据文件中有很多孔（将近2000孔）。 我们需要将数据集中的孔与这些孔进行比较，以提取相应的元数据。 使用[NPD井眼和井眼命名准则]中概述的方案命名井眼（https://www.npd.no/globalassets/1-npd/regelverk/tematiske-veiledninger/eng/guidelines-for-designation -of-wells and-wellbores.pdf）。 井使用以下格式标识：

[象限编号]/[区块编号]-[井眼ID][侧钻等]

例如，在名为的数据集中有两个井

`34/5-1 A`, and  `34/5-1 S`

这些井具有相同的井口，并代表不同的侧向（相似）。 元文件并不包含我们训练数据中表示的每个旁听的信息。 因此，让我们仅从井名称中提取井头前缀，然后使用它来获取井的位置。

In [None]:
def base_well_name(row):
    
    well_name = row['WELL']
    
    return well_name.split()[0]

# apply the function to extract the WELL_HEAD base name from the well
well_names_df['WELL_HEAD'] = well_names_df.apply(lambda row: base_well_name(row), axis=1)

# merge with location data to get lat/lon
locations_df = well_names_df.merge(well_locations_df, how='inner', on='WELL_HEAD')
# merge with the meta data to get other data
locations_df = locations_df.merge(well_meta_df, how='left', on='WELL')
locations_df 

Unnamed: 0,WELL,WELL_HEAD,lat,lon,Drilling Operator,Purpose,Completion Year,Formation
0,15/9-13,15/9-13,58.373878,1.934128,Den norske stats oljeselskap a.s,APPRAISAL,1982.0,ZECHSTEIN GP
1,15/9-15,15/9-15,58.302069,1.922131,Den norske stats oljeselskap a.s,WILDCAT,1982.0,SKAGERRAK FM
2,15/9-17,15/9-17,58.445608,1.948217,Den norske stats oljeselskap a.s,WILDCAT,1983.0,SMITH BANK FM
3,16/1-2,16/1-2,58.935894,2.222239,Esso Exploration and Production Norway A/S,APPRAISAL,1976.0,BASEMENT
...,...,...,...,...,...,...,...,...
104,34/3-3 A,34/3-3,61.795136,2.717883,BG Norge AS,APPRAISAL,2012.0,BURTON FM
105,34/6-1 S,34/6-1,61.582317,2.685472,Norske Conoco A/S,WILDCAT,2002.0,LUNDE FM
106,35/6-2 S,35/6-2,61.533606,3.911311,StatoilHydro Petroleum AS,WILDCAT,2009.0,NO FORMAL NAME
107,35/9-8,35/9-8,61.285269,3.675594,Wintershall Norge AS,APPRAISAL,2013.0,RANNOCH FM


我们拥有数据集中所有108口井的数据。

现在，让我们向井位数据列表中添加一列，指示这口井是训练井，还是来自测试数据集。

In [None]:
locations_df.loc[locations_df['WELL'].isin(train_well_names), 'Dataset'] = 'Train'
locations_df.loc[locations_df['WELL'].isin(test_well_names), 'Dataset'] = 'Test'

# save the location and meta data for future use.
locations_df.to_csv('force_2020_meta.csv', index=False)

最后，我们可以使用该位置数据在地图上绘制井位。 我将使用Plotly的[`scatter_mapbox`]（https://plotly.github.io/plotly.py-docs/genic/plotly.express.scatter_mapbox.html）来完成此操作。 那里有很多基于Python的映射选项，但是在下一个笔记本中，我将使用Plotly的Dash框架构建一个交互式仪表板，我们可以在其中放置该图。

对于此图，我们将使用Plotly的[mapbox]（https://www.mapbox.com/）界面使用纬度和经度坐标查看井的位置。 这使用的是“开放街道地图”样式，因此我们不需要API密钥。 mapbox提供了许多[很酷的样式]（https://plotly.com/python/mapbox-layers/）来更改绘图的外观。 其中一些要求您[注册]（https://docs.mapbox.com/help/how-mapbox-works/access-tokens/）以获取API密钥。

孔用彩色圆点表示。 训练数据集中的孔为​​蓝色，测试集中的孔为​​红色。 将鼠标悬停在孔位置上时，将显示与每个孔相关的元数据。 使用下面的“ scatter_mapbox”函数中的“ hover_data”参数可以很容易地进行配置。

In [None]:
fig = px.scatter_mapbox(locations_df, lat="lat", lon="lon",
                        color='Dataset', 
                        zoom=5, height=600,
                        hover_data={'WELL': True,
                                    'lat': False,
                                    'lon': False,
                                    'Dataset': False,
                                    'Drilling Operator': True,
                                    'Purpose': True,
                                    'Completion Year': True,
                                    'Formation': True}
                        )
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":200,"t":20,"l":200,"b":0})
fig.show()

RuntimeError: ignored

比赛数据中似乎至少有三个点的群集。 北部有两个集群，向南有一个细长的集群。 每个群集至少有几个测试井。 也许可以使用聚类算法（如kmeans）来分配空间聚类ID，这可能是机器学习的有用功能？ 在不久的将来要测试的东西。

In the next notebook we'll build on this map plot, and add some interactivity using Plotly's Dash framework.  I'll show how to build a tool to select and visualize groups of wells not only based on location, but also what curves each well possesses.

This notebook is open source content. Text is CC-BY-4.0, code is [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).

### References

Bormann P., Aursand P., Dilib F., Dischington P., Manral S. (2020) 2020 FORCE Machine Learning Contest. https://github.com/bolgebrygg/Force-2020-Machine-Learning-competition