<a href="https://colab.research.google.com/github/ra2309/AIWorkshop/blob/main/Workshop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This section is related to reading and mounting data from Google Drive.

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [8]:
import os
os.chdir('drive/MyDrive/Welllogs')

The first step after mounting data is to ignest them. In most industries, it is suffcieient to read only csv, excel and other formats. In oil/gas industry, there are, however, many file formats not readable naturally by pandas or numpy. Thus, we utilize 3rd party libraries like lasio to read las files. We start by installing lasio.

In [2]:
!pip install lasio

Collecting lasio
  Downloading lasio-0.31-py2.py3-none-any.whl (47 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/47.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.4/47.4 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: lasio
Successfully installed lasio-0.31


For the purpose of testing mounting and ingestion, we read one sample file.

In [12]:
import lasio
las=lasio.read('1052987184.las')

lasio library enables to transform data into a pandas dataframe.

In [13]:
df = las.df()

In [14]:
df.head()

Unnamed: 0_level_0,ABHV,CNPOR,DCAL,DPOR,GR,RHOB,RHOC,RILD,RILM,RLL3,RXORT,SP,TBHV,MEL15,MEL20,MELCAL
DEPT,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0.0,,,,,22.9799,,,100000.0,1.0667,2062.7507,151.6998,-91.4477,,,,
0.5,,,,,24.3601,,,100000.0,4.0241,3293.4761,133.4111,-66.0708,,,,
1.0,,,,,26.1486,,,100000.0,100000.0,3339.4282,132.8695,-55.1717,,,,
1.5,,,,,28.5913,,,100000.0,100000.0,3339.6782,132.8666,-50.0492,,,,
2.0,,,,,31.5497,,,100000.0,100000.0,3339.6782,132.8666,-47.8064,,,,


Now, we go ahead and read whole set of data. We create an empty list of dataframes where we will append single dataframe read from las.

In [17]:
dfs = []
for file in os.listdir():
  print(file)
  las = lasio.read(file)
  df = las.df()
  dfs.append(df)

1053243844.las
1052987184.las
1053318726.las
1053292672.las


KeyError: ignored

We discovered an error which is a natural situation in automation and machine learning workflow. I printed list of files and shown the file that is troublesome. Now, if we open it with a text editor we will clearly see the error reason. We try one more time this time with try and except. In addition, to identify different wells from each other in this dataframe, we record name of the well.

In [21]:
dfs = []
for file in os.listdir():
  try:
    las = lasio.read(file)
    df = las.df()
    df['WELL'] = file.split('.las')[0]
    dfs.append(df)
  except Exception as e:
    print(file)
    print(e)

1053292672.las
'No ~ sections found. Is this a LAS file?'


Now, we convert list of dfs into a one single dataframe with a command called concat.

In [22]:
import pandas as pd
big_df = pd.concat(dfs)

In [23]:
big_df.head()

Unnamed: 0_level_0,CNPOR,DPOR,GR,RHOB,RHOC,RILD,RILM,RLL3,RXORT,SP,...,DPHL,PEF,NPHL,PXND,MINV,MNOR,DPHS,NPHS,DPHD,NPHD
DEPT,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2.5,,,29.4789,,,,,,,,...,,,,,,,,,,
3.0,,,29.5145,,,,,,,,...,,,,,,,,,,
3.5,,,28.8759,,,,,,,,...,,,,,,,,,,
4.0,,,28.5174,,,,,,,,...,,,,,,,,,,
4.5,,,29.2807,,,,,,,,...,,,,,,,,,,


We check number of columns that are there.

In [24]:
big_df.columns

Index(['CNPOR', 'DPOR', 'GR', 'RHOB', 'RHOC', 'RILD', 'RILM', 'RLL3', 'RXORT',
       'SP', 'MEL15', 'MEL20', 'WELL', 'ABHV', 'DCAL', 'TBHV', 'MELCAL', 'PE',
       'AVTX', 'BVTX', 'MINMK', 'CILD', 'CNDL', 'CNLS', 'CNSS', 'LSPD', 'LTEN',
       'MCAL', 'MI', 'MN', 'DT', 'ITT', 'SPOR', 'DEVI', 'DTMP', 'NPOR', 'GK1',
       'IA10_2', 'IA20_2', 'IA30_2', 'IA60_2', 'IA90_2', 'CIA90_2', 'RXO_2',
       'RT_2', 'CALI', 'DRHO', 'DPHL', 'PEF', 'NPHL', 'PXND', 'MINV', 'MNOR',
       'DPHS', 'NPHS', 'DPHD', 'NPHD'],
      dtype='object')