# Module 3 | Loading in Geoscientific data

Hello! In this module we will load in some geo-whatever data into python for future use.

In [19]:
import pandas as pd
import numpy as np

## Reading in a .las file

.las files are common for downhole geophysical measurements (gamma ray, deep resistivity). They are commonly used in the oil and gas industry, but also used in enviromental and mining industries. To open up this file type, we will install a package called lasio, that will handle I/O.

In [20]:
!pip install lasio



In [3]:
import lasio

In [4]:
lasfile = '../1_data/561689E.las'

In [5]:
las = lasio.read(lasfile)

In one line, we can convert the las object to a dataframe!

In [6]:
df = las.df()

In [7]:
df

Unnamed: 0_level_0,BIT,BVOL,CAL,CNCF,CVOL,DT,GR,M2R1,M2R2,M2R3,...,M2RX,PE,PORA,PORZ,RWAZ,SPDH,TEN,TT,ZCOR,ZDEN
DEPT,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0.0,,,,,,,223.172,,,,...,,,,,,,,,,
0.5,,,,,,,240.508,,,,...,,,,,,,,,,
1.0,,,,,,,251.382,,,,...,,,,,,,,,,
1.5,,,,,,,256.477,,,,...,,,,,,,,,,
2.0,,,,,,,250.439,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8078.0,8.75,,,,,,,,,,...,,,,,,,737.130,,,
8078.5,8.75,,,,,,,,,,...,,,,,,,703.401,,,
8079.0,8.75,,,,,,,,,,...,,,,,,,,,,
8079.5,8.75,,,,,,,,,,...,,,,,,,,,,


Once the well data is loaded into a data frame, you can export it as a .csv, a JSON, or other pythontic functions. If your interested in using well logs further, check out the turtorial on the github.

## Reading in USGS River Data

USGS has river data from everywhere, we will load in some USGS data from Western Colorado. There are two files, discharge and temperature. Link to the data is [here](https://waterdata.usgs.gov/co/nwis/inventory/?site_no=09070500). The file has a header that is commented with a # infront of it. There are two ways we can handle this. The other small change I did was to make these files a .txt file.

#### USGS Temperature Data

In [8]:
rowskip = np.arange(0,35)
rowskip = rowskip.tolist()
rowskip.append(36)
# rowskip # uncomment if you want to qc the output

In [9]:
folder = '../1_data/'
usgs_temp = 'monthly_temp.txt'

In [10]:
df_temp = pd.read_csv(folder + usgs_temp, skiprows=rowskip, sep = "\t")

In [11]:
df_temp.head()

Unnamed: 0,agency_cd,site_no,parameter_cd,ts_id,year_nu,month_nu,mean_va
0,USGS,9070500,10,18627,1980,3,4.26
1,USGS,9070500,10,18627,1980,4,8.27
2,USGS,9070500,10,18627,1980,5,9.94
3,USGS,9070500,10,18627,1980,6,12.2
4,USGS,9070500,10,18627,1980,8,17.66


#### USGS Discharge data

Let's load in the discharge data

In [12]:
usgs_discharge = '../1_data/monthly_distcharge.txt'

In [13]:
df_dist = pd.read_csv(usgs_discharge, delim_whitespace=True, comment="#", header=0, skiprows=[1])

In [14]:
df_dist.head()

Unnamed: 0,agency_cd,site_no,parameter_cd,ts_id,year_nu,month_nu,mean_va
0,USGS,9070500,60,18624,1980,2,990.2
1,USGS,9070500,60,18624,1980,3,1020.0
2,USGS,9070500,60,18624,1980,4,1645.0
3,USGS,9070500,60,18624,1980,5,5682.0
4,USGS,9070500,60,18624,1980,6,7134.0


# SEGY Headers

segy data is seismic data, if you have questions about what it is, here is a great blog post: https://agilescientific.com/blog/2014/3/26/what-is-seg-y.html . If you want to check out the headers, you can use obspy

In [15]:
!pip install obspy

Collecting obspy
  Using cached obspy-1.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.5 MB)
Collecting numpy>=1.20 (from obspy)
  Obtaining dependency information for numpy>=1.20 from https://files.pythonhosted.org/packages/98/5d/5738903efe0ecb73e51eb44feafba32bdba2081263d40c5043568ff60faf/numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Using cached numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.6 kB)
Collecting lxml (from obspy)
  Obtaining dependency information for lxml from https://files.pythonhosted.org/packages/44/1b/0771c38e65ad23e25368b5e07c920054774b8d12477a4fad116bf500de73/lxml-4.9.3-cp38-cp38-manylinux_2_28_x86_64.whl.metadata
  Using cached lxml-4.9.3-cp38-cp38-manylinux_2_28_x86_64.whl.metadata (3.8 kB)
Collecting sqlalchemy (from obspy)
  Obtaining dependency information for sqlalchemy from https://files.pythonhosted.org/packages/f2/40/5c63e612ff70247ece24dc6de14426fd4276ff39e835df833b3

In [16]:
from obspy.core import read
stream = read('https://examples.obspy.org/RJOB20090824.ehz')
stream.write('outfile.ascii', format='SLIST')

In [17]:
segy_df = pd.read_csv('outfile.ascii', sep='\t', skiprows=0)

In [18]:
segy_df

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,"TIMESERIES BW_RJOB__EHZ_D, 6001 samples, 200 sps, 2009-08-24T00:20:03.000000, SLIST, INTEGER,"
288,300.0,292.0,285.0,265.0,287.0
279,250.0,278.0,278.0,268.0,258.0
250,256.0,232.0,232.0,255.0,261.0
250,220.0,227.0,252.0,230.0,209.0
224,229.0,218.0,209.0,208.0,214.0
...,...,...,...,...,...
482,461.0,429.0,450.0,469.0,425.0
423,455.0,455.0,434.0,429.0,457.0
449,432.0,441.0,445.0,432.0,425.0
400,397.0,471.0,426.0,390.0,450.0


# Questions

Using a combination of code and text boxes, please answer the following questions:

#### 0. segy_df can be loaded as a dataframe, but the header is still a bit odd. Improve the loading of the seismic ascii file

#### 1. Load in a data file from your research or independent project? Does it require a package (like lasio or obspy)? After you have loaded it in, can you make it into a pandas dataframe easily? If not, why?

#### 2. Do you like using packages to load in data, or would you prefer something else?

#### 3. Does using python change how you want to create and store data?

#### 4. How is data storage handled in your research group or job? Could it be done better (in the context of python)?

#### 5. Out of loading the two USGS river data files, which made more sense?

#### 6. Bonus. Is there a geo-data format that you wanted to load into python, but could not?