# Module 3 | Loading in Geoscientific data

Hello! In this module we will load in some geo-whatever data into python for future use.

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

## Reading in a .las file

.las files are common for downhole geophysical measurements (gamma ray, deep resistivity). They are commonly used in the oil and gas industry, but also used in enviromental and mining industries. To open up this file type, we will install a package called lasio, that will handle I/O.

In [None]:
import lasio

In [None]:
lasfile = '../1_data/561689E.las'

In [None]:
las = lasio.read(lasfile)

In one line, we can convert the las object to a dataframe!

In [None]:
df = las.df()

In [None]:
df

Once the well data is loaded into a data frame, you can export it as a .csv, a JSON, or other pythontic functions. If your interested in using well logs further, check out the turtorial on the github.

## Reading in USGS River Data

USGS has river data from everywhere, we will load in some USGS data from Western Colorado. There are two files, discharge and temperature. Link to the data is [here](https://waterdata.usgs.gov/co/nwis/inventory/?site_no=09070500). The file has a header that is commented with a # infront of it. There are two ways we can handle this. The other small change I did was to make these files a .txt file.

#### USGS Temperature Data

In [None]:
rowskip = np.arange(0,35)
rowskip = rowskip.tolist()
rowskip.append(36)
# rowskip # uncomment if you want to qc the output

In [None]:
folder = '../1_data/'
usgs_temp = 'monthly_temp.txt'

In [None]:
df_temp = pd.read_csv(folder + usgs_temp, skiprows=rowskip, sep = "\t")

In [None]:
df_temp.head()

#### USGS Discharge data

Let's load in the discharge data

In [None]:
usgs_discharge = '../1_data/monthly_distcharge.txt'

In [None]:
df_dist = pd.read_csv(usgs_discharge, sep='\s+', comment="#", header=0, skiprows=[1])

In [None]:
df_dist.head()

# GeoTif Read in 

This GeoTif file is from [here](https://www.sciencebase.gov/catalog/item/53f5a87ae4b09d12e0e8547b) provided by USGS.

When working with GeoTIFFs in the Python ecosystem, particularly using xarray and rioxarray, you gain powerful capabilities for handling geospatial data. These libraries typically load GeoTIFFs as DataArrays with dimensions like (band, y, x), where each band represents a distinct data layer such as spectral bands, time series, or different data types. Rioxarray extends xarray's functionality by preserving crucial geospatial metadata, including coordinate reference systems (CRS) and geotransforms, which are essential for accurate spatial analysis and visualization. This integration with Python's broader scientific computing ecosystem (numpy, scipy, scikit-image) enables efficient processing of the numerical data within GeoTIFFs, facilitating operations like band math, filtering, and image classification. Moreover, xarray's labeled dimensions and coordinates allow for easy slicing, aggregation, and analysis of GeoTIFF data based on spatial or temporal dimensions, making it particularly powerful for applications in Earth observation and climate studies.

#### Work through the xarray tutorial in notebook 3_2

In [2]:
import rioxarray

# Replace this with the actual path to your file
file_path = "../1_data/FAA_UTM18N_NAD83.tif"

# Open the file
data = rioxarray.open_rasterio(file_path)

# Now you can work with the data
data

In [None]:
# Ensure data is in the correct shape for imshow
rgb_data = np.transpose(data.values, (1, 2, 0))

# Normalize the data to 0-1 range for each band
rgb_data = rgb_data.astype(np.float32)
for i in range(3):
    band = rgb_data[:,:,i]
    min_val, max_val = np.nanmin(band), np.nanmax(band)
    rgb_data[:,:,i] = (band - min_val) / (max_val - min_val)

# Clip values to 0-1 range
rgb_data = np.clip(rgb_data, 0, 1)

# Create the plot
plt.figure(figsize=(10, 8))
plt.imshow(rgb_data)
plt.title("3-Band GeoTIFF Visualization")
plt.axis('off')  # Turn off axis
plt.show()

# Questions

Using a combination of code and text boxes, please answer the following questions:

#### 0. When would you use xarray (or rioxarray) compared to pandas?

#### 1. Load in a data file from your research or independent project? Does it require a package (like lasio or obspy)? After you have loaded it in, can you make it into a pandas dataframe easily? If not, why?

#### 2. Do you like using packages to load in data, or would you prefer something else?

#### 3. Does using python change how you want to create and store data?

#### 4. How is data storage handled in your research group or job? Could it be done better (in the context of python)?

#### 5. Out of loading the two USGS river data files, which made more sense?

#### 6. Bonus. Is there a geo-data format that you wanted to load into python, but could not?