# Checking the data
## List raw data contents

In [None]:
import zipfile

In [None]:
data = zipfile.ZipFile(r'../data/raw/1113_XYZ.zip', 'r')

In [None]:
data.printdir()

We're interested in XYZ/1113_MagLine.XYZ and XYZ/1113_MagTie.XYZ files. These seem to be raw and processed aeromag data exported from two Geosoft's Oasis montaj database files.

If you can read brazilian portuguese, one interesting file to check is "1113 - Relatorio Final - Sudeste do Mato Grosso.pdf", since it's the final processing report.

## Checking the file headers
Now we're going to list both files header to get a feel of the file format.

In [None]:
n = 15 # number of lines to read

with data.open('XYZ/1113_MagLine.XYZ') as f:
    head = [next(f) for x in range(n)]

# Decode the bytes object into a string object
head = [line.decode("utf-8") for line in head]

print('\n'.join(head))

In [None]:
with data.open('XYZ/1113_MagTie.XYZ') as f:
    head = [next(f) for x in range(n)]

# Decode the bytes object into a string object
head = [line.decode("utf-8") for line in head]

print('\n'.join(head))

## Plotting flight lines and tie lines

Now the file structure is clear. Both files are divided in flight lines (Tie # and Line #), with every line containing the same fields described in the header.

These files are way too big to read directly with Pandas, because [pandas.read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) has a memory peak several times bigger than the file size. This is due to format checking etc.

So, first we'll write a small function read the data by lines, since numpy.genfromtxt has the [same problem](https://stackoverflow.com/questions/8956832/python-out-of-memory-on-large-csv-file-numpy) of pandas.read_csv. Next we'll plot a simplified version of the flight lines. This simplification is required since the lines have way too many points each.

In [None]:
import numpy as np
from shapely.geometry import LineString

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set_style('ticks')

In [None]:
def iter_func(data, filename, comment='/'):
    # dictionary with line_number: LineString
    flight_lines = {}
    
    with data.open(filename) as f:
        line_number = None
        x = []
        y = []
        for line in f:
            line = line.decode("utf-8")
            
            if line.startswith(comment):
                continue
            
            # Get the line number
            if 'Tie' in line or 'Line' in line:
                if line_number is not None:
                    # This happens when we enter a new line
                    # I'm using a tolerance of 100 m to simplify the line path
                    flight_lines[line_number] = LineString(np.vstack((x,y)).T).simplify(tolerance=100)
                    x = []
                    y = []
                    
                line_number = np.int(line.strip().split()[1])
                continue
                
            # Split the line using \s+ as the pattern and get only X and Y
            line = line.strip().split()[:2]
            x.append(np.float(line[0]))
            y.append(np.float(line[1]))
            
        return flight_lines

In [None]:
%%time
# These files are quite big, so they may take a couple minutes to read.

tie_lines =  iter_func(data, 'XYZ/1113_MagTie.XYZ')
flight_lines =  iter_func(data, 'XYZ/1113_MagLine.XYZ')

In [None]:
plt.figure(figsize=(10,8))
for l in tie_lines:
    x,y = tie_lines[l].xy
    plt.plot(x,y, 'k', lw=0.5)
    
for l in flight_lines:
    x,y = flight_lines[l].xy
    plt.plot(x,y, 'k', lw=0.5)

plt.axes().set_aspect('equal', 'datalim')
plt.title('Flight lines and Tie lines')
plt.xlabel('easting (m)')
plt.ylabel('northing (m)')
plt.tight_layout()

plt.savefig('../reports/lines_geometry.png', dpi=300)
plt.show()
plt.close();

Figure 1 - Flight and Tie lines for the aeromagnetometric survey. Projection UTM 22S/WGS 84 (EPSG:32722).

In [None]:
data.close()

## Writing a shapefile with the geometry

In [None]:
from shapely.geometry import mapping, shape
import fiona
from fiona.crs import from_epsg

In [None]:
schema = {
    'geometry': 'LineString', 
    'properties': {'Tie' : 'int'} # Tie line number
}

# Writing Tie lines shapefile
with fiona.open('../data/processed/tie_geometry.shp', 'w', crs=from_epsg(32722),
                driver='ESRI Shapefile', schema=schema) as output:
    for l in tie_lines:
        prop = {'Tie' : l}
        output.write({'geometry': mapping(tie_lines[l]), 'properties': prop})

In [None]:
schema = {
    'geometry': 'LineString', 
    'properties': {'Line' : 'int'} # Tie line number
}

# Writing Line lines shapefile, a.k.a aquisition lines
with fiona.open('../data/processed/line_geometry.shp', 'w', crs=from_epsg(32722),
                driver='ESRI Shapefile', schema=schema) as output:
    for l in flight_lines:
        prop = {'Line' : l}
        output.write({'geometry': mapping(flight_lines[l]), 'properties': prop})