# Notebook to test the layouts for each section
This notebook opens and reads teh dbf900.ebc.gz file then starts to parse the data from each section <br><br>
The decode, parse, and layout calls have been added to ensure this notebook is pulling from the same definitions and processes that are being used in the main program. (Minimizing all the places the definitions need to be updated.)

## Import, read, and convert file to array of strings

In [1]:
import pandas as pd
import codecs

##Import section to ensure the main directory is in the path
import sys
import os
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

## Importing from the main directory now
from dbf900_main import decode_file, parse_record
from dbf900_layouts import dbf900_layout

### Import file location, block size and file decoding script called

#### from dbf900_main import decode_file
opens the file, decodes, and splits the records into an array based on the block_size length<br><br>
.py file with decode_file def: https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_main.py

In [2]:
file = r'C:\PublicData\Texas\TXRRC\index\dbf900.ebc' ##Local storage location
##file origin: ftp://ftpe.rrc.texas.gov/shfwba/dbf900.ebc.gz

block_size  = 247 ##block size for each record in the file
##Unknown if this holds true for all versions of this file or for other files on TXRRC

##file and block size sent to decode and return record array    
split_records = decode_file(file,block_size)
##the records in this array have a leading two character code to
##know how it should be split apart and treated.
print('array ready for review and parsing.')

opening C:\PublicData\Texas\TXRRC\index\dbf900.ebc
decoding...
separating records...
returning records...
array ready for review and parsing.


In [3]:
split_records[:50]

['0100100001010106001 19631027000000000000000000000000000 000000000010000100000000000000  NNN00000000Y0 13A19930900000019931022199602L0040230000000199801000000Y 00000000                                                                                ',
 '02O0604411   1  00      000000000000000000000                                          0000000000000000000000000000000000N0000000     00000          0000  NN00000000                                                                                  ',
 '038015988719840112        NNNNN1963120519631027000000000000000000      00000        N0000  00000000                                                                                                                                                    ',
 '0600100000000000000000000000000000S000000        0000000000000000000000000000000000000000000000000000000                                                                                                                                          

### from dbf900_layouts import dbf900_layout
dbf900_layouts.py may require changes to the definitions to ensure correct formatting for each field
(e.g. original definition has API as numeric, but needs to be string to preserve leading zeros)<br>
Definitions from https://www.rrc.texas.gov/media/41906/wba091_well-bore-database.pdf <br><br>
.py file can be found here: https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_layouts.py

### from dbf900_main import parse_record
This calls the formatting calls from dbf900_formats.py using:<br> "from dbf900_formats import pic_yyyymmdd, pic_yyyymm, pic_latlong, pic_coord, pic_numeric, pic_any" <br><br>
.py file with parse_record def: https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_main.py <br><br>
.py file with format defs: https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_formats.py

## Working section to play with read data from each record

Currently (2020-07-30) working through the formatting coming from the definitions. ensuring that the date are formatted correctly along with the alignment of the fields. 

In [5]:
API = None
api_check = None
ct = 0
check = 0

sample_records = split_records#[35000:75000] ## Used for testing to reduce number of records to run

##The following loops are for testing different things and exploring the formatting.
##Everything past this point is subject to change.
while check <10:
#while api_check == API: ##Set to pull all records associated with the first API
    record = sample_records[ct]
    
    if not API:
        api_check = '42'+record[2:10]
    
    if record.startswith('01'):
        API = '42'+record[2:10]

    startval = str(record[0:2])
    layout = dbf900_layout(startval)['layout']
    parsed_vals = parse_record(record, layout)



    if startval =='01': ##currently reviewing results vs. original record. Use 01 through 28 to check results.
        check+=1
        print(ct, API, parsed_vals)
        print(record)
        print('--------------------------------------')
    ct+=1

0 4200100001 {'RRC-TAPE-RECORD-ID': '01', 'WELL-BORE-API-ROOT': '00100001', 'WB-NXT-AVAIL-SUFFIX': '01', 'WB-NXT-AVAIL-HOLE-CHGE-NBR': '01', 'WB-FIELD-DISTRICT': '06', 'WB-RES-CNTY-CODE': '001', 'WB-ORIG-COMPL-DATE': '10/27/1963', 'WB-TOTAL-DEPTH': 0, 'WB-VALID-FLUID-LEVEL': 0, 'WB-CERTIFICATION-REVOKED-DATE': None, 'WB-CERTIFICATION-DENIAL-DATE': None, 'WB-DENIAL-REASON-FLAG': '0', 'WB-ERROR-API-ASSIGN-CODE': '', 'WB-REFER-CORRECT-API-NBR': '00000000', 'WB-DUMMY-API-NUMBER': '00100001', 'WB-DATE-DUMMY-REPLACED': None, 'WB-NEWEST-DRL-PMT-NBR': 0, 'WB-CANCEL-EXPIRE-CODE': '', 'WB-EXCEPT-13-A': 'N', 'WB-FRESH-WATER-FLAG': 'N', 'WB-PLUG-FLAG': 'N', 'WB-PREVIOUS-API-NBR': '00000000', 'WB-COMPLETION-DATA-IND': 'Y', 'WB-HIST-DATE-SOURCE-FLAG': 0, 'WB-EX14B2-COUNT': 13, 'WB-DESIGNATION-HB-1975-FLAG': 'A', 'WB-DESIGNATION-EFFECTIVE-DATE': '09/01/1993', 'WB-DESIGNATION-REVISED-DATE': None, 'WB-DESIGNATION-LETTER-DATE': '10/22/1993', 'WB-CERTIFICATION-EFFECT-DATE': '02/01/1996', 'WB-WATER-LAND-C