Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create script that tests the definitions for the files and reads the data to something that can be loaded to SQL. #6

Closed
mlbelobraydi opened this issue Jul 29, 2020 · 7 comments · Fixed by #21
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@mlbelobraydi
Copy link
Owner

Since the definitions have been complete, now a notebook needs to be created to test the process to turn the .ebc file to usable data that can be formatted to JSON or SQL tables. This task is to create a prototype of that process in a notebook.

@mlbelobraydi mlbelobraydi created this issue from a note in TXRRC Mainframe Files (In Progress:) Jul 29, 2020
@mlbelobraydi mlbelobraydi added this to the reading data accurately milestone Jul 29, 2020
@mlbelobraydi mlbelobraydi self-assigned this Jul 29, 2020
@mlbelobraydi mlbelobraydi added good first issue Good for newcomers help wanted Extra attention is needed labels Jul 29, 2020
@mlbelobraydi
Copy link
Owner Author

Good start on the formatting. Using the definitions of each section to create the correctly formatted dictionary results. Everything captured in the following notebook.
https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/Notebooks/Testing%20data%20definitions.ipynb

@mlbelobraydi
Copy link
Owner Author

QAQC of the formatting definitions in progress. Need to ensure the fields are being split and formatted correctly before moving on to dataframes and SQL.

@mlbelobraydi
Copy link
Owner Author

mlbelobraydi commented Aug 5, 2020

Created .py files that work in conjunction to be able to test the formatting. https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/WorkingFileForTesting.py

Same data parsing can be found in the notebook: https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/Notebooks/Testing%20data%20definitions.ipynb

@mlbelobraydi mlbelobraydi changed the title Create notebook that tests the definitions for the files and reads the data to something that can be loaded to SQL. Create script that tests the definitions for the files and reads the data to something that can be loaded to SQL. Aug 6, 2020
@mlbelobraydi
Copy link
Owner Author

Starting to map out the dependencies of unique keys in the different sections. Tracking most changes in https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_layouts.py and will need to move the results the the .txt in the definition file and update the definitions in the jupyter notebook.

@mlbelobraydi
Copy link
Owner Author

Sections 1, 4, 5, 7, 12, 13, 23, 24, 25, 26, 27 have passed QAQC. Sections like 24 will need to be formatted into json and added to previous record in 23. Section 22 has a known byte error.

Section 2, 3, 6, 8, 9, 10, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 28 need QAQC

Sections 2, 14, 21, 22 will need an additional subroutine to decode the 'WB-OIL-GAS-INFO' field into the appropriate oil or gas components

@mlbelobraydi
Copy link
Owner Author

This is still ongoing, but the bytes rewrite is taking priority to capture the full decimal digits for lat-long and coordinates in section 13.

@mlbelobraydi
Copy link
Owner Author

https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/WorkingFileForTesting.py now working with:
https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_main_bytes.py
https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_layouts_bytes.py
https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_formats_bytes.py

WorkingFileForTesting.py also captures the unique keys and is placing the values in the appropriate dataframes

All sections are being read into dataframes and output to csv files. Format testing is still ongoing so QAQC of the layouts and formats needs to be completed.

@mlbelobraydi mlbelobraydi linked a pull request Oct 1, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
Development

Successfully merging a pull request may close this issue.

1 participant