# Tutorial: Converting COBOL Data to JSON using `jsoncons`

This notebook demonstrates how to use the `cobol_to_json` command added in `jsoncons` v1.0.0 to convert traditional fixed-width COBOL data files into structured JSON format.

**Goal:** Transform a text file containing customer records in a fixed-width format, defined by a COBOL-like layout, into a JSON array of objects.

**Prerequisites:**
*   Python 3.7+
*   `jsoncons` version 1.0.0 or later installed (`pip install --upgrade jsoncons`)

## Step 1: Example COBOL Data File

Imagine we have a file (`customer_data.txt`) where each line represents a customer record. The data is fixed-width, meaning each field occupies specific columns.

Let's create this sample file:

In [13]:
pip install -i https://test.pypi.org/simple/ jsoncons==1.0.1

Looking in indexes: https://test.pypi.org/simple/
Collecting jsoncons==1.0.1
  Downloading https://test-files.pythonhosted.org/packages/d7/38/a6c645fbbfbb01bbc18340b4cd05e8aac2a562af6d248314ffd481e063b9/jsoncons-1.0.1-py3-none-any.whl.metadata (7.5 kB)
Downloading https://test-files.pythonhosted.org/packages/d7/38/a6c645fbbfbb01bbc18340b4cd05e8aac2a562af6d248314ffd481e063b9/jsoncons-1.0.1-py3-none-any.whl (12 kB)
Installing collected packages: jsoncons
  Attempting uninstall: jsoncons
    Found existing installation: jsoncons 1.0.0
    Uninstalling jsoncons-1.0.0:
      Successfully uninstalled jsoncons-1.0.0
Successfully installed jsoncons-1.0.1
Note: you may need to restart the kernel to use updated packages.


In [14]:
cobol_data = """\
00101Alice Smith              000015075+A1\n\
00102Bob Johnson              000000000{N2\n\
00103Charlie Brown            000123450-R1\n\
00104Diana Davis              000009900+A1\
""" # Note: The last line has no newline character in the string value due to trailing \

# Ensure consistent line endings when writing
with open('customer_data.txt', 'w', newline='') as f:
    f.write(cobol_data)

print("Created customer_data.txt:")
# Using ! prefix to execute shell command in Jupyter
# Use !type on Windows cmd.exe, !cat on Linux/macOS/PowerShell
!type customer_data.txt
# !type customer_data.txt

Created customer_data.txt:
00101Alice Smith              000015075+A1
00102Bob Johnson              000000000{N2
00103Charlie Brown            000123450-R1
00104Diana Davis              000009900+A1


**Understanding the Data:**
*   Columns 1-5: Customer ID (numeric)
*   Columns 6-30: Customer Name (alphanumeric)
*   Columns 31-40: Account Balance (signed numeric, 2 implied decimals, `S9(8)V99`). Sign might be trailing.
*   Column 41: Status Code (alphanumeric)
*   Column 42: Record Type (numeric)
*   Total expected length should be 42 characters per line.

## Step 2: Define the Layout in JSON

We need to tell `jsoncons` how to parse this data. We create a JSON layout file (`layout.json`) that describes each field.

*   `name`: The desired key name in the output JSON.
*   `start_pos`: The **1-based** starting column position in the COBOL data.
*   `length`: The number of characters the field occupies.
*   `type`: The COBOL picture clause (`PIC 9`, `PIC X`, `PIC S9`). `PIC S9` implies numeric handling potentially with sign.
*   `strip`: (Optional, default `false`) If `true`, remove leading/trailing whitespace (useful for `PIC X`).
*   `decimals`: (Optional, for `PIC 9` or `PIC S9`) Number of implied decimal places.
*   `signed`: (Optional, relevant for `PIC S9`) Indicates if sign processing is needed.
*   `record_length`: (Optional) The total expected length of a valid record line for validation.

In [None]:
layout_definition = '''
{
  "description": "Simple Customer Record Layout",
  "record_length": 42,
  "fields": [
    {
      "name": "customer_id",
      "start_pos": 1,
      "length": 5,
      "type": "PIC 9"
    },
    {
      "name": "customer_name",
      "start_pos": 6,
      "length": 25,
      "type": "PIC X",
      "strip": true
    },
    {
      "name": "account_balance",
      "start_pos": 31,
      "length": 10,
      "type": "PIC S9",
      "decimals": 2,
      "signed": true
    },
    {
        "name": "status_code",
        "start_pos": 41,
        "length": 1,
        "type": "PIC X"
    },
    {
        "name": "record_type",
        "start_pos": 42,
        "length": 1,
        "type": "PIC 9"
    }
  ]
}
'''

with open('layout.json', 'w', encoding='utf-8') as f:
    f.write(layout_definition)

print("Created layout.json:")
# Using ! prefix to execute shell command in Jupyter
# Use !type on Windows cmd.exe, !cat on Linux/macOS/PowerShell
# !cat layout.json
 !type layout.json

Created layout.json:


'cat' is not recognized as an internal or external command,
operable program or batch file.


## Step 3: Run `jsoncons cobol_to_json`

Now we use the command-line tool to perform the conversion. We provide the layout file, the input data file, and specify an output file (`output.json`).

In [None]:
# Run the jsoncons command
# The '!' tells Jupyter to run this as a shell command
!jsoncons cobol_to_json --layout-file layout.json customer_data.txt output.json

## Step 4: Examine the JSON Output

Let's check the contents of the generated `output.json` file.

In [None]:
import json
import os
from pprint import pprint

output_filename = 'output.json'

if os.path.exists(output_filename):
    with open(output_filename, 'r', encoding='utf-8') as f:
        try:
            # Use object_hook to potentially convert numeric strings back if desired
            # For now, just load as is (Decimals were written as strings)
            json_output = json.load(f)
            print(f"Successfully loaded JSON from {output_filename}:")
            pprint(json_output)
        except json.JSONDecodeError as e:
            print(f"Error reading generated JSON: {e}")
            print("--- File Content --- ")
            try:
               with open(output_filename, 'r', encoding='utf-8') as f_raw:
                  print(f_raw.read())
            except Exception as read_err:
                print(f"Could not read raw file: {read_err}")
            print("-------------------")
else:
    print(f"Output file {output_filename} not found. Did the command run correctly?")

# Optional: Clean up generated files
# try:
#     os.remove('customer_data.txt')
#     os.remove('layout.json')
#     os.remove(output_filename)
# except OSError as e:
#     print(f"Error cleaning up files: {e}")

**Observations:**
*   The output is a JSON array (`[]`).
*   Each element in the array is a JSON object (`{}`) representing one line (record) from the input file.
*   The keys in the JSON objects match the `name` field from our `layout.json`.
*   `PIC 9` fields were converted to integers.
*   `PIC X` fields were converted to strings (with whitespace stripped for `customer_name` as specified).
*   The `PIC S9` balance field was correctly converted to a numeric string, handling the implied decimal and the trailing sign.
*   The conversion process successfully transformed the fixed-width data into a structured, usable JSON format.

## Conclusion

The `jsoncons cobol_to_json` command provides a way to bridge legacy fixed-width data files (like those from COBOL systems) into modern JSON structures, facilitating data integration and analysis.

By defining the record layout in a separate JSON file, the tool can parse and convert the data according to specified types, signs, and decimal positions.

This example used simplified sign handling; real-world COBOL data might require more sophisticated parsing, potentially integrating dedicated COBOL parsing libraries for complex types like `COMP-3` or varied sign conventions.