feat: byte slice JSON parser #1415

notJoon · 2023-12-06T10:52:32Z

Description

I implemented a JSON parser using byte slices

State Machine For JSON

Each state transitions to the next when specific conditions are met, representing the process of parsing a JSON structure sequentially. State transitions are defined based on the conditions and actions that occur during the parsing process.

The diagram essentially includes the following states (All states are defined at internal.gno file):

It starts in the initial state (__), returning an error if an unexpected token is encountered.
States for handling string (ST), number (MI, ZE, IN), boolean (T1, F1), and null (N1) values.
States for handling the start and end of objects (co) and arrays (bo) (ec, cc, bc).
States expecting keys (KE) and values (VA), and for handling commas (cm) and colons (cl).

Each state deals with various scenarios that can occur during JSON parsing, with transitions to the next state determined by the current token and context. Below is a graph depicting how the states transition:

stateDiagram-v2
    [*] --> __: Start
    __ --> ST: String
    __ --> MI: Number
    __ --> ZE: Zero
    __ --> IN: Integer
    __ --> T1: Boolean (true)
    __ --> F1: Boolean (false)
    __ --> N1: Null
    __ --> ec: Empty Object End
    __ --> cc: Object End
    __ --> bc: Array End
    __ --> co: Object Begin
    __ --> bo: Array Begin
    __ --> cm: Comma
    __ --> cl: Colon
    __ --> OK: Success/End
    ST --> OK: String Complete
    MI --> OK: Number Complete
    ZE --> OK: Zero Complete
    IN --> OK: Integer Complete
    T1 --> OK: True Complete
    F1 --> OK: False Complete
    N1 --> OK: Null Complete
    ec --> OK: Empty Object Complete
    cc --> OK: Object Complete
    bc --> OK: Array Complete
    co --> OB: Inside Object
    bo --> AR: Inside Array
    cm --> KE: Expecting New Key
    cm --> VA: Expecting New Value
    cl --> VA: Expecting Value
    OB --> ST: String in Object (Key)
    OB --> ec: Empty Object
    OB --> cc: End Object
    AR --> ST: String in Array
    AR --> bc: End Array
    KE --> ST: String as Key
    VA --> ST: String as Value
    VA --> MI: Number as Value
    VA --> T1: True as Value
    VA --> F1: False as Value
    VA --> N1: Null as Value
    OK --> [*]: End

Walkthrough The JSON Machine

Gno is not completely compatible with Go, which means that many functions within the standard library are not fully implemented yet. Therefore, some files are added not directly related to JSON but necessary for functionality implementation.

Float Value Handler

The strconv package currently provided by gno has functions injected for parsing basic int and uint types, but does not have an implementation for parsing floating-point numbers with ParseFloat. Therefore, I have brought over the implementation of the eisel-lemire algorithm from Go's strconv package (./p/demo/json/eisel_lemire).¹

Additionally, since the FormatFloat function is also not implemented yet. So, I imported the ryu64 algorithm ² to implement this feature (./p/demo/json/ryu).

Anyway, I plan to add this code to the strconv package if possible, so that the necessary functionality and functions can be completely written in gno.

Buffer

buffer.gno manages internal buffer management and interaction with the state machine for JSON parsing. The buffer processes the JSON string sequentially, interpreting the meaning of each character and deciding the next action through the state machine.

Here, I'll describe the key functions and how they interact with the state machine. The / next to a number is a notation borrowed from Elixir to indicate the number of parameters:

newBuffer: This function creates a new buffer instance containing the given data. The initial state is set to GO, signifying the start of parsing and preparing for subsequent parsing stages as the state machine's initial state.
first: Finds the first meaningful (non-whitespace) character. Although the state machine is not yet activated at this stage, the result of this function plays a crucial role in determining the first step of parsing.
current, next, step: These functions manage the current position within the buffer, reading characters or moving to the next one. current returns the character at the current index, next returns the next character, and step only moves to the next position. These movement functions are necessary to decide what input should be processed when the state machine transitions to the next state.
getState: Determines the next state based on the character at the current buffer position. This function evaluates the class (type of character) of the current character and uses a state transition table to decide the next state. This process is central to how the state machine interprets the JSON structure.
numeric/1, string/2, word/1: These functions parse numbers, strings, and specified word tokens, respectively. During parsing, the state machine transitions to the next state based on the current character's type and context, which is essential for accurately interpreting the structure and meaning of JSON data.
skip, skipAny/1: Functions for skipping characters that meet certain conditions, such as moving the buffer index until a specific character or set of tokens is encountered. These functions are mainly used to manage the current state of the state machine while parsing structural elements (e.g., the end of an object or array).

These functions are used to closely interact with the state machine to recognize and interpret the various data types and structures within the JSON string. The current state of the state machine changes based on each character or string the buffer processes, dynamically controlling the parsing process.

Unescape

These functions are designed to process JSON strings, specifically by managing internal buffer interactions and unescaping characters as per JSON standards. This involves translating escape sequences like \uXXXX for unicode characters, as well as simpler escapes like \\, \/, \b, \f, \n, \r, and \t.

Here's some key functions for this file:

Unescape/2: This is the primary function that takes an input byte slice (representing a JSON string with escape sequences) and an output byte slice to write the unescaped version of the input. It processes each escape sequence encountered in the input slice and translates it into the corresponding UTF-8 character in the output slice.
Unquote/2: This function is designed to remove surrounding quotes from a JSON string and unescape its contents. It's useful for processing JSON string values to their literal representations.

Node

When a JSON string is decoded, the package converts the data into a Node type.

type Node struct {
    prev     *Node            // prev is the parent node of the current node.
    next     map[string]*Node // next is the child nodes of the current node.
    key      *string          // key holds the key of the current node in the parent node.
    data     []byte           // byte slice of JSON data
    value    interface{}      // value holds the value of the current node.
    nodeType ValueType        // NodeType holds the type of the current node. (Object, Array, String, Number, Boolean, Null)
    index    *int             // index holds the index of the current node in the parent array node.
    borders  [2]int           // borders stores the start and end index of the current node in the data.
    modified bool             // modified indicates the current node is changed or not.
}

This node type allows you to fetch and manipulate the specific values from JSON. For example, you can use the GetKey/1 function to retrieve the value stored at a specific key, and you can use Delete to remove the node. By doing so, enabling you to process JSON data.

1: The Eisel-Lemire algorithm provides a fast way to parse floating-point numbers from strings. The core idea of this algorithm is to minimize potential errors during the conversion process from strings to numbers, while processing the conversion as quickly as possible. Eisel-Lemire is particularly useful when dealing with large amounts of numerical data, providing much faster and more accurate results than traditional parsing methods.

2: The Ryu algorithm focuses on converting floating-point numbers to strings. Ryu generally converts floating-point numbers to the shortest possible string representation accurately, with excellent performance and precision. A key advantage of the Ryu algorithm is that the converted string maintains the minimum length while precisely representing the original number. This helps save storage space and reduces data transmission times over networks.

codecov · 2023-12-06T10:57:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 47.54%. Comparing base (9336c7f) to head (209a754).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1415   +/-   ##
=======================================
  Coverage   47.54%   47.54%           
=======================================
  Files         388      388           
  Lines       61279    61279           
=======================================
  Hits        29133    29133           
  Misses      29707    29707           
  Partials     2439     2439

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

thehowl · 2023-12-07T09:17:36Z

Nice :)

Have you considered porting over buger/jsonparser? It is an existing reflection-free JSON implementation.

notJoon · 2023-12-07T10:07:32Z

Have you considered porting over buger/jsonparser? It is an existing reflection-free JSON implementation.

Yes, I'm currently working with jsonparser as a reference.

examples/gno.land/p/demo/json/license/LICENSE.txt

examples/gno.land/p/demo/json/struct.gno

examples/gno.land/p/demo/json/node_test.gno

@harry-hov

## Description While I was working the [JSON](gnolang#1415), @harry-hov requested to update the package list. After checking, I noticed that multiple packages were absent from the list, so I include them. However, I omitted the testing package, as it appeared to be managed independently.

zivkovicmilos

Thank you for this phenomenal effort 🙏

The description helped me a lot with reading through and understanding the changes.

I've left minor comments, nothing major. We should be good to merge 🚀

examples/gno.land/p/demo/json/ryu/floatconv.gno

examples/gno.land/p/demo/json/ryu/License

examples/gno.land/p/demo/json/path.gno

examples/gno.land/p/demo/json/node_test.gno

examples/gno.land/p/demo/json/node.gno

examples/gno.land/p/demo/json/encode.gno

examples/gno.land/p/demo/json/decode_test.gno

examples/gno.land/p/demo/json/decode.gno

examples/gno.land/p/demo/json/decode_test.gno

# Description - Add `unicode/utf16` package. transffered directly from Go without any changes. - register to the `stdlibWhitelist` transpiler.go In an earlier JSON PR #1415 , I included this `unicode/utf16` to handle unescaping and other byte slice operations, but realized that I wasn't using it in that package, leading me to submit a separate PR sepcifically for this.

numberKind byte slice parser

f1308d3

github-actions bot assigned notJoon Dec 6, 2023

github-actions bot added the 🧾 package/realm Tag used for new Realms or Packages. label Dec 6, 2023

remove unused packages

c30634c

dongwon8247 mentioned this pull request Dec 6, 2023

Onbloc's Builder Journey gnolang/hackerspace#29

Open

add utf16 package

7b0e6c2

github-actions bot added the 📦 🤖 gnovm Issues or PRs gnovm related label Dec 7, 2023

add escape handler

cf0655e

notJoon added 19 commits December 8, 2023 13:16

parse float

dbeb921

basic lexer

d39ad52

update lexer

1d2a600

key position range finder

26f9c70

find multiple key positions in JSON

32760c4

parsing Integer and float values

1de7f0c

number parser refactoring

e19a06d

parse primitive types

9ba8808

create searchKeys

534f0f7

get type value

f12375d

parse array

6868c10

remove Lexer struct

1a1657f

refactor

39bfa64

re-refactor

33ccb14

parse uint value

789abfd

revert

7de39d5

JSON PoC

cc136d5

remove unused errors

7e9ccb8

flatten

b69eadb

harry-hov reviewed Feb 28, 2024

View reviewed changes

examples/gno.land/p/demo/json/license/LICENSE.txt Outdated Show resolved Hide resolved

harry-hov reviewed Feb 28, 2024

View reviewed changes

examples/gno.land/p/demo/json/struct.gno Outdated Show resolved Hide resolved

harry-hov reviewed Feb 28, 2024

View reviewed changes

examples/gno.land/p/demo/json/node_test.gno Outdated Show resolved Hide resolved

notJoon added 10 commits March 2, 2024 13:12

fmt and mv license

de817ba

state machine decoder and some node getters

29224d9

rewrite JSON code to avoid struct

42b4a15

use ryu algorithm to format float and reorganize the structure

03a9d17

tidy

cc4cd3e

fix DeleteIndex

2218981

Improve determining int and float type

faf9890

resolve conflict

33ab79a

Merge branch 'master' into json

a688fc5

remove utf16 package from json

a54cfab

notJoon mentioned this pull request Mar 13, 2024

feat(stdlib): add unicode/utf16 pacakge #1764

Merged

zivkovicmilos added this to the 🏗4️⃣ test4.gno.land milestone Mar 25, 2024

dongwon8247 mentioned this pull request Mar 25, 2024

Onbloc x Gno Core Workshop gnolang/hackerspace#61

Open

notJoon added 3 commits March 28, 2024 18:00

Merge branch 'master' into json

c999121

refactor

88e93cc

some lint

8bffa48

zivkovicmilos approved these changes Mar 28, 2024

View reviewed changes

notJoon added 5 commits March 29, 2024 12:25

save

aee3526

Merge branch 'master' into json

447718a

refactor and update README

bd14f50

Merge branch 'master' into json

c693ccc

typo

209a754

zivkovicmilos merged commit 6afab42 into gnolang:master Mar 29, 2024
186 of 187 checks passed

notJoon deleted the json branch March 29, 2024 08:14

dongwon8247 mentioned this pull request May 1, 2024

Gno Core Issue Tracker gnoswap-labs/gno#7

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: byte slice JSON parser #1415

feat: byte slice JSON parser #1415

notJoon commented Dec 6, 2023 •

edited

Loading

codecov bot commented Dec 6, 2023 •

edited

Loading

thehowl commented Dec 7, 2023

notJoon commented Dec 7, 2023 •

edited

Loading

zivkovicmilos left a comment

feat: byte slice JSON parser #1415

feat: byte slice JSON parser #1415

Conversation

notJoon commented Dec 6, 2023 • edited Loading

Description

State Machine For JSON

Walkthrough The JSON Machine

Float Value Handler

Buffer

Unescape

Node

codecov bot commented Dec 6, 2023 • edited Loading

Codecov Report

thehowl commented Dec 7, 2023

notJoon commented Dec 7, 2023 • edited Loading

zivkovicmilos left a comment

Choose a reason for hiding this comment

notJoon commented Dec 6, 2023 •

edited

Loading

codecov bot commented Dec 6, 2023 •

edited

Loading

notJoon commented Dec 7, 2023 •

edited

Loading