<a href="https://colab.research.google.com/github/meirm7/Python_Project/blob/master/Python_Project_by_Meir_Moshkovitz.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Background:
The JSON data interchange format is getting more and more widely used today for data transfer via servers communication (e.g. in web services) , for database data storage communication etc.

The basic syntactic structure of JASON is the following: JSON has two basic data structures: object and arrays. an object is a set of name:value pairs and an array is an ordered list of values. In general, these objects and arrays may contain other objects and arrays; hence, general JSON data package has a tree-like structure.

according to JSON syntax rules (see JSON.org), white characters and particularly new line delimiters may be included in the the JSON formatted string, between certain elements (but not within value string literals), but they are ignored.
However, for various reasons, the line delimiters sometimes are not preseneted or even intentionally removed (cleaned) from the stream.
For example, because they, occasionally may disturb some parsers etc. especially when the streams are being processed automatically without human involvement.
In such cases they are also just redundant and reduce the efficiency of the payload data transfer. For the discussion, we will call JSON streams without new line delimiters: "flat JSON streams/strings".
(Remark: there is a different meaning to the term 'flat JASON' which pertains to the tree structure form of the stream; this meaning will not be used here.)
But ,of course, in order to to be  human understandable, a flat JSON stream should be processed by slicing and indenting it so that it would be presented in the it's well known *nested* form.

## Purpose

In light of all this, the purpose of this project is to get an input flat-JSON string, analyse it and implement two processing functions:

1) Present it in the human readable nested form, which clearly demonstrate it's underlying tree structure.

2) Extract a requested value, embedded in the string, by specifying the 'path' to it in the string's underlying JSON tree structue.

## Program organization

All this functionality will be embedded within a class called JSONAnalyzer, which will get the flat JSON string as it's input, and implement the following two class methods:

* print_nested()

* extract_value_by_path(requested path)


Following is the class implementation:

## Class Implementation:



In [0]:

"""
  =========================================================================
  Author:  Meir Moshkovitz
  
  Subject: Analyzer program of flat JSON formatted stream/file.
  Date:    December 2019
  Course:  Data Scince.
  
  * Submitted as the final Project in the Python part of the course.
  =========================================================================

  Disclaimer:
  This software was designed to be used only for private learning purposes.
  Any use or copying of this software or part of it, is not allowed.
  This software comes with no warranties of any kind whatsoever.
  =========================================================================
"""
        
class JSONAnalyzer:
    """Class or analyziong flat JSON formatted streams. """

    OBJ_BEGIN =       '{'
    OBJ_END =         '}'
    ARR_BEGIN =       '['
    ARR_END =         ']'
    NAME_SEPARATOR =  ':'
    VALUE_SEPARATOR = ','
    QUOTATION_MARKS = '"'
    
    def __init__(self, json_str):
        self.json_str = json_str

    def __str__(self):
        return self.json_str
    
    def indent(n):
        """Indents a single line to the requested position. """
        print(n*' ', end = '')


    def print_nested(self):
        """Prints the input flat JSON string in a human-readable indented form. """
        pStart = 0
        depth = 0
    
        for i in range(len(self.json_str)):
            if self.json_str[i] == JSONAnalyzer.OBJ_BEGIN:
                pStop = i+1
                JSONAnalyzer.indent(depth*4)
                depth += 1
            elif self.json_str[i] == JSONAnalyzer.OBJ_END:
                pStop = i
                JSONAnalyzer.indent(depth*4)
                depth -= 1
            elif self.json_str[i] == JSONAnalyzer.VALUE_SEPARATOR:
                pStop = i+1
                JSONAnalyzer.indent(depth*4)
            elif self.json_str[i] == JSONAnalyzer.ARR_BEGIN:
                pStop = i+1
                JSONAnalyzer.indent(depth*4)
                depth += 1
            elif self.json_str[i] == JSONAnalyzer.ARR_END:
                pStop = i
                JSONAnalyzer.indent(depth*4)
                depth -= 1        
            else:
                continue

            print(self.json_str[pStart:pStop])
            pStart = pStop

        print('}')

#=============================================================================================


    def clear_tags(tags):
        """Clears the auxiiary 'tags' list. """
        for i in range(len(tags)):
            tags[i] = ''


    def test_detected_tag(tags, depth, tag, value_tags_path):
        """Upon detection of a new name tag, checks whether the requested path location is met. """
        tags[depth] = tag

        #just to improve performance
        for i in range(1, len(value_tags_path)):
            if tags[i] == '':
                return False
    
        return (tags[:len(value_tags_path)] == value_tags_path) and (depth == len(value_tags_path) - 1 )

    
        
    def find_value_len(s):
        """Calculates the length of the found value string. """
        i = 0
        nested = True
    
        while s[i] == ' ':
            i += 1
        
        opener = s[i]
        if opener == JSONAnalyzer.QUOTATION_MARKS:
            closer = opener
        elif opener == JSONAnalyzer.OBJ_BEGIN:
            closer = JSONAnalyzer.OBJ_END
        elif opener == JSONAnalyzer.ARR_BEGIN:
            closer = JSONAnalyzer.ARR_END
        else:
            nested = False

        if(nested):    
            nesting = 1
            for i in range(i+1 , len(s)):
                if s[i] == closer:
                    nesting -= 1
                elif s[i] == opener:
                    nesting += 1

                if nesting == 0:
                    return i+1
        else: # not nested
            for i in range(i+1 , len(s)):
                if s[i] == JSONAnalyzer.VALUE_SEPARATOR or\
                   s[i] == JSONAnalyzer.ARR_END or\
                   s[i] == JSONAnalyzer.OBJ_END:
                    return i

        
    def extract_value_by_path(self, value_tags_path):
        """Extracts the requested value string from the JSON stream by it's tag names path in the JSON tree. """
        pStart = 0
        pStop = 0
        depth = 0
        tag_expected = True
        inside_tag = False
        keep_looping = True
        value_start = 0
        JSON_MAX_DEPTH = 20
        val_occurrences = 0
        extracted_values =[]

        value_tags_path = [''] + value_tags_path
        tags = ['' for i in range(JSON_MAX_DEPTH)]
  
        JSONAnalyzer.clear_tags(tags)
    
        for i in range(len(self.json_str)):
            if not keep_looping:
                break
        
            if self.json_str[i] == JSONAnalyzer.OBJ_BEGIN:
                pStop = i+1
                depth += 1
                tag_expected = True
            elif self.json_str[i] == JSONAnalyzer.OBJ_END:
                pStop = i
                tags[depth] = ''
                depth -= 1
            elif self.json_str[i] == JSONAnalyzer.VALUE_SEPARATOR:
                pStop = i+1
                tag_expected = True
            elif self.json_str[i] == JSONAnalyzer.QUOTATION_MARKS:
                if tag_expected:
                    tag_start = i+1
                    tag_expected = False
                    inside_tag = True
                elif inside_tag:
                    tag_end = i
                    tag_found = JSONAnalyzer.test_detected_tag(tags, depth, self.json_str[tag_start:tag_end], value_tags_path)
                    inside_tag = False
                continue
            elif self.json_str[i] == JSONAnalyzer.NAME_SEPARATOR:
                if tag_found:
                    value_start = i+1

                    vl = JSONAnalyzer.find_value_len(self.json_str[value_start:])
                
                    extracted_values.append(self.json_str[value_start:value_start+vl])
                    val_occurrences += 1
                
                    # keep_looping = False  #i.e. not greedy
                continue    
            continue

            pStart = pStop
        
        return val_occurrences, extracted_values



## Class invocation demonstration:

In order to demonstrate the usage and invocation of the class, we will use the following three example flat JSON packages (stored in files):

1) quiz.json - represents the details of simple school quiz

2) colors.json - represents the defining parameters of some colors

3) glossary.json - represents the attributes of a glossary

First, we'll import the files to colab:


In [0]:
import sys

if 'google.colab' in sys.modules:
    from google.colab import files
    uploaded = files.upload()


### **Example #1:   Analysis of *quiz.json* data package:**
First, we'll open the quiz.json file:




In [0]:
with open('quiz.json') as f:
    quiz_stream = f.read()

J = JSONAnalyzer(quiz_stream)



In [0]:
#J = JSONAnalyzer('{"quiz": {"sport": {"q1": {"question": "Which one is correct team name in NBA?","options":["New York Bulls","Los Angeles Kings","Golden State Warriros","Huston Rocket"],"answer": "Huston Rocket"}},"maths": {"q1": {"question": "5 + 7 = ?","options":["10","11","12","13"],"answer": "12"},"q2": {"question": "12 - 8 = ?","options":["1","2","3","4"],"answer": "4"}}}}')

( The very class object string representation have, naturally, set to the respective JSON stream itself; so, let's print it:)

In [0]:
print('Flat JSON stream:\n',J)

Now, let's see it in nested form:

In [0]:
print('The respective nested stream is:\n')
J.print_nested()

Now, let's extract various values from it:

In [0]:
value_occurrences, values_extracted = J.extract_value_by_path(['quiz', 'sport' ,'q1', 'options'])

print(f'found {value_occurrences} value occurrences.\n')
print('Found value(s):\n')
for value in values_extracted:
  print(value)

=================================================================================================================



### **Example #2:   Analysis of *colors.json* data package:**
First, we'll open the colors.json file:




In [0]:
with open('colors.json') as f:
    colors_stream = f.read()

J = JSONAnalyzer(colors_stream)



In [0]:
#J = JSONAnalyzer('{"colors": [{"color": "black","category": "hue","type": "primary","code": {"rgba": [255,255,255,1],"hex": "#000"}},{"color": "white","category": "value","code": {"rgba": [0,0,0,1],"hex": "#FFF"}},{"color": "red","category": "hue","type": "primary","code": {"rgba": [255,0,0,1],"hex": "#FF0"}},{"color": "blue","category": "hue","type": "primary","code": {"rgba": [0,0,255,1],"hex": "#00F"}},{"color": "yellow","category": "hue","type": "primary","code": {"rgba": [255,255,0,1],"hex": "#FF0"}},{"color": "green","category": "hue","type": "secondary","code": {"rgba": [0,255,0,1],"hex": "#0F0"}},]}')

( The very class object string representation have, naturally, been set to the respective JSON stream itself; so, let's print it: )

In [0]:
print('Flat JSON stream:\n',J)

Now, let's see it in nested form:

In [0]:
print('The respective nested stream is:\n')
J.print_nested()

Now, let's extract various values from it:

In [0]:
value_occurrences, values_extracted = J.extract_value_by_path(['colors', 'color'])

print(f'found {value_occurrences} value occurrences.\n')
print('Found value(s):\n')
for value in values_extracted:
  print(value)

=================================================================================================


### **Example #3:   Analysis of *glossary.json* data package:**




First, we'll open the glossary.json file:

In [0]:
with open('glossary.json') as f:
    quiz_stream = f.read()

J = JSONAnalyzer(quiz_stream)



In [0]:
#J = JSONAnalyzer('{"glossary": {"title": "example glossary","GlossDiv": {"title": "S","GlossList": {"GlossEntry": {"ID": "SGML","SortAs": "SGML","GlossTerm": "Standard Generalized Markup Language","Acronym": "SGML","Abbrev": "ISO 8879:1986","GlossDef": {"para": "A meta-markup language; used to create markup languages such as DocBook.","GlossSeeAlso": ["GML", "XML"]},"GlossSee": "markup"}}}}}')

( The very class object string representation have, naturally, been set to the respective JSON stream itself; so, let's print it:)

In [0]:
print('Flat JSON stream:\n',J)

Now, let's see it in nested form:

In [0]:
print('The respective nested stream is:\n')
J.print_nested()

Now, let's extract various values from it:

In [0]:
value_occurrences, values_extracted = J.extract_value_by_path(['glossary', 'GlossDiv' ,'GlossList', 'GlossEntry', 'GlossDef', 'para'])

print(f'found {value_occurrences} value occurrences.\n')
print('Found value(s):\n')
for value in values_extracted:
  print(value)

==================================================================================================

==================================================================================================