## Unicode Reader mini function

### Objective:
- To read Unicode characters off a HTML GOOGLE DOC file and present results as readable text

### Requirements:
- Allows user to enter the arguments for URL

### Components:
- Comprises of a function to fetch table data in HTML from URL
- And to read and extract table and convert table data into a dataframe

### Methodology:
- Use Pandas to read URL, extract table and format table data  
- Form a 2D matrix using max values of both x and y as extracted from source DOC
- Loop the range of rows in matrix, and loop the elements in list to extract x,y,Character values using index
- Replace an element of the matrix using x,y as index positions and Character as value
- Format to show output in correct sequence, and to remove list separators

### Special Considerations:  
- Use pandas to easily manage extraction of max values, read HTML DOC from url
- Work in rows, handle x,y as paired elements for easy checking
- Avoid requirement to sort for looping ie. while y=1, do all x
- y-axis descending top [max] to bottom [0]   
- Consideration for x=0, y=0 ie. max +1 to keep within range
- 2D matrix notation is swapped from x,y to y,x after creation

### Library:
- Pandas

In [33]:
# obtain the required library
import pandas as pd

In [34]:
def unicode_reader (doc_url):
    
    # Open the url, extract the table from HTML, convert HTML table data into dataframe, force utf-8 encoding    
    tables = pd.read_html(doc_url, encoding='utf-8')
    df = tables[0]                 # Select the 1st table extracted by read_html
    
    # Data cleaning for dataframe
    new_header = df.iloc[0]        # Get the first row of table to be the new header
    df.columns = new_header        # Set the new column headers for dataframe
    df = df.iloc[1:]               # Drop row 0 which is now duplicate value row of column names 
    df = df.reset_index(drop=True) # Reset the row index to start from 0 at first row
    df['x-coordinate'] = df['x-coordinate'].astype('int') # change numeric to integer type
    df['y-coordinate'] = df['y-coordinate'].astype('int') # change numeric to integer type

    # Create a blank 2D matrix
    max_valuex = df['x-coordinate'].max() # extract the x-value range
    max_valuey = df['y-coordinate'].max() # extract the y-value range
    grid = [[" " for i in range(max_valuex+1)] for j in range(max_valuey+1)]  


    
    """
    ### This is old code without data cleaning ###

    df = df.iloc[1:]  #drop column names 
    df.iloc[:,0] = df.iloc[:,0].astype('int') # change column numeric to integer type
    df.iloc[:,2] = df.iloc[:,2].astype('int') # change column numeric to integer type
  
      
    # Create a blank 2D matrix
    max_valuex = df[0].max() 
    max_valuey = df[2].max() 
    grid = [[" " for i in range(max_valuex+1)] for j in range(max_valuey+1)] 
  
    ### This is old code without data cleaning ###
    """
    
    
    # Populate the 2D matrix with the Unicode Character of each row
    for row in df.itertuples(index=False, name=None):
        rowx,rowy = row[0],row[2]
        grid[rowy][rowx] = row[1]   
    
    for row in reversed(grid):
        for item in row:
            print(item, end='')
        print()
        

In [53]:
### EXECUTION BLOCK ###

default_url = "https://docs.google.com/document/d/e/2PACX-1vRs1K6ZCuc_GN7kWKRNq5NdQZYNFmGw9l28KYQ9j6Y5F6c1P0VUKsNOmaxkidXc9Ap9UMYSYDQMDAJq/pub"
correct_url = "https://docs.google.com/document/d/e/2PACX-1vRfIq2r6rfrL-LkKkUxLALVxdEvDawTCxlC84KoKc8YLcuN7I-bF8I1LgvuaLMPSH0gfeemE-Ha1RMZ/pub"    

while True:
    
    user_url = input(f"Enter the URL of the DOC:"'\n')
    
    if user_url == "":
        print(f"Invalid entry!")
        continue

    elif user_url == default_url or user_url == "test": #check for valid entries
        print(f"Loading Sample Test File!!!"'\n')
        unicode_reader (default_url)
        print('\n'f"Unicode converted!!!")
        break
            
    elif user_url == correct_url or user_url == "run": #check for valid entries
        print(f"Loading Actual Run File!!!"'\n')
        unicode_reader (correct_url)
        print('\n'f"Unicode converted!!!")
        break

Enter the URL of the DOC:
 test


Loading Sample Test File!!!

█▀▀▀
█▀▀ 
█   

Unicode converted!!!


# Output Explained:
Elect to use Pandas library to easily read HTML DOC from url, and to extract HTML table using pd.read_HTML and clean table data.   
Using a 2D matrix of single char blanks formed by max values of both x-coordinate and y-coordinate, the resulting image is formed by looping through the range of rows and elements in the matrix and replacing blank values with the Unicode value of Character for each coordinate pair.
Final image is formatted by removing list separators, and by reversing the y-axis loop into descending order from top [y=max] to bottom [y=0].

### An image is also generated when looping from y=0 to y=max, showing 'M' instead of 'W'.
But calibration from sample shows correct orientation is y descending.