<a href="https://colab.research.google.com/github/gabrielbrazo/geoprocessing/blob/main/QGIS_Raster_Layer_Unique_Value_Report_Output_HTML_File_reading.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# QGIS Raster Layer Unique Value Report
## Working with the output html file to get the total area based on each unique value in a given raster layer
<br>

Created by [Gabriel Brazo](mailto:gbrazo@id.uff.br)<br>

While working in QGIS with the manipulation of raster layers, the step in question was the need to get the slope of such layers (assuming they have already been properly reprojected into an crs that treats the data preferably in meters), group and characterize them, and finally get the total area based on these pre-defined groups. Only the process of getting the total area will be done in this notebook.<br>
For the firsts steps of this job, the QGIS toolbox provides us a tool called Raster Layer Unique Value Report, which has as output a html file. This html file has a "table" of three columns:


*   value: pixel value
*   count: count of pixels with this value
*   m2: total area in square meters of pixels with this value.


With this file saved, you can import it into your python environment in the way that suits you. Since this work was done in Google Colab, the chosen method was "gdown". The process that takes places is the conversion of the html file into a csv one, where the columns of the html "table" shall be the columns of the final csv file.<br>
Thereby, we will finally obtain the complete result. For this, we may use those pre-estabilished groups as the parameter for defining sum intervals. In this notebook we'll use as example the slope intervals of 0-3%, 3-6% and 6-12%.<br>

**Finally, let's jump into the program**

In [None]:
# first, we need to install the modules that do not come built-in python
%%capture 
!pip install BeautifulSoup
#if necessary, uncomment the code below
#!pip install pandas

In [None]:
# now, importing what is needed
import pandas as pd
from bs4 import BeautifulSoup
import os
import sys

In [None]:
# building the code responsible for converting html into csv
def go_csv(html_file):
  path = f'{html_file}.html'

  # creating an empty list
  data = []

  # to get the header from the HTML file
  list_header = []
  soup = BeautifulSoup(open(path, encoding="ISO-8859-1"),'html.parser') # here the 'enconding="ISO-8859-1" is needed because the QGIS' html file 
                                                                        # often comes with the charset=utf-8"
  header = soup.find_all("table")[0].find("tr")

  for items in header:
    try:
      list_header.append(items.get_text())
    except:
      continue

  # to get the data
  HTML_data = soup.find_all("table")[0].find_all("tr")[1:]

  for element in HTML_data:
    sub_data = []
    for sub_element in element:
      try:
        sub_data.append(sub_element.get_text())
      except:
        continue
    data.append(sub_data)

  # Storing the data into Pandas DataFrame
  dataFrame = pd.DataFrame(data = data, columns = list_header)

  # Converting Pandas DataFrame into CSV file
  dataFrame.to_csv(f'{html_file}.csv')

In [None]:
def raster_area(csv_file):
  rpd = pd.read_csv(f'{csv_file}.csv')
  # for the slope interval of 0-3%
  area3 = rpd[rpd['Value'] <= 3]    # Note that "Value" may change with the setup-language of your QGIS
                                    # In these cases you should check the column name

  r_area3 = sum(area3['Area (m²)']) # Note that "Area (m²)" may change with the setup-language of your QGIS
                             # In these cases you should check the column name
  print(f'Raster Area <=3% = {r_area3:.2f} m²')
  print(f'Raster Area <=3% = {(r_area3/1000000):.2f} km²')
  # for the slope interval of 3-6%
  area6 = rpd[(rpd['Value'] > 3) & (rpd['Value'] <= 6)]
  r_area6 = sum(area6['Area (m²)'])
  print(f'Raster Area 3-6% = {r_area6:.2f} m²')
  print(f'Raster Area 3-6% = {(r_area6/1000000):.2f} km²')
  # for the slope interval of 6-12%
  area12 = rpd[(rpd['Value'] > 6) & (rpd['Value'] <= 12)]
  r_area12 = sum(area12['Area (m²)'])
  print(f'Raster Area 6-12% = {r_area12:.2f} m²')
  print(f'Raster Area 6-12% = {(r_area12/1000000):.2f} km²')

## Reference
<div class="csl-entry"><i>Convert HTML table into CSV file in python - GeeksforGeeks</i>. (n.d.). Retrieved March 11, 2022, from https://www.geeksforgeeks.org/convert-html-table-into-csv-file-in-python/</div>
