<a href="https://colab.research.google.com/github/lustraka/data-analyst-portfolio-project-2022/blob/main/cs01_cds_methods/20211203_Transform_CSV_HTML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transform CSV to HTML and Vice Versa
## Goal
The purpose of this pattern is to provide interface for gathering data into structured form through HTML files in GitHub repository. GitHub displays HTML files as source code, but it is possible to open rendered output using prefix:

```
https://htmlpreview.github.io/?
```
> **Note**: To open the link in a new tab, just use `Ctrl`+click

## Objectives
- Export a DataFrame to an HTML file using `BeautifulSoup`, where:
  - Each row is contained in a `<div class='row'>` element.
  - Fields in general are contained  in `<span class='field_name'>` elements divided by `<br\>` element.
  - Fields `term` and `title` are wrapped in a `<h3>` element.
  - The field `url` is wrapped in a `<a href=''>` element.
- Import a DataFrame from an HTML file. The structure of a DataFrame imported from HTML shall be derived from the classes of `<span>` elements in the HTML file.

## Lessons Learned
Using `BeautifulSoup` for extracting data:
  - cs01_cds_methods > [20211202_Search_Web_Resources.ipynb](https://github.com/lustraka/data-analyst-portfolio-project-2022/blob/875001b6ca1896c6a09d7e71a16a2b1bb65da280/cs01_cds_methods/20211202_Search_Web_Resources.ipynb)
  - suitecrm > iim > 20211126-WebScrape-Pytude.ipynb
  - Wrangle_Data > [Compare_CRM_Systems.ipynb](https://github.com/lustraka/Data_Analysis_Workouts/blob/d0585eafc2d5a1439831da275609945e430527ac/Wrangle_Data/Compare_CRM_Systems.ipynb)
  
Using `BeautifulSoup` for reporting data:
  - Tutorials > BeautifulSoup-Compile-Prehled-zmen.ipynb

In [4]:
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
from datetime import date
import os

In [3]:
html_template = """<!DOCTYPE html>
<html><head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<title></title>
<style>
body { background-color: #ffffff; color: #000000; margin:10px 10px 10px 10px; font-family: Arial, Helvetica, sans-serif; }
h1 { font-size: 1.75em; color: #003199; }
h2 { font-size: 1.25em; color: #003199; }
h3 { font-size: 1em; color: #003199; }
p, li { font-size: 1em; }
.header { font-size: 2em; color: #000099; }
.maincontent { }
.footer { text-align: center; font-size: 0.675em; }
table, th, td { border: 1px solid black; border-collapse: collapse; }
table { width: 18cm; }
th, td { padding: 2px; }
.highligth { background-color: #ffffe0;}
</style>
</head>
<body>
<div class="header"><span></span></div>
<hr/>
<div class="maincontent">
</div>
<hr/>
<div class="footer">Updated: <span class="update"></span>.</div>
</body></html>"""
soup = BeautifulSoup(html_template, 'html.parser')
print(soup.prettify())

<!DOCTYPE html>
<html>
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <title>
  </title>
  <style>
   body { background-color: #ffffff; color: #000000; margin:10px 10px 10px 10px; font-family: Arial, Helvetica, sans-serif; }
h1 { font-size: 1.75em; color: #003199; }
h2 { font-size: 1.25em; color: #003199; }
h3 { font-size: 1em; color: #003199; }
p, li { font-size: 1em; }
.header { font-size: 2em; color: #000099; }
.maincontent { }
.footer { text-align: center; font-size: 0.675em; }
table, th, td { border: 1px solid black; border-collapse: collapse; }
table { width: 18cm; }
th, td { padding: 2px; }
.highligth { background-color: #ffffe0;}
  </style>
 </head>
 <body>
  <div class="header">
   <span>
   </span>
  </div>
  <hr/>
  <div class="maincontent">
  </div>
  <hr/>
  <div class="footer">
   Updated:
   <span class="update">
   </span>
   .
  </div>
 </body>
</html>


In [5]:
soup.find(class_='update').string = str(date.today())
print(soup.body.prettify())

<body>
 <div class="header">
  <span>
  </span>
 </div>
 <hr/>
 <div class="maincontent">
 </div>
 <hr/>
 <div class="footer">
  Updated:
  <span class="update">
   2021-12-03
  </span>
  .
 </div>
</body>
