### SAP Machine Learning Embedding in OpenAI - step 01
##### Author: Sergiu Iatco. May, 2023
https://people.sap.com/iatco.sergiu <br>
https://www.linkedin.com/in/sergiuiatco/ <br>

#### Collected URLs.
Blogs:<br>
https://blogs.sap.com/2022/11/07/sap-community-call-sap-hana-cloud-machine-learning-challenge-i-quit-how-to-prevent-employee-churn/ <br>
https://blogs.sap.com/2022/11/28/i-quit-how-to-predict-employee-churn-sap-hana-cloud-machine-learning-challenge/ <br>
https://blogs.sap.com/2022/12/22/sap-hana-cloud-machine-learning-challenge-2022-the-winners-are/ <br>

https://blogs.sap.com/2023/01/09/sap-hana-cloud-machine-learning-challenge-i-quit-understanding-metrics/ <br>

Documentation:<br>
https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.04/en-US/hana_ml.dataframe.html <br>
https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.07/en-US/pal/algorithms/hana_ml.algorithms.pal.trees.HybridGradientBoostingClassifier.html <br>

In [1]:
import urllib.request
import os

class collect_html():
    def __init__(self):
        pass

    def read_save_html(self, url, path_save = None, filename = None, mode = 0):
        # mode: 0 - save, 1 - content, 2 - save and content
        
        response = urllib.request.urlopen(url)
        html_file = response.read()

        if mode == 0 or mode == 2:
            if filename==None:
                filename = os.path.basename(url)

            if path_save!=None:
                path_with_filename = os.path.join(path_save, filename)
            else:
                path_with_filename = filename

            if path_save!=None and not os.path.exists(path_save):
                os.makedirs(path_save)
                
            with open(path_with_filename, "wb") as file:
                file.write(html_file)

            print(f"Destination {path_with_filename}")
            print(f"Extraction and save completed!")
        
        if mode == 1 or mode == 2:
            return html_file

In [2]:
# # Example 1 # save in same folder with same name
# url = "https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.04/en-US/hana_ml.dataframe.html"
# dc = collect_html()
# dc.read_save_html(url)

In [3]:
# # Example 2 # save in same folder with filename
# url = "https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.04/en-US/hana_ml.dataframe.html"
# dc = collect_html()
# dc.read_save_html(url, filename = 'test2.html')

In [4]:
# # Example 3 # save in same folder with filename and path
# url = "https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.04/en-US/hana_ml.dataframe.html"
# dc = collect_html()
# dc.read_save_html(url, path_save = 'collect/', filename = 'test3.html')

In [5]:
# # Example 4 # save in same folder with same name and path
# url = "https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.04/en-US/hana_ml.dataframe.html"
# dc = collect_html()
# dc.read_save_html(url, path_save = 'collect/')

In [6]:
url = "https://blogs.sap.com/2023/01/09/sap-hana-cloud-machine-learning-challenge-i-quit-understanding-metrics/"
path_save = 'llama_challenge/html_challenge'
filename = 'understanding_metrics_blog.html'
dc = collect_html()
dc.read_save_html(url, path_save = path_save, filename = filename)

Destination llama_challenge/html_challenge\understanding_metrics_blog.html
Extraction and save completed!


In [7]:
url = "https://blogs.sap.com/2022/11/07/sap-community-call-sap-hana-cloud-machine-learning-challenge-i-quit-how-to-prevent-employee-churn/"
# path_save
filename = "challenge_20221107.html"
dc = collect_html()
dc.read_save_html(url, path_save = path_save, filename = filename)

Destination llama_challenge/html_challenge\challenge_20221107.html
Extraction and save completed!


In [8]:
url = "https://blogs.sap.com/2022/11/28/i-quit-how-to-predict-employee-churn-sap-hana-cloud-machine-learning-challenge/"
# path_save
filename = "challenge_20221128.html"
dc = collect_html()
dc.read_save_html(url, path_save = path_save, filename = filename)

Destination llama_challenge/html_challenge\challenge_20221128.html
Extraction and save completed!


In [9]:
url = "https://blogs.sap.com/2022/12/22/sap-hana-cloud-machine-learning-challenge-2022-the-winners-are/"
# path_save
filename = "challenge_20221222.html"
dc = collect_html()
dc.read_save_html(url, path_save = path_save, filename = filename)

Destination llama_challenge/html_challenge\challenge_20221222.html
Extraction and save completed!


In [10]:
url = "https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.04/en-US/hana_ml.dataframe.html"
dc = collect_html()
dc.read_save_html(url, path_save = path_save)

Destination llama_challenge/html_challenge\hana_ml.dataframe.html
Extraction and save completed!


In [11]:
url = "https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.07/en-US/pal/algorithms/hana_ml.algorithms.pal.trees.HybridGradientBoostingClassifier.html"
dc = collect_html()
dc.read_save_html(url, path_save = path_save)

Destination llama_challenge/html_challenge\hana_ml.algorithms.pal.trees.HybridGradientBoostingClassifier.html
Extraction and save completed!


In [12]:
import pathlib

def list_ipynb(repo_path, extension):
    name_filter = f"**/*.{extension}"
    repo_path_lib = pathlib.Path(repo_path)
    files = list(repo_path_lib.glob(name_filter))
    for file in files:
        print(file)

In [13]:
# List html files
repo_path = path_save
list_ipynb(repo_path, "html")

llama_challenge\html_challenge\understanding_metrics_blog.html
llama_challenge\html_challenge\challenge_20221107.html
llama_challenge\html_challenge\challenge_20221128.html
llama_challenge\html_challenge\challenge_20221222.html
llama_challenge\html_challenge\hana_ml.dataframe.html
llama_challenge\html_challenge\hana_ml.algorithms.pal.trees.HybridGradientBoostingClassifier.html
