In [1]:
import pandas as pd
import requests
import json
import re
from tqdm import tqdm

# load data & clean up

We load the previously defined media and patent key sentences. The media part is prepared with TextRank, patents was done by keyBERT.

## media

In [2]:
df_media = pd.read_csv('../data/summaries_media.csv')
df_media

Unnamed: 0,summary
0,It said that the facility converts solar energ...


In [3]:
media_summaries = df_media.summary[0].split('\n\n')
len(media_summaries)

26

In [4]:
# clean the mess
media_summary = [re.sub(r'[^a-zA-Z0-9\s\.:\?]+', '', summary).replace('\n', '') for summary in media_summaries]
media_summary = list(filter(None, media_summary))
media_summary

['It said that the facility converts solar energy into green hydrogen through water electrolysis which is then stored in the underground natural gas reservoir in Austria. The Portuguese Ministry of the Environment has approved a new set of solar energy projects to connect to the grid in collaboration with grid operators REN and ERedes.',
 'As of yearend 2021 Louisiana had approximately 200 MW of solar energy installed.  Solar energy is the lowest cost new energy resource across the U.S. and we are pleased to support 1803 Electric Coop s goals to bring lower rates for its members  says Dr. Shawn Qu chairman and CEO of Canadian Solar.',
 'If you know any details drop us a note in the comment thread. Whether or not all 12 gigawatts materialize something is stirring the energy storage pot in Kentucky which does increase the potential for solar energy energy projects in the state.',
 'The Korean government expects private developers will build solar plants on the highways idle sites for a c

In [5]:
len(media_summary)

25

## patent

In [6]:
with open('../data/patent_summary.txt', 'r') as f:
    patent_summary = [line.strip() for line in f.readlines()]

print(patent_summary)

['Disclosed is an adaptable DC-AC inverter system and its operation.', 'The energy is recycled, and the method is suitable for different types of buildings in different regions.', 'ence between day and night and various soil types, and has small influence on environment and good practicability.', 'The utility model has reasonable design and obvious effect, and solves the problem of low economic benefit caused by high labor intensity, low speed and lack of effective monitoring of manual fertilization in the prior art.', 'heating pipe is provided with the water tank, the front side of water tank is provided with the circulating pump, the output fixedly connected with output tube of circulating pump, the rear side of water tank is provided with heating device, the right flank fixedly connected with reflector panel of mounting panel.', 'or strengthening the strength of the force transmission column, at least three hydraulic jacks are arranged between each pair of bearing plates and the bot

In [7]:
len(patent_summary)

43

# generate FAQ entries

To generate our FAQ-entries we use llama3, running locally. First we define a helper function which takes a string, embeds it in a specific prompt, sends it to the ollama-api and returns the result.

In [8]:
def llama_faq_call(key_string, model='llama3'):

    # URL of the API
    url = "http://localhost:11434/api/generate"

    topic_prompt = f"""
    generate a question & answer in the style of an FAQ entry based on the following sentence. The sentence was extracted as one a few key sentences in a large document. 
    Respond in Markdown code in the following form, do not include any remarks or anything:

    # Q: [the question here]
    A: [the answer here]

    here the sentence:
    {key_string}
    """

    # Data to be sent
    data = {
        "model": model,
        "prompt": topic_prompt,
        "stream": False
    }

    # Send POST request with data
    response = requests.post(url, json=data)

    # Print response
    response_text = json.loads(response.text).get('response', None)

    return response_text

## media

In [9]:
media_faq_entries = [llama_faq_call(key_sentence) for key_sentence in tqdm(media_summaries)]

media_faq_entries

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26/26 [00:49<00:00,  1.91s/it]


['# Q: What process does the facility use to convert solar energy into hydrogen?\nA: The facility converts solar energy into green hydrogen through water electrolysis.',
 '# Q: As of what year did Louisiana have approximately 200 MW of solar energy installed?\nA: As of year-end 2021.',
 '# Q: What impact will energy storage have on solar energy projects in Kentucky?\nA: The development of energy storage capacity in Kentucky increases the potential for solar energy projects in the state.',
 '# Q: Will private developers be building solar plants on highways in South Korea?\nA: Yes, according to the Korean government, private developers are expected to build solar plants on idle sites along highways with a combined capacity of 243 MW by 2025.',
 "# Q: What were the results of Japan's latest solar energy project auction?\nA: The final results showed that 273 PV projects with a combined capacity of 268.7MW were selected through the procurement exercise, which was the eleventh and final auct

## patent

In [10]:
patent_faq_entries = [llama_faq_call(key_sentence) for key_sentence in tqdm(patent_summary)]

patent_faq_entries

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 43/43 [01:22<00:00,  1.92s/it]


['# Q: What does "Disclosed" refer to?\nA: An adaptable DC-AC inverter system.',
 '# Q: Is the energy recycling method suitable for all types of buildings?\nA: Yes, the energy is recycled, and the method is suitable for different types of buildings in different regions.',
 '# Q: What factors affect the growth of plants in different environments?\nA: The growth of plants is influenced by a range of factors including the difference between day and night, as well as varying soil types. This has a relatively small impact on the environment and provides good practical applications.',
 '# Q: What are the key features of the utility model?\nA: The utility model has reasonable design and obvious effect, and solves the problem of low economic benefit caused by high labor intensity, low speed and lack of effective monitoring of manual fertilization in the prior art.',
 '# Q: What components are typically found on the front and rear sides of a water tank?\nA: The front side of the water tank is t

# Export to markdown-files

As we already generated markdown-code, we export the llm-answers into a markdown file.

In [11]:
with open('../faq_media.md', 'w') as f:
    for entry in media_faq_entries:
        f.write(entry + '\n\n')

In [12]:
with open('../faq_patent.md', 'w') as f:
    for entry in patent_faq_entries:
        f.write(entry + '\n\n')