### Data generation

For the academic research writings of the specific authors are generated using these LLMs:
- gemini-pro
- mistralai/Mixtral-8x7B-Instruct-v0.1
- gpt-4-turbo
- gpt-3.5-turbo

### Provider

Data is generated using [gpt4free](https://github.com/xtekky/gpt4free/tree/main), which implements reversed engineering technics in order to give free access to the most popular language models.

### Run

In order to run the notebook, few data is necessary to be filled

**GPT-4**

Create `hardir` in the root of the project. Save `.har` file explained in `.HAR File for OpenaiChat Provider` section in [gpt4free](https://github.com/xtekky/gpt4free/tree/main) README.

**Gemini-pro**

Create `hardir` in the root of the project and create `gemini_cookies.json` file:
```json
{
    "__Secure-1PSID": "your_cookies",
    "__Secure-1PSIDTS": "your_cookies"
}
```
How to find these cookies is explained in `Cookies` section in [gpt4free](https://github.com/xtekky/gpt4free/tree/main) README.

In [10]:
%%capture
%pip install -U g4f[all]

In [11]:
from g4f.client import Client
from g4f.cookies import set_cookies
from datetime import datetime
from src import Configuration
import json
import os

In [12]:
res_dir = "res/"
har_dir = "hardir/"
authors_list_filepath = f"{res_dir}/books/selected/author_list"
prompts_filepath = f"{res_dir}prompts2"

Setup necessary cookies

In [13]:
with open(Configuration.GEMINI_COOKIES, "r") as f:
    cookies = json.load(f)
    set_cookies(".google.com", {
        "__Secure-1PSID": cookies["__Secure-1PSID"],
        "__Secure-1PSIDTS": cookies["__Secure-1PSIDTS"],
    })

In [14]:
def get_generated_response(system_spec, prompt, model):
    prompt = f"{prompt}. Use at least {Configuration.RESPONSE_LENGTH} words."
    client = Client()
    response = client.chat.completions.create(
        model=model,
        provider=Configuration.PROVIDERS[model],
        messages=[
            {"role": "system", "content": system_spec},
            {"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

In [15]:
def save_data(data):
    model_dir = f"{res_dir}{data['model'].replace('/', '-')}"
    data['id'] = int(datetime.now().timestamp())  
    filename = f"{model_dir}/{data['author']}/{data['id']}.json"
    os.makedirs(os.path.dirname(filename), exist_ok=True)
    with open(filename, 'w') as f:
        f.write(json.dumps(data, indent=4))

Use the list of 10 authors with the highest total words count from `book_stats.ipynb`.

In [16]:
authors = open(Configuration.SELECTED_AUTHORS_FILEPATH, 'r', encoding='utf-8').read().split('\n')
authors

['Zane Grey',
 'Joseph Conrad',
 'Benjamin Disraeli',
 'Lucy Maud Montgomery',
 'William Henry Hudson']

In [17]:
prompts = open(Configuration.PROMPTS_FILEPATH, 'r', encoding='utf-8').read().split('\n')
prompts[:5]

["Describe the most peculiar meal you've ever encountered.",
 'Explain the finer points of etiquette for a barnyard gathering.',
 'Convince a grumpy mule to pull a rickety carriage.',
 'Debate the merits of a mustache versus a beard.',
 'Write a love letter from a lovesick cactus to a blooming rose.']

In [18]:
for author in authors:
    system_spec = f"Come up with the answer in {author}'s writing style. Don't use direct references and citations of {author}. Answer in plain text format."
        
    for model in Configuration.MODELS:
        
        for prompt in prompts:
            print(f"Generating response for {author}'s response using {model} model.")
            response = get_generated_response(system_spec, prompt, model)

            data = {
                "requested_response_length": Configuration.RESPONSE_LENGTH,
                "response_length": len(response),
                "model": model, 
                "created_at": datetime.now().isoformat(),
                "author": author, 
                "system_spec": system_spec,
                'prompt': prompt,
                'response':  response
            }

            save_data(data)

Generating response for William Henry Hudson's response using gpt-4 model.
Generating response for William Henry Hudson's response using gpt-4 model.


KeyboardInterrupt: 