# Scraping prompts generated from ordinary sentences 

This website (https://imageprompt.org/image-prompt-generator) effectively converts normal sentences into extra-detailed and refined text-to-image prompts. What I am going to do here is to input sentences obtained from Kaggle into the website, and wait for it to generate prompts, and then ultimately extract them and save them into a pandas DataFrame.

This is the only effective way I have found so far since no other generative models found on HuggingFace are suited for this niche job. 

## Importing sentences downloaded from Kaggle

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("data/cv-unique-has-end-punct-sentences.csv").iloc[:, 1:]
df.head()

Unnamed: 0,sentence
0,He was accorded a State funeral and was buried...
1,In American English whilst is considered to be...
2,Once again she is seen performing on a compute...
3,Hippety Hopper returns in McKimsons Pop Im Pop.
4,Today their programs are available on the Inte...


In [3]:
sentences = df.values.flatten()
sentences

array(['He was accorded a State funeral and was buried in Drayton and Toowoomba Cemetery.',
       'In American English whilst is considered to be pretentious or archaic.',
       'Once again she is seen performing on a computergenerated stage.',
       ..., 'Here his attention was drawn to geology.',
       'Every element of Milnor Ktheory can be written as a finite sum of symbols.',
       'The south wing contained the owners private apartments.'],
      dtype=object)

## Web scraping

In [5]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select, WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import re
from time import sleep
from tqdm import tqdm

In [6]:
results = []

### Initializing Selenium

In [None]:
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))

wait = WebDriverWait(driver, timeout=20)

website = 'https://imageprompt.org/image-prompt-generator'
driver.get(website)

input_box = driver.find_element(By.XPATH,"//textarea")
enter_button = driver.find_element(By.XPATH, "//button[contains(@class, 'text-primary') and contains(@class, 'border-primary')]")
input_box.send_keys("Hello world")
enter_button.click()

wait.until(EC.presence_of_element_located((By.XPATH, "//span[contains(text(), 'Continue Editing')]")))

continue_editing_button = driver.find_elements(By.XPATH, "//button[contains(@class, 'text-primary') and contains(@class, 'border-primary')]")[3]
result_box = driver.find_element(By.XPATH, "//textarea[@placeholder='Your image prompt will show here']")

### Loop through all the sentences 

In [None]:
print("Input (n): n is by index")

for i in tqdm(range(len(sentences))):
    try:
        input_box.clear()
        input_box.send_keys(sentences[i])
    
        enter_button.click()
    
        wait.until(lambda d : continue_editing_button.is_enabled())
        results.append(result_box.text)
        # print(f"Generated prompt for sentence {i}")
        
    except:
        # print(f"An error occured for sentence {i}! Putting None for now.") 
        results.append(None)

driver.quit()

### Saving the refined prompts and the respective sentences into a DataFrame

In [9]:
data = df.iloc[:10000, :]
data['prompts'] = results
data.to_csv("training_data.csv", index=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['prompts'] = results


Please be sure to clean the data afterwards as the odds of getting duplicated response from the website is very high.