In [2]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import google.generativeai as genai

In [3]:
# Constants
load_dotenv(override=True)
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
HEADERS = {"Content-Type": "application/json"}
MODEL = "gemini-1.5-flash"

genai.configure(api_key=GOOGLE_API_KEY)

In [4]:
for m in genai.list_models():
    if 'generateContent' in m.supported_generation_methods:
        print(m.name)

models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-pro-exp-0827
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-exp-0827
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/gemini-2.0-flash-exp
models/gemini-exp-1206
models/gemini-exp-1121
models/gemini-exp-1114
models/gemini-2.0-flash-thinking-exp-01-21
models/gemini-2.0-flash-thinking-exp
models/gemini-2.0-flash-thinking-exp-1219
models/learnlm-1.5-pro-experimental


In [5]:
model = genai.GenerativeModel(MODEL)

In [6]:
messages = "Describe some of the business applications of Generative AI"

In [7]:
response = model.generate_content(messages,stream=True)

In [8]:
for chunk in response:
    display(Markdown(chunk.text))

Gener

ative AI is rapidly finding its place in numerous business applications, revolutionizing how companies

 operate and interact with customers. Here are some key examples:

**Marketing &

 Sales:**

* **Content Creation:**  Generating marketing copy (website text, ad copy, social media posts), blog posts, product descriptions, email newsletters,

 and even scripts for videos at scale and with speed.  This frees up human marketers to focus on strategy and higher-level tasks.
* **Personalized Experiences

:** Creating customized product recommendations, personalized emails, and targeted advertising based on individual customer data and preferences.  This leads to improved engagement and conversion rates.
* **Chatbots & Virtual Assistants:**  Building more sophisticated and human-like chat

bots capable of handling complex customer inquiries, providing support, and even negotiating sales.
* **Image & Video Generation:** Creating visually appealing marketing materials like images for social media, product demos, and advertising campaigns, quickly and cost-effectively.



**Product Development & Design:**

* **Prototyping & Design:** Generating initial designs for products, websites, and user interfaces, accelerating the design process and enabling rapid iteration.
* **New Material Discovery:**  AI can predict the properties of new materials, drastically reducing the time and cost associated with traditional R&

D.
* **Personalized Product Design:**  Generating customized products tailored to individual customer specifications.  Think personalized clothing, shoes, or even furniture.

**Customer Service:**

* **Automated Response Systems:**  Handling routine customer inquiries and providing immediate support 24/7, reducing the burden on human agents.


* **Sentiment Analysis & Feedback Processing:**  Analyzing customer feedback (reviews, surveys, social media posts) to identify trends, pain points, and areas for improvement.

**Operations & Internal Processes:**

* **Code Generation:** Automating parts of software development by generating code snippets, translating between programming languages, and

 assisting with debugging.
* **Data Analysis & Reporting:** Automating the creation of reports and visualizations from complex datasets, saving time and resources.
* **Process Automation:**  Automating repetitive tasks such as data entry, invoice processing, and report generation.

**Other Applications:**

* **Drug Discovery & Development

:** Accelerating the process of identifying and developing new drugs by generating novel molecule designs and predicting their efficacy.
* **Financial Modeling:**  Creating more accurate and sophisticated financial models by generating synthetic data and simulating various market scenarios.
* **Education:** Creating personalized learning experiences, generating educational content (quizzes, exercises,

 summaries), and assisting with grading.


It's crucial to note that while generative AI offers immense potential,  businesses also need to consider ethical implications, such as potential biases in generated content, the risk of misinformation, and the need for human oversight to ensure accuracy and quality.  The successful implementation of generative AI requires

 a strategic approach and careful consideration of these factors.


In [9]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [10]:
ed = Website("https://travis.media/blog/ai-web-scraping-tools/")
print(ed.title)
print(ed)

Best AI-Powered Web Scraping Tools for Data Collection
<__main__.Website object at 0x000002B09593E810>


In [20]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a detail summary in traditional chinese, ignoring text that might be navigation related. \
Respond in markdown."
    user_prompt = system_prompt + f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a detail summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    # print(user_prompt)
    return user_prompt

In [18]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    messages = user_prompt_for(website)
    response = model.generate_content(messages)
    # return response.choices[0].message.content
    return response.text

In [21]:
url = "https://travis.media/blog/ai-web-scraping-tools/"
display(Markdown(summarize(url)))

## 人工智慧驅動的網頁抓取工具：資料收集的最佳選擇 (繁體中文)

本文探討了當今資料驅動的世界中，人工智慧(AI)驅動的網頁抓取工具如何徹底改變資料收集和分析的方式。傳統的網頁抓取方法耗時且費力，而AI工具則透過機器學習演算法自動導航網站、提取相關資料，並處理動態內容、CAPTCHA和反抓取措施等複雜情況。此文重點介紹了五種最佳的AI網頁抓取工具，並分析了AI網頁抓取的優勢和應用案例。

**五種領先的AI網頁抓取工具：**

1. **Octoparse:**  一個使用者友善的工具，無需編碼經驗即可使用。它具有直覺的點擊式介面，可輕鬆從任何網站提取資料，並能處理動態內容、分頁和AJAX載入的資料。提供免費方案(功能受限)。

2. **ParseHub:**  另一個強大的AI網頁抓取工具，可從複雜網站提取資料。其機器學習引擎分析網站結構並建議最相關的資料進行抓取，輕鬆處理互動元素、無限滾動和巢狀資料結構。支援文字、圖片和檔案的抓取。

3. **Diffbot:**  一個超越傳統網頁抓取的AI資料提取平台。它使用自然語言處理和機器學習自動從網站提取結構化資料，理解網頁的上下文和語義，非常適合建立知識圖譜和分析非結構化資料。提供預建API和客製化API開發。

4. **Scrapy:**  一個開源的網頁抓取框架，需要編碼知識(Python)。它具有高度客製化和擴充性，支援並發請求以加快抓取速度，並與TensorFlow和PyTorch等AI函式庫整合，實現智慧型資料提取。

5. **ScrapeStorm:**  另一個具有視覺化無程式碼介面的AI網頁抓取工具。它使用機器學習自動識別網站上的列表、表格和分頁元素，並支援多種資料匯出選項。


**AI驅動網頁抓取的優勢：**

* **效率提升:** 自動化導航、資料提取和複雜場景處理，快速有效地收集大量資料。
* **準確度提高:**  AI工具可以學習和適應不同的網站佈局，確保更準確可靠的資料提取，減少傳統方法中因動態內容或網站結構不一致造成的錯誤。
* **成本節省:**  減少人工干預，降低資料收集的整體成本。
* **競爭優勢:**  及時準確的資料有助於做出明智的決策，例如監控競爭對手的價格、市場趨勢和客戶行為等。


**AI驅動網頁抓取的應用案例：**

* **價格監控和優化:**  電商企業可以監控競爭對手的價格並優化自身定價策略。
* **潛在客戶開發:**  銷售和市場團隊可以收集聯絡資訊、職位和公司詳情以開發目標潛在客戶。
* **情緒分析:**  提取客戶評論、社群媒體提及和論壇討論，分析客戶情緒並找出改進方向。
* **市場研究:**  收集市場趨勢、消費者行為和行業統計資料，為商業策略提供資訊。


**結論:**

AI驅動的網頁抓取工具正在徹底改變企業收集和分析網站資料的方式。選擇工具時，應考慮易用性、資料處理能力、客製化選項和價格等因素。投資AI驅動的網頁抓取可以顯著增強資料收集工作，推動業務增長。


**(文章中其他與導覽無關的內容已包含於上述摘要中)**
