[ScrapeGraphAI](https://github.com/VinciGit00/Scrapegraph-ai) is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, etc.).

Just say which information you want to extract and the library will do it for you!

### Step1: Install the necessary Python Libraries

In [None]:
!pip -qq install scrapegraphai
!playwright install
!playwright install-deps

### Step2 : Download ollama and pull the following models:

* Llama-3 as the main LLM
* nomic-embed-text as the embedding model

In [None]:
!curl https://ollama.ai/install.sh | sh

In [None]:
!ollama

### Step2 - Run the app

In [None]:
import subprocess
import time

# Start ollama as a backrgound process
command = "nohup ollama serve&"

# Use subprocess.Popen to start the process in the background
process = subprocess.Popen(command,
                            shell=True,
                           stdout=subprocess.PIPE,
                           stderr=subprocess.PIPE)
print("Process ID:", process.pid)
# Let's use fly.io resources
#!OLLAMA_HOST=https://ollama-demo.fly.dev:443
time.sleep(5)  # Makes Python wait for 5 seconds

In [None]:
!ollama list

### Step 3: Get the Model
Next, you can visit the model library to check the list of all model families currently supported. The default model downloaded is the one with the latest tag. On the page for each model, you can get more info such as the size and quantization used.

You can search through the list of tags to locate the model that you want to run. For each model family, there are typically foundational models of different sizes and instruction-tuned variants.

In [None]:
!ollama pull llama3
!ollama pull nomic-embed-text

### Step 3: Check that Ollama is running 
Try serving the model with the command



In [None]:
#!ollama run llama3

In [None]:
#!ollama serve llama3

In [None]:
from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "ollama/llama3",
        "temperature": 0,
        "format": "json",  # Ollama needs the format to be specified explicitly
        #"base_url": "http://localhost:11434",  # set Ollama URL
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        #"base_url": "http://localhost:11434",  # set Ollama URL
    },
    "verbose": True,
}

smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the projects with their descriptions",
    # also accepts a string with the already downloaded HTML code
    source="https://perinim.github.io/projects",
    config=graph_config
)

result = smart_scraper_graph.run()
print(result)
