Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper with FastAPI #141

Closed
angelkurten opened this issue Mar 6, 2024 · 2 comments
Closed

Scraper with FastAPI #141

angelkurten opened this issue Mar 6, 2024 · 2 comments

Comments

@angelkurten
Copy link

Description

I tried integrate the scraper with fast-api but I getting this error

Code to Reproduce (Paste main.py)

from datetime import datetime

from fastapi import FastAPI
from fastapi.openapi.models import Response

from src import Gmaps
from src.point import point

app = FastAPI()

@app.get("/{category}/{latitude}/{longitude}")
def scrap(category: str, latitude: float, longitude: float, point_id: str):
    point.set_point_id(point_id)
    Gmaps.places(
        queries=[category],
        fields=Gmaps.ALL_FIELDS,
        max=120,
        geo_coordinates=f"{latitude}, {longitude}",
    )
    return Response(status_code=200, content="Success")

###Dockerfile

FROM chetan1111/botasaurus:latest

# Establecer PYTHONUNBUFFERED para no almacenar los outputs en búfer
ENV PYTHONUNBUFFERED=1

# Establecer el directorio de trabajo en el contenedor
WORKDIR /app

# Copiar solo los archivos necesarios para instalar las dependencias primero
COPY requirements.txt ./

# Instalar dependencias del proyecto
RUN python -m pip install --no-cache-dir --upgrade -r requirements.txt

# Copiar el resto del código fuente del proyecto al contenedor
COPY . .

# Comando para ejecutar la aplicación FastAPI con Uvicorn
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--reload"]

###Docker compose

services:
  scrapper:
    platform: linux/arm64/v8
    build: .
    shm_size: 4000m
    ports:
      - "8000:8000"
    volumes:
      - .:/app
    environment:
      - UVICORN_HOST=0.0.0.0
      - UVICORN_PORT=8000
      - UVICORN_RELOAD=True

###Error

INFO: Will watch for changes in these directories: ['/app']
2024-03-06T21:29:20.158071138Z INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2024-03-06T21:29:20.158203638Z INFO: Started reloader process [1] using StatReload
2024-03-06T21:29:29.208874753Z INFO: Started server process [11]
2024-03-06T21:29:29.209013420Z INFO: Waiting for application startup.
2024-03-06T21:29:29.211119836Z INFO: Application startup complete.
2024-03-06T21:30:17.209310220Z Running
2024-03-06T21:30:20.983483013Z Chrome failed to launch. Retrying with additional server options. To add server options by default, include '--server' in your launch command.
2024-03-06T21:30:23.443837209Z INFO: 192.168.224.1:54530 - "GET /bar/40.41116/-3.7044?point_id=1 HTTP/1.1" 500 Internal Server Error
2024-03-06T21:30:23.479692709Z ERROR: Exception in ASGI application
2024-03-06T21:30:23.479729375Z Traceback (most recent call last):
2024-03-06T21:30:23.479732750Z File "/usr/local/lib/python3.9/site-packages/botasaurus/create_driver_utils.py", line 236, in create_selenium_driver
2024-03-06T21:30:23.479735000Z driver = AntiDetectDriver(
2024-03-06T21:30:23.479736792Z File "/usr/local/lib/python3.9/site-packages/botasaurus/anti_detect_driver.py", line 33, in init
2024-03-06T21:30:23.479739042Z super().init(*args, **kwargs)
2024-03-06T21:30:23.479740709Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in init
2024-03-06T21:30:23.479742625Z super().init(DesiredCapabilities.CHROME['browserName'], "goog",
2024-03-06T21:30:23.479744375Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chromium/webdriver.py", line 92, in init
2024-03-06T21:30:23.479746125Z super().init(
2024-03-06T21:30:23.479747750Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 272, in init
2024-03-06T21:30:23.479749500Z self.start_session(capabilities, browser_profile)
2024-03-06T21:30:23.479751209Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 364, in start_session
2024-03-06T21:30:23.479753084Z response = self.execute(Command.NEW_SESSION, parameters)
2024-03-06T21:30:23.479754625Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute
2024-03-06T21:30:23.479756417Z self.error_handler.check_response(response)
2024-03-06T21:30:23.479758084Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response
2024-03-06T21:30:23.479759875Z raise exception_class(message, screen, stacktrace)
2024-03-06T21:30:23.479761542Z selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
2024-03-06T21:30:23.479765959Z (session not created: DevToolsActivePort file doesn't exist)
2024-03-06T21:30:23.479801292Z (The process started from chrome location /opt/google/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
2024-03-06T21:30:23.479804584Z Stacktrace:
2024-03-06T21:30:23.479806292Z #0 0x0040007a5f83
2024-03-06T21:30:23.479813667Z #1 0x00400045ecf7
2024-03-06T21:30:23.479815459Z #2 0x00400049660e
2024-03-06T21:30:23.479817084Z #3 0x00400049326e
2024-03-06T21:30:23.479819000Z #4 0x0040004e380c
2024-03-06T21:30:23.479820875Z #5 0x0040004d7e53
2024-03-06T21:30:23.479822459Z #6 0x00400049fdd4
2024-03-06T21:30:23.479824209Z #7 0x0040004a11de
2024-03-06T21:30:23.479825792Z #8 0x00400076a531
2024-03-06T21:30:23.479827334Z #9 0x00400076e455
2024-03-06T21:30:23.479828917Z #10 0x004000756f55
2024-03-06T21:30:23.479830500Z #11 0x00400076f0ef
2024-03-06T21:30:23.479833334Z #12 0x00400073a99f
2024-03-06T21:30:23.479834917Z #13 0x004000793008
2024-03-06T21:30:23.479836500Z #14 0x0040007931d7
2024-03-06T21:30:23.479838125Z #15 0x0040007a5124
2024-03-06T21:30:23.479839667Z #16 0x004002ca1044
2024-03-06T21:30:23.479841209Z
2024-03-06T21:30:23.479842792Z
2024-03-06T21:30:23.479844375Z During handling of the above exception, another exception occurred:
2024-03-06T21:30:23.479846084Z
2024-03-06T21:30:23.479847625Z Traceback (most recent call last):
2024-03-06T21:30:23.479849250Z File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi
2024-03-06T21:30:23.479851084Z result = await app( # type: ignore[func-returns-value]
2024-03-06T21:30:23.479852750Z File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call
2024-03-06T21:30:23.479854500Z return await self.app(scope, receive, send)
2024-03-06T21:30:23.479856250Z File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 1106, in call
2024-03-06T21:30:23.479858084Z await super().call(scope, receive, send)
2024-03-06T21:30:23.479860000Z File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 122, in call
2024-03-06T21:30:23.479870750Z await self.middleware_stack(scope, receive, send)
2024-03-06T21:30:23.479873125Z File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call
2024-03-06T21:30:23.479874959Z raise exc
2024-03-06T21:30:23.479876542Z File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call
2024-03-06T21:30:23.479888250Z await self.app(scope, receive, _send)
2024-03-06T21:30:23.479891000Z File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in call
2024-03-06T21:30:23.479892750Z raise exc
2024-03-06T21:30:23.479894334Z File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in call
2024-03-06T21:30:23.479896042Z await self.app(scope, receive, sender)
2024-03-06T21:30:23.479897709Z File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call
2024-03-06T21:30:23.479899500Z raise e
2024-03-06T21:30:23.479901042Z File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call
2024-03-06T21:30:23.479914084Z await self.app(scope, receive, send)
2024-03-06T21:30:23.479917084Z File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 718, in call
2024-03-06T21:30:23.479922125Z await route.handle(scope, receive, send)
2024-03-06T21:30:23.479924042Z File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle
2024-03-06T21:30:23.479925792Z await self.app(scope, receive, send)
2024-03-06T21:30:23.479927500Z File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 66, in app
2024-03-06T21:30:23.479929459Z response = await func(request)
2024-03-06T21:30:23.481045542Z File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 274, in app
2024-03-06T21:30:23.481066209Z raw_response = await run_endpoint_function(
2024-03-06T21:30:23.481068209Z File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
2024-03-06T21:30:23.481070042Z return await run_in_threadpool(dependant.call, **values)
2024-03-06T21:30:23.481071750Z File "/usr/local/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
2024-03-06T21:30:23.481073542Z return await anyio.to_thread.run_sync(func, *args)
2024-03-06T21:30:23.481075334Z File "/usr/local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync
2024-03-06T21:30:23.481077084Z return await get_asynclib().run_sync_in_worker_thread(
2024-03-06T21:30:23.481078750Z File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
2024-03-06T21:30:23.481080584Z return await future
2024-03-06T21:30:23.481082584Z File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run
2024-03-06T21:30:23.481084334Z result = context.run(func, *args)
2024-03-06T21:30:23.481085959Z File "/app/server.py", line 16, in scrap
2024-03-06T21:30:23.481087625Z Gmaps.places(
2024-03-06T21:30:23.481095125Z File "/app/src/gmaps.py", line 327, in places
2024-03-06T21:30:23.481096917Z places_obj = scraper.scrape_places(place_data, cache = use_cache)
2024-03-06T21:30:23.481098667Z File "/usr/local/lib/python3.9/site-packages/botasaurus/decorators.py", line 650, in wrapper_browser
2024-03-06T21:30:23.481100459Z current_result = run_task(data_item, False, 0)
2024-03-06T21:30:23.481102084Z File "/usr/local/lib/python3.9/site-packages/botasaurus/decorators.py", line 530, in run_task
2024-03-06T21:30:23.481103834Z driver = create_selenium_driver(options, desired_capabilities)
2024-03-06T21:30:23.481105667Z File "/usr/local/lib/python3.9/site-packages/botasaurus/create_driver_utils.py", line 253, in create_selenium_driver
2024-03-06T21:30:23.481107500Z return create_selenium_driver( options, desired_capabilities, attempt_download=False)
2024-03-06T21:30:23.481109250Z File "/usr/local/lib/python3.9/site-packages/botasaurus/create_driver_utils.py", line 236, in create_selenium_driver
2024-03-06T21:30:23.481111000Z driver = AntiDetectDriver(
2024-03-06T21:30:23.481112625Z File "/usr/local/lib/python3.9/site-packages/botasaurus/anti_detect_driver.py", line 33, in init
2024-03-06T21:30:23.481114625Z super().init(*args, **kwargs)
2024-03-06T21:30:23.481116459Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in init
2024-03-06T21:30:23.481118167Z super().init(DesiredCapabilities.CHROME['browserName'], "goog",
2024-03-06T21:30:23.481125209Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chromium/webdriver.py", line 92, in init
2024-03-06T21:30:23.481145375Z super().init(
2024-03-06T21:30:23.481150917Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 272, in init
2024-03-06T21:30:23.481153334Z self.start_session(capabilities, browser_profile)
2024-03-06T21:30:23.481155000Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 364, in start_session
2024-03-06T21:30:23.481157500Z response = self.execute(Command.NEW_SESSION, parameters)
2024-03-06T21:30:23.481159167Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute
2024-03-06T21:30:23.481160917Z self.error_handler.check_response(response)
2024-03-06T21:30:23.481162542Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response
2024-03-06T21:30:23.481168834Z raise exception_class(message, screen, stacktrace)
2024-03-06T21:30:23.481171417Z selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
2024-03-06T21:30:23.481173209Z (session not created: DevToolsActivePort file doesn't exist)
2024-03-06T21:30:23.481174875Z (The process started from chrome location /opt/google/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
2024-03-06T21:30:23.481182459Z Stacktrace:
2024-03-06T21:30:23.481187709Z #0 0x0040007a5f83
2024-03-06T21:30:23.481190334Z #1 0x00400045ecf7
2024-03-06T21:30:23.481191959Z #2 0x00400049660e
2024-03-06T21:30:23.481193584Z #3 0x00400049326e
2024-03-06T21:30:23.481195334Z #4 0x0040004e380c
2024-03-06T21:30:23.481197000Z #5 0x0040004d7e53
2024-03-06T21:30:23.481198667Z #6 0x00400049fdd4
2024-03-06T21:30:23.481200292Z #7 0x0040004a11de
2024-03-06T21:30:23.481201875Z #8 0x00400076a531
2024-03-06T21:30:23.481203417Z #9 0x00400076e455
2024-03-06T21:30:23.481205084Z #10 0x004000756f55
2024-03-06T21:30:23.481206709Z #11 0x00400076f0ef
2024-03-06T21:30:23.481208292Z #12 0x00400073a99f
2024-03-06T21:30:23.481210375Z #13 0x004000793008
2024-03-06T21:30:23.481228542Z #14 0x0040007931d7
2024-03-06T21:30:23.481231959Z #15 0x0040007a5124
2024-03-06T21:30:23.481233584Z #16 0x004002ca1044
2024-03-06T21:30:23.481238459Z

Zip and Upload the error_log/ Folder (Optional, if there are errors)

@Chetan11-dev
Copy link
Contributor

I will be releasing API for Gmaps, so kindly wait till then.

@Chetan11-dev
Copy link
Contributor

Release new version with API Integration, Kindly run command

python -m pip install bota botasaurus_api botasaurus_driver bota botasaurus-proxy-authentication botasaurus_server --upgrade

and then run the below commands
1️⃣ Clone the Magic 🧙‍♀️:

git clone https://github.com/omkarcloud/google-maps-scraper
cd google-maps-scraper

2️⃣ Install Dependencies 📦:

python -m pip install -r requirements.txt && python run.py install

3️⃣ Launch the UI Dashboard 🚀:

python run.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants