-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
是否可以为 ASCII2D添加 base_url ? #115
Comments
追加一下请求: |
对了我还是说一下,我提这个 issue 的动机我判断目前 ASCII2D 遇到了 CF 的拦截,但仔细想了想我并没有办法完全确定这一点,以下是我这边遇到的 ASCII2D 的具体报错:
由于特征不明显,我暂时无法确认是否是 CF 的
目前 |
目前确实存在 |
所以因为 cf 的 waf,整个 ascii2d 接口不可用了吗? |
目前还在思考解决方案🤔 |
测试了一下,cf 似乎会检查 tls 指纹,可以考虑使用
ref: How to issue a web request to simulate browser (Namely the TLS handshake / client hello?) |
这个方式还需要引入额外的库和相应的重构,不打算考虑采用。 不过,会触发这个和网络环境有关。 可以接受给所有模块加上 base_url 的方案。 |
有一个小小的规范期望,如果打算使用 base_url 希望最后实现的时候,base_url 能统一为不带路由的二级域名,如 |
这个没问题。 |
试了下,集成 from collections import namedtuple
from types import TracebackType
from typing import Any, Dict, Optional, Type, Union
# from httpx import AsyncClient, QueryParams
from httpx import QueryParams
from curl_cffi.requests import AsyncSession as AsyncClient # 导入 curl_cffi AsyncSession 设别名兼容已有代码
DEFAULT_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/99.0.4844.82 Safari/537.36"
)
}
RESP = namedtuple("RESP", ["text", "url", "status_code"])
class Network:
"""Manages HTTP client for network operations.
Attributes:
internal: Indicates if the object manages its own client lifecycle.
cookies: Dictionary of parsed cookies, provided in string format upon initialization.
client: Instance of an HTTP client.
"""
def __init__(
self,
internal: bool = False,
proxies: Optional[str] = None,
headers: Optional[Dict[str, str]] = None,
cookies: Optional[str] = None,
timeout: float = 30,
verify_ssl: bool = True,
):
"""Initializes Network with configuration for HTTP requests.
Args:
internal: If True, Network manages its own HTTP client lifecycle.
proxies: Proxy settings for the HTTP client.
headers: Custom headers for the HTTP client.
cookies: Cookies in string format for the HTTP client.
timeout: Timeout duration for the HTTP client.
verify_ssl: If True, verifies SSL certificates.
"""
self.internal: bool = internal
headers = {**DEFAULT_HEADERS, **headers} if headers else DEFAULT_HEADERS
self.cookies: Dict[str, str] = {}
if cookies:
for line in cookies.split(";"):
key, value = line.strip().split("=", 1)
self.cookies[key] = value
self.client: AsyncClient = AsyncClient(
headers=headers,
cookies=self.cookies,
verify=verify_ssl,
proxies=proxies,
timeout=timeout,
# follow_redirects=True,
allow_redirects=True, # 修改为 requests 标准
impersonate="chrome120" # 模拟 chrome
)
def start(self) -> AsyncClient:
"""Initializes and returns the HTTP client.
Returns:
AsyncClient: Initialized HTTP client for network operations.
"""
return self.client
async def close(self) -> None:
"""Closes the HTTP client session if managed internally."""
# await self.client.aclose()
await self.client.close() # 修改为 requests 标准
async def __aenter__(self) -> AsyncClient:
"""Async context manager entry for initializing or returning the HTTP client.
Returns:
AsyncClient: The HTTP client instance.
"""
return self.client
async def __aexit__(
self,
exc_type: Optional[Type[BaseException]] = None,
exc_val: Optional[BaseException] = None,
exc_tb: Optional[TracebackType] = None,
) -> None:
"""Async context manager exit for closing the HTTP client if managed internally."""
# await self.client.aclose()
await self.client.close() # 修改为 requests 标准
#之后的代码不用动 但是和我预料的差不多, |
类似于
google可以通过自定义base_url来选择镜像源,希望ascii2d也可以
目的是通过自建ascii2d反代站点到安全环境,来永久避开cf的爬虫检测,一劳永逸
The text was updated successfully, but these errors were encountered: