You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
url_patterns = ('a[href=https://news.naver.com/main/read.nhn?]',
'a[href^=https://entertain.naver.com/main/read.nhn?]',
'a[href^=https://sports.news.naver.com/sports/index.nhn?]',
'a[href^=https://news.naver.com/sports/index.nhn?]')
...
for pattern in url_patterns:
article_urls = [link['href'] for link in article_blocks.select(pattern)]
urls_in_page.update(article_urls)
article_blocks 라는 변수(bs4.element.Tag 객체)가 pattern 을 인식못하는 현상이 발생했습니다.
그래서 각 패턴 string 중 링크 앞뒤에 " " 를 붙이니까 해결이 되었습니다.
Traceback (most recent call last):
File "/home/simonjisu/code/naver_news_search_scraper/naver_news_search_crawler/search_crawler.py", line 97, in _parse_urls_from_page
article_urls = [link['href'] for link in article_blocks.select(pattern)]
File "/home/simonjisu/miniconda3/envs/venv/lib/python3.6/site-packages/bs4/element.py", line 1376, in select
return soupsieve.select(selector, self, namespaces, limit, **kwargs)
File "/home/simonjisu/miniconda3/envs/venv/lib/python3.6/site-packages/soupsieve/__init__.py", line 108, in select
return compile(select, namespaces, flags).select(tag, limit)
File "/home/simonjisu/miniconda3/envs/venv/lib/python3.6/site-packages/soupsieve/__init__.py", line 59, in compile
return cp._cached_css_compile(pattern, namespaces, flags)
File "/home/simonjisu/miniconda3/envs/venv/lib/python3.6/site-packages/soupsieve/css_parser.py", line 192, in _cached_css_compile
CSSParser(pattern, flags).process_selectors(),
File "/home/simonjisu/miniconda3/envs/venv/lib/python3.6/site-packages/soupsieve/css_parser.py", line 894, in process_selectors
return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
File "/home/simonjisu/miniconda3/envs/venv/lib/python3.6/site-packages/soupsieve/css_parser.py", line 744, in parse_selectors
key, m = next(iselector)
File "/home/simonjisu/miniconda3/envs/venv/lib/python3.6/site-packages/soupsieve/css_parser.py", line 881, in selector_iter
raise SyntaxError(msg)
SyntaxError: Malformed attribute selector at position 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "searching_news_comments.py", line 73, in <module>
main()
File "searching_news_comments.py", line 70, in main
crawler.search(query, bd, ed)
File "/home/simonjisu/code/naver_news_search_scraper/naver_news_search_crawler/search_crawler.py", line 131, in search
scrap_date, verbose=self.verbose, debug=self.debug)
File "/home/simonjisu/code/naver_news_search_scraper/naver_news_search_crawler/search_crawler.py", line 26, in get_article_urls
search_result_url, num_articles, verbose, debug)
File "/home/simonjisu/code/naver_news_search_scraper/naver_news_search_crawler/search_crawler.py", line 70, in _extract_urls_from_search_result
urls_in_page = _parse_urls_from_page(search_result_url, page)
File "/home/simonjisu/code/naver_news_search_scraper/naver_news_search_crawler/search_crawler.py", line 100, in _parse_urls_from_page
raise ValueError('Failed to extract urls from page %s' % str(e))
ValueError: Failed to extract urls from page Malformed attribute selector at position 1
The text was updated successfully, but these errors were encountered:
안녕하세요! 오류가 좀 있어서 해결하다가 이슈를 남기게 되었습니다.
환경:
실행파일
오류 설명
search_crawler.py
파일에서_parse_urls_from_page
함수에서 에러가 났었는데요,article_blocks
라는 변수(bs4.element.Tag
객체)가 pattern 을 인식못하는 현상이 발생했습니다.그래서 각 패턴 string 중 링크 앞뒤에 " " 를 붙이니까 해결이 되었습니다.
('a[href=https://news.naver.com/main/read.nhn?]'
('a[href="https://news.naver.com/main/read.nhn?"]'
PS: 저만 그런지 모르겠지만, 다른 분들도 오류가 나면 참고 부탁드립니다. :)
오류상세
The text was updated successfully, but these errors were encountered: