# BeautifulSoup

In [2]:
from bs4 import BeautifulSoup
import requests

url = "https://www.appinn.com/motick-for-iphone/"

def download(url):
    header = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36"}
    html = requests.get(url,headers=header)
    html.encoding = html.encoding
    return html.text

html = download(url)

soup = BeautifulSoup(html, 'lxml')
body = soup.body

print("  ")
txt_contents = body.select('div.single_post > header > div > span.thecategory > a')

print(txt_contents)

category_list = []

for i in txt_contents:
    i = i.get_text()
    category_list.append(i)
    
    
print(category_list)


ConnectionError: HTTPSConnectionPool(host='www.appinn.com', port=443): Max retries exceeded with url: /motick-for-iphone/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001E79301D9B0>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。'))

## 解析库

| 解析器	| 使用方法	| 优势	| 劣势 |
|--| -- |-- |-- |
| Python标准库 |	BeautifulSoup(markup, "html.parser")	| Python的内置标准库、执行速度适中 、文档容错能力强 | Python 2.7.3 or 3.2.2)前的版本中文容错能力差|
| lxml HTML 解析器	| BeautifulSoup(markup, "lxml")	| 速度快、文档容错能力强 | 需要安装C语言库 |
| lxml XML 解析器	| BeautifulSoup(markup, "xml") | 速度快、唯一支持XML的解析器 | 需要安装C语言库 |
| html5lib	| BeautifulSoup(markup, "html5lib")	 | 最好的容错性、以浏览器的方式解析文档、生成HTML5格式的文档 | 速度慢、不依赖外部扩展 |

## 基本使用

In [1]:
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.prettify())
print(soup.title.string)

<html>
 <head>
  <title>
   The Dormouse's story
  </title>
 </head>
 <body>
  <p class="title" name="dromouse">
   <b>
    The Dormouse's story
   </b>
  </p>
  <p class="story">
   Once upon a time there were three little sisters; and their names were
   <a class="sister" href="http://example.com/elsie" id="link1">
    <!-- Elsie -->
   </a>
   ,
   <a class="sister" href="http://example.com/lacie" id="link2">
    Lacie
   </a>
   and
   <a class="sister" href="http://example.com/tillie" id="link3">
    Tillie
   </a>
   ;
and they lived at the bottom of a well.
  </p>
  <p class="story">
   ...
  </p>
 </body>
</html>
The Dormouse's story


## 标签选择器

### 选择元素

In [3]:
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.title)
print(type(soup.title))
print(soup.head)
print(soup.p)

<title>The Dormouse's story</title>
<class 'bs4.element.Tag'>
<head><title>The Dormouse's story</title></head>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>


### 获取名称

In [3]:
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.title.name)

title


### 获取属性

In [4]:
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.p.attrs['name'])
print(soup.p['name'])

dromouse
dromouse


### 获取内容

In [4]:
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p clss="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.p.string)

The Dormouse's story


### 嵌套选择

In [6]:
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.head.title.string)

The Dormouse's story


### 子节点和子孙节点

In [20]:
html = """
<html>
    <head>
        <title>The Dormouse's story</title>
    </head>
    <body>
    <div>
        <p class="story">
            <a href="http://example.com/elsie" class="sister" id="link1">
                <span>Elsie</span>
            </a>
            <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> 
            <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
        </p>
        <p class="story">...</p>
    </div>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')

for i in soup.div.contents:
    if len(i) < 1:
        pass
    else:
        print(i)
        print("---")



---
<p class="story">
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
</p>
---


---
<p class="story">...</p>
---


---


In [19]:
html = """
<html>
    <head>
        <title>The Dormouse's story</title>
    </head>
    <body>
        <p class="story">
            <a href="http://example.com/elsie" class="sister" id="link1">
                <span>Elsie</span>
            </a>
            <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> 
            <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
        </p>
        <p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.p.children)
for i, child in enumerate(soup.p.children):
    print(i, child)

<list_iterator object at 0x0000019653A559B0>
0 

1 <a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
2 

3 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
4 

5 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
6 



In [9]:
html = """
<html> 
    <head>
        <title>The Dormouse's story</title>
    </head>
    <body>
        <p class="story">
            Once upon a time there were three little sisters; and their names were
            <a href="http://example.com/elsie" class="sister" id="link1">
                <span>Elsie</span>
            </a>
            <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> 
            and
            <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
            and they lived at the bottom of a well.
        </p>
        <p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.p.descendants)
for i, child in enumerate(soup.p.descendants):
    print(i, child)

<generator object descendants at 0x10650e678>
0 
            Once upon a time there were three little sisters; and their names were
            
1 <a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
2 

3 <span>Elsie</span>
4 Elsie
5 

6 

7 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
8 Lacie
9  
            and
            
10 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
11 Tillie
12 
            and they lived at the bottom of a well.
        


### 父节点和祖先节点

In [10]:
html = """
<html>
    <head>
        <title>The Dormouse's story</title>
    </head>
    <body>
        <p class="story">
            Once upon a time there were three little sisters; and their names were
            <a href="http://example.com/elsie" class="sister" id="link1">
                <span>Elsie</span>
            </a>
            <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> 
            and
            <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
            and they lived at the bottom of a well.
        </p>
        <p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.a.parent)

<p class="story">
            Once upon a time there were three little sisters; and their names were
            <a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> 
            and
            <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
            and they lived at the bottom of a well.
        </p>


In [11]:
html = """
<html>
    <head>
        <title>The Dormouse's story</title>
    </head>
    <body>
        <p class="story">
            Once upon a time there were three little sisters; and their names were
            <a href="http://example.com/elsie" class="sister" id="link1">
                <span>Elsie</span>
            </a>
            <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> 
            and
            <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
            and they lived at the bottom of a well.
        </p>
        <p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(list(enumerate(soup.a.parents)))

[(0, <p class="story">
            Once upon a time there were three little sisters; and their names were
            <a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> 
            and
            <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
            and they lived at the bottom of a well.
        </p>), (1, <body>
<p class="story">
            Once upon a time there were three little sisters; and their names were
            <a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> 
            and
            <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
            and they lived at the bottom of a well.
        </p>
<p class="story">...</p>
</body>), (2, <html>
<head>
<title>The Dormouse's story</title>
</head>
<body>
<p c

### 兄弟节点

In [12]:
html = """
<html>
    <head>
        <title>The Dormouse's story</title>
    </head>
    <body>
        <p class="story">
            Once upon a time there were three little sisters; and their names were
            <a href="http://example.com/elsie" class="sister" id="link1">
                <span>Elsie</span>
            </a>
            <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> 
            and
            <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
            and they lived at the bottom of a well.
        </p>
        <p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(list(enumerate(soup.a.next_siblings)))
print(list(enumerate(soup.a.previous_siblings)))

[(0, '\n'), (1, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>), (2, ' \n            and\n            '), (3, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>), (4, '\n            and they lived at the bottom of a well.\n        ')]
[(0, '\n            Once upon a time there were three little sisters; and their names were\n            ')]


## 标准选择器

### find_all( name , attrs , recursive , text , **kwargs )

可根据标签名、属性、内容查找文档

#### name

In [13]:
html='''
<div class="panel">
    <div class="panel-heading">
        <h4>Hello</h4>
    </div>
    <div class="panel-body">
        <ul class="list" id="list-1">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
            <li class="element">Jay</li>
        </ul>
        <ul class="list list-small" id="list-2">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
        </ul>
    </div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.find_all('ul'))
print(type(soup.find_all('ul')[0]))

[<ul class="list" id="list-1">
<li class="element">Foo</li>
<li class="element">Bar</li>
<li class="element">Jay</li>
</ul>, <ul class="list list-small" id="list-2">
<li class="element">Foo</li>
<li class="element">Bar</li>
</ul>]
<class 'bs4.element.Tag'>


In [14]:
html='''
<div class="panel">
    <div class="panel-heading">
        <h4>Hello</h4>
    </div>
    <div class="panel-body">
        <ul class="list" id="list-1">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
            <li class="element">Jay</li>
        </ul>
        <ul class="list list-small" id="list-2">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
        </ul>
    </div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for ul in soup.find_all('ul'):
    print(ul.find_all('li'))

[<li class="element">Foo</li>, <li class="element">Bar</li>, <li class="element">Jay</li>]
[<li class="element">Foo</li>, <li class="element">Bar</li>]


#### attrs

In [15]:
html='''
<div class="panel">
    <div class="panel-heading">
        <h4>Hello</h4>
    </div>
    <div class="panel-body">
        <ul class="list" id="list-1" name="elements">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
            <li class="element">Jay</li>
        </ul>
        <ul class="list list-small" id="list-2">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
        </ul>
    </div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.find_all(attrs={'id': 'list-1'}))
print(soup.find_all(attrs={'name': 'elements'}))

[<ul class="list" id="list-1" name="elements">
<li class="element">Foo</li>
<li class="element">Bar</li>
<li class="element">Jay</li>
</ul>]
[<ul class="list" id="list-1" name="elements">
<li class="element">Foo</li>
<li class="element">Bar</li>
<li class="element">Jay</li>
</ul>]


In [16]:
html='''
<div class="panel">
    <div class="panel-heading">
        <h4>Hello</h4>
    </div>
    <div class="panel-body">
        <ul class="list" id="list-1">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
            <li class="element">Jay</li>
        </ul>
        <ul class="list list-small" id="list-2">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
        </ul>
    </div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.find_all(id='list-1'))
print(soup.find_all(class_='element'))

[<ul class="list" id="list-1">
<li class="element">Foo</li>
<li class="element">Bar</li>
<li class="element">Jay</li>
</ul>]
[<li class="element">Foo</li>, <li class="element">Bar</li>, <li class="element">Jay</li>, <li class="element">Foo</li>, <li class="element">Bar</li>]


#### text

In [17]:
html='''
<div class="panel">
    <div class="panel-heading">
        <h4>Hello</h4>
    </div>
    <div class="panel-body">
        <ul class="list" id="list-1">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
            <li class="element">Jay</li>
        </ul>
        <ul class="list list-small" id="list-2">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
        </ul>
    </div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.find_all(text='Foo'))

['Foo', 'Foo']


### find( name , attrs , recursive , text , **kwargs )

find返回单个元素，find_all返回所有元素

In [18]:
html='''
<div class="panel">
    <div class="panel-heading">
        <h4>Hello</h4>
    </div>
    <div class="panel-body">
        <ul class="list" id="list-1">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
            <li class="element">Jay</li>
        </ul>
        <ul class="list list-small" id="list-2">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
        </ul>
    </div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.find('ul'))
print(type(soup.find('ul')))
print(soup.find('page'))

<ul class="list" id="list-1">
<li class="element">Foo</li>
<li class="element">Bar</li>
<li class="element">Jay</li>
</ul>
<class 'bs4.element.Tag'>
None


### find_parents()  find_parent()

find_parents()返回所有祖先节点，find_parent()返回直接父节点。

### find_next_siblings()  find_next_sibling()

find_next_siblings()返回后面所有兄弟节点，find_next_sibling()返回后面第一个兄弟节点。

### find_previous_siblings()  find_previous_sibling()

find_previous_siblings()返回前面所有兄弟节点，find_previous_sibling()返回前面第一个兄弟节点。

### find_all_next()  find_next()

find_all_next()返回节点后所有符合条件的节点, find_next()返回第一个符合条件的节点

### find_all_previous() 和 find_previous()

find_all_previous()返回节点后所有符合条件的节点, find_previous()返回第一个符合条件的节点

## CSS选择器

通过select()直接传入CSS选择器即可完成选择

In [5]:
html='''
<div class="panel">
    <div class="panel-heading">
        <h4>Hello</h4>
    </div>
    <div class="panel-body">
        <ul class="list" id="list-1">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
            <li class="element">Jay</li>
        </ul>
        <ul class="list list-small" id="list-2">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
        </ul>
    </div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
print(soup.select('.panel .panel-heading'))
print(soup.select('ul li'))
print(soup.select('#list-2 .element'))
print(type(soup.select('ul')[0]))

[<div class="panel-heading">
<h4>Hello</h4>
</div>]
[<li class="element">Foo</li>, <li class="element">Bar</li>, <li class="element">Jay</li>, <li class="element">Foo</li>, <li class="element">Bar</li>]
[<li class="element">Foo</li>, <li class="element">Bar</li>]
<class 'bs4.element.Tag'>


In [20]:
html='''
<div class="panel">
    <div class="panel-heading">
        <h4>Hello</h4>
    </div>
    <div class="panel-body">
        <ul class="list" id="list-1">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
            <li class="element">Jay</li>
        </ul>
        <ul class="list list-small" id="list-2">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
        </ul>
    </div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for ul in soup.select('ul'):
    print(ul.select('li'))

[<li class="element">Foo</li>, <li class="element">Bar</li>, <li class="element">Jay</li>]
[<li class="element">Foo</li>, <li class="element">Bar</li>]


### 获取属性

In [21]:
html='''
<div class="panel">
    <div class="panel-heading">
        <h4>Hello</h4>
    </div>
    <div class="panel-body">
        <ul class="list" id="list-1">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
            <li class="element">Jay</li>
        </ul>
        <ul class="list list-small" id="list-2">
            <li class="element">Foo</li>
            <li class="element">Bar</li>
        </ul>
    </div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for ul in soup.select('ul'):
    print(ul['id'])
    print(ul.attrs['id'])

list-1
list-1
list-2
list-2


### 获取内容

In [23]:
html='''
<div id="daily-cont" class="cont">
<p style="text-align: center;"><strong>看动漫不过瘾？去浅草神社当巫女</strong></p><p><div class="detail-img-wp"><div class="detail-img-in"><img src="https://iknow-pic.cdn.bcebos.com/5fdf8db1cb13495473facb19464e9258d0094ae2?x-bce-process=image/resize,m_lfit,w_450,h_600,limit_1"></div></div></p><p>在日本生活的6年中，有5年的1月份我都在浅草神社做助勤巫女。其实这更像一份打工，一天能赚到8000日元，在打工里算是很不错的，而且还能接触到各式各样有趣的人。</p><p style="; ; font-size: 18px;">好想好想当巫女</p><p>坦白说，我想当巫女，只是因为巫女的衣服很好看。</p><p>相信大部分90后，都是看着全国各大卫视引进的日本动漫长大的。如果你看过《美少女战士》《犬夜叉》《神无月的巫女》《魔卡少女樱》，以及前几年大热的动画电影《你的名字。》，那一定对其中的巫女一职并不陌生。</p><p>巫女是动漫里降妖除魔的神职人员，一定是黑色头发，身着白色上衣和红色裙裤。不论色彩搭配，还是职业赋予的特殊性，巫女这个职业怎么看都充满了萌点，因此也是不少漫展里争相cos的角色。</p><p>2011年12月，知名旅日作家毛丹青老师，在微博上记录了如何带自己的学生去神社体验巫女生活，最后还被包括NHK等媒体报道。那时我才知道，原来在日本，外国人也是可以当巫女的，遂在心里种下了这颗种子。2012年4月，当我背着行囊前往东京留学时，最关心的不是如何学好日语、申请哪所学校，而是寻找能当上巫女的机会。</p><p>不仅在动漫世界，三次元世界里巫女一样很受欢迎。那些介绍巫女打工的网站，解释了巫女的工作性质，基本以日常清扫、神社运营和贩卖“御守”为主。而巫女也是作为一种普通的职业存在，并没什么特殊性。但巫女一职仅面向年轻未婚女性，一旦结婚生子，就不可能再继续做下去。神社不仅招收全职巫女，某些知名度高的神社在每年12月，还会招收临时工巫女填补人手不足。</p><p>而我毕竟是来留学的，做全职巫女不太可能，考虑做个兼职巫女也不错。2012年10月開始，我就操着不流利的日语，拜访了居所附近大大小小十几间神社，询问他们是否需要巫女，免费帮忙也行，均被无情拒绝。后来遇到好心的神官解释，我才明白不是所有神社都需要巫女。有些住宅区里的小型神社，基本没什么香火，神主只需要维持最低限度的运营。</p><p><div class="detail-img-wp"><div class="detail-img-in"><img src="https://iknow-pic.cdn.bcebos.com/37d12f2eb9389b50111bc32f9535e5dde6116ee2?x-bce-process=image/resize,m_lfit,w_450,h_600,limit_1"></div></div>日本漫画《犬夜叉》中的桔梗，是战国时代的巫女</p><p>巫女一职仅面向年轻未婚女性，一旦结婚生子，就不可能再继续做下去。</p><p>后来我转变方法，给东京比较有名的几间神社打电话。也许是找对了方法，问到第二家浅草神社，对方告诉我12月招募的时候可以来递交履历试试。</p><p>“三宗教大法要演唱会”</p><p>到日本头两年，因为日语不够好，我被浅草神社分配在约400年历史的正殿里接待参拜的人，因为在那里不用说话，只负责给大家倒“御神酒”。</p><p>在正殿里，我见到了政治家、演艺明星，甚至日本黑帮的组长。不同身份的人，祈求的愿望也不同。空闲时，我躲在正殿旁边的小房子里做手工。每间神社的“御守”都不同，而这些“御守”都是半成品，需要神社的工作人员人工完成组装。</p><p>后来日语熟练了，我就被分配到外面贩卖“御守”。这几年来日旅游的外国人越来越多，尤其是中国人。我的多元身份就派上了用场，渐渐就成了浅草神社的半个对外讲解员。</p><p>如果你拉住一个前来参拜的日本人询问他的宗教信仰，多半会听到基督教、天主教、佛教、伊斯兰教，但不会有人觉得自己信神道教。神社在日本只是传统文化的象征。神社的神职人员会过圣诞节，会在卡拉OK里大唱圣诞颂歌。</p><p>1月1日新年元旦，是日本最重要的节日，而日本人的新年都是从“初诣”开始的。公历最后一天的晚上，与家人或友人三五成群前往附近的寺庙或神社，在夜市买些零食、玩具，排着大长队去许新年的第一个愿、抽第一支签，是日本人过新年最重要的仪式感。</p><p>在5天法定假期里，前往浅草寺“初诣”的游客多达300万，这也带动了这一地区的商业发展。届时来帮忙的人不仅是我这样的“临时巫女”，还会有从别的神社抽调去帮忙的人。</p><p>第一次见到壮紫的时候，是在正月助勤动员会上。动员会那天所有人正襟危坐，听从工作分配，他却穿得像个视觉系摇滚歌手，在会议室里格外显眼。</p><p>熟识后壮紫告诉我，他家在蜡笔小新的故乡春日部市。尽管大部分时间他打扮得像个新宿牛郎，但他是个实实在在的神主，而且他名下有17间神社。日本神道教大多内部通婚，所以说白了，壮紫全家都是神官，算是个妥妥的宗教N代。不过他家的神社，大都类似于上文提及的小神社，在荒山僻野，没什么人气，所以他才有大把时间搞副业，开地下演唱会。</p><p><div class="detail-img-wp"><div class="detail-img-in"><img src="https://iknow-pic.cdn.bcebos.com/55e736d12f2eb938c689598fc5628535e4dd6fe2?x-bce-process=image/resize,m_lfit,w_450,h_600,limit_1"></div></div>矢野幸士利用浅草神社在台东区的地域优势，举办了许多活动，使得神社越来越具知名度</p><p>我也应邀去听过几次壮紫的演唱会，大都是在小酒吧，叫些熟识的朋友来听，唱些情歌、热门流行音乐。后来不知怎的，这哥们儿终于活络了起来，知道自己在视觉系和情歌方面实在没什么发展前途，转而搞起了跨界。而他最大的卖点，便是号称“歌唱的神主”。</p><p>尤其是2016年《你的名字。》大火，壮紫搭着神社的卖点，开始接受访谈、上节目，最后终于出了专辑。2019年，他竟然还拉来了基督教的牧师与佛教的和尚友人，一起搞了个“三宗教大法要演唱会”。这种不正经又跨界的组合，竟也在东京地下乐团里闯出了一席之地。</p><p>浅草神社的“秘密”</p><p>矢野幸士是浅草神社的神主，40多岁，个子不高，是一位性格开朗的大叔。他这种跟任何人都能侃侃而谈的性格，在日本并不多见。在浅草观光联盟里，他被大家称为“三社祭之王”。他告诉我，其实他原姓高桥，是矢野家的养子。</p><p>在日本，无论神社还是寺庙，都是世袭的。日本人认为，运营神社的家族，有义务世世代代供奉神明，所以必须结婚生子，让家族一直保持后继有人。所以，当年矢野家后继无人，便选中了远亲高桥家的儿子幸士。</p><p>少年时期的幸士对神社毫无兴趣，他说他喜欢看《三国志》，欣赏刘备，梦想是去丰田工作，高中毕业就考去了丰田技术学校。</p><p>矢野家第一次询问幸士是否愿意继承神社，是在他上初中的时候，那时他直接拒绝了，毕竟对一个少年来说，继承神社一点也不酷;第二次是在他高中毕业之际，他已经拿到了丰田学校的录取通知书;第三次是在他于丰田汽车电子研究所工作的时候。幸士大叔说，这让他想到了刘备三顾茅庐请诸葛亮出山。矢野家三次邀约足以表明诚意，幸士大叔便答应了下来。</p><p>但成为神社继承人，并不是过继为养子那么简单。在一个各行各业都要考执照的国家，成为神官自然也有相对应的考试。而要成为神主，必须上相应的神教大学。幸士大叔不得不重新参加高考，考上日本仅有的两所教授神道的大学，才有资格拿到“明阶”的资格（在日本，神官分为五级，自上而下分别是：净阶、明阶、正阶、权正阶、直阶）。</p><p>去过东京旅游的人，都听过浅草寺的大名。台东区有许多观光景点，最有名的建筑是天空树;最有名的风景是夏日隅田川的花火大会;最有江户风情的是仲见世商店街;而最具代表性的是浅草寺的正门—“雷门”。</p><p>浅草神社紧邻浅草寺，贴着这样一座全国最具盛名的寺院，浅草神社的存在很是尴尬。即便是台东区当地的居民也鲜有人知。浅草寺里供奉的那座举国闻名的观音像，是浅草神社里的三位神明、曾经的渔夫偶然在隅田川里打捞上来的。而浅草寺，便是这三人为观音像修建的。</p><p>1868年日本全国的“神佛分离”运动，让浅草寺和浅草神社分离开来。直到1873年，浅草神社才改名，独立成神社。某种意义上说，没有浅草神社，就没有浅草寺。</p><p>1868年日本全国的“神佛分离”运动，让浅草寺和浅草神社分离开来。</p><p>浅草神社的正殿，是江户时代第三代幕府将军德川家光指名于1649年修建的。在大都是木制建筑的日本，打仗基本靠火烧。近400年的动荡历史里，浅草神社的正殿躲过了火灾，躲过了东京大轰炸，也躲过了3·11地震，至今还屹立在那里，是真正的历史文物。</p><p><div class="detail-img-wp"><div class="detail-img-in"><img src="https://iknow-pic.cdn.bcebos.com/b8389b504fc2d562e5ecf390f71190ef77c66ce2?x-bce-process=image/resize,m_lfit,w_450,h_600,limit_1"></div></div>浅草寺的正门—“雷门”</p><p>幸士大叔接手后，于1996年修葺了它，然后对外开放，迎接愿意在这里举办婚礼的新人。从一开始的无人问津，到现在每年平均举办300多场婚礼，这里成为东京的人气婚礼举办场所。</p><p>在此之前，浅草神社在东京的知名度，都是围绕着东京规模最大的“浅草三社祭”存在的。大部分听说过“三社祭”的日本人，心中都有一个疑问：“三社”有浅草寺、浅草神社，那另一个“社”是谁？实际上，“三社祭”的“三社”，指的是浅草神社里供奉的、发现观音像的三位“神”。</p><p>幸士大叔认为，现代社会里的神社，更大的责任是弘扬日本传统文化。他利用台东区的地域优势，在神社举办了许多活动：每月首日的“社子屋”，免费教市民制作一些传统日本手工艺品;每周四针对小孩子举行巫女舞教室;每月两次的歌舞伎教室;还有日本书法、川柳教室、和服教室等。这都让浅草神社从一个没什么存在感的神社变得越来越具知名度。</p><p>2016年，幸士大叔借由自己的影响力，在日本神教界发起了“夏诣”这个新节日，意在呼应“初诣”，在年中感谢神明前半年的庇佑，期待后半年继续平顺安康。“夏诣”结合了中国牛郎织女的故事，在每年7月7日举行，号召女性穿着浴衣过节。搭配日本特有的夏日花火大会，想必未来更是一道日本传统文化的风景。</p><p>也是这样一个力图弘扬日本傳统文化的神主，觉得我这个外国人也可以当巫女，然后我就成为了浅草神社历史上的第一个外国巫女。</p><p><span style="text-indent: 2em;">作者：王纯</span><br></p><p>来源：《看世界》</p><p><br></p>
<div class="detail_statement article-source">
<p class="tit clearfix"><span class="left"></span><span class="center">特别声明</span><span class="right"></span></p>
<p class="cont">
本文为自媒体、作者等在百度知道日报上传并发布，仅代表作者观点，不代表百度知道日报的观点或立场，知道日报仅提供信息发布平台。合作及供稿请联系zdribao@baidu.com。</p>
</div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for li in soup.select('span'):
    print(li.get_text())

Foo
Bar
Jay
Foo
Bar


## 总结

* 推荐使用lxml解析库，必要时使用html.parser
* 标签选择筛选功能弱但是速度快
* 建议使用find()、find_all() 查询匹配单个结果或者多个结果
* 如果对CSS选择器熟悉建议使用select()
* 记住常用的获取属性和文本值的方法