# 搜索与排名
这一章主要介绍一个搜索引擎的构造过程，我们利用它来对一组文档建立索引(index),并提供一些进一步的改进建议。

主要内容：
- 检索网页（crawl）
- 建立索引
- 对网页进行搜索
- 对搜索的结果进行排名

## 搜索引擎的组成
简历搜索引擎有四个步骤：
- 首先，找到一种搜集文档的办法。可以是用爬虫（scrapy等）抓取网页，也可以在固定数量的文档范围（某公司内部的网络）中进行搜集。
- 其次，我们要建立索引。通常建立一张很大的表，表包含了文档以及不同单词的位置信息。
- 第三步，我们需要查询返回一个经过排序的文档列表。这其中涉及到了度量方法，比如page-rank，tfidf。
- 第四步（可选），建立一个神经网络，对查询结果进行排名。通过了解人们在搜索结果以后都点击了哪些链接，神经网络会将搜索过程与搜索结果关联起来。利用这一信息改变搜索结果的排列顺序。

新建一个crawler类，其作用是检索网页和创建数据库

In [1]:
from urllib.parse import urljoin
from bs4 import BeautifulSoup
import requests

class crawler:
    
    # 初始化crawler类并传入数据库名称
    def __init__(self,dbname):
        pass
    def __del__(self):
        pass
    def dbcommit(self):
        pass
    
    # 辅助函数，用于获取条目的id，并且如果条目不存在，就将其加入数据库中
    def getentryid(self,table,field,value,createnew=True):
        return None
    
    # 为每个网页建立索引
    def addtoindex(self,url,soup):
        print('Indexing %s'%url)
        
    # 从一个HTML的网页中提取文字（不带标签的）
    def gettextonly(self,soup):
        return None
    
    # 根据任何非空白符进行分词处理
    def separatewords(self,text):
        return None
    
    # 如果url已经建立过索引，则返回ture
    def isindexed(self,url):
        return False
    
    # 添加一个关联两个网页的链接
    def addlinkref(self,urlForm,urlTo,linkText):
        pass
    
    # 从一小组网页开始进行广度优先搜索，直至某一给定深度，期间为网页建立索引
    def crawl(self,pages,depth=2):
        pass
    
    # 创建数据库表
    def createindextables(self):
        pass

在crawler类中新增加一个crawl函数，该函数循环遍历网页列表，并针对每个网页调用addtoindex函数。

随后，该函数利用Beautiful Soup取到网页中的所有链接，并将这些链接加入到一个名为newpages的集合中。

循环结束之前，我们将newpages赋给pages

In [2]:
class crawler:
    # 初始化crawler类并传入数据库名称
    def __init__(self,dbname):
        pass
    def __del__(self):
        pass
    def dbcommit(self):
        pass
    
    # 辅助函数，用于获取条目的id，并且如果条目不存在，就将其加入数据库中
    def getentryid(self,table,field,value,createnew=True):
        return None
    
    # 为每个网页建立索引
    def addtoindex(self,url,soup):
        print('Indexing %s'%url)
        
    # 从一个HTML的网页中提取文字（不带标签的）
    def gettextonly(self,soup):
        return None
    
    # 根据任何非空白符进行分词处理
    def separatewords(self,text):
        return None
    
    # 如果url已经建立过索引，则返回ture
    def isindexed(self,url):
        return False
    
    # 添加一个关联两个网页的链接
    def addlinkref(self,urlForm,urlTo,linkText):
        pass
    
    # 从一小组网页开始进行广度优先搜索，直至某一给定深度，期间为网页建立索引
    def crawl(self,pages,depth=2):
        pass
    
    # 创建数据库表
    def createindextables(self):
        pass

    def crawl(self,pages,depth=2):
        headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) '
                      'AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 52.0.2743.116Safari / 537.36'
        }
        for i in range(depth):
            newpages=set()
            for page in pages:
                try:
                    c=requests.get(page,headers)
                except:
                    print("could not open %s"%page)
                    continue
                plain_text=c.text
                soup=BeautifulSoup(plain_text,'html.parser')
                self.addtoindex(page,soup)

                links=soup('a')
                # print(links)
                for link in links:
                    # print(link.href)
                    if ('href' in link.attrs):
                        url=urljoin(page,link['href'])
                        if url.find("'")!=-1:
                            continue
                        url=url.split('#')[0] # 去掉位置部分
                        if (url[0:4]=='http' or url[0:4]=='https') and not self.isindexed(url):
                            newpages.add(url)
                        linkText=self.gettextonly(link)
                        self.addlinkref(page,url,linkText)

                self.dbcommit()
            pages=newpages

用豆瓣图书测试一下，的确是好用的

In [3]:
pagelist=['https://book.douban.com/'] #豆瓣读书
crawler=crawler('1')
crawler.crawl(pagelist)

Indexing https://book.douban.com/
Indexing https://read.douban.com/ebook/5083852/?dcs=book-hot&dcm=douban&dct=read-subject
Indexing https://book.douban.com
Indexing https://book.douban.com/subject/27046739/?icn=index-latestbook-subject
Indexing https://book.douban.com/link2/?pre=0&vendor=dangdang&srcpage=bestseller&price=1980&pos=1&url=http%3A%2F%2Funion.dangdang.com%2Ftransfer.php%3Ffrom%3DP-306226-0-s26647769%26backurl%3Dhttp%3A%2F%2Fproduct.dangdang.com%2Fproduct.aspx%3Fproduct_id%3D23761145&srcsubj=&type=bkbuy&subject=26647769
Indexing https://www.douban.com
Indexing https://book.douban.com/tag/?view=type&icn=index-sorttags-all
Indexing https://read.douban.com/ebooks?dcs=book-intro&dcm=douban
Indexing https://movie.douban.com
Indexing https://moment.douban.com
Indexing https://book.douban.com/tag/散文
Indexing https://book.douban.com/review/8583202/
Indexing https://book.douban.com/review/best/
Indexing https://book.douban.com/tag/言情
Indexing https://book.douban.com/subject/1082154/?

## 建立索引
我们需要为全文索引建立数据库。索引对应于一个列表，其中包含了所有不同的单词、这些单词所在的文档，以及单词在文档中出现的位置。

### 建立数据库Schema
在crawler类的末尾添加createindextables()函数

In [4]:
import sqlite3

In [5]:
class crawler:
    # 初始化crawler类并传入数据库名称
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
        
    def __del__(self):
        self.con.close()
        
    def dbcommit(self):
        self.con.commit()
    
    # 辅助函数，用于获取条目的id，并且如果条目不存在，就将其加入数据库中
    def getentryid(self,table,field,value,createnew=True):
        return None
    
    # 为每个网页建立索引
    def addtoindex(self,url,soup):
        print('Indexing %s'%url)
        
    # 从一个HTML的网页中提取文字（不带标签的）
    def gettextonly(self,soup):
        return None
    
    # 根据任何非空白符进行分词处理
    def separatewords(self,text):
        return None
    
    # 如果url已经建立过索引，则返回ture
    def isindexed(self,url):
        return False
    
    # 添加一个关联两个网页的链接
    def addlinkref(self,urlForm,urlTo,linkText):
        pass
    
    # 从一小组网页开始进行广度优先搜索，直至某一给定深度，期间为网页建立索引
    def crawl(self,pages,depth=2):
        pass
    
    # 创建数据库表
    def createindextables(self):
        pass

    def crawl(self,pages,depth=2):
        headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) '
                      'AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 52.0.2743.116Safari / 537.36'
        }
        for i in range(depth):
            newpages=set()
            for page in pages:
                try:
                    c=requests.get(page,headers)
                except:
                    print("could not open %s"%page)
                    continue
                plain_text=c.text
                soup=BeautifulSoup(plain_text,'html.parser')
                self.addtoindex(page,soup)

                links=soup('a')
                # print(links)
                for link in links:
                    # print(link.href)
                    if ('href' in link.attrs):
                        url=urljoin(page,link['href'])
                        if url.find("'")!=-1:
                            continue
                        url=url.split('#')[0] # 去掉位置部分
                        if (url[0:4]=='http' or url[0:4]=='https') and not self.isindexed(url):
                            newpages.add(url)
                        linkText=self.gettextonly(link)
                        self.addlinkref(page,url,linkText)

                self.dbcommit()
            pages=newpages
    
    def createindextables(self):
        self.con.execute('create table urllist(url)')  # urlist保存已经过索引的URL列表
        self.con.execute('create table wordlist(word)') # wordlist保存单词列表
        self.con.execute('create table wordlocation(urlid,wordid,location)') # wordlocation保存单词在文档中所处的位置的列表
        self.con.execute('create table link(fromid integer,toid integer)') # 保存两个URL ID，指明一张表到另外一张表的链接关系
        self.con.execute('create table linkwords(wordid,linkid)') # 记录了哪些单词与链接实际相关
        self.con.execute('create index wordidx on wordlist(word)')
        self.con.execute('create index urlidx on urllist(url)')
        self.con.execute('create index wordurlidx on wordlocation(wordid)')
        self.con.execute('create index urltoidx on link(toid)')
        self.con.execute('create index urlfromidx on link(fromid)')
        self.dbcommit()
        

In [6]:
crawler=crawler('searchindex.db')
crawler.createindextables()

### 在网页中查找单词
网上下载的文件都是HTML格式的，其中包含大量的标签、属性，以及其它不在索引范围内的信息。

我们首先要从网页中提取出所有的文字部分。

完善gettextonly函数

In [7]:
class crawler:
    # 初始化crawler类并传入数据库名称
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
        
    def __del__(self):
        self.con.close()
        
    def dbcommit(self):
        self.con.commit()
    
    # 辅助函数，用于获取条目的id，并且如果条目不存在，就将其加入数据库中
    def getentryid(self,table,field,value,createnew=True):
        return None
    
    # 为每个网页建立索引
    def addtoindex(self,url,soup):
        print('Indexing %s'%url)
        
    # 从一个HTML的网页中提取文字（不带标签的）
    def gettextonly(self,soup):
        v=soup.string
        if v==None:
            c=soup.contents
            resulttext=''
            for t in c:
                subtext=self.gettextonly(t)
                resulttext+=subtext+'\n'
            return resulttext
        else:
            return v.strip()
    
    # 根据任何非空白符进行分词处理
    def separatewords(self,text):
        return None
    
    # 如果url已经建立过索引，则返回ture
    def isindexed(self,url):
        return False
    
    # 添加一个关联两个网页的链接
    def addlinkref(self,urlForm,urlTo,linkText):
        pass
    
    # 从一小组网页开始进行广度优先搜索，直至某一给定深度，期间为网页建立索引
    def crawl(self,pages,depth=2):
        headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) '
                      'AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 52.0.2743.116Safari / 537.36'
        }
        for i in range(depth):
            newpages=set()
            for page in pages:
                try:
                    c=requests.get(page,headers)
                except:
                    print("could not open %s"%page)
                    continue
                plain_text=c.text
                soup=BeautifulSoup(plain_text,'html.parser')
                self.addtoindex(page,soup)

                links=soup('a')
                # print(links)
                for link in links:
                    # print(link.href)
                    if ('href' in link.attrs):
                        url=urljoin(page,link['href'])
                        if url.find("'")!=-1:
                            continue
                        url=url.split('#')[0] # 去掉位置部分
                        if (url[0:4]=='http' or url[0:4]=='https') and not self.isindexed(url):
                            newpages.add(url)
                        linkText=self.gettextonly(link)
                        self.addlinkref(page,url,linkText)

                self.dbcommit()
            pages=newpages
    
    # 创建数据库
    def createindextables(self):
        self.con.execute('create table urllist(url)')  # urlist保存已经过索引的URL列表
        self.con.execute('create table wordlist(word)') # wordlist保存单词列表
        self.con.execute('create table wordlocation(urlid,wordid,location)') # wordlocation保存单词在文档中所处的位置的列表
        self.con.execute('create table link(fromid integer,toid integer)') # 保存两个URL ID，指明一张表到另外一张表的链接关系
        self.con.execute('create table linkwords(wordid,linkid)') # 记录了哪些单词与链接实际相关
        self.con.execute('create index wordidx on wordlist(word)')
        self.con.execute('create index urlidx on urllist(url)')
        self.con.execute('create index wordurlidx on wordlocation(wordid)')
        self.con.execute('create index urltoidx on link(toid)')
        self.con.execute('create index urlfromidx on link(fromid)')
        self.dbcommit()

接下来是separatewords函数，该函数将字符串拆分成一组独立的单词，以便我们将其加入索引中

思想是将非字母或者非数字的字符作为分割符

中文中这样肯定是行不通的，这里我使用了[jieba](https://github.com/fxsjy/jieba)对中文文本进行分词

In [8]:
import jieba
import re

class crawler:
    # 初始化crawler类并传入数据库名称
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
        
    def __del__(self):
        self.con.close()
        
    def dbcommit(self):
        self.con.commit()
    
    # 辅助函数，用于获取条目的id，并且如果条目不存在，就将其加入数据库中
    def getentryid(self,table,field,value,createnew=True):
        return None
    
    # 为每个网页建立索引
    def addtoindex(self,url,soup):
        print('Indexing %s'%url)
        
    # 从一个HTML的网页中提取文字（不带标签的）
    def gettextonly(self,soup):
        v=soup.string
        if v==None:
            c=soup.contents
            resulttext=''
            for t in c:
                subtext=self.gettextonly(t)
                resulttext+=subtext+'\n'
            return resulttext
        else:
            return v.strip()
    
    # 根据任何非空白符进行分词处理
    def separatewords(self,text):
        splitter=re.compile('\\W*')
        return [s.lower() for s in splitter.split(text) if s!='']
    
    # 中文的分词,使用jieba
    def separatewords_cn():
        seg_list = jieba.cut_for_search(text)  # 搜索引擎模式
        return [word for word in seg_list if len(word)>1]
    
    # 如果url已经建立过索引，则返回ture
    def isindexed(self,url):
        return False
    
    # 添加一个关联两个网页的链接
    def addlinkref(self,urlForm,urlTo,linkText):
        pass
    
    # 从一小组网页开始进行广度优先搜索，直至某一给定深度，期间为网页建立索引
    def crawl(self,pages,depth=2):
        headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) '
                      'AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 52.0.2743.116Safari / 537.36'
        }
        for i in range(depth):
            newpages=set()
            for page in pages:
                try:
                    c=requests.get(page,headers)
                except:
                    print("could not open %s"%page)
                    continue
                plain_text=c.text
                soup=BeautifulSoup(plain_text,'html.parser')
                self.addtoindex(page,soup)

                links=soup('a')
                # print(links)
                for link in links:
                    # print(link.href)
                    if ('href' in link.attrs):
                        url=urljoin(page,link['href'])
                        if url.find("'")!=-1:
                            continue
                        url=url.split('#')[0] # 去掉位置部分
                        if (url[0:4]=='http' or url[0:4]=='https') and not self.isindexed(url):
                            newpages.add(url)
                        linkText=self.gettextonly(link)
                        self.addlinkref(page,url,linkText)
                self.dbcommit()
            pages=newpages
    
    # 创建数据库
    def createindextables(self):
        self.con.execute('create table urllist(url)')  # urlist保存已经过索引的URL列表
        self.con.execute('create table wordlist(word)') # wordlist保存单词列表
        self.con.execute('create table wordlocation(urlid,wordid,location)') # wordlocation保存单词在文档中所处的位置的列表
        self.con.execute('create table link(fromid integer,toid integer)') # 保存两个URL ID，指明一张表到另外一张表的链接关系
        self.con.execute('create table linkwords(wordid,linkid)') # 记录了哪些单词与链接实际相关
        self.con.execute('create index wordidx on wordlist(word)')
        self.con.execute('create index urlidx on urllist(url)')
        self.con.execute('create index wordurlidx on wordlocation(wordid)')
        self.con.execute('create index urltoidx on link(toid)')
        self.con.execute('create index urlfromidx on link(fromid)')
        self.dbcommit()

### 加入索引
这里我们更新前面未实现的三个方法
- addtoindex 该方法通过调用函数得到一个出现于网页中的单词的列表，然后，它会将网页以及所有单词加入索引，在网页和单词之间建立联系，并保存单词在文档中的位置
- getentryid 该函数的作用是返回某一条目的ID。如果条目不存在，则程序会在数据库中新建一条记录，并将ID返回
- isindexed 该函数判断网页是否已经存入数据库，如果存在，则判断是否有任何单词与之关联

In [5]:
from urllib.parse import urljoin
from bs4 import BeautifulSoup
import jieba
import re
import sqlite3
import requests

class crawler:
    # 初始化crawler类并传入数据库名称
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
        
    def __del__(self):
        self.con.close()
        
    def dbcommit(self):
        self.con.commit()
    
    # 辅助函数，用于获取条目的id，并且如果条目不存在，就将其加入数据库中
    def getentryid(self,table,field,value,createnew=True):
        cur=self.con.execute("select rowid from %s where %s='%s'"%(table,field,value))
        res=cur.fetchone()
        if res==None:
            cur=self.con.execute("insert into %s (%s) values ('%s')"%(table,field,value))
            return cur.lastrowid
        else:
            return res[0]
    
    # 为每个网页建立索引，英文
    def addtoindex(self,url,soup):
        if self.isindexed(url):
            return
        print('Indexing'+url)
        
        # 获取每个单词
        text=self.gettextonly(soup)
        words=self.separatewords(text)
        
        # 得到URL的id
        urlid=self.getentryid('urllist','url',url)
        
        # 将每个单词与该url关联
        for i in range(len(words)):
            word=words[1]
            if word in ignorewords:
                continue
            wordid=self.getentryid('wordlist','word',word)
            self.con.execute("Insert into wordlocation(urlid,wordid,location) values(%d,%d,%d)"%(urlid,wordid,i))
    
    # 为每个网页建立索引，中文
    def addtoindex_cn(self,url,soup):
        if self.isindexed(url):
            return
        print('Indexing'+url)
        
        # 获取每个单词
        text=self.gettextonly(soup)
        words=self.separatewords_cn(text)
        
        # 得到URL的id
        urlid=self.getentryid('urllist','url',url)
        
        # 将每个单词与该url关联
        for i in range(len(words)):
            word=words[1]
            wordid=self.getentryid('wordlist','word',word)
            self.con.execute("Insert into wordlocation(urlid,wordid,location) values(%d,%d,%d)"%(urlid,wordid,i))
        
    # 从一个HTML的网页中提取文字（不带标签的）
    def gettextonly(self,soup):
        v=soup.string
        if v==None:
            c=soup.contents
            resulttext=''
            for t in c:
                subtext=self.gettextonly(t)
                resulttext+=subtext+'\n'
            return resulttext
        else:
            return v.strip()
    
    # 根据任何非空白符进行分词处理
    def separatewords(self,text):
        splitter=re.compile('\\W*')
        return [s.lower() for s in splitter.split(text) if s!='']
    
    # 中文的分词,使用jieba
    def separatewords_cn(self,text):
        seg_list = jieba.cut_for_search(text)  # 搜索引擎模式
        return [word for word in seg_list if len(word)>1]
    
    # 如果url已经建立过索引，则返回ture
    def isindexed(self,url):
        u=self.con.execute("select rowid from urllist where url='%s'"%url).fetchone()
        if u!=None:
            # 检查它是否已经被检索过了
            v=self.con.execute('select * from wordlocation where urlid=%d' %u[0]).fetchone()
            if v!=None:
                return True
        return False
    
    # 添加一个关联两个网页的链接
    def addlinkref(self,urlForm,urlTo,linkText):
        pass
    
    # 从一小组网页开始进行广度优先搜索，直至某一给定深度，期间为网页建立索引
    def crawl(self,pages,depth=2):
        headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) '
                      'AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 52.0.2743.116Safari / 537.36'
        }
        for i in range(depth):
            newpages=set()
            for page in pages:
                try:
                    c=requests.get(page,headers)
                except:
                    print("could not open %s"%page)
                    continue
                plain_text=c.text
                soup=BeautifulSoup(plain_text,'html.parser')
                # self.addtoindex(page,soup) # 英文添加索引
                self.addtoindex_cn(page,soup) # 中文添加索引

                links=soup('a')
                # print(links)
                for link in links:
                    # print(link.href)
                    if ('href' in link.attrs):
                        url=urljoin(page,link['href'])
                        if url.find("'")!=-1:
                            continue
                        url=url.split('#')[0] # 去掉位置部分
                        if (url[0:4]=='http' or url[0:4]=='https') and not self.isindexed(url):
                            newpages.add(url)
                        linkText=self.gettextonly(link)
                        self.addlinkref(page,url,linkText)

                self.dbcommit()
            pages=newpages
    
    # 创建数据库
    def createindextables(self):
        self.con.execute('create table urllist(url)')  # urlist保存已经过索引的URL列表
        self.con.execute('create table wordlist(word)') # wordlist保存单词列表
        self.con.execute('create table wordlocation(urlid,wordid,location)') # wordlocation保存单词在文档中所处的位置的列表
        self.con.execute('create table link(fromid integer,toid integer)') # 保存两个URL ID，指明一张表到另外一张表的链接关系
        self.con.execute('create table linkwords(wordid,linkid)') # 记录了哪些单词与链接实际相关
        self.con.execute('create index wordidx on wordlist(word)')
        self.con.execute('create index urlidx on urllist(url)')
        self.con.execute('create index wordurlidx on wordlocation(wordid)')
        self.con.execute('create index urltoidx on link(toid)')
        self.con.execute('create index urlfromidx on link(fromid)')
        self.dbcommit()

In [6]:
crawler=crawler('doubanbook.db')
crawler.createindextables()

In [7]:
pages=['https://book.douban.com/']
crawler.crawl(pages=pages)

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ETHANW~1\AppData\Local\Temp\jieba.cache


Indexinghttps://book.douban.com/


Loading model cost 0.796 seconds.
Prefix dict has been built succesfully.


Indexinghttps://book.douban.com/subject/26944962/?icn=index-topchart-subject
Indexinghttps://www.douban.com/accounts/register?source=book
Indexinghttps://book.douban.com/tag/历史
Indexinghttps://book.douban.com/subject/26936410/?icn=index-editionrecommend
Indexinghttps://book.douban.com/subject/27052400/?icn=index-editionrecommend
Indexinghttps://book.douban.com/subject/1027191/?icn=index-book250-subject
Indexinghttps://book.douban.com/tag/美食
Indexinghttps://book.douban.com/subject/1022060/?icn=index-book250-subject
Indexinghttps://book.douban.com/subject/27004926/?icn=index-latestbook-subject
Indexinghttps://book.douban.com/standbyme/2016?source=navigation
Indexinghttps://book.douban.com
Indexinghttps://book.douban.com/tag/漫画
Indexinghttps://book.douban.com/link2/?pre=0&vendor=dangdang&srcpage=bestseller&price=2770&pos=9&url=http%3A%2F%2Funion.dangdang.com%2Ftransfer.php%3Ffrom%3DP-306226-0-s26978921%26backurl%3Dhttp%3A%2F%2Fproduct.dangdang.com%2Fproduct.aspx%3Fproduct_id%3D23932960&sr

## 查询
现在我们开始创建searcher类，其功能是通过查询数据库进行全文搜索

In [8]:
class searcher:
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
    
    def __del__(self):
        self.con.close()

增加getmatchrows函数，其功能是接收一个查询字符串，将其拆分为多个单词，然后构造一个SQL查询

In [9]:
class searcher:
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
    
    def __del__(self):
        self.con.close()
        
    def getmatchrows(self,q):
        # 构造查询的字符串
        fieldlist='w0.urlid'
        tablelist=''
        clauselist=''
        wordids=[]
        
        # 根据空格拆分单词
        words=q.split(' ')  
        tablenumber=0

        for word in words:
            # 获取单词的ID
            wordrow=self.con.execute("select rowid from wordlist where word='%s'" % word).fetchone()
            if wordrow!=None:
                wordid=wordrow[0]
                wordids.append(wordid)
                if tablenumber>0:
                    tablelist+=','
                    clauselist+=' and '
                    clauselist+='w%d.urlid=w%d.urlid and ' % (tablenumber-1,tablenumber)
                fieldlist+=',w%d.location' % tablenumber
                tablelist+='wordlocation w%d' % tablenumber      
                clauselist+='w%d.wordid=%d' % (tablenumber,wordid)
                tablenumber+=1

        # 根据各个组分，建立查询
        fullquery='select %s from %s where %s' % (fieldlist,tablelist,clauselist)
        print(fullquery)
        cur=self.con.execute(fullquery)
        rows=[row for row in cur]

        return rows,wordids

In [10]:
e=searcher('searchindex.db')
e.getmatchrows('functional programming')

select w0.urlid,w0.location,w1.location from wordlocation w0,wordlocation w1 where w0.wordid=145 and w0.urlid=w1.urlid and w1.wordid=19


([(1, 327, 23),
  (1, 327, 162),
  (1, 327, 243),
  (1, 327, 261),
  (1, 327, 269),
  (1, 327, 436),
  (1, 327, 953),
  (1, 327, 1123),
  (1, 327, 1159),
  (1, 327, 1172),
  (1, 327, 1230),
  (1, 327, 1240),
  (1, 327, 1258),
  (1, 327, 1304),
  (1, 327, 1346),
  (1, 327, 1351),
  (1, 327, 1407),
  (1, 327, 1410),
  (1, 327, 1603),
  (1, 327, 1668),
  (1, 327, 1681),
  (1, 327, 1687),
  (1, 327, 1690),
  (1, 327, 1715),
  (1, 327, 1745),
  (1, 327, 1778),
  (1, 327, 1857),
  (1, 327, 1897),
  (1, 327, 1927),
  (1, 327, 1949),
  (1, 327, 2042),
  (1, 327, 2109),
  (1, 327, 2119),
  (1, 327, 2151),
  (1, 327, 2386),
  (1, 327, 2453),
  (1, 327, 2464),
  (1, 327, 2656),
  (1, 327, 2761),
  (1, 327, 2766),
  (1, 327, 2800),
  (1, 327, 2855),
  (1, 327, 2995),
  (1, 327, 3096),
  (1, 327, 3172),
  (1, 327, 3440),
  (1, 327, 3450),
  (1, 327, 3465),
  (1, 327, 3770),
  (1, 327, 3801),
  (1, 327, 3804),
  (1, 327, 3809),
  (1, 327, 3885),
  (1, 327, 3888),
  (1, 327, 4041),
  (1, 327, 4089),


## 基于内容的排名
我们需要找到一种针对给定查询条件为网页进行评价的方法，并且能在返回结果中将评价最高者排在最前面。

我们将对几种只依据查询条件和网页内容进行评价计算的方法进行考察。包含以下三种：
- 单词频度TD（TDIDF）。位于查询条件中的单词在文档中出现的次数能有助于我们判断文档的相关程度
- 文档位置。文档的主题可能会出现在靠近文档的开始处
- 单词距离。如果查询的条件中有个单词，则它们在文档中的位置应该靠的很近

早期搜索引擎只用了上面几种方法，就已经取得了比较好的结果

在searcher类中添加一个新的方法，接收查询请求，将获取到的行集置于字典中，并以格式化列表的形式显示输出。

In [12]:
class searcher:
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
    
    def __del__(self):
        self.con.close()
        
    def getmatchrows(self,q):
        # 构造查询的字符串
        fieldlist='w0.urlid'
        tablelist=''
        clauselist=''
        wordids=[]
        
        # 根据空格拆分单词
        words=q.split(' ')  
        tablenumber=0

        for word in words:
            # 获取单词的ID
            wordrow=self.con.execute("select rowid from wordlist where word='%s'" % word).fetchone()
            if wordrow!=None:
                wordid=wordrow[0]
                wordids.append(wordid)
                if tablenumber>0:
                    tablelist+=','
                    clauselist+=' and '
                    clauselist+='w%d.urlid=w%d.urlid and ' % (tablenumber-1,tablenumber)
                fieldlist+=',w%d.location' % tablenumber
                tablelist+='wordlocation w%d' % tablenumber      
                clauselist+='w%d.wordid=%d' % (tablenumber,wordid)
                tablenumber+=1

        # 根据各个组分，建立查询
        fullquery='select %s from %s where %s' % (fieldlist,tablelist,clauselist)
        print(fullquery)
        cur=self.con.execute(fullquery)
        rows=[row for row in cur]

        return rows,wordids
    
    # 接收查询请求，将获取到的行集置于字典中，并以格式化的形式输出
    def getscoredlist(self,rows,wordids):
        totalscores=dict([(row[0],0) for row in rows])
        
        # 评价函数
#         weights=[(1.0,self.locationscore(rows)), 
#              (1.0,self.frequencyscore(rows)),
#              (1.0,self.pagerankscore(rows)),
#              (1.0,self.linktextscore(rows,wordids)),
#              (5.0,self.nnscore(rows,wordids))]
        weights=[]
        
        for (weight,scores) in weights:
            for url in totalscores:
                totalscores[url]+=weight*scores[url]
                
        return totalscores
    
    def geturlname(self,id):
        return self.con.execute("select url from urllist where rowid=%d" % id).fetchone()[0]
    
    def query(self,q):
        rows,wordids=self.getmatchrows(q) # 查询
        scores=self.getscoredlist(rows,wordids)
        rankedscores=[(score,url) for (url,score) in scores.items()] # 排序
        rankedscores.sort()
        rankedscores.reverse()
        for (score,urlid) in rankedscores[0:10]: # 得到前十的结果
            print('%f\t%s' % (score,self.geturlname(urlid)))
        
        return wordids,[r[1] for r in rankedscores[0:10]]

query方法还没有评价函数

In [13]:
e=searcher('searchindex.db')
e.query('functional programming')

select w0.urlid,w0.location,w1.location from wordlocation w0,wordlocation w1 where w0.wordid=145 and w0.urlid=w1.urlid and w1.wordid=19
0.000000	http://kiwitobes.com/wiki/XSLT.html
0.000000	http://kiwitobes.com/wiki/XQuery.html
0.000000	http://kiwitobes.com/wiki/Unified_Modeling_Language.html
0.000000	http://kiwitobes.com/wiki/SNOBOL.html
0.000000	http://kiwitobes.com/wiki/Procedural_programming.html
0.000000	http://kiwitobes.com/wiki/Miranda_programming_language.html
0.000000	http://kiwitobes.com/wiki/ISWIM.html
0.000000	http://kiwitobes.com/wiki/Smalltalk_programming_language.html
0.000000	http://kiwitobes.com/wiki/Self_programming_language.html
0.000000	http://kiwitobes.com/wiki/MOO_programming_language.html


([145, 19], [437, 436, 419, 389, 373, 372, 370, 365, 364, 361])

### 归一化函数
归一化函数normalizescores接收一个包含ID与评价值的字典，并返回一个带有相同ID，而评价值则介于0,1之间的新字典。

函数根据每个评价值与最佳结果的接近程度（最佳结果的对应值为1），对其做了相应的缩放处理。

In [14]:
class searcher:
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
    
    def __del__(self):
        self.con.close()
        
    def getmatchrows(self,q):
        # 构造查询的字符串
        fieldlist='w0.urlid'
        tablelist=''
        clauselist=''
        wordids=[]
        
        # 根据空格拆分单词
        words=q.split(' ')  
        tablenumber=0

        for word in words:
            # 获取单词的ID
            wordrow=self.con.execute("select rowid from wordlist where word='%s'" % word).fetchone()
            if wordrow!=None:
                wordid=wordrow[0]
                wordids.append(wordid)
                if tablenumber>0:
                    tablelist+=','
                    clauselist+=' and '
                    clauselist+='w%d.urlid=w%d.urlid and ' % (tablenumber-1,tablenumber)
                fieldlist+=',w%d.location' % tablenumber
                tablelist+='wordlocation w%d' % tablenumber      
                clauselist+='w%d.wordid=%d' % (tablenumber,wordid)
                tablenumber+=1

        # 根据各个组分，建立查询
        fullquery='select %s from %s where %s' % (fieldlist,tablelist,clauselist)
        print(fullquery)
        cur=self.con.execute(fullquery)
        rows=[row for row in cur]

        return rows,wordids
    
    # 接收查询请求，将获取到的行集置于字典中，并以格式化的形式输出
    def getscoredlist(self,rows,wordids):
        totalscores=dict([(row[0],0) for row in rows])
        
        # 评价函数
#         weights=[(1.0,self.locationscore(rows)), 
#              (1.0,self.frequencyscore(rows)),
#              (1.0,self.pagerankscore(rows)),
#              (1.0,self.linktextscore(rows,wordids)),
#              (5.0,self.nnscore(rows,wordids))]
        weights=[]
        
        for (weight,scores) in weights:
            for url in totalscores:
                totalscores[url]+=weight*scores[url]
                
        return totalscores
    
    def geturlname(self,id):
        return self.con.execute("select url from urllist where rowid=%d" % id).fetchone()[0]
    
    def query(self,q):
        rows,wordids=self.getmatchrows(q) # 查询
        scores=self.getscoredlist(rows,wordids)
        rankedscores=[(score,url) for (url,score) in scores.items()] # 排序
        rankedscores.sort()
        rankedscores.reverse()
        for (score,urlid) in rankedscores[0:10]: # 得到前十的结果
            print('%f\t%s' % (score,self.geturlname(urlid)))
        
        return wordids,[r[1] for r in rankedscores[0:10]]
    
    # 将评价值列表传入该函数，并指明数值越小越好，还是越大越好
    def normalizescores(self,scores,smallIsBetter=0):
        vsmall=0.00001
        if smallIsBetter:
            minscore=min(scores.values()) # 评价的最小分值/分值或者0.00001，分值越小，评价越高
            return dict([(u,float(minscore)/max(vsmall,l)) for (u,l) in scores.items()])
        else:
            maxscore=max(scores.values())
            if maxscore==0:
                maxscore=vsmall
            return dict([(u,float(c)/maxscore) for (u,c) in scores.items()])

### 单词频度
频度越高，说明网页更加靠谱

我们将frequencyscore添加入searcher类中

In [15]:
class searcher:
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
    
    def __del__(self):
        self.con.close()
        
    def getmatchrows(self,q):
        # 构造查询的字符串
        fieldlist='w0.urlid'
        tablelist=''
        clauselist=''
        wordids=[]
        
        # 根据空格拆分单词
        words=q.split(' ')  
        tablenumber=0

        for word in words:
            # 获取单词的ID
            wordrow=self.con.execute("select rowid from wordlist where word='%s'" % word).fetchone()
            if wordrow!=None:
                wordid=wordrow[0]
                wordids.append(wordid)
                if tablenumber>0:
                    tablelist+=','
                    clauselist+=' and '
                    clauselist+='w%d.urlid=w%d.urlid and ' % (tablenumber-1,tablenumber)
                fieldlist+=',w%d.location' % tablenumber
                tablelist+='wordlocation w%d' % tablenumber      
                clauselist+='w%d.wordid=%d' % (tablenumber,wordid)
                tablenumber+=1

        # 根据各个组分，建立查询
        fullquery='select %s from %s where %s' % (fieldlist,tablelist,clauselist)
        print(fullquery)
        cur=self.con.execute(fullquery)
        rows=[row for row in cur]

        return rows,wordids
    
    # 接收查询请求，将获取到的行集置于字典中，并以格式化的形式输出
    def getscoredlist(self,rows,wordids):
        totalscores=dict([(row[0],0) for row in rows])
        
        # 评价函数
#         weights=[(1.0,self.locationscore(rows)), 
#              (1.0,self.frequencyscore(rows)),
#              (1.0,self.pagerankscore(rows)),
#              (1.0,self.linktextscore(rows,wordids)),
#              (5.0,self.nnscore(rows,wordids))]
        weights=[(1.0,self.frequencyscore(rows))]
        
        for (weight,scores) in weights:
            for url in totalscores:
                totalscores[url]+=weight*scores[url]
                
        return totalscores
    
    def geturlname(self,id):
        return self.con.execute("select url from urllist where rowid=%d" % id).fetchone()[0]
    
    def query(self,q):
        rows,wordids=self.getmatchrows(q) # 查询
        scores=self.getscoredlist(rows,wordids)
        rankedscores=[(score,url) for (url,score) in scores.items()] # 排序
        rankedscores.sort()
        rankedscores.reverse()
        for (score,urlid) in rankedscores[0:10]: # 得到前十的结果
            print('%f\t%s' % (score,self.geturlname(urlid)))
        
        return wordids,[r[1] for r in rankedscores[0:10]]
    
    # 将评价值列表传入该函数，并指明数值越小越好，还是越大越好
    def normalizescores(self,scores,smallIsBetter=0):
        vsmall=0.00001
        if smallIsBetter:
            minscore=min(scores.values()) # 评价的最小分值/分值或者0.00001，分值越小，评价越高
            return dict([(u,float(minscore)/max(vsmall,l)) for (u,l) in scores.items()])
        else:
            maxscore=max(scores.values())
            if maxscore==0:
                maxscore=vsmall
            return dict([(u,float(c)/maxscore) for (u,c) in scores.items()])
    
    # 单词频度函数
    def frequencyscore(self,rows):
        counts=dict([row[0],0] for row in rows)
        for row in rows:
            counts[row[0]]+=1
        return self.normalizescores(counts)

In [16]:
e=searcher('searchindex.db')
e.query('functional programming')

select w0.urlid,w0.location,w1.location from wordlocation w0,wordlocation w1 where w0.wordid=145 and w0.urlid=w1.urlid and w1.wordid=19
1.000000	http://kiwitobes.com/wiki/Functional_programming.html
0.262476	http://kiwitobes.com/wiki/Categorical_list_of_programming_languages.html
0.062310	http://kiwitobes.com/wiki/Programming_language.html
0.043976	http://kiwitobes.com/wiki/Lisp_programming_language.html
0.036394	http://kiwitobes.com/wiki/Programming_paradigm.html
0.030880	http://kiwitobes.com/wiki/Multi-paradigm_programming_language.html
0.027295	http://kiwitobes.com/wiki/Perl.html
0.022057	http://kiwitobes.com/wiki/Declarative_programming.html
0.020265	http://kiwitobes.com/wiki/Generic_programming.html
0.019024	http://kiwitobes.com/wiki/Object-oriented_programming.html


([145, 19], [220, 1, 2, 227, 289, 288, 171, 196, 293, 296])

### 文档位置
一个网页与待搜索的单词相关，则该单词就更有可能在靠近网页开始的位置出现

In [17]:
class searcher:
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
    
    def __del__(self):
        self.con.close()
        
    def getmatchrows(self,q):
        # 构造查询的字符串
        fieldlist='w0.urlid'
        tablelist=''
        clauselist=''
        wordids=[]
        
        # 根据空格拆分单词
        words=q.split(' ')  
        tablenumber=0

        for word in words:
            # 获取单词的ID
            wordrow=self.con.execute("select rowid from wordlist where word='%s'" % word).fetchone()
            if wordrow!=None:
                wordid=wordrow[0]
                wordids.append(wordid)
                if tablenumber>0:
                    tablelist+=','
                    clauselist+=' and '
                    clauselist+='w%d.urlid=w%d.urlid and ' % (tablenumber-1,tablenumber)
                fieldlist+=',w%d.location' % tablenumber
                tablelist+='wordlocation w%d' % tablenumber      
                clauselist+='w%d.wordid=%d' % (tablenumber,wordid)
                tablenumber+=1

        # 根据各个组分，建立查询
        fullquery='select %s from %s where %s' % (fieldlist,tablelist,clauselist)
        print(fullquery)
        cur=self.con.execute(fullquery)
        rows=[row for row in cur]

        return rows,wordids
    
    # 接收查询请求，将获取到的行集置于字典中，并以格式化的形式输出
    def getscoredlist(self,rows,wordids):
        totalscores=dict([(row[0],0) for row in rows])
        
        # 评价函数
#         weights=[(1.0,self.locationscore(rows)), 
#              (1.0,self.frequencyscore(rows)),
#              (1.0,self.pagerankscore(rows)),
#              (1.0,self.linktextscore(rows,wordids)),
#              (5.0,self.nnscore(rows,wordids))]
        weights=[(1.0,self.locationscore(rows))]
        
        for (weight,scores) in weights:
            for url in totalscores:
                totalscores[url]+=weight*scores[url]
                
        return totalscores
    
    def geturlname(self,id):
        return self.con.execute("select url from urllist where rowid=%d" % id).fetchone()[0]
    
    def query(self,q):
        rows,wordids=self.getmatchrows(q) # 查询
        scores=self.getscoredlist(rows,wordids)
        rankedscores=[(score,url) for (url,score) in scores.items()] # 排序
        rankedscores.sort()
        rankedscores.reverse()
        for (score,urlid) in rankedscores[0:10]: # 得到前十的结果
            print('%f\t%s' % (score,self.geturlname(urlid)))
        
        return wordids,[r[1] for r in rankedscores[0:10]]
    
    # 将评价值列表传入该函数，并指明数值越小越好，还是越大越好
    def normalizescores(self,scores,smallIsBetter=0):
        vsmall=0.00001
        if smallIsBetter:
            minscore=min(scores.values()) # 评价的最小分值/分值或者0.00001，分值越小，评价越高
            return dict([(u,float(minscore)/max(vsmall,l)) for (u,l) in scores.items()])
        else:
            maxscore=max(scores.values())
            if maxscore==0:
                maxscore=vsmall
            return dict([(u,float(c)/maxscore) for (u,c) in scores.items()])
    
    # 单词频度函数
    def frequencyscore(self,rows):
        counts=dict([row[0],0] for row in rows)
        for row in rows:
            counts[row[0]]+=1
        return self.normalizescores(counts)
    
    # 文档位置
    def locationscore(self,rows):
        locations=dict([(row[0],1000000) for row in rows])
        for row in rows:
            loc=sum(row[1:])
            if loc<locations[row[0]]:
                locations[row[0]]=loc
        
        return self.normalizescores(locations,smallIsBetter=1)


In [18]:
e=searcher('searchindex.db')
e.query('functional programming')

select w0.urlid,w0.location,w1.location from wordlocation w0,wordlocation w1 where w0.wordid=145 and w0.urlid=w1.urlid and w1.wordid=19
1.000000	http://kiwitobes.com/wiki/Functional_programming.html
0.150183	http://kiwitobes.com/wiki/Haskell_programming_language.html
0.149635	http://kiwitobes.com/wiki/Opal_programming_language.html
0.149091	http://kiwitobes.com/wiki/Miranda_programming_language.html
0.149091	http://kiwitobes.com/wiki/Joy_programming_language.html
0.149091	http://kiwitobes.com/wiki/Dylan_programming_language.html
0.149091	http://kiwitobes.com/wiki/Charity_programming_language.html
0.149091	http://kiwitobes.com/wiki/Curry_programming_language.html
0.149091	http://kiwitobes.com/wiki/Scheme_programming_language.html
0.148551	http://kiwitobes.com/wiki/Logo_programming_language.html


([145, 19], [220, 225, 232, 372, 226, 223, 221, 141, 125, 253])

### 单词距离
寻找单词彼此间距更近的网页是很有意义的

加入一个新方法distancescore

In [19]:
class searcher:
    def __init__(self,dbname):
        self.con=sqlite3.connect(dbname)
    
    def __del__(self):
        self.con.close()
        
    def getmatchrows(self,q):
        # 构造查询的字符串
        fieldlist='w0.urlid'
        tablelist=''
        clauselist=''
        wordids=[]
        
        # 根据空格拆分单词
        words=q.split(' ')  
        tablenumber=0

        for word in words:
            # 获取单词的ID
            wordrow=self.con.execute("select rowid from wordlist where word='%s'" % word).fetchone()
            if wordrow!=None:
                wordid=wordrow[0]
                wordids.append(wordid)
                if tablenumber>0:
                    tablelist+=','
                    clauselist+=' and '
                    clauselist+='w%d.urlid=w%d.urlid and ' % (tablenumber-1,tablenumber)
                fieldlist+=',w%d.location' % tablenumber
                tablelist+='wordlocation w%d' % tablenumber      
                clauselist+='w%d.wordid=%d' % (tablenumber,wordid)
                tablenumber+=1

        # 根据各个组分，建立查询
        fullquery='select %s from %s where %s' % (fieldlist,tablelist,clauselist)
        print(fullquery)
        cur=self.con.execute(fullquery)
        rows=[row for row in cur]

        return rows,wordids
    
    # 接收查询请求，将获取到的行集置于字典中，并以格式化的形式输出
    def getscoredlist(self,rows,wordids):
        totalscores=dict([(row[0],0) for row in rows])
        
        # 评价函数
#         weights=[(1.0,self.locationscore(rows)), 
#              (1.0,self.frequencyscore(rows)),
#              (1.0,self.pagerankscore(rows)),
#              (1.0,self.linktextscore(rows,wordids)),
#              (5.0,self.nnscore(rows,wordids))]
        weights=[(1.0,self.distancescore(rows))]
        
        for (weight,scores) in weights:
            for url in totalscores:
                totalscores[url]+=weight*scores[url]
                
        return totalscores
    
    def geturlname(self,id):
        return self.con.execute("select url from urllist where rowid=%d" % id).fetchone()[0]
    
    def query(self,q):
        rows,wordids=self.getmatchrows(q) # 查询
        scores=self.getscoredlist(rows,wordids)
        rankedscores=[(score,url) for (url,score) in scores.items()] # 排序
        rankedscores.sort()
        rankedscores.reverse()
        for (score,urlid) in rankedscores[0:10]: # 得到前十的结果
            print('%f\t%s' % (score,self.geturlname(urlid)))
        
        return wordids,[r[1] for r in rankedscores[0:10]]
    
    # 将评价值列表传入该函数，并指明数值越小越好，还是越大越好
    def normalizescores(self,scores,smallIsBetter=0):
        vsmall=0.00001
        if smallIsBetter:
            minscore=min(scores.values()) # 评价的最小分值/分值或者0.00001，分值越小，评价越高
            return dict([(u,float(minscore)/max(vsmall,l)) for (u,l) in scores.items()])
        else:
            maxscore=max(scores.values())
            if maxscore==0:
                maxscore=vsmall
            return dict([(u,float(c)/maxscore) for (u,c) in scores.items()])
    
    # 单词频度函数
    def frequencyscore(self,rows):
        counts=dict([row[0],0] for row in rows)
        for row in rows:
            counts[row[0]]+=1
        return self.normalizescores(counts)
    
    # 文档位置
    def locationscore(self,rows):
        locations=dict([(row[0],1000000) for row in rows])
        for row in rows:
            loc=sum(row[1:])
            if loc<locations[row[0]]:
                locations[row[0]]=loc
        
        return self.normalizescores(locations,smallIsBetter=1)
    
    # 单词距离
    def distancescore(self,rows):
        # 如果仅有一个单词，得分都一样
        if len(rows[0])<=2: return dict([(row[0],1.0) for row in rows])

        # 初始化字典，并填入一个很大的数
        mindistance=dict([(row[0],1000000) for row in rows])

        for row in rows:
            dist=sum([abs(row[i]-row[i-1]) for i in range(2,len(row))])
            if dist<mindistance[row[0]]: mindistance[row[0]]=dist
        return self.normalizescores(mindistance,smallIsBetter=1) # 找出总距离的最小值

In [20]:
e=searcher('searchindex.db')
e.query('functional programming')

select w0.urlid,w0.location,w1.location from wordlocation w0,wordlocation w1 where w0.wordid=145 and w0.urlid=w1.urlid and w1.wordid=19
1.000000	http://kiwitobes.com/wiki/XSLT.html
1.000000	http://kiwitobes.com/wiki/XQuery.html
1.000000	http://kiwitobes.com/wiki/Procedural_programming.html
1.000000	http://kiwitobes.com/wiki/Miranda_programming_language.html
1.000000	http://kiwitobes.com/wiki/ISWIM.html
1.000000	http://kiwitobes.com/wiki/Smalltalk_programming_language.html
1.000000	http://kiwitobes.com/wiki/MOO_programming_language.html
1.000000	http://kiwitobes.com/wiki/SuperCollider.html
1.000000	http://kiwitobes.com/wiki/Smalltalk.html
1.000000	http://kiwitobes.com/wiki/Sather_programming_language.html


([145, 19], [437, 436, 373, 372, 370, 365, 361, 352, 345, 342])