## 如何拆分含有多种分隔符的字符串？

**连续使用str.split()方法，每次处理一种分隔符号.**

In [66]:
s1 = 'ph 15196 0.0 0.0 22652 2872 pts/11 R+ 13:50 0:00 ps aux'
s = 'ab;cd|efg|hi,jkl|mn\topq;rst,uvw\txyz'

In [67]:
s.split()

['ab;cd|efg|hi,jkl|mn', 'opq;rst,uvw', 'xyz']

In [69]:
s1.split()

['ph',
 '15196',
 '0.0',
 '0.0',
 '22652',
 '2872',
 'pts/11',
 'R+',
 '13:50',
 '0:00',
 'ps',
 'aux']

In [70]:
res = s.split(';')

In [71]:
list(map(lambda x:x.split('|'), res))

[['ab'], ['cd', 'efg', 'hi,jkl', 'mn\topq'], ['rst,uvw\txyz']]

In [72]:
t = []
list(map(lambda x:t.extend(x.split('|')), res))

[None, None, None]

In [73]:
t

['ab', 'cd', 'efg', 'hi,jkl', 'mn\topq', 'rst,uvw\txyz']

In [74]:
res = t

In [75]:
t = []

In [76]:
list(map(lambda x:t.extend(x.split(',')), res))

[None, None, None, None, None, None]

In [77]:
t

['ab', 'cd', 'efg', 'hi', 'jkl', 'mn\topq', 'rst', 'uvw\txyz']

**发现规律，写一个循环**

In [61]:
s = 'ab;cd|efg|hi,,jkl|mn\topq;rst,uvw\txyz'
def mySplit(s, ds):
    res = [s]
    
    for d in ds:
        t = []
        list(map(lambda x:t.extend(x.split(d)), res))
        res = t
    return [x for x in res if x]  # 过滤空字符串

print(mySplit(s, ';,|\t'))

['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']


**使用正则表达式的re.split()方法，一次性拆分字符串.**

In [78]:
import re
re.split(r'[,;\t|]+', s)

['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uvw', 'xyz']

## 如何判断字符串a是否以字符串b开头或结尾？

In [79]:
import os, stat
os.listdir('.')

['.ipynb_checkpoints',
 '信息.txt',
 '多线程和多进程.ipynb',
 '字符串处理.ipynb',
 '数据结构.ipynb',
 '数据编码与处理.ipynb',
 '文件IO操作.ipynb',
 '类与对象.ipynb',
 '装饰器.ipynb',
 '迭代器与生成器.ipynb']

In [80]:
s = 'g.sh'
s.endswith('.sh')

True

In [81]:
s.endswith('.py')

False

In [83]:
s.endswith(('.sh', '.py'))    # 只能传入元组不能传入列表

True

In [84]:
[name for name in os.listdir('.') if name.endswith(('.ipynb', '.py'))]

['多线程和多进程.ipynb',
 '字符串处理.ipynb',
 '数据结构.ipynb',
 '数据编码与处理.ipynb',
 '文件IO操作.ipynb',
 '类与对象.ipynb',
 '装饰器.ipynb',
 '迭代器与生成器.ipynb']

In [85]:
os.stat('装饰器.ipynb')

os.stat_result(st_mode=33206, st_ino=4785074604138849, st_dev=1076605485, st_nlink=1, st_uid=0, st_gid=0, st_size=555, st_atime=1538578134, st_mtime=1538578134, st_ctime=1538578062)

In [86]:
os.stat('装饰器.ipynb').st_mode

33206

In [87]:
oct(os.stat('装饰器.ipynb').st_mode)

'0o100666'

In [88]:
[name for name in os.listdir('.') if name.startswith('数')]

['数据结构.ipynb', '数据编码与处理.ipynb']

## 如何调整字符串中文本的格式?

In [90]:
log = '2018-10-08 18:40:01 status'

In [91]:
import re
re.sub('(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1', log)

'10/08/2018 18:40:01 status'

In [92]:
re.sub('(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})', r'\g<month>/\g<day>/\g<year>', log)

'10/08/2018 18:40:01 status'

## 如何将多个小字符串拼接成一个大的字符串？

In [93]:
pl = ['<0112>', '<32>', '<1024x768>', '<60>', '<1>', '<100.0>', '<500.0>']

In [94]:
s = ''
for p in pl:
    s += p
print(s)

<0112><32><1024x768><60><1><100.0><500.0>


**如果字符串很长，可以使用str.join()方法，更加快速的拼接列表中所有字符串**

In [96]:
''.join(pl)

'<0112><32><1024x768><60><1><100.0><500.0>'

In [97]:
l = ['abc', 123, 45, 'xyz']
''.join(str(x) for x in l)

'abc12345xyz'

In [98]:
(str(x) for x in l)

<generator object <genexpr> at 0x00000224738CDE08>

## 如何对字符串进行左，右，居中对齐

In [109]:
headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive'
}

**使用字符串的str.ljust(),str.rjust(),str.center()进行左,右,居中对齐**

In [108]:
s = 'abc'
print(s.ljust(20, '='))
print()
print(s.rjust(20, '-'))
print()
print(s.center(20))


-----------------abc

        abc         


**使用format()方法, 传递类似'<20', '>20', '^20'参数完成同样任务**

In [107]:
print(format(s, '<20'))
print()
print(format(s, '>20'))
print()
print(format(s, '^20'))

abc                 

                 abc

        abc         


In [110]:
headers

{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
 'Accept-Encoding': 'gzip, deflate, br',
 'Accept-Language': 'zh-CN,zh;q=0.9',
 'Cache-Control': 'max-age=0',
 'Connection': 'keep-alive'}

In [114]:
w = max(list(map(len, headers.keys())))

In [115]:
for k in headers:
    print(k.ljust(w), ':', headers[k])

Accept          : text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding : gzip, deflate, br
Accept-Language : zh-CN,zh;q=0.9
Cache-Control   : max-age=0
Connection      : keep-alive


In [116]:
for k in headers:
    print(k)

Accept
Accept-Encoding
Accept-Language
Cache-Control
Connection


## 如何去掉字符串中不需要的字符?

In [120]:
s = '    nick2018@gmail.com  '
print(s.strip())
print(s.lstrip())
print(s.rstrip())

nick2018@gmail.com
nick2018@gmail.com  
    nick2018@gmail.com


In [123]:
s = '---abc+++'
s.strip('-+')

'ab'

In [124]:
s = 'abc:123'
s[:3] + s[4:]

'abc123'

In [129]:
s = '\tabc\t123\txyz'
print(s)
s

	abc	123	xyz


'\tabc\t123\txyz'

In [130]:
s.replace('\t', '')

'abc123xyz'

In [131]:
s = '\tabc\t123\txyz\ropq'

In [135]:
import re

re.sub('[\t\r]', '', s)  # [] 表示其中之一

'abc123xyzopq'

In [148]:
s = 'abc1230323xyz'

In [139]:
str.maketrans('abcxyz', 'xyzabc')

{97: 120, 98: 121, 99: 122, 120: 97, 121: 98, 122: 99}

In [140]:
s.translate(str.maketrans('abcxyz', 'xyzabc'))

'xyz1230323abc'

In [145]:
s = 'abc\refg\n234\t'
remove = '\r\n\t'
table = str.maketrans('_', '_', remove)
s.translate(table)

'abcefg234'