# DataType

## int 

### scale convertion

In [1]:
print(int('10101010', 2))
print(int('252', 8))
print(int('0xAA', 16))
print(bin(170))
print(oct(170))
print(hex(170))

print(bin(0o252))
print(bin(0xaa))


170
170
170
0b10101010
0o252
0xaa
0b10101010
0b10101010



## String

### Basic

A double quoted string literal can contain single quotes without any fuss (e.g. "I didn't do it") and likewise single quoted string can contain double quotes. A string literal can span multiple lines, but there must be a backslash **\\** at the end of each line to escape the newline. String literals inside triple quotes, **""" or '''**, can span multiple lines of text.

In [2]:
s = 'hi'
print(s[1])  # i
print(len(s))  # 2
print(s + ' there')  # hi there
print('lala\
      haha\
    ')  # lala    haha

print('''
      haha
      lala
      '''
      )  # haha
# lala


i
2
hi there
lala      haha    

      haha
      lala
      


Unlike Java, the '+' does not automatically convert numbers or other types to string form. The str() function converts values to a string form so they can be combined with other strings.

In [3]:

pi = 3.14
# text = 'The value of pi is ' + pi      ## NO, does not work
text = 'The value of pi is ' + str(pi)  # yes
# !尽量不要用关键字命名变量，即使当前文件没有，但编译器可能依然保留着之前其它文件运行的信息
print(text)


The value of pi is 3.14


For numbers, the standard operators, +, /, * work in the usual way. **There is no ++ operator, but +=, -=, etc. work**. If you want integer division, it is most correct to use 2 slashes -- e.g. 6 // 5 is 1 (previous to python 3, a single / does int division with ints anyway, but moving forward // is the preferred way to indicate that you want int division.)

In [4]:
print(3/2)
print(3//2)


1.5
1


The "print" operator prints out one or more python items followed by a newline (**<font color=MediumTurquoise>leave a trailing comma at the end of the items to inhibit the newline </font>**). A "raw" string literal is prefixed by an 'r' and passes all the chars through without special treatment of backslashes, so r'x\nx' evaluates to the length-4 string 'x\nx'. A 'u' prefix allows you to write a unicode string literal 

Regular(标准) Python strings are *not* unicode, they are just plain bytes(普通字节). 

A unicode string is a different type of object from regular "str" string, but the unicode string is compatible (they share the common superclass "basestring"), and *the various libraries such as regular expressions work correctly if passed a unicode string instead of a regular string*.

<font color=MediumTurquoise> 在python3中，str对象中获取的元素是Unicode字符 - 在python2中，str对象中获取的原始字节序列</font>

To convert a unicode string to bytes with an encoding such as 'utf-8', call the ustring.encode('utf-8') method on the unicode string. Going the other direction, the unicode(s, encoding) function converts encoded plain bytes to a unicode string:

In [5]:
raw = r'this\t\n and that'
print(raw)
ustring = u'A unicode 😍 \U0001F602 string \xf1'  # 注意较长的unicode要补足8位，短的用\u,四位
print(ustring)  # utf-8
s = ustring.encode('utf-8')
print(s)
t = str(s, 'utf-8')  # !python2 use unicode instead str
print(ustring == t)


this\t\n and that
A unicode 😍 😂 string ñ
b'A unicode \xf0\x9f\x98\x8d \xf0\x9f\x98\x82 string \xc3\xb1'
True


Python has a *printf()-like facility* to put together a string. The % operator takes a printf-type format string on the left (%d int, %s string, %f/%g floating point), and the matching values in a tuple on the right (*<font color=MediumTurquoise>a tuple is made of values separated by commas, typically grouped inside parentheses</font>*):

Since by default Python <font color=MediumTurquoise>treats each line as a separate statement</font> (on the plus side, this is why we don't need to type semi-colons on each line). To fix this, enclose the whole expression in an outer set of parenthesis -- then the expression is allowed to span multiple lines. This code-across-lines technique works with the various grouping constructs detailed below: **( ), [ ], { }**.

In [6]:
# % operator
text = "%d little pigs come out, or I'll %s, and I'll %s, and I'll blow your %s down." % (
    3, 'huff', 'puff', 'house')
print(text)

# Add parentheses to make the long line work:
text = (
    "%d little pigs come out, or I'll %s, and I'll %s, and I'll blow your %s down."
    % (3, 'huff', 'puff', 'house'))
print(text)

# Split the line into chunks, which are concatenated automatically by Python
text = (
    "%d little pigs come out, "
    "or I'll %s, and I'll %s, "
    "and I'll blow your %s down."
    % (3, 'huff', 'puff', 'house'))


3 little pigs come out, or I'll huff, and I'll puff, and I'll blow your house down.
3 little pigs come out, or I'll huff, and I'll puff, and I'll blow your house down.


### format

{ [index][ : [ [fill] [align] [sign] [width] [.precision] [type] ] }

- index：指定：后边设置的格式要作用到 args 中第几个数据，数据的索引值从 0 开始。如果省略此选项，则会根据 args 中数据的先后顺序自动分配。
- fill：指定空白处填充的字符。注意，当填充字符为逗号(,)且作用于整数或浮点数时，该整数（或浮点数）会以逗号分隔的形式输出，例如（1000000会输出 1,000,000）。
- align：指定数据的对齐方式
  - align	含义
    - <	数据左对齐。
    - \>	数据右对齐。
    - =	数据右对齐，同时将符号放置在填充内容的最左侧，该选项只对数字类型有效。
    - ^	数据居中，此选项需和 width 参数一起使用。

- sign参数	
  - 含义
    - \+	正数前加正号，负数前加负号。
    - \-	正数前不加正号，负数前加负号。
    - 空格	正数前加空格，负数前加负号。
    - \#	对于二进制数、八进制数和十六进制数，使用此参数，各进制数前会分别显示 0b、0o、0x前缀；反之则不显示前缀。



In [7]:
#以货币形式显示
print("货币形式：{:,d}".format(1000000))
#科学计数法表示
print("科学计数法：{:E}".format(1200.12))
#以十六进制表示
print("100的十六进制：{:#x}".format(100))
#输出百分比形式
print("0.01的百分比表示：{:.0%}".format(0.01))

货币形式：1,000,000
科学计数法：1.200120E+03
100的十六进制：0x64
0.01的百分比表示：1%


### String Methods

 Here are some of the most common string methods:

- s.lower(), s.upper() -- returns the lowercase or uppercase version of the string
- s.strip() -- returns a string with whitespace removed from the start and end
- s.isalpha()/s.isdigit()/s.isspace()... -- tests if all the string chars are in the various character classes
- s.startswith('other'), s.endswith('other') -- tests if the string starts or ends with the given other string
- s.find('other') -- searches for the given other string (not a regular expression) within s, and returns the first index where it begins or -1 if not found
- s.replace('old', 'new') -- returns a string where all occurrences of 'old' have been replaced by 'new'
- s.split('delim') -- returns a list of substrings separated by the given delimiter. The delimiter is not a regular expression, it's just text. 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']. As a convenient special case s.split() (with no arguments) splits on all whitespace chars.
- s.join(list) -- opposite of split(), joins the elements in the given list together using the string as the delimiter. e.g. '---'.join(['aaa', 'bbb', 'ccc']) -> aaa---bbb---ccc

**Python does not have a separate character type**. Instead an expression like s[8] returns a string-length-1 containing the character. With that string-length-1, the operators ==, <=, ... all work as you would expect, so mostly you don't need to know that Python does not have a separate scalar "char" type.

#### String Slices

sequence[indexStart : indexEnd : stride]

按步长stride从序列sequence中取出从indexStart开始到indexEnd终止范围内的元素组成一个新的序列

indexStart: 起始索引号，它的值可以是正的也可以是负的，它所对应的元素会加入到新的序列中；该索引号指的是原始序列中的索引号，而不是新序列中的索引号

indexEnd:  终止索引号，它的值可以是正的也可以是负的，它所对应的元素不会加入到新的序列中；该索引号指的是原始序列中的索引号，而不是新序列中的索引号

stride:        步长，它的值可以是正的也可以是负的，可以省略，省略时使用默认步长，**默认步长为正1**

indexStart 可以省略，省略时，若stride为正的，则表示序列的首个元素；若stride为负的，则表示序列的最后一个元素

indexEnd  也可以省略，省略时，若stride为正的，则表示序列的最后一个元素；若stride为负的，则表示序列的首个元素


In [8]:
import random
s='Hello'
# !左闭右开
print(s[1:4])
print(s[1:])
print(s[1:100])
print(s[-1])
print(s[:-3])
index=random.randint(-100,100)
print(s[index:]+s[:index]==s)#* True This works even for n negative or out of bounds. 
# Or put another way s[:n] and s[n:] always partition the string into two string parts, conserving all the characters

s_copy=s[:]
print(s_copy)

ell
ello
ello
o
He
True
Hello


## bytes

字节串（bytes）和字符串（string）的对比：
- 字符串由若干个字符组成，以字符为单位进行操作；字节串由若干个字节组成，以字节为单位进行操作。
- 字节串和字符串除了操作的数据单元不同之外，它们支持的所有方法都基本相同。
- 字节串和字符串都是不可变序列，不能随意增加和删除数据。

In [9]:
#通过构造函数创建空 bytes
b1 = bytes()
#通过空字符串创建空 bytes
b2 = b''
#通过b前缀将字符串转换成 bytes
b3 = b'http://c.biancheng.net/python/'
print("b3: ", b3)
print(b3[3])
print(b3[7:22])
#为 bytes() 方法指定字符集
b4 = bytes('C语言中文网8岁了', encoding='UTF-8')
print("b4: ", b4)
#通过 encode() 方法将字符串转换成 bytes
b5 = "C语言中文网8岁了".encode('UTF-8')
print("b5: ", b5)

#通过 decode() 方法将 bytes 转换成字符串
str1 = b5.decode('UTF-8')
print("str1: ", str1)

b3:  b'http://c.biancheng.net/python/'
112
b'c.biancheng.net'
b4:  b'C\xe8\xaf\xad\xe8\xa8\x80\xe4\xb8\xad\xe6\x96\x87\xe7\xbd\x918\xe5\xb2\x81\xe4\xba\x86'
b5:  b'C\xe8\xaf\xad\xe8\xa8\x80\xe4\xb8\xad\xe6\x96\x87\xe7\xbd\x918\xe5\xb2\x81\xe4\xba\x86'
str1:  C语言中文网8岁了


## list

### basic

Lists work similarly to strings -- use the len() function and square brackets [ ] to access data, with the first element at index 0. 

In [10]:
colors = ['red', 'blue', 'green']
print (colors[0])  # red
print (colors[2])  # green
print (len(colors))  # 3
print(colors+[1,2,3])


red
green
3
['red', 'blue', 'green', 1, 2, 3]


### loop

In [11]:
squares = [1, 4, 9, 16]
sum = 0
for num in squares:
    sum += num
print(sum)  # 30

# can also use for/in to work on a string

for ch in "hello":
    print(ch, end=' ')  # !指定末尾字符

list = ['larry', 'curly', 'moe']
if 'curly' in list:
    print('yay')

# Print the multiplication table
for i in range(1, 10):
    for j in range(1, 10):
        print("{:d}*{:d}={:2d}".format(i, j, i*j),end=' ')
    print()
a = range(20)
# Access every 3rd element in a list
i = 0
while i < len(a):
    print(a[i], end=' ')
    i = i + 3
print()


30
h e l l o yay
1*1= 1 1*2= 2 1*3= 3 1*4= 4 1*5= 5 1*6= 6 1*7= 7 1*8= 8 1*9= 9 
2*1= 2 2*2= 4 2*3= 6 2*4= 8 2*5=10 2*6=12 2*7=14 2*8=16 2*9=18 
3*1= 3 3*2= 6 3*3= 9 3*4=12 3*5=15 3*6=18 3*7=21 3*8=24 3*9=27 
4*1= 4 4*2= 8 4*3=12 4*4=16 4*5=20 4*6=24 4*7=28 4*8=32 4*9=36 
5*1= 5 5*2=10 5*3=15 5*4=20 5*5=25 5*6=30 5*7=35 5*8=40 5*9=45 
6*1= 6 6*2=12 6*3=18 6*4=24 6*5=30 6*6=36 6*7=42 6*8=48 6*9=54 
7*1= 7 7*2=14 7*3=21 7*4=28 7*5=35 7*6=42 7*7=49 7*8=56 7*9=63 
8*1= 8 8*2=16 8*3=24 8*4=32 8*5=40 8*6=48 8*7=56 8*8=64 8*9=72 
9*1= 9 9*2=18 9*3=27 9*4=36 9*5=45 9*6=54 9*7=63 9*8=72 9*9=81 
0 3 6 9 12 15 18 


### List Methods

Here are some other common list methods.

- list.append(elem) -- adds a single element to the end of the list. Common error: does not return the new list, just modifies the original.
- list.insert(index, elem) -- inserts the element at the given index, shifting elements to the right.
- list.extend(list2) adds the elements in list2 to the end of the list. Using + or += on a list is similar to using extend().
- list.index(elem) -- searches for the given element from the start of the list and returns its index. Throws a ValueError if the element does not appear (use "in" to check without a ValueError).
- list.remove(elem) -- searches for the first instance of the given element and removes it (throws ValueError if not present)
- list.sort() -- sorts the list in place (does not return it). (The sorted() function shown later is preferred.)
- list.reverse() -- reverses the list in place (does not return it)
- list.pop(index) -- removes and returns the element at the given index. Returns the rightmost element if index is omitted (roughly the opposite of append()).

In [12]:
list = ['larry', 'curly', 'moe']
list.append('shemp')  # append elem at end
list.insert(0, 'xxx')  # insert elem at index 0
list.extend(['yyy', 'zzz'])  # add list of elems at end
print (list)  # ['xxx', 'larry', 'curly', 'moe', 'shemp', 'yyy', 'zzz']
print (list.index('curly'))  # 2

list.remove('curly')  # search and remove that element
list.pop(1)  # removes and returns 'larry'
print (list)  # ['xxx', 'moe', 'shemp', 'yyy', 'zzz']


['xxx', 'larry', 'curly', 'moe', 'shemp', 'yyy', 'zzz']
2
['xxx', 'moe', 'shemp', 'yyy', 'zzz']


#### List Build Up

```python
list = []          ## Start as the empty list
list.append('a')   ## Use append() to add elements
list.append('b')

```

#### List Slice

In [13]:
list = ['a', 'b', 'c', 'd']
print (list[1:-1])  # ['b', 'c']
list[0:2] = 'z'  # replace ['a', 'b'] with ['z']
print (list)  # ['z', 'c', 'd']

['b', 'c']
['z', 'c', 'd']


#### sort

In [14]:
list = [1, 2, 3, 2, 6, 7]
list.sort()  # sorts
print(list)
list.sort(reverse=True)
print(list)

print(sorted(list))
print(sorted(list, reverse=True))

strs = ['ccc', 'aaaa', 'd', 'bb']
print (sorted(strs, key=len))  # ['d', 'bb', 'ccc', 'aaaa']


[1, 2, 2, 3, 6, 7]
[7, 6, 3, 2, 2, 1]
[1, 2, 2, 3, 6, 7]
[7, 6, 3, 2, 2, 1]
['d', 'bb', 'ccc', 'aaaa']


#### Custom Sorting With key=

In [15]:
from functools import cmp_to_key
strs = ['ccc', 'aaaa', 'd', 'bb']
print(sorted(strs, key=len))  # ['d', 'bb', 'ccc', 'aaaa']
# "key" argument specifying str.lower function to use for sorting
print(sorted(strs, key=str.lower))  # ['aa', 'BB', 'CC', 'zz']

# Say we have a list of strings we want to sort by the last letter of the string.
strs = ['xc', 'zb', 'yd', 'wa']

# Write a little function that takes a string, and returns its last letter.
# *This will be the key function (takes in 1 value, returns 1 value).


def MyFn(s):
    return s[-1]


# *Now pass key=MyFn to sorted() to sort by the last letter:
print(sorted(strs, key=MyFn))  # ['wa', 'zb', 'xc', 'yd']

'''
如果元组里第一个元素是奇数，就用元组里第一个元素进行排序，
如果元组里第一个元素是偶数，则用这个元组里的第二个元素进行大小比较
'''

lst = [(9, 4), (2, 10), (4, 3), (3, 6)]


def cmp(x, y):
    a = x[0] if x[0] % 2 == 1 else x[1]  # !python 三目运算符写法
    b = y[0] if y[0] % 2 == 1 else y[1]

    return (a>b)-(a<b)
    # return 1 if a > b else -1 if a < b else 0


lst.sort(key=cmp_to_key(cmp))
print(lst)
# print(sorted(list,cmp=cmp))
# !在python3中，cmp关键字被移除了，这样的写法就无法运行了

''' 
def cmp_to_key(mycmp):
    """Convert a cmp= function into a key= function"""
    class K(object):
        __slots__ = ['obj']
        def __init__(self, obj):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(self.obj, other.obj) >= 0
        __hash__ = None
    return K
'''


['d', 'bb', 'ccc', 'aaaa']
['aaaa', 'bb', 'ccc', 'd']
['wa', 'zb', 'xc', 'yd']
[(4, 3), (3, 6), (9, 4), (2, 10)]


' \ndef cmp_to_key(mycmp):\n    """Convert a cmp= function into a key= function"""\n    class K(object):\n        __slots__ = [\'obj\']\n        def __init__(self, obj):\n            self.obj = obj\n        def __lt__(self, other):\n            return mycmp(self.obj, other.obj) < 0\n        def __gt__(self, other):\n            return mycmp(self.obj, other.obj) > 0\n        def __eq__(self, other):\n            return mycmp(self.obj, other.obj) == 0\n        def __le__(self, other):\n            return mycmp(self.obj, other.obj) <= 0\n        def __ge__(self, other):\n            return mycmp(self.obj, other.obj) >= 0\n        __hash__ = None\n    return K\n'

#### reverse 

In [16]:

# 第一种
# li=range(0,10)#python2返回list，3是range
# li=list(li) #python3.7之前
li=[*range(10)] #!当前语法 range->list
print(type(li))
rli=[*reversed(li)]
print(type(reversed(li))) #*list_reverseiterator->list
print(rli)

#第二种
print(sorted(li,reverse=True))

# *第三种(最简便)

print(li[::-1])

#

<class 'list'>
<class 'list_reverseiterator'>
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


### List Comprehensions(列表推导式)

The syntax is [ expr for var in list ] -- the for var in list looks like a regular for-loop, but without the colon (:). The expr to its left is evaluated once for each element to give the values for the new list. 

In [17]:
li = [n for n in range(10)]
print(li)
print([n*n for n in range(10)])

strs = ['hello', 'and', 'goodbye']

shouting = [s.upper() + '!!!' for s in strs]
print(shouting)

fruits = ['apple', 'cherry', 'banana', 'lemon']
afruits = [s.upper() for s in fruits if 'a' in s]
print(afruits)


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
['HELLO!!!', 'AND!!!', 'GOODBYE!!!']
['APPLE', 'BANANA']


## tuple

### basic

A tuple is a <font color=MediumTurquoise> fixed size grouping of elements</font>, such as an (x, y) co-ordinate. Tuples are like lists, except they are immutable and do not change size (<font color=MediumTurquoise> tuples are not strictly immutable since one of the contained elements could be mutable</font>). Tuples play a sort of "struct" role in Python -- a convenient way to pass around a little logical, fixed size bundle of values. <font color=MediumTurquoise> A function that needs to return multiple values can just return a tuple of the values.</font>

The "empty" tuple is just an empty pair of parenthesis. Accessing the elements in a tuple is just like a list -- len(), [ ], for, in, etc. all work the same.

In [18]:
tuple = (1, 2, 'hi')
print(len(tuple))
print(tuple[2])
# tuple[2] = "lala"  # NO, tuples cannot be changed
tuple = (1, 2, [1, 2, 3])  # this works
li = tuple[2]
li += [4]
print(tuple[2])  # !在不改变引用的条件下可修改

tuple = ('hi',)  # *size-1 tuple

(x, y, z) = (42, 13, "hike")
print (z)  # hike


3
hi
[1, 2, 3, 4]
hike


### sort

In [19]:
import operator
students = [("zhangsan","A",10),("lisi","C",9),("lisi1","A",9),("lisi2","B",9),("wangwu","B",13)]
students.sort(key=operator.itemgetter(2))
print(students)

[('lisi', 'C', 9), ('lisi1', 'A', 9), ('lisi2', 'B', 9), ('zhangsan', 'A', 10), ('wangwu', 'B', 13)]


### zip

将多个序列（列表、元组、字典、集合、字符串以及 range() 区间构成的列表）“压缩”成一个 zip 对象。所谓“压缩”，其实就是将这些序列中对应位置的元素重新组合，生成一个个新的元组。

In [20]:
my_list = [11, 12, 13]
my_tuple = (21, 22, 23)
print([x for x in zip(my_list, my_tuple)])

my_pychar = "python"
my_shechar = "shell"
print([x for x in zip(my_pychar, my_shechar)])


[(11, 21), (12, 22), (13, 23)]
[('p', 's'), ('y', 'h'), ('t', 'e'), ('h', 'l'), ('o', 'l')]


## Dict Hash Table

### basic  

<font color=MediumTurquoise> Strings, numbers, and tuples work as keys, and any type can be a value. </font>Other types may or may not work correctly as keys (strings and tuples work cleanly since they are immutable). Looking up a value which is not in the dict throws a KeyError -- use "in" to check if the key is in the dict, or use dict.get(key) which returns the value or None if the key is not present (or get(key, not-found) allows you to specify what value to return in the not-found case).

In [21]:
# Can build up a dict by starting with the the empty dict {}
# and storing key/value pairs into the dict like this:
# dict[key] = value-for-that-key
dict = {}
dict['a'] = 'alpha'
dict['g'] = 'gamma'
dict['o'] = 'omega'
# 直接初始化
# !dict={'a': 'alpha', 'g': 'gamma', 'o': 'omega'}

print(dict)  # {'a': 'alpha', 'o': 'omega', 'g': 'gamma'}

print(dict['a'])  # Simple lookup, returns 'alpha'
dict['a'] = 6  # Put new key/value into dict
print('a' in dict)  # True
# print dict['z']                  ## Throws KeyError
if 'z' in dict:
    print(dict['z'])  # Avoid KeyError
print(dict.get('z'))  # None (instead of KeyError)

{'a': 'alpha', 'g': 'gamma', 'o': 'omega'}
alpha
True
None


### loop

In [22]:
dict = {}
dict['a'] = 'alpha'
dict['g'] = 'gamma'
dict['o'] = 'omega'

for key in dict:
    print(key, end=' ')
print()
# Exactly the same as above
for key in dict.keys():
    print(key, end=' ')
print()

print(dict.values())  # ['alpha', 'omega', 'gamma']

# Common case -- loop over the keys in sorted order,
# accessing each key/value
for key in sorted(dict.keys()):
    print(key, dict[key])
for k, v in dict.items():
    print(k, '->', v)


a g o 
a g o 
dict_values(['alpha', 'gamma', 'omega'])
a alpha
g gamma
o omega
a -> alpha
g -> gamma
o -> omega


### Dict Formatting

In [23]:
hash={}
hash['word'] = 'garfield'
hash['count'] = 42

s = 'I want %(count)d copies of %(word)s' % hash  # %d for int, %s for string
print(s)


I want 42 copies of garfield


### del

The "del" operator does deletions. In the simplest case, it can remove the definition of a variable, as if that variable had not been defined. <font color=MediumTurquoise> Del can also be used on list elements or slices to delete that part of the list and to delete entries from a dictionary.</font>

## set

### basic 

Python 提供了 2 种创建 set 集合的方法，分别是使用 {} 创建和使用 set() 函数将列表、元组等类型数据转换为集合。

从内容上看，同一集合中，只能存储不可变的数据类型，包括整形、浮点型、字符串、元组，<font color="MediumTurquoise"> 无法存储列表、字典、集合这些可变的数据类型，否则 Python 解释器会抛出 TypeError 错误</font>。


In [24]:
a = {1, 'c', 1, (1, 2, 3), 'c'}
print(a)
for ele in a:
    print(ele, end=' ')
del(a)
print()
set0 = set()  # 创建空字典
set1 = set("c.biancheng.net")
set2 = set([1, 2, 3, 4, 5])
set3 = set((1, 2, 3, 4, 5))
# *由于 Python 中的 set 集合是无序的，所以每次输出时元素的排序顺序可能都不相同。
print("set0:", set0)
print("set1:", set1)
print("set2:", set2)
print("set3:", set3)

{1, (1, 2, 3), 'c'}
1 (1, 2, 3) c 
set0: set()
set1: {'.', 'g', 'n', 'a', 'c', 't', 'h', 'e', 'i', 'b'}
set2: {1, 2, 3, 4, 5}
set3: {1, 2, 3, 4, 5}


In [25]:
a={1,2,3} 
a.add(4)
print(a)
a.remove(1)
print(a)
a.discard(1)# *删除时不存在也不会报错

{1, 2, 3, 4}
{2, 3, 4}


### set operation

In [26]:
set1={1,2,3}
set2={3,4,5}
print(set1 & set2)
print(set1 | set2)
print(set1-set2)
print(set1^set2)# *对称差

{3}
{1, 2, 3, 4, 5}
{1, 2}
{1, 2, 4, 5}


### set method

In [27]:
print(dir(set))# 直接dir(list)是竖行
set={1,2,3,4}
help(set.pop)
set.pop()
print(set)



['__and__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__iand__', '__init__', '__init_subclass__', '__ior__', '__isub__', '__iter__', '__ixor__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__or__', '__rand__', '__reduce__', '__reduce_ex__', '__repr__', '__ror__', '__rsub__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__xor__', 'add', 'clear', 'copy', 'difference', 'difference_update', 'discard', 'intersection', 'intersection_update', 'isdisjoint', 'issubset', 'issuperset', 'pop', 'remove', 'symmetric_difference', 'symmetric_difference_update', 'union', 'update']
Help on built-in function pop:

pop(...) method of builtins.set instance
    Remove and return an arbitrary set element.
    Raises KeyError if the set is empty.

{2, 3, 4}


### frozenset

frozenset 集合是不可变序列，程序不能改变序列中的元素。set 集合中所有能改变集合本身的方法，比如 remove()、discard()、add() 等，frozenset 都不支持；set 集合中不改变集合本身的方法，fronzenset 都支持。

向 set 中添加 frozenset 是没问题的，因为 frozenset 是不可变的

## file 

### basic 

The code f = open('name', 'r') opens the file into the variable f, ready for reading operations, and use f.close() when finished. Instead of 'r', use 'w' for writing, and 'a' for append. <font color=MediumTurquoise> The special mode 'rU' is the "Universal" option for text files where it's smart about converting different line-endings so they always come through as a simple '\n'.</font> The standard for-loop works for text files, iterating through the lines of the file (this works only for text files, not binary files)

In [28]:
# Echo the contents of a file
f = open('foo.txt', 'r')# *U  python3.x默认开启
for line in f:  # iterates over the lines of the file
    print (line),  # trailing , so print does not add an end-of-line char
    # since 'line' already includes the end-of-line.
f.close()


123

456



The f.readlines() method reads the whole file into memory and returns its contents as a list of its lines. The f.read() method reads the whole file into a single string
<font color=MediumTurquoise> 可以通过使用 size 参数，指定 read() 每次可读取的最大字符（或者字节）数,read(size)</font>

For writing, f.write(string) method is the easiest way to write data to an open output file. Or you can use "print" with an open file, but the syntax is nasty: "print >> f, string". In python 3, the print syntax will be fixed to be a regular function call with a file= optional argument: "print(string, file=f)".

In [29]:
f=open('foo.txt','w+')# *可用encoding参数指定打开方式
print(f.encoding)# UTF-8
f.write("123\n")
print(456,file=f)
f.flush()  # 刷新,不关闭
print(f.tell()) #指针位置
f.seek(0) #指针回到文件开头
# *当 offset（第一个参数） 值非 0 时，Python 要求文件必须要以二进制格式打开，否则会抛出 io.UnsupportedOperation 错误。
# *whence：作为可选参数，用于指定文件指针要放置的位置，该参数的参数值有 3 个选择：0 代表文件头（默认值）、1 代表当前位置、2 代表文件尾。
print(f.readlines())
f.close()

#以二进制形式打开指定文件
f = open("foo.txt",'rb+')
#输出读取到的数据
print(f.read().decode('utf-8'))# *直接读取得到的是 bytes

f.close()


UTF-8
8
['123\n', '456\n']
123
456



## Regular Expressions

### basic

The Python "re" module provides regular expression support.

In [30]:
import re
s = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', s)
# If-statement after search() tests if it succeeded
if match:
    print('found', match.group())  # 'found word:cat'
else:
    print('did not find')
s = 'purple alice-b@google.com monkey dishwasher'
match = re.search(r'[\w.-]+@[\w.-]+', s)#  !work inside square brackets too with the one exception that dot (.) just means a literal dot. 
if match:
    print(match.group())  # 'alice-b@google.com'


found word:cat
alice-b@google.com


#### Group Extraction

 On a successful search, match.group(1) is the match text corresponding to the 1st left parenthesis, and match.group(2) is the text corresponding to the 2nd left parenthesis. The plain match.group() is still the whole match text as usual.

In [31]:
s = 'purple alice-b@google.com monkey dishwasher'
match = re.search(r'([\w.-]+)@([\w.-]+)', s)
if match:
    print(match.group())  # 'alice-b@google.com' (the whole match)
    print(match.group(1))  # 'alice-b' (the username, group 1)
    print(match.group(2))  # 'google.com' (the host, group 2)


alice-b@google.com
alice-b
google.com


#### findall

In [32]:
# Suppose we have a text with many email addresses
s = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'

# Here re.findall() returns a list of all the found email strings
# ['alice@google.com', 'bob@abc.com']
emails = re.findall(r'[\w\.-]+@[\w\.-]+', s)
for email in emails:
    # do something with each found email string
    print(email)

tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', s)
print(tuples)  # [('alice', 'google.com'), ('bob', 'abc.com')]
for tuple in tuples:
    print(tuple[0])  # username
    print(tuple[1])  # host


#   # Open file
#   f = open('test.txt', 'r')
#!   # Feed the file text into findall(); it returns a list of all the found strings
#   strings = re.findall(r'some pattern', f.read())

alice@google.com
bob@abc.com
[('alice', 'google.com'), ('bob', 'abc.com')]
alice
google.com
bob
abc.com


#### Greedy vs. Non-Greedy 

Suppose you have text with tags in it: \<b>foo</b> and \<i>so on</i>

Suppose you are trying to match each tag with the pattern '(<.*>)' -- what does it match first?

The result is a little surprising, but the greedy aspect of the .* causes it to match the whole '\<b>foo</b> and \<i>so on</i>' as one big match. The problem is that the .* goes as far as is it can, instead of stopping at the first > (aka it is "greedy").

There is an extension to regular expression where you add a ? at the end, such as .\*? or .+?, changing them to be non-greedy. Now they stop as soon as they can. So the pattern '(<.\*?>)' will get just '<b>' as the first match, and '</b>' as the second match, and so on getting each <..> pair in turn. The style is typically that you use a .\*?, and then immediately its right look for some concrete marker (> in this case) that forces the end of the .\*? run.



#### Substitution

The re.sub(pat, replacement, str) function searches for all the instances of pattern in the given string, and replaces them. The replacement string can include '\1', '\2' which refer to the text from group(1), group(2), and so on from the original matching text.

In [33]:
s = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
# re.sub(pat, replacement, str) -- returns new string with all replacements,
# \1 is group(1), \2 group(2) in the replacement
print(re.sub(r'([\w\.-]+)@([\w\.-]+)', r'\1@yo-yo-dyne.com', s))
# purple alice@yo-yo-dyne.com, blah monkey bob@yo-yo-dyne.com blah dishwasher


purple alice@yo-yo-dyne.com, blah monkey bob@yo-yo-dyne.com blah dishwasher


## 可变类型与不可变类型

Python中的变量分为可变类型和不可变类型 两种。

　　可变类型： 数字、字符串、元组、可变集合。

　　不可变类型： 列表、字典、不可变集合。

　　这里的可变不可变，是指内存中的那块内容（value）是否可以被改变

 　　所谓可变类型与不可变类型是指：数据能够直接进行修改，如果能直接修改那么就是可变，否则是不可变。

In [1]:
def inc(n):
    print(id(n))# !赋值给形参，共享内存
    n=n+1 # !在这里形参n的引用改变
    print(id(n))
b=1
print(b)
print(id(b))
inc(b)
print(b)
print(id(b))

1
9788992
9788992
9789024
1
9788992


In [8]:
print(id(1))
a=1
print(id(a))
b=a
print(id(b))

# 都指向1的地址

9788992
9788992
9788992


In [2]:
a=1
b=a+1
print(id(a),id(b))

9788992 9789024


 数字a=1，在修改a+1后得到的b，此时就会创建一个新的内存地址用来保存修改后的对象，所以内存地址不一样。而对于可变类型数据在这一点就不一样