In [4]:
"""
Strings as sequences of characters

1. For the purposes of extracting characters and substrings, strings can be considered to be sequences of characters, which means that you
   can use index or slice notation.

2. But strings aren’t lists of characters. The most noticeable difference between strings and lists is that unlike lists, strings can’t
   be modified. Attempting to say something like string.append('c') or string[0] = 'H' results in an error. 
"""

x = "Hello"
print(x[0], x[-1], x[1:])

y = "Goodbye\n"
y = y[:-1] # created a new string, not modify.
print(y)

print(len(y))

# string concatenation operator + and multiplication operator *
x =  "Hello " + "World"
print(x)
print(x * 7)

H o ello
Goodbye
7
Hello World
Hello WorldHello WorldHello WorldHello WorldHello WorldHello WorldHello World


In [17]:
"""
Special Characters

1. Sequences of characters that start with a backslash and that are used to represent other characters are called escape sequences. Escape
   sequences are generally used to represent special characters—that is, characters (such as tab and newline) that don’t have a standard
   one-character printable representation.

2. Basic escape sequences:  \' \" \\ \a \b \f \n \r \t \v

3. Numeric (octal and hexadecimal) escape sequences: You can include any ASCII character in a string by using an octal (base 8) or
   hexadecimal(base 16) escape sequence corresponding to that character. An octal escape sequence is a backslash followed by three digits
   defining an octal number; the ASCII character corresponding to this octal number is substituted for the octal escape sequence.
   A hexadecimal escape sequence is with backslash x rather than just backslash and can consist of any number of hexadecimal digits.
   The escape sequence is terminated when a character is found that’s not a hexadecimal digit. 

4. Because all strings in Python 3 are Unicode strings, they can also contain almost every character from every language available. And
   the Unicode character set includes the common ASCII characters.
   
5. A string that’s evaluated at the top level of an interactive Python session is shown with all of its special characters as octal escape
   sequences, which makes clear what’s in the string. Meanwhile, the print function passes the string directly to the terminal program,
   which may interpret special characters in special ways.

6. A normal print function also adds a newline to the end of the string. Sometimes (that is, when you have lines from files that already
   end with newlines), you may not want this behavior. Giving the print function an end parameter of "" causes the print function
   to not append the newline.
"""

print("m",  "\155", "\x6D")
print("\n", "\012", "\x0A")

unicode_a = '\N{LATIN SMALL LETTER A}'
unicode_a_with_acute = '\N{LATIN SMALL LETTER A WITH ACUTE}'
print(unicode_a)
print(unicode_a_with_acute)
print("\u00E1" )

'a\n\tb'
print('a\n\tb')

print("abc\n")
print("abc\n", end="")

m m m

 
 

a
á
á
a
	b
abc

abc


In [36]:
"""
String Method (1)

1. Most of the Python string methods are built into the standard Python string class, so all string objects have them automatically.
   The standard string module also contains some useful constants. You need only remember that most string methods are attached to the
   string object they operate on by a dot (.), as in x.upper(). That is, they’re prepended with the string object followed by a dot.
   Because strings are immutable, the string methods are used only to obtain their return value and don’t modify the string object
   they’re attached to in any way.

2. String concatenation using + is useful but not efficient for joining large numbers of strings into a single string, because each time
   + is applied, a new string object is created.A better option is to use the join function.

3. The most common use of split is probably as a simple parsing mechanism for stringdelimited records stored in text files. By default,
   split splits on any whitespace, not just a single space character, but you can also tell it to split on a particular sequence by passing
   it an optional argument.

4. By specifying how many splits split should perform when it’s generating its result, via an optional second argument. If you specify
   n splits, split goes along the input string until it has performed n splits (generating a list with n+1 substrings as elements) or
   until it runs out of string.

5. You can use the functions int and float to convert strings to integer or floatingpoint numbers, respectively. If they’re passed
   a string that can’t be interpreted as a number of the given type, these functions raise a ValueError exception.
"""

# join() method
print(" ".join(["join", "puts", "spaces", "between", "elements"]))
print("::".join(["seperated", "with", "colons"]))
print("".join(["seperated", "by", "nothing"]))

# split() method()
x = "You\t\t can have tabs\t\n \t and newlines \n\n mixed in"
y = "Mississippi"
z = 'a b c d'
print(x.split())
print(y.split("ss"))
print(z.split(' ', 1))
print(z.split(' ', 2))
print(z.split(' ', 9))

# converting string to numbers
print(float("123.456"))
# print(float("xxyy"))
# print(int("123.456"))
print(int("3333"))
print(int("101", 2))
print(int("100000", 8))
print(int("ff", 16))
# print(int("123456", 6))

join puts spaces between elements
seperated::with::colons
seperatedbynothing
['You', 'can', 'have', 'tabs', 'and', 'newlines', 'mixed', 'in']
['Mi', 'i', 'ippi']
['a', 'b c d']
['a', 'b', 'c d']
['a', 'b', 'c', 'd']
123.456
3333
5
32768
255


In [41]:
"""
String Method (2)


1. You can use the functions int and float to convert strings to integer or floating-point numbers, respectively. If they’re passed
   a string that can’t be interpreted as a number of the given type, these functions raise a ValueError exception. In addition,
   you may pass int an optional second argument, specifying the numeric base to use when interpreting the input string.

2. The method strip() returns a new string that’s the same as the original string, except that any whitespace at the beginning or end
   of the string has been removed. lstrip() and rstrip() work similarly, except that they remove whitespace only at the left or right end
   of the original string, respectively. And you can change which characters strip, rstrip, and lstrip remove by passing a string
   containing the characters to be removed as an extra parameter.

3. The four basic string-searching methods are similar: find, rfind, index, and rindex. A related method, count, counts how many times
   a substring can be found in another string. 
   
4. find() takes one required argument: the substring being searched for. find() returns the position of the first character of the first
   instance of substring in the string object, or –1 if substring doesn’t occur in the string. And find() can also take one or two
   additional, optional arguments. The first of these arguments, if present, is an integer start; it causes find to ignore all characters
   before position start in string when searching for substring. The second optional argument, if present, is an integer end; it causes
   find to ignore characters at or after position end in string.

5. rfind() is almost the same as find(), except that it starts its search at the end of string and so returns the position of the first
   character of the last occurrence of substring in string.

6. index() and rindex() are identical to find and rfind, respectively, except for one difference: If index or rindex fails to find an
   occurrence of substring in string, it doesn’t return –1 but raises a ValueError exception. 

7. Other string methods to search strings: startswith and endswith. These methods return a True or False result, depending on whether
   the string they’re used on starts or ends with one of the strings given as parameters. Both startswith and endswith can look for more
   than one string at a time. If the parameter is a tuple of strings, both methods check for all the strings in the tuple and return True
   if any one of them is found.

8.  In my experience, many uses of basic string searches are inappropriate. You’d benefit from a more powerful searching mechanism
    from re module.
"""

# get rid of extra whitespace
x = " Hello, World\t\t "
y = "www.python.org"
print(x.strip())
print(x.lstrip())
print(x.rstrip())
print(y.strip("w")) # strip all w
print(y.strip(".gorw")) # strip all . g o r w

# substring
x = "Mississippi"
print(x.find("ss"), x.find("zz"))
print(x.find("ss", 3), x.find("ss", 0, 3))
print(x.rfind("ss"))
print(x.count("ss"))

# startswith & endswith
print(x.startswith("Miss"))
print(x.endswith("pi"))
print(x.endswith(("i", "u")))

Hello, World
Hello, World		 
 Hello, World
.python.org
python
2 -1
5 -1
5
2
True
True


In [55]:
"""
String Method (3)

1. Strings are immutable, but string objects have several methods that can operate on that string and return a new string that’s a modified
   version of the original string. This provides much the same effect as direct modification for most purposes. 

2. You can use the replace method to replace occurrences of substring (its first argument) in the string with newstring (its second argument).
   Like the string search functions, the re module is a much more powerful method of substring replacement.

3. The functions string.maketrans and string.translate may be used together to translate characters in strings into different characters.
   The method maketrans() make up a translation table from its two string arguments. The two arguments must each contain the same number of
   characters, and a table is made such that looking up the nth character of the first argument in that table gives back the nth character
   of the second argument. Next, the table produced by maketrans() is passed to translate(). Then translate() goes over each of the characters
   in its string object and checks to see whether they can be found in the table given as the second argument. If a character can be
   found in the translation table, translate() replaces that character with the corresponding character looked up in the table to produce
   the translated string. You can give translate an optional argument to specify characters that should be removed from the string.

4. Other functions:
   (*) string.lower converts all alphabetic characters in a string to lowercase, and upper does the opposite.
   (*) capitalize capitalizes the first character of a string, and title capitalizes all words in a string.
   (*) swapcase converts lowercase characters to uppercase and uppercase to lowercase in the same string.
   (*) expandtabs gets rid of tab characters in a string by replacing each tab with a specified number of spaces. 
   (*) ljust, rjust, and center pad a string with spaces to justify it in a certain field width.
   (*) zfill left-pads a numeric string with zeros   

5. Because strings are immutable objects, you have no way to manipulate them directly in the same way that you can manipulate lists.
   Although the operations that produce new strings (leaving the original strings unchanged) are useful for many things, sometimes you
   want to be able to manipulate a string as though it were a list of characters. In that case, turn the string into a list of characters,
   do whatever you want, and then turn the resulting list back into a string.
   
6. Remember that strings are sequences of characters, so you can use the convenient Python in operator to test for a character’s
   membership in any of these strings, although usually the existing string methods are simpler and easier.
"""

# replace() method
x = "Mississippi"
print(x.replace("ss", "+++"))

# maketrans() & translate() method
x = "~x ^ (y % z)"
table = x.maketrans("~^()", "!&[]") # The two arguments must each contain the same number of characters
print(x.translate(table))

# other functions
x = "Mississippi"
y = "\t\tHello\t\tWorld\t\t"
z = "  Hello World  "
print(x.lower(), x.upper())
print(x.capitalize(), x.title())
print(x.swapcase())
print(y.expandtabs(1))
print(z.ljust(5))

# Modifying strings with list manipulations
text = "Hello, World"
wordList = list(text)
wordList[6:] = []
wordList.reverse()
text = "".join(wordList) # return to string
print(text)

wordList_tuple = tuple(text)
print(wordList_tuple)

# useful methods to report various characteristics of the string
x = "123"
y = "M"
print(x.isdigit())
print(x.isalpha())
print(y.islower())
print(y.isupper())

# string constant frin string module
import string

print(string.whitespace)
print(string.digits)
print(string.hexdigits)
print(string.octdigits)

# Convert a Unicode string to a byte object
print(x.encode("utf_8"))

Mi+++i+++ippi
!x & [y % z]
mississippi MISSISSIPPI
Mississippi Mississippi
mISSISSIPPI
  Hello  World  
  Hello World  
,olleH
(',', 'o', 'l', 'l', 'e', 'H')
True
False
False
True
 	

0123456789
0123456789abcdefABCDEF
01234567
b'123'


In [63]:
"""
Converting from objects to strings

1. In Python, almost anything can be converted to some sort of a string representation by using the built-in repr function. The methold repr()
   can be used to obtain some sort of string representation for almost any Python object.

2. Try it on each Python data type (dictionaries, tuples, classes, and the like), you’ll see that no matter what type of Python object you
   have, you can get a string that describes something about that object. This is great for debugging programs. If you’re in doubt about
   what’s held in a variable at a certain point in your program, use repr() and print out the contents of that variable.
   
3.  The repr() function always returns what might be loosely called the formal string representation of a Python object. More specifically,
    repr() returns a string representation of a Python object from which the original object can be rebuilt.  Python also provides the
    built-in str function. In contrast to repr(), str() is intended to produce printable string representations, and it can be applied to
    any Python object. str() returns what might be called the informal string representation of the object. A string returned by str()
    need not define an object fully and is intended to be read by humans, not by Python code.

4. Until you begin using the object-oriented features of Python, there’s no difference. str applied to any built-in Python object always
   calls repr to calculate its result. Only when you start defining your own classes does the difference between str() and repr() become
   important. 
"""

# convert the list to string
l = [1, 2, 3]
l.append(4)
l.append([5, 6])
print("The list of l is ", repr(l))
print(type(repr(l)))
print(repr(len), type(repr(len))) # describe the len() function
print(str(len))

The list of l is  [1, 2, 3, 4, [5, 6]]
<class 'str'>
<built-in function len> <class 'str'>
<built-in function len>


In [71]:
"""
Using the format() method

1. You can format strings using the string class’s format method. The format method combines a format string containing replacement
   fields marked with { } with replacement values taken from the parameters given to the format command. If you need to include a literal
   { or } in the string, you double it to {{ or }}. 

2. Format specifiers let you specify the result of the formatting with even more power and control than the formatting sequences of
   the older style of string formatting. The format specifier lets you control the fill character, alignment, sign, width, precision,
   and type of the data when it’s substituted for the replacement field.
"""

# the numbered replacement
text1 = "{0} is the {1} of {2}".format("Ambrosia", "food", "the gods")
text2 = "{{Ambrosia}} is the {0} of {1}".format("food", "the gods")
print(text1)
print(text2)

# the named replacement
text3 = "{food} is the food of {user}".format(food="Ambrosia", user="the gods")
print(text3)

# both the numbered and named replacement
text4 = "{0} is the food of {user[1]}".format("Ambrosia", user=["men", "the gods", "the others"])
print(text4)

# format specifiers
text5 = "{0:10} is the food of gods".format("Ambrosia")
text6 = "{0:{1}} is the food of gods".format("Ambrosia", 10)
print(text5)
print(text6)

text7 = "{0:>10} is the food of gods".format("Ambrosia") 
print(text7)

text8 = "{0:&>10} is the food of gods".format("Ambrosia")
print(text8)

Ambrosia is the food of the gods
{Ambrosia} is the food of the gods
Ambrosia is the food of the gods
Ambrosia is the food of the gods
Ambrosia   is the food of gods
Ambrosia   is the food of gods
  Ambrosia is the food of gods
&&Ambrosia is the food of gods


In [76]:
"""
Formating strings with %

1. The use of % for string formatting is the old style of string formatting. This style of formatting shouldn’t be used in new code.

2. The string modulus operator takes two parts: the left side, which is a string, and the right side, which is a tuple. The string modulus
   operator scans the left string for special formatting sequences and produces a new string by substituting the values on the right side
   for those formatting sequences, in order. 

3. Formatting sequences can specify what should be substituted for them by name rather than by position. When you do this, each formatting
   sequence has a name in parentheses immediately following the initial % of the formatting sequence. In addition, the argument to the right
   of the % operator is no longer given as a single value or tuple of values to be printed, but as a dictionary of values to be printed,
   with each named formatting sequence having a correspondingly named key in the dictionary. 
"""

text1 = "%s is the %s of %s" % ("Ambrosia", "food", "the gods")
print(text1)

# The members of the tuple on the right have str applied to them automatically by %s, so they don’t have to already be strings
x = [1, 2, "three"]
text2 = "The %s contains: %s" % ("list", x)
print(text2)

# The field width (total number of characters) of a printed number to be six, specifies the number of characters after the decimal point
# to be two, and left-justifies the number in its field.
text3 = "Pi is <%-6.2f>" % 3.14159
print(text3)

# named parameters
num_dict = {'e': 2.718, 'pi': 3.14159}
print("%(pi).2f - %(pi).4f - %(e).2f" % num_dict)

# print()
print("a", "b", "c")
print("a", "b", "c", sep="|")
print("a", "b", "c", end="\n\n")

Ambrosia is the food of the gods
The list contains: [1, 2, 'three']
Pi is <3.14  >
3.14 - 3.1416 - 2.72
a b c
a|b|c
a b c



In [78]:
"""
String Interpolation (from python 3.6)

String interpolation is a way to include the values of Python expressions inside literal strings. These f-strings, as they’re commonly
called because they are prefixed with f, use a syntax similar to that of the format method, but with a little less overhead
"""

value = 42
message = f"The anwser is {value}"
print(message)

pi = 3.1415
print(f"pi is {pi:{10}.{2}}")

The anwser is 42
pi is        3.1


In [82]:
"""
Bytes

1. A bytes object is similar to a string object but with an important difference: A string is an immutable sequence of Unicode characters,
   whereas a bytes object is a sequence of integers with values from 0 to 256. Bytes can be necessary when you’re dealing with binary data,
   such as reading from a binary data file.

2. The key thing to remember is that bytes objects may look like strings, but they can’t be used exactly like strings or combined with
   strings.
"""
unicode_a_with_acute = '\N{LATIN SMALL LETTER A WITH ACUTE}'
print(unicode_a_with_acute)

# Convert from a regular (Unicode) string to bytes, you need to call the string’s encode method
xb = unicode_a_with_acute.encode() 
print(xb)

# Convert a bytes object back to a string, you need to call that object’s decode method
print(xb.decode())

á
b'\xc3\xa1'
á
