# Lecture 6
  - Strings
  - Operations
  - Formatting
  - String Methods

## Unicode
- In early days all characters were represented by ASCII
- Characters occupied 8 bits
- 0 - 127 values were enough to represent english languge:
  - Upper and lower case alphabets
  - Digits
  - Punctuations
  - Non printable characters (control characters)
- A letter maps to some bits which you can store on disk or in memory:
  -  A -> 0100 0001
- **Other languages started using codes from 128-255**
- Asian Languages has thousands of characters
  - They needed more memory

In [None]:
bin(ord('A'))  # hex 0x41, 0100 = 0x4  0001 = 0x1 -->  0x41

-  **Unicode consortium** came up with a concept called **code point**, where every character is represented as:
   - For e.g: Hello --> U+0048 U+0065 U+006C U+006C U+006F
   - English text rarely used code points above U+00FF
   - Then the issue comes with how they will stored in memory?
     - Endian issues:
       00 48 vs 48 00
-  UTF-8 was another system for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes 
-  In **UTF-8**, every **code point from 0-127** is stored in a **single byte**
-  Only **code points 128 and above** are stored using **2, 3, in fact, up to 6 bytes**
-  **English text** looks exactly the same in **UTF-8** as it did in **ASCII**




In [None]:
for i in "Hello":    # iterating through a sequence (string)
    codePoint = ord(i)
    print("{} --> {:3d} {} {}".format(i,codePoint,hex(codePoint), bin(codePoint)))

#### Python Strings are Unicode by Default

In [None]:
print(u"hi 猫")    # u prefixed before a string denotes unicode

In [None]:
print("hi 猫")    # unicode by default, u prefix is not needed

In [None]:
print("∑ Ω µ")   # greek alphabets

## Strings
- Immutable 
  - **Readonly**, values cant be changed
  - They cannot be modified
  - **New strings** will formed on certain string operations
- String variables can be **reassigned** with new values


### Single, Double or Triple Quotes
-  Triple quotes are used when strings have line breaks
   -  For e.g.: you have a paragraph of text you want to assign to variable

In [None]:
# you can use single or double or triple quotes to represent strings
string1 = 'Hello World'  # single quote
string2 = "Hello World"  # double quote
string3 = '''Hello World''' # triple single quote

print ("String 1, 2, 3: ", string1, string2, string3)
print ("Type of String 1, 2, 3: ", type(string1), type(string2), type(string3))

In [None]:
# multi line string (or a comment)
para = '''This is line 1.
this is line 2.
This is line3.'''
print (para)

### Escape Characters
-  Have special meaning to certain characters
-  Character preceded by a back slash \
   -  '\a'  alarm
   -  '\t'  tab
   -  '\n'  newline
   -  '\r'  carriage return

In [None]:
print ('\a')  # you are supposed to hear a beep

In [None]:
print ("foo", "\t", "bar")  # prints a tab between foo and bar

In [None]:
print ("foo\n.")  # prints a new line and then dot
print ("bar\n.")

In [None]:
print ("foo\r.")  # prints foo and then carriage return and overwrite f with doe
print ("bar\r.")  # carriage return and overwrites b with .

In [None]:
stringu = u'Hello World'  #default is unicode in python3
stringr = r'Hello\tWorld\n'  # raw string where escape \ does not mean anything
print ("String u, r: ", stringu, stringr)
print ("Type of String u, r: ", type(stringu), type(stringr))

In [None]:
print ("foo\tbar")   # tab character

In [None]:
print (r"foo\tbar")  # raw string

In [None]:
print ("foo\\bar")  # escape the backslash

In [None]:
print ("foo\\\\bar")  # escape the backslash

#### Raw Strings - Strings prefixed with r or R

In [None]:
r = r"\nh\ni\n"  # supresses the meaning of slash
print(r)

### String Variable Reinitialized
-  **New string object is being formed**
-  **String ids are different**
![Reinitialized](images/Lecture-6.002.png)

In [None]:
foo='Foo'
print("foo:", foo, "id:", id(foo))
foo ='Bar'
print("foo:", foo, "id:", id(foo))

## Operations

### Concatenation +
-  str1 + str2
-  str1 + str2 + str3
![Concatenation](images/Lecture-6.003.png)

In [None]:
fooStr = 'Foo'
barStr = 'Bar'
cat = fooStr + barStr
print(cat)

In [None]:
# Ids are all different, because different object strings
print('id fooStr {}, id barStr {}, id cat {}'.format(id(fooStr), id(barStr), id(cat)))

In [None]:
cat = "Foo" "Bar"   # no plus, strings by themselves
cat

In [None]:
cat = "Foo" "Bar" 'Baz'
cat

### Repetition *
-  str1*3
-  3*str1

![Repetition](images/Lecture-6.004.png)

In [None]:
print("fooStr:", fooStr)
rep = fooStr*3
rep2 = 3*fooStr
print (rep, rep2)

In [None]:
id(rep) == id(rep2)  # different string objects

In [None]:
rep3 = fooStr * -5  # what heppens when multiplied by -ve number?
rep3 # creates an empty string

### Index [ ]
![Index](images/Lecture-6.005.png)

In [None]:
helloWorld = 'Hello Wo'
print ("Length:", len(helloWorld))
print ("Index 0: ", helloWorld[0], "Index 3:", helloWorld[3])

In [None]:
for index, value in enumerate(helloWorld): # enumerate gives index and value of sequence
    print(" Index: {} Value: {}".format(index,value))

In [None]:
helloWorld[8]   # Expect IndexError, there no element at 8th position

![Index](images/Lecture-6.006.png)

In [None]:
print ("Index -1: ", helloWorld[-1], "Index -8:", helloWorld[-8])

### Slice [start:Upto:Skip]
![Slice](images/Lecture-6.007.png)

In [None]:
print ("helloWorld[1:4]", helloWorld[1:4])

![Slice](images/Lecture-6.008.png)

In [None]:
print ("helloWorld[0:7:2]", helloWorld[0:7:2]) # stride or skip 2

![Slice](images/Lecture-6.009.png)

In [None]:
print ("helloWorld[::-1]", helloWorld[::-1]) # reverse slicing

### Immutable, Cant Change String

In [None]:
helloWorld[0]  # read character at 0th index

In [None]:
helloWorld[0] = 'h'  # Expect TypeError 

### String Methods
- dir(str)
- help(str.casefold)

In [None]:
dir(str)                      # returns methods in string as a list

In [None]:
for _ in dir(str):
    if  _.startswith("__"):   # filtering off names which startswith __
        continue
    print(_)

In [None]:
help(str.casefold)

In [None]:
s_upper = "HELLO"
s_lower = "hello"

s_upper == s_lower  # strings are case sensitive (upper and lcwer case are different)

#### Casefold

In [None]:
s_upper.casefold() == s_lower.casefold()

#### Capitalize

In [None]:
"hello world".capitalize()  # capitalizes the first letter of first word

In [None]:
s="hello world"
s.capitalize()  # capitalizes the first letter of first word

#### Title

In [None]:
"hello world, greetings".title()    # capitalizes first letter of every word

In [None]:
g="hello world, greetings"
g.title()    # capitalizes first letter of every word

#### Upper

In [None]:
"hello world".upper()  # converts to upper case

In [None]:
x = "hello world"
x.upper()  # converts to upper case

#### Lower

In [None]:
'HELLO WORLD'.lower()  # converts to lower case

#### Count

In [None]:
"hello world".count('l')  # there are 3 l in hello world

In [None]:
"hello world".count('Z')  # there is no Z in hello world

#### Strip White Spaces

In [None]:
s = "    hello world   "  # whitespaces at start and end are stripped

In [None]:
print("s: |{}| s.strip: |{}|".format(s, s.strip()))

In [None]:
s.strip()

#### Strip Leading and Trailing Characters

In [None]:
s = " ;,hello world!?   "  # whitespaces and punctuations
s.strip(';,!? ')           # remove characters at start and end of string

#### Split White Spaces

In [None]:
s="The brown fox jumped quickly at the lazy dogs"

In [None]:
s.split()  # breaks apart sentence into list of words

#### Strip and Split lines

In [None]:
s = '''
line 1
line 2
line 3
'''
s    # note \n are new lines

In [None]:
s.splitlines() # splits into a list of single lines

In [None]:
s.strip().splitlines()

#### Replace

In [None]:
s = "Python Programmers is cool!"
s.replace("is","are")

In [None]:
"Python Programmers is cool!".replace("is","are")

In [None]:
s.replace("cool","COOL")

#### Join

In [None]:
a2zLetters = "The brown fox jumped quickly at the lazy dogs"
a2zWords = a2zLetters.split()
print(a2zWords) # print list of words

" ".join(a2zWords) # creates a string from list of words

In [None]:
s = ''
s.join(a2zWords) # creates a string from list of words

In [None]:
''.join(a2zWords) # creates a string from list of words

In [None]:
s = '-'
s.join(a2zWords) # creates a string from list of words

In [None]:
'_'.join(a2zWords) # creates a string from list of words

#### Index vs Find
- index, rindex
  - Raises value error if string not found
- find, rfind
  - returns -1 if string not found

In [None]:
s = "One Two Three"
s.find("e")  # e from left

In [None]:
s.rfind('e')  # e from right

In [None]:
s.find("Foo")  # returns -1 when substring not found in string

In [None]:
s = "One Two Three"
s.index("e")

In [None]:
s.rindex('e')

In [None]:
s.index("Foo")   # expect valueError

## String Format Specifier
- A string can be formated using %
  - "format specifier"  % (arguments) 
     - Note here % is not the modulus operator

| format | remarks |
| --- | --- |
| %d, %i | Decimal |
| %s | string |
| %f | floating point |
| %e, %E | scientific notation |
| %x, %X | hex |

In [None]:
print ("percent d: %d" % (10))  # print a decimal number

In [None]:
print ("percent i: %i" % (-10.23)) # %i and %d are same

In [None]:
print ("percent f: %f" % (3.1415)) # print a floating point

In [None]:
print ("percent e/E: %e" % (10000)) # print in scientific notation

In [None]:
print ("percent s: %s" % (10000))   # print a string

In [None]:
print ("percent x: %x" % (65534)) # print integer in hexadecimal

In [None]:
print ("percent X: %X" % (65534)) # %x or %X prints in hexadecimal

In [None]:
print ("percent X: %X %X" % (47710, 47633))

### String Format method
- ''.format()

In [None]:
'{} {}'.format('one', 'two')

In [None]:
'{} {}'.format(1, 2)

In [None]:
'{1} {0} {1}'.format("one","two")

In [None]:
'{1} {0} {1}'.format(1,2)

In [None]:
'| {0:<10} | {0:^10} | {0:>10} |'.format('Hello') # sufficient width for hello

In [None]:
'| {0:<2} | {0:^2} | {0:>2} |'.format('Hello') # string longer than width provided

In [None]:
'{:10.5}'.format('Hello World')  # only 5 characters printedcv

In [None]:
'{:06.2f}'.format(3.141592653589793)

## Recap
- Unicode
- Strings
- Concatenation and Repetition operators
- String indexing, Slicing
- String Format Sepcifier and Format Methods
- String Methods

## Assignments
- String Operations Assignment
- String Operations Writing Assignment

## Quiz
- Quiz 6

## Reference

[Joel On Unicode](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/)

[Strings and Character Data in Python](https://realpython.com/python-strings)
  