# Lecture 6
  - Strings
  - Operations
  - Formatting
  - String Methods

## Unicode
- In early days all characters were represented by ASCII
- Characters occupied 8 bits
- 0 - 127 values were enough to represent english languge:
  - Upper and lower case alphabets
  - Digits
  - Punctuations
  - Non printable characters (control characters)
- A letter maps to some bits which you can store on disk or in memory:
  -  A -> 0100 0001
- Other languages started using codes from 128-255
- Asian Languages has thousands of characters
  - They needed more memory

In [123]:
bin(ord('A'))

'0b1000001'

-  **Unicode consortium** came up with a concept called **code point**, where every character is represented as:
   - For e.g: Hello --> U+0048 U+0065 U+006C U+006C U+006F
   - English text rarely used code points above U+00FF
   - Then the issue comes with how they will stored in memory?
     - Endian issues:
       00 48 vs 48 00
-  UTF-8 was another system for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes 
-  In UTF-8, every code point from 0-127 is stored in a single byte
-  Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes
-  English text looks exactly the same in UTF-8 as it did in ASCII




In [124]:
for i in "Hello":
    print("{} --> {}".format(i,hex(ord(i))))

H --> 0x48
e --> 0x65
l --> 0x6c
l --> 0x6c
o --> 0x6f


In [125]:
print("hi 猫")

hi 猫


In [126]:
print(u"hi 猫")  # default unicode strings

hi 猫


In [127]:
print("∑ Ω µ")

∑ Ω µ


In [128]:
r = r"\nh\ni\n"  # supresses the meaning of slash
print(r)

\nh\ni\n


## Strings
- Immutable 
  - Readonly, values cant be changed
  - They cannot be modified
  - New strings will formed on certain string operations
- String variables can be reassigned with new values


### Single, Double or Triple Quotes
-  Triple quotes are used when strings have line breaks
   -  For e.g.: you have a paragraph of text you want to assign to variable

In [129]:
# you can use single or double or triple quotes to represent strings
string1 = 'Hello World'
string2 = "Hello World"
string3 = '''Hello World'''
print ("String 1, 2, 3: ", string1, string2, string3)
print ("Type of String 1, 2, 3: ", type(string1), type(string2), type(string3))

String 1, 2, 3:  Hello World Hello World Hello World
Type of String 1, 2, 3:  <class 'str'> <class 'str'> <class 'str'>


In [132]:
# multi line string (or a comment)
para = '''This is line 1.
this is line 2.
This is line3.'''
print (para)

This is line 1.
this is line 2.
This is line3.


### Escape Characters
-  Have special meaning to certain characters
-  Character preceded by a back slash \
   -  '\a'
   -  '\t'
   -  '\n'
   -  '\r'

In [133]:
print ('\a')  # you are supposed to hear a beep




In [134]:
print ("foo", "\t", "bar")

foo 	 bar


In [135]:
print ("foo\n.")
print ("bar\n.")

foo
.
bar
.


In [136]:
print ("foo\r.")
print ("bar\r.")

foo.
bar.


In [137]:
stringu = u'Hello World'  #default is unicode in python3
stringr = r'Hello World'  # raw string where escape \ does not mean anything
print ("String u, r: ", stringu, stringr)
print ("Type of String u, r: ", type(stringu), type(stringr))

String u, r:  Hello World Hello World
Type of String u, r:  <class 'str'> <class 'str'>


In [138]:
print ("foo\tbar")   # tab character

foo	bar


In [139]:
print (r"foo\tbar")  # raw string

foo\tbar


In [140]:
print ("foo\\bar")  # escape the backslash

foo\bar


In [141]:
print ("foo\\\\bar")  # escape the backslash

foo\\bar


### String Variable Reinitialized
-  New string object is being formed
-  String ids are different
![Reinitialized](images/Lecture-6.002.png)

In [142]:
foo='Foo'
print("foo:", foo, "id:", id(foo))
foo ='Bar'
print("foo:", foo, "id:", id(foo))

foo: Foo id: 140366917016688
foo: Bar id: 140366917044072


## Operations

### Concatenation +
-  str1 + str2
-  str1 + str2 + str3
![Concatenation](images/Lecture-6.003.png)

In [143]:
fooStr = 'Foo'
barStr = 'Bar'
cat = fooStr + barStr
print(cat)

FooBar


In [144]:
cat = "Foo" "Bar"   # no plus, strings by themselves
cat

'FooBar'

In [145]:
cat = "Foo" "Bar" 'Baz'
cat

'FooBarBaz'

### Repetition *
-  str1*3
-  3*str1

![Repetition](images/Lecture-6.004.png)

In [146]:
rep = fooStr*3
rep2 = 2*fooStr
print (rep, rep2)

FooFooFoo FooFoo


### Index []
![Index](images/Lecture-6.005.png)

In [147]:
helloWorld = 'Hello Wo'
print ("Length:", len(helloWorld))
print ("Index 0: ", helloWorld[0], "Index 3:", helloWorld[3])

Length: 8
Index 0:  H Index 3: l


In [148]:
helloWorld[8]

IndexError: string index out of range

![Index](images/Lecture-6.006.png)

In [149]:
print ("Index -1: ", helloWorld[-1], "Index -8:", helloWorld[-8])

Index -1:  o Index -8: H


### Slice [start:Upto:Skip]
![Slice](images/Lecture-6.007.png)

In [150]:
print ("helloWorld[1:4]", helloWorld[1:4])

helloWorld[1:4] ell


![Slice](images/Lecture-6.008.png)

In [151]:
print ("helloWorld[0:7:2]", helloWorld[0:7:2])

helloWorld[0:7:2] HloW


![Slice](images/Lecture-6.009.png)

In [152]:
print ("helloWorld[::-1]", helloWorld[::-1])

helloWorld[::-1] oW olleH


## String Format specifier
- A string can be formated using %

| format | remarks |
| --- | --- |
| %d, %i | Decimal |
| %s | string |
| %f | floating point |
| %e, %E | scientific notation |
| %x, %X | hex |

In [153]:
print ("percent d: %d" % (10))

percent d: 10


In [154]:
print ("percent i: %i" % (-10.23))

percent i: -10


In [155]:
print ("percent f: %f" % (3.1415))

percent f: 3.141500


In [156]:
print ("percent e/E: %e" % (10000))

percent e/E: 1.000000e+04


In [157]:
print ("percent s: %s" % (10000))

percent s: 10000


In [158]:
print ("percent x: %x" % (65534))

percent x: fffe


In [159]:
print ("percent X: %X" % (65534))

percent X: FFFE


In [None]:
print ("percent X: %X %X" % (47710, 47633))

### String Methods
- dir(str)
- help(str.casefold)

In [None]:
dir(str)

In [None]:
help(str.casefold)

In [None]:
s_upper = "HELLO"
s_lower = "hello"

s_upper == s_lower

In [None]:
s_upper.casefold() == s_lower.casefold()

In [None]:
"hello world".capitalize()

In [None]:
"hello world".title()

In [None]:
"hello world".upper()

In [None]:
'HELLO WORLD'.lower()

In [None]:
"hello world".count('l')  # there are 3 l in hello world

In [None]:
"hello world".count('Z')  # there is no Z in hello world

#### Strip White Spaces

In [None]:
s = "    hello world   "  # whitespaces at start and end

In [None]:
print("s: |{}| s.strip: |{}|".format(s, s.strip()))

In [None]:
s.strip()

#### Strip Leading and Trailing Characters

In [None]:
s = " ;,hello world!?   "  # whitespaces and punctuations
s.strip(';,!? ')           # remove characters at start and end of string

#### Split White Spaces

In [None]:
s="The brown fox jumped quickly at the lazy dogs"

In [None]:
s.split()

### Strip and Splitlines

In [None]:
s = '''
line 1
line 2
line 3
'''
s    # note \n are new lines

In [None]:
s.splitlines() 

In [None]:
s.strip().splitlines()

### Replace

In [None]:
s = "Python Programmers is cool!"
s.replace("is","are")

In [None]:
s.replace("cool","COOL")

### Join

In [None]:
a2zLetters = "The brown fox jumped quickly at the lazy dogs"
a2zWords = a2zLetters.split()
" ".join(a2zWords)

s.join

### Index vs Find
- index, rindex
- find, rfind

In [None]:
s = "One Two Three"
s.find("e")

In [None]:
s.rfind('e')

In [None]:
s.find("Foo")

In [None]:
s = "One Two Three"
s.index("e")

In [None]:
s.rindex('e')

In [None]:
s.index("Foo")   # expect valueError

### String Format method
- ''.format()

In [None]:
'{} {}'.format('one', 'two')

In [None]:
'{} {}'.format(1, 2)

In [None]:
'{1} {0} {1}'.format("one","two")

In [None]:
'{1} {0} {1}'.format(1,2)

In [None]:
'| {0:<10} | {0:^10} | {0:>10} |'.format('Hello') # sufficient width for hello

In [None]:
'| {0:<2} | {0:^2} | {0:>2} |'.format('Hello') # string longer than width provided


In [None]:
'{:10.5}'.format('Hello World')  # only 5 characters printedcv

In [None]:
'{:06.2f}'.format(3.141592653589793)

## Recap
- Unicode
- Strings
- Concatenation and Repetition operators
- String indexing, Slicing
- String Format Sepcifier and Format Methods
- String Methods

## Assignments
- String Operations Assignment
- String Operations Writing Assignment

In [None]:
w = "equation"
w1 = w[:3]
w1

In [None]:
"EQUAtion"  

In [None]:
w[:4].upper() + w[4:]

In [None]:
t = "programming is fun"
o = "Programming IS fun"
s = t.split()
s[0].capitalize() + s[1].upper() + s[2]

t.capitalize()
t.replace('is','IS')

## Quiz
- Quiz 6

In [None]:
string1 = "  :in the middle ; "
string1.strip(':; ')

## Reference

[Joel On Unicode](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/)

[Strings and Character Data in Python](https://realpython.com/python-strings)
  