# Strings
Strings are sequences of character data  

## String Properties
1. Contains characters (individual letters or symbols)
2. Have length (the numbers of characters)
3. *Immutable sequence* (the characters have left-to-right positional order and cannot be changed in place)

## String Literals

Using single or double quotes pair create standard string

In [18]:
s1 = 'food'                           # single quote
s2 = "food"                           # double quote

s1, s2

('food', 'food')

Using triple single or double quotes pair create multiline string

In [17]:
s1 = '''Multiline
        strings'''                      # Multiline string preserve whitepsace
s2 = """Multiline   
        strings"""

s1,s2

('Multiline\n        strings', 'Multiline   \n        strings')

Using `r` at starting quotes create raw string

In [15]:
s1 = r"C:\new\directory\file.dat"     # Raw strings suppress escape sequences
s2 = r'D:\path\to\file.bin' 

s1,s2

('C:\\new\\directory\\file.dat', 'D:\\path\\to\\file.bin')

### Escape Sequences in Strings
A backslash character in a string indicates that one or more characters that follow it should be treated specially
|Escape|Meaning|
|---|---|
|\newline|Ignored|
|\ \ |Backslash|
|\ ' |Single quote|
|\ " |Double quote|
|\a|Bell|
|\b|Backspace|
|\f|Formfee|
|\n|Newline|
|\r|Carriage return|
|\t|Horizontal tab|
|\v|Vertical tab|
|\xhh|Characters with hex value *hh* (2 digits)|
|\ooo|Characters with octal value *ooo* (up to 3 digits)|


In [14]:
print('\n Hello \'world\' \n \t This is Python')


 Hello 'world' 
 	 This is Python


## Basic String Operations

#### Determine string length using `len(s)` method

In [25]:
s = "Will my cat eat my eyeballs ?"

len(s)

29

#### Concatenation using `+` operator

In [24]:
s1 = "my name is"
s2 = "lucas"
s3 = "nice to meet you !"
s4 = s1 + " " + s2 + ", " + s3

s4

'my name is lucas, nice to meet you !'

#### Repetition using `*` operator

In [23]:
s = '-'
s = s*10    # repeat 10 times

s

'----------'

#### Iterating over string with for loop using `in` keyword

In [26]:
for c in s4:            # Print items in s4 one by one
    print(c,end=' ')

m y   n a m e   i s   l u c a s ,   n i c e   t o   m e e t   y o u   ! 

## String Indexing and Slicing

|S|L|I|C|E| |O|F| |S|T|R|I|N|G|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|
|-15|-14|-13|-12|-11|-10|-9|-8|-7|-6|-5|-4|-3|-2|-1|

Indexing `S[i]` fetches components at offsets:
- First item is at offset 0
- Negative index mean to count backward from right-to-left
- `S[0]` fetches the first item
- `S[-2]` fetches the second item from the right

Slicing `S[i:j]` extracts contiguous sections of sequences:
- The upper bound is noninclusive
- Slice boundaries default to 0 and the sequence length, if ommited
- `S[1:3]` fetches items at offset 1 up to but not including 3
- `S[1:]` fetches items at offset 1 through the end (the sequence length)
- `S[:3]` fetches items at offset 0 up to but not including 3
- `S[:-1]` fetches items at offset 0 up to but not including the last item
- `S[:]` fetches items at offset 0 through the end, thus making a top level copy of S

Extended slicing `S[i:j:k]` accepts a step (or stride) k, which defaults to 1:
- Allows for skipping items and reversing order

In [29]:
s = "SLICE OF STRING"
item1 = s[0]        # fetches the first item
item2 = s[-1]       # fetches the last item
item3 = s[3]        # fetches the fourth item
item4 = s[-3]       # fetches the third item from the right / end

item1,item2,item3,item4

('S', 'G', 'C', 'I')

In [35]:
s = "@SLICE OF STRING@"
slice1 = s[2:]     # fetches items at offset 2 through the end
slice2 = s[:4]     # fetches item at offset 0 up to but not including 4
slice3 = s[3:8]    # fetches items at offset 3 up to but not including 8
slice4 = s[-4:]    # fetches items at offset -4 through the end
slice5 = s[:-4]    # fetches items at offset 0 up to but not including (items at offset) -4
slice6 = s[-8:-4]  # fetches items at offset -8 up to but not including (items at offset) -8

s_copy = s[:]     # make a shallow copy of s8

slice1,slice2,slice3,slice4,slice5,slice6,s_copy


('LICE OF STRING@',
 '@SLI',
 'ICE O',
 'ING@',
 '@SLICE OF STR',
 ' STR',
 '@SLICE OF STRING@')

In [36]:
s = "@@SLICE OF STRING@@"
pos_strided_slice_s = s[1:15:3]       # fetches items at offset 1 up to but not including 15 with stride of 3
neg_strided_slice_s = s[5:1:-1]       # fetches items at offset 2 up to 5 in reverse order (with negative stride,the first two bounds are reversed)
reversed_s = s[::-1]                  # reverse string items order

pos_strided_slice_s,neg_strided_slice_s,reversed_s


('@I  R', 'CILS', '@@GNIRTS FO ECILS@@')

## String methods
***Notes:*** all string methods does not perform modification in-place

### Converting String Case

`S.lower()`  
Convert string case to lowercase

`S.upper()`  
Convert string case to uppercase

`S.title()`  
Convert string case to titlecase

In [44]:
s1 = 'John Doe'
s1 = s1.lower()

s2 = 'windows'
s2 = s2.upper()

s3 = "can you hear me?"
s3 = s3.title()

s1,s2,s3

('john doe', 'WINDOWS', 'Can You Hear Me?')

### Removing Whitespace from a String

`S.rstrip()`  
Remove trailing whitespace from string

`S.lstrip()`  
Remove leading whitespace from string

`S.strip()`  
Remove trailing and leading whitespace from string

In [46]:
s1 = "Jean-luc Picard    "
s1 = s1.rstrip()

s2 = "     Jean-luc Picard" 
s2 = s2.lstrip()

s3 = "    Jean-luc Picard       "
s3 = s3.strip()

s1,s2,s3

('Jean-luc Picard', 'Jean-luc Picard', 'Jean-luc Picard')

### Determine if a String Starts or Ends with a Particular String

`S.startswith()`  
Return True if S starts with the specified prefix, False otherwise

`S.endswith(substring)`  
Return True if S ends with the specified prefix, False otherwise

In [48]:
s = "Enterprise"
s.startswith("Enter"),s.startswith("enter")    # Case sensitive

(True, False)

In [50]:
s.endswith("rise"),s.endswith("riSE")

(True, False)

### Finding a Substring in a String

`S.find(substring)`  
Return the lowest index in S where substring sub is found, such that sub is contained within `S[start:end]`

In [57]:
s = "Where the crawdads sings?"
s.find("crawdads")

10

In [59]:
s.find("enter")             # Return -1 when the searched substring is not found

-1

In [58]:
s.find('crawdads',5, 13)    # Can also be supplied with optional start and end index for where to search

-1

### Counting a Substring in a String

`s.count()`  
Counts occurrences of a substring in the target string

In [77]:
"foo goo moo".count('oo')

3

In [78]:
"foo goo moo".count('oo', 0, 8)     # Cal also be supplied with optional start and end index for where to search

2

### Splitting a String

`S.split(separator)`  
Return a list of the words in the string, using sep as the delimiter string

In [60]:
s = "vini,vidi,vici"
l = s.split(',') 

l

['vini', 'vidi', 'vici']

In [62]:
s = "aaa bbbb cccc"
l = s.split()             # Default separator is whitespace

l

['aaa', 'bbbb', 'cccc']

In [63]:
s = "111---2222---333"
l = s.split("---")        # Separator is not limited to single characters 

l

['111', '2222', '333']

### String to Number Conversion

`int(String)`  
Convert string to integer

`float(String)`  
Convert string to float

`ord()`  
Returns an integer value for the given character

In [65]:
s1 = "100"
s2 = "3.14159"

i = int(s1)
f = float(s2)

(i,type(i)),(f,type(f)) 

((100, int), (3.14159, float))

In [67]:
c = 'f'
i = ord(c)     # Return the ASCII byte value of the character

i,type(i)

(102, int)

### Numbers to String Conversion

`str()`  
Create a new string object from the give object

`chr()`  
Returns a character value for the given integer


In [68]:
i = 200
f = 3.14159

s1 = str(i)
s2 = str(f)

s1,s2

('200', '3.14159')

In [70]:
i = 36
c = chr(i)      # Convert byte value to its ASCII character representation

c, type(c)

('$', str)

### Joins Strings

`S.join()`  
Concatenates strings from an iterable.

In [73]:
s1 = '-'.join(['021','444777'])          # The string upfront is the separator string.
s2 = '__'.join(['aaa','bbb','ccc'])

s1,s2

('021-444777', 'aaa__bbb__ccc')

### Character Classification

`s.isalnum()`  
Determines whether the target string consists of alphanumeric (either a letter or a number) characters

`s.isalpha()`  
Determines whether the target string consists of alphabetic characters

`s.isdigit()`  
Determines whether the target string consists of digit characters

In [79]:
'abc123'.isalnum(), 'abc$123'.isalnum()

(True, False)

In [80]:
'ABCabc'.isalpha(), 'abc123'.isalpha()

(True, False)

In [81]:
'123'.isdigit(), '123abc'.isdigit()

(True, False)

### Formatting

`s.center()`  
Centers a string in a field

In [95]:
'foo'.center(10), 'bar'.center(10, '-')     # The optional fill argument is specified, it is used as the padding character

('   foo    ', '---bar----')

In [96]:
'foo'.center(2)                             # String is already at least as long as the specified width, it is returned unchanged

'foo'

`s.ljust()`  
Left-justifies a string in field

In [90]:
'foo'.ljust(10), 'foo'.ljust(10, '-')

('foo       ', 'foo-------')

In [97]:
'foo'.ljust(2)

'foo'

`s.rstrip()`  
Right-justifies a string in a field

In [93]:
'foo'.rjust(10), 'foo'.rjust(10, '-')

('       foo', '-------foo')

In [94]:
'foo'.rjust(2)

'foo'

`s.zfill()`  
Pads a string on the left with zeros

In [100]:
'foo'.zfill(6)

'000foo'

In [101]:
'42'.zfill(5)                   # .zfill() is most useful for string representations of numbers

'00042'

In [102]:
'+42'.zfill(8), '-42'.zfill(3)  # If s contains a leading sign, it remains at the left edge of the result string after zeros are inserted

('+0000042', '-42')

### Changing a String

`S.replace()`  
Replaces occurrences of a substring within a string

In [104]:
'foo bar foo baz foo qux'.replace('foo', 'grault')

'grault bar grault baz grault qux'

In [106]:
'foo bar foo baz foo qux'.replace('foo', 'grault', 2)   # If the optional count argument is specified, a maximum of count replacements are performed, starting at the left end of s

'grault bar grault baz foo qux'

The conventional way:

In [109]:
s = "SPAM"
s = 'X' + s[1:]     # Assign new string build from slicing and concatenation

s

'XPAM'

## String Formattings

### Old Style String Formatting `%`
  
`%[(keyname)][flags][width][.precision]typecode`

Some useful typecode:  
%s = string  
%c = character  
%d = decimal (10 base)  
%i = integer  
%o = octal  
%x = hex  
%e = floating-point with exponent
%f = floating-point decimal  
%g = floating-point e or f

In [64]:
name = "John"
'Hello, %s' % name

'Hello, John'

Type specific substitutions (% operator accept multiple items using tuple)

In [122]:
'%d %s %g you' % (1, 'spam', 4.0)

'1 spam 4 you'

All types match a %s target

In [66]:
'%s -- %s -- %s' % (42, 3.14159, [1, 2, 3])             

'42 -- 3.14159 -- [1, 2, 3]'

Dealing with floating-point numbers

In [121]:
x = 1.23456789
x, '%e | %f | %g' % (x, x, x)                           

(1.23456789, '1.234568e+00 | 1.234568 | 1.23457')

Dealing with floating point precision (the 4 give the precision for the last string)

In [120]:
'%f, %.2f, %.*f' % (1/3.0, 1/3.0, 4, 1/3.0)

'0.333333, 0.33, 0.3333'

Dictionary style

In [119]:
'%(qty)d more %(food)s' % {'qty': 1, 'food': 'spam'}    

'1 more spam'

### New Style String Formatting `S.format()`
`[[fill]align][sign][#][0][width][,][.precision][typecode]`

`{[<name>][!<conversion>][:<format_spec>]}`  

#### The \<name\> Component

In [135]:
'{}, {} and {}'.format('spam', 'ham', 'eggs')

'spam, ham and eggs'

In [145]:
x, y, z = 1, 2, 3

'x= {0}, y= {1} and z= {baz}'.format(x, y, baz=z)

'x= 1, y= 2 and z= 3'

In [137]:
a = ['foo', 'bar', 'baz']

'{0[0]}, {0[2]}'.format(a), '{my_list[0]}, {my_list[1]}, {my_list[2]}'.format(my_list=a)

('foo, baz', 'foo, bar, baz')

In [142]:
data = {'name': 'foo', 'age': 40, 'job' : 'mgr'}

'{0[age]}'.format(data), 'name= {my_dict[name]}, job = {my_dict[job]}'.format(my_dict=data)

('40', 'name= foo, job = mgr')

In [144]:
d = {'motto':'spam','pork':'ham','food':'eggs'}
'motto = {motto}, pork = {pork} and food = {food}'.format(**d)

'motto = spam, pork = ham and food = eggs'

#### The \<conversion\> Component

In [146]:
'{0!s}'.format(42)      # Convert with str()

'42'

In [147]:
'{0!r}'.format(42)      # Convert with repr()

'42'

In [149]:
'{0!a}'.format(42)      #  	Convert with ascii()

'42'

#### The <format_spec> Component
`:[[<fill>]<align>][<sign>][#][0][<width>][<group>][.<prec>][<type>]`

The \<type\> Subcomponent:  
  
Specifies the presentation type, which is the type of conversion performed on the corresponding argument

In [160]:
'{:d}'.format(42), '{:f}'.format(2.1), '{:s}'.format('foobar')      # Presenting int, float, and string

('42', '2.100000', 'foobar')

In [161]:
'{:x}'.format(31)                                                   # Presenting int in hex literal

'1f'

In [162]:
'{:b}'.format(255)                                                  # Presenting int in binary literal

'11111111'

In [181]:
'{:o}'.format(20)                                                   # Presenting int in octal litera

'24'

In [152]:
'{:c}'.format(35)

'#'

In [155]:
'{:g}'.format(3.14159), '{:g}'.format(-123456789.8765), '{:G}'.format(-123456789.8765)

('3.14159', '-1.23457e+08', '-1.23457E+08')

The \<fill\> and \<align\> Subcomponents:  
  
Control how formatted output is padded and positioned within the specified field width

In [164]:
'{0:<8s}'.format('foo'), '{0:<8d}'.format(123)      # Left alligned <

('foo     ', '123     ')

In [165]:
'{0:>8s}'.format('foo'), '{0:>8d}'.format(123)      # Right alligned >

('     foo', '     123')

In [169]:
'{0:^8s}'.format('foo'), '{0:^8d}'.format(123)      # Center alligned ^

('  foo   ', '  123   ')

In [170]:
'{0:->8s}'.format('foo'), '{0:#<8d}'.format(123), '{0:*^8s}'.format('foo')      # Fill in extra space 

('-----foo', '123#####', '**foo***')

The \<sign\> Subcomponent:  

In [174]:
'{0:+6d}'.format(123), '{0:+6d}'.format(-123)   # A sign will always be included for both positive and negative values.

('  +123', '  -123')

In [175]:
'{0:-6d}'.format(123), '{0:-6d}'.format(-123)   # Only negative numeric values will include a leading sign

('   123', '  -123')

In [173]:
'{0:*> 6d}'.format(123), '{0:*>6d}'.format(123), '{0:*> 6d}'.format(-123)

('** 123', '***123', '**-123')

The \# Subcomponent

In [180]:
'{0:b}, {0:#b}'.format(16), '{0:o}, {0:#o}'.format(16), '{0:x}, {0:#x}'.format(16)

('10000, 0b10000', '20, 0o20', '10, 0x10')

In [182]:
'{0:.0f}, {0:#.0f}'.format(123), '{0:.0e}, {0:#.0e}'.format(123)

('123, 123.', '1e+02, 1.e+02')

The 0 Subcomponent

In [183]:
'{0:05d}'.format(123), '{0:08.1f}'.format(12.3)

('00123', '000012.3')

In [184]:
'{0:>06s}'.format('foo')

'000foo'

In [186]:
'{0:*>05d}'.format(123)     # <fill> overrides 0

'**123'

The \<width\> Subcomponent

In [187]:
'{0:8s}'.format('foo'), '{0:8d}'.format(123)

('foo     ', '     123')

In [188]:
'{0:2s}'.format('foobar')

'foobar'

The \<group\> Subcomponent:  
  
Include a grouping separator character in numeric output

In [190]:
'{0:,d}'.format(1234567), '{0:_d}'.format(1234567), '{0:,.2f}'.format(1234567.89), '{0:_.2f}'.format(1234567.89)

('1,234,567', '1_234_567', '1,234,567.89', '1_234_567.89')

In [192]:
'{0:_b}'.format(0b111010100001), '{0:_x}'.format(0xae123fcc8ab2), '{0:#_x}'.format(0xae123fcc8ab2)

('1110_1010_0001', 'ae12_3fcc_8ab2', '0xae12_3fcc_8ab2')

The .\<prec\> Subcomponent:  
  
Specifies the number of digits after the decimal point for floating point presentation types

In [193]:
'{0:8.2f}'.format(1234.5678), '{0:8.4f}'.format(1.23), '{0:8.2e}'.format(1234.5678), '{0:8.4e}'.format(1.23)

(' 1234.57', '  1.2300', '1.23e+03', '1.2300e+00')

In [194]:
'{:.4s}'.format('foobar')       # For string types, .<prec> specifies the maximum width of the converted output

'foob'

### String Interpolation / f-string

In [117]:
name = 'John'
f'Hello, {name}!'

'Hello, John!'

In [118]:
a = 5
b = 10
f'Five plus ten is {a + b} and not {2 * (a + b)}.'

'Five plus ten is 15 and not 30.'

Dictionaries 

In [87]:
comedian = {'name': 'Eric Idle', 'age': 74}
f"The comedian is {comedian['name']}, aged {comedian['age']}."

'The comedian is Eric Idle, aged 74.'

Braces

In [115]:
f"{{70 + 4}}"

'{70 + 4}'

Multiline f-string

In [88]:
name = "Eric"
profession = "comedian"
affiliation = "Monty Python"
message = (
     f"Hi {name}. "
     f"You are a {profession}. "
     f"You were in {affiliation}.")
message

'Hi Eric. You are a comedian. You were in Monty Python.'

f-String Formatting  
  
Same as `.format()`

In [195]:
n = 123
f'{n:=+8}'

'+    123'

In [196]:
n = 0b111010100001
f'{n:#_b}'

'0b1110_1010_0001'