**Working with strings**

To work with text data it is important to be able to use `string`.

1. [String Types](#string_types)
2. [String Operations](#string_ops)
3. [String Indexing and Sliding](#string_in_sl)
4. [String Methods](#string_methods)
5. [String Formatting](#string_form)

In [1]:
# zen of python
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


```
stringliteral   ::=  [stringprefix](shortstring | longstring)
stringprefix    ::=  "r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"
                     | "b" | "B" | "br" | "Br" | "bR" | "BR"
shortstring     ::=  "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring      ::=  "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
shortstringitem ::=  shortstringchar | escapeseq
longstringitem  ::=  longstringchar | escapeseq
shortstringchar ::=  <any source character except "\" or newline or the quote>
longstringchar  ::=  <any source character except "\">
escapeseq       ::=  "\" <any ASCII character>
```

## String Types <a class='anchor' id='string_types'></a>

In [2]:
new_string = "This is a String"  # storing a string

print(f'ID: {id(new_string)}')  # shows the object identifier (address)
print(f'Type: {type(new_string)}')  # shows the object type
print(f'Value: {new_string}')  # shows the object value

ID: 2536809233344
Type: <class 'str'>
Value: This is a String


In [3]:
# simple string
simple_string = 'Hello!' + " I'm a simple string"
print(simple_string)

Hello! I'm a simple string


In [4]:
# multi-line string, note the \n (newline) escape character automatically created
multi_line_string = """Hello I'm
a multi-line
string!"""

multi_line_string

"Hello I'm\na multi-line\nstring!"

In [5]:
print(multi_line_string)

Hello I'm
a multi-line
string!


In [6]:
# Normal string with escape sequences leading to a wrong file path!
escaped_string = "C:\the_folder\new_dir\file.txt"
print(escaped_string)  # will cause errors if we try to open a file here

C:	he_folder
ew_dirile.txt


In [7]:
# raw string keeping the backslashes in its normal form
raw_string = r'C:\the_folder\new_dir\file.txt'
print(raw_string)

C:\the_folder\new_dir\file.txt


In [8]:
# unicode string literals
string_with_unicode = 'H\u00e8llo!'
print(string_with_unicode)

Hèllo!


In [9]:
more_unicode = 'I love Pizza 🍕!  Shall we book a cab 🚕 to get pizza?'
print(more_unicode)

I love Pizza 🍕!  Shall we book a cab 🚕 to get pizza?


In [10]:
print(f"{string_with_unicode} \n{more_unicode}")

Hèllo! 
I love Pizza 🍕!  Shall we book a cab 🚕 to get pizza?


In [11]:
' '.join([string_with_unicode, more_unicode])

'Hèllo! I love Pizza 🍕!  Shall we book a cab 🚕 to get pizza?'

In [12]:
more_unicode[::-1]  # reverses the string

'?azzip teg ot 🚕 bac a koob ew llahS  !🍕 azziP evol I'

## String Operations <a class='anchor' id='string_ops'></a>

### Different ways of strings concatenation

In [13]:
'Hello 😊' + ' and welcome ' + 'to Python 🐍!'

'Hello 😊 and welcome to Python 🐍!'

In [14]:
'Hello 😊' ' and welcome ' 'to Python 🐍!'

'Hello 😊 and welcome to Python 🐍!'

In [15]:
# concatenation of variables and literals

In [16]:
s1 = 'Python 💻!'
'Hello 😊 ' + s1

'Hello 😊 Python 💻!'

In [17]:
'Hello 😊 ' + s1

'Hello 😊 Python 💻!'

In [18]:
# some more ways of concatenating strings

In [19]:
s2 = '--🐍Python🐍--'
s2 * 5

'--🐍Python🐍----🐍Python🐍----🐍Python🐍----🐍Python🐍----🐍Python🐍--'

In [20]:
s1 + s2

'Python 💻!--🐍Python🐍--'

In [21]:
(s1 + s2)*3

'Python 💻!--🐍Python🐍--Python 💻!--🐍Python🐍--Python 💻!--🐍Python🐍--'

In [22]:
# concatenating several strings together in parentheses

In [23]:
s3 = ('This '
      'is another way '
      'to concatenate '
      'several strings!')
s3

'This is another way to concatenate several strings!'

In [24]:
# checking for substrings in a string

In [25]:
'way' in s3

True

In [26]:
'python' in s3

False

In [27]:
# computing total length of the string

In [28]:
len(s3)

51

## String Indexing and Slicing <a class='anchor' id='string_in_sl'></a>

In [29]:
# creating a string
s = 'PYTHON'
s, type(s)

('PYTHON', str)

### String indexing

In [30]:
# depicting string indexes
for index, character in enumerate(s):
    print(f'Character -> {character} has index-> {index}')

Character -> P has index-> 0
Character -> Y has index-> 1
Character -> T has index-> 2
Character -> H has index-> 3
Character -> O has index-> 4
Character -> N has index-> 5


In [31]:
s[0], s[1], s[2], s[3], s[4], s[5]

('P', 'Y', 'T', 'H', 'O', 'N')

In [32]:
s[-1], s[-2], s[-3], s[-4], s[-5], s[-6]

('N', 'O', 'H', 'T', 'Y', 'P')

### String slicing

In [33]:
s[:] 

'PYTHON'

In [34]:
s[1:4]

'YTH'

In [35]:
s[:3], s[3:]

('PYT', 'HON')

In [36]:
s[-3:]

'HON'

In [37]:
s[:3] + s[3:]

'PYTHON'

In [38]:
s[:3] + s[-3:]

'PYTHON'

### String slicing with offsets

In [39]:
s[::1]  # no offset

'PYTHON'

In [40]:
s[::2]  # print every 2nd character in string

'PTO'

### String Immutability

In [41]:
# strings are immutable hence assignment throws error
s[0] = 'X'

TypeError: 'str' object does not support item assignment

**Note**: you need to run the following code cells by clicking on the Run button.

In [42]:
print(f'Original String id: {id(s)}')
# creates a new string
s = 'X' + s[1:]
print(s)
print(f'New String id: {id(s)}')

Original String id: 2536804491312
XYTHON
New String id: 2536837197104


## String Methods <a class='anchor' id='string_met'></a>

### Case conversions

In [43]:
s = 'python is great'

In [44]:
s.capitalize()

'Python is great'

In [45]:
s.upper()

'PYTHON IS GREAT'

In [46]:
s.title()

'Python Is Great'

### String replace

In [47]:
s.replace('python', 'NLP')

'NLP is great'

### Numeric checks

In [48]:
'12345'.isdecimal()

True

In [49]:
'apollo11'.isdecimal()

False

### Alphabet checks

In [50]:
'python'.isalpha()

True

In [51]:
'number1'.isalpha()

False

### Alphanumeric checks

In [52]:
'total'.isalnum()

True

In [53]:
'abc123'.isalnum()

True

In [54]:
'1+1'.isalnum()

False

### String splitting and joining

In [55]:
s = 'I,am,a,comma,separated,string'
s.split(',')

['I', 'am', 'a', 'comma', 'separated', 'string']

In [56]:
' '.join(s.split(','))

'I am a comma separated string'

In [57]:
# stripping whitespace characters
s = '   I am surrounded by spaces    '
s

'   I am surrounded by spaces    '

In [58]:
s.strip()

'I am surrounded by spaces'

In [59]:
sentences = 'Python is great. NLP is also good.'
sentences.split('.')

['Python is great', ' NLP is also good', '']

In [60]:
print('\n'.join(sentences.split('.')))

Python is great
 NLP is also good



In [61]:
print('\n'.join([sentence.strip() for sentence in sentences.split('.') if sentence]))

Python is great
NLP is also good


## String Formatting <a class='anchor' id='string_form'></a>

### Simple string formatting expressions - old style

In [62]:
'Hello %s' %('Python!')

'Hello Python!'

In [63]:
'Hello %s %s' %('World!', 'How are you?')

'Hello World! How are you?'

### Formatting expressions with different data types - old style

In [64]:
'We have %d %s containing %.2f gallons of %s' % (2, 'bottles', 2.5, 'milk')

'We have 2 bottles containing 2.50 gallons of milk'

In [65]:
'We have %d %s containing %.2f gallons of %s' % (5.21, 'jugs', 10.86763, 'juice')

'We have 5 jugs containing 10.87 gallons of juice'

### Formatting strings using the format method - new style

In [66]:
'Hello {} {}, it is a great {} to meet you at {}'.format('Mr.', 'Jones', 'pleasure', 5)

'Hello Mr. Jones, it is a great pleasure to meet you at 5'

In [67]:
'Hello {} {}, it is a great {} to meet you at {} o\' clock'.format('Sir', 'Arthur', 'honor', 9)

"Hello Sir Arthur, it is a great honor to meet you at 9 o' clock"

### Formatting strings using the `f` method - new style

In [68]:
f'Hello {"Mr."} {"Jones"}, it is a great {"pleasure"} to meet you at {5}'

'Hello Mr. Jones, it is a great pleasure to meet you at 5'

### Alternative ways of using string format

In [69]:
'I have a {food_item} and a {drink_item} with me'.format(drink_item='soda', food_item='sandwich')

'I have a sandwich and a soda with me'

In [70]:
'The {animal} has the following attributes: {attributes}'.format(animal='dog', attributes=['lazy', 'loyal'])

"The dog has the following attributes: ['lazy', 'loyal']"