# String
A string is a data type used in programming, such as an integer and floating point unit, but is used to represent text rather than numbers. It is comprised of a set of characters that can also contain spaces and numbers.

## Topics
* String Data Type
* String Indexing & Slicing
* Escape Characters
* String Formatting
* Find Substring
* String Constants
* String Functions

### String Data Type
Strings in python are surrounded by either single quotation marks, or double quotation marks. In Python single and double quote both can be used for strings. `'hello'` is the same as `"hello"`. You can display a string with the print() function just like any other data type.

In [1]:
var = "Hello World"

In [2]:
var = 'Hello World'

Python doesn't have a character data type

In [3]:
type("hello")

str

In [4]:
type('a')

str

#### Quotes
We cannot use single quote inside single quote string.

In [5]:
print('Hey it's me.')

SyntaxError: ignored

But we can use single inside double or vice versa.

In [6]:
print("Hey, it's me.")

Hey, it's me.


**To insert characters that are illegal in a string, use an escape character.**  
An escape character is a backslash `\` followed by the character you want to insert.

In [7]:
print("He said, \"I am busy today.\"")

He said, "I am busy today."


#### Multiline String

We can use `''' '''` or `""" """` to write multiline string.

In [8]:
multiline = """Hello
World"""
print(multiline)

Hello
World


In [9]:
# Alternative
s = "Hello\nWorld"
print(s)

Hello
World


#### Concatenation
Python allows us to easily concatenate strings using `+`.

In [10]:
name = "Michael"
print("Hello, " + name)

Hello, Michael


In [11]:
print("Hello, " + "Michael")

Hello, Michael


You can only concatenate a string to another string.

In [12]:
"Michael's age is " + 30 # TypeError

TypeError: ignored

**Concatenating Integers to Strings:** In Python, only a string can be concatenated with another string. If you need to concatenate a string with another data type, you need to convert it to string using `str()` function or some other method. Note that if your goal is to only print, then you can just print them with a comma without any conversion.

In [13]:
print("Michael's age is", 30)

Michael's age is 30


In [14]:
age = 30
print("Michael's age is " + str(age))

Michael's age is 30


In [15]:
print("Michael's age is " + str(30))

Michael's age is 30


`str(value)` is a function that converts any type of compatible data to string.

In [16]:
print("Hello " + str(12731.1273618))

Hello 12731.1273618


#### String Multiplication
If we want replicate our String , then we can use multiplication operator `*` on that string with  desired no of repeats. 

In [17]:
print("ABC" * 3)

ABCABCABC


Since the result is a string, it can be concatenated to other strings as well.

In [18]:
print("p" * 3 + "q" * 4 + "r" * 2)

pppqqqqrr


#### String Length
It gives us that how many characters we have in a string.

In [None]:
len("hello")

5

In [None]:
s = "Hello World"
len(s)

11

 Keep in mind that escape sequences like `\n` counts as a single character.

In [None]:
len("AB\nCD")

5

`len()` is a function that returns the length of an object

### String Indexing & Slicing:
Strings can be indexed (subscripted), with the first character having index 0. There is no separate character
type. A character is simply a string of size one. Python also allows for negative indexing, with the last (righmost) character having index -1, second last element having index -2, and so on. Given below is an example for the string "Hello".

| String | H | e | l | l | o |
| --- | --- | --- | --- | --- | --- |
| Index | 0 | 1 | 2 | 3 | 4 |
| Index (-ve) | -5 | -4 | -3 | -2 | -1 |

**Indexing**

In [19]:
"Hello"[0]

'H'

In [20]:
"Hello"[-1] # Negative Indexing

'o'

In [21]:
"Hello"[4]

'o'

In [22]:
s = "hello"
s[len(s) - 1]

'o'

In [23]:
s = "Hello World"

In [24]:
s[0]

'H'

In [25]:
s[7]

'o'

In [26]:
s[-4]

'o'

In [27]:
len(s)

11

In the following line of code we have exceeded our index with respect to that string. Basically, the range of positive index is from `0` to `len(str) - 1` and range of negative index is from `-1` (righmost) to `-len(str)` (leftmost).

In [29]:
s[11]

IndexError: ignored

In [30]:
s[-11]

'H'

In [31]:
s[-12]

IndexError: ignored

**Slicing:** `string_name[start_index:end_index:step]`  
The string is sliced from index `start_index` to `end_index - 1`. In order to make it easy to understand we will be using the string `"0123456789"` where the value at each index is equivalent to it's positive index.

In [32]:
s = "0123456789"

In [33]:
# Using only end index
# Here we get from index 0 to index 7 (end_index - 1)
s[:8]

'01234567'

In [34]:
# Using only start index
# Here we get from index 4 (start_index) to the end of the string
s[4:]

'456789'

In [35]:
# Using both start index and end index
# Here we get from index 4 (start_index) to index 7 (end_index - 1)
s[4:8]

'4567'

In [36]:
# If a single : is used without start or end, then we get a copy of the entire string
s[:]

'0123456789'

In [37]:
# Using step
# Step can be used with any of the above configurations by adding another colon and a value
# Here we get from index 1 to index 7 with a step of 2
s[1:8:2]

'1357'

In [39]:
# It may be easier to understand if you can try to think of step as follows
s[1] + s[1 + 2] + s[1 + 2 + 2] + s[1 + 2 + 2 + 2] 
# Any more values will exceed the start_index to (end_index - 1) range

'1357'

### Escape Characters 
We use escape characters to add certain characters that cannot be added using a keyboard or may be illegal to add. It consists of a `\` followed by a character. The whole escape character including the backslash is counted as a single character.

| Escape Sequence |	Meaning |
| :-- | :-- |
| \\\\ | Backslash (\\) |
| \\' | Single quote (') |
| \\" | Double quote (") |
| \a | Bell |
| \b | Backspace |
| \f | Formfeed |
| \n | Linefeed (New Line) |
| \r | Carriage Return |
| \t | Horizontal Tab |
| \v | Vertical Tab |

*A few examples:-*

In [40]:
print("Hello\nWorld")

Hello
World


In [41]:
print("Hello\tWorld")

Hello	World


In [42]:
print("Single: \', Double: \"")

Single: ', Double: "


In [43]:
print("Escape Character for new line: \\n")

Escape Character for new line: \n


In [44]:
print("Backslash: \\")

Backslash: \


In [45]:
len("\\")

1

In [46]:
len("\\n") # Here '\\' is one character and 'n' is the other character

2

### String Formatting
Python allows you to do % formatting as in other languages like C, C++, Java, etc. However, once you start using several parameters and longer strings, your code will quickly become much less readable. Moreover, this kind of formatting isn’t great because it is verbose and leads to errors, like not displaying tuples or dictionaries correctly. We will discuss two methods for formatted string: `format()` and f-String `f""`.

**format():** The `format()` method formats the specified values and inserts them inside the string's placeholder. The placeholder is defined using curly brackets: `{}`. The `format()` method returns the formatted string.

In [47]:
name = "Michael"
sentence = "My name is {}."
print(sentence.format(name))

My name is Michael.


You can use multiple placeholders. Data type doesn't matter. They will be automatically converted to string.

In [48]:
name = "Michael"
age = 30
sentence = "My name is {} and my age is {}."
print(sentence.format(name, age))

My name is Michael and my age is 30.


The placeholders can be identified using named index (Eg. `{price}`) or numbered index (Eg. `{0}`)

In [49]:
name = "Michael"
age = 30
print("My name is {person_name} and I am {person_age} years old."
      .format(person_age=age, person_name=name))

My name is Michael and I am 30 years old.


In [50]:
name = "Michael"
age = 30
print("My name is {1} and I am {0} years old.".format(age, name))

My name is Michael and I am 30 years old.


**f-Strings:**  They are string literals that have a `f` at the beginning and the curly braces `{}` containing expressions that will be replaced with their values. The expressions are evaluated at runtime and then formatted. Thus f-strings are faster than both `%` formatting and `str.format()`. *A drawback of f-Strings is that they cannot contain escape sequences in the `{}` part.*

In [51]:
age = 30
print(f"I am {age} years old.")

I am 30 years old.


In [52]:
name = "Michael"
age = 30
print(f"{name} is {age} years old.")

Michael is 30 years old.


In [53]:
name = "Michael"
age = 30
print(f"{name} is {age * 365 * 24 * 3600} seconds old.")

Michael is 946080000 seconds old.


##### Without f-String

In [54]:
name = "Michael"
age = 30
print(name + " is " + str(age * 365 * 24 * 3600) + " old.")

Michael is 946080000 old.


##### Drawback

In [55]:
print(f"Hello {'\n'} World")

SyntaxError: ignored

**Important:** Keep in mind that the string doesn't get updated if any of the values get changed.

In [56]:
age = 30
text = f"I am {age} years old."
age = 40
print(text)

I am 30 years old.


In [57]:
age = 30
text = "I am {} years old.".format(age)
age = 40
print(text)

I am 30 years old.


#### Formatting Options
| Format | Action |
| :-- | :-- |
| :< | Left align |
| :> | Right align |
| :^ | Center align |
| := | Places sign at left most position |
| :+ | Show if number is +ve or -ve |
| :- | Show if number is -ve |
| :(space) | Extra spacebefore +ve and '-' before -ve number |
| :, | Use comma as thousand separator |
| :_ | Use underscore as thousand separator |
|:b | Binary format |
| :c | Unicode format |
| :d | Decimal format |
| :e | Scientific Format |
| :E | Scientific Format (uppercase) |
| :f | Fixed point number format |
| :F | Fixed point number format (uppercase) |
| :g | General format |
| :G | General format (uppercase) |
| :o | Octal format |
| :x | Hexadecimal format |
| :X | Hexadecimal format (uppercase) |
| :n | Number format |
| :% | Percentage format |

*Examples*

In [58]:
print(f"This item costs Rs. {99:.2f}")
# The value will be printed as a float with 2 places after decimal

This item costs Rs. 99.00


In [59]:
print(f"{'1':>5}")
# :>5 -> The output will be right aligned with a length of 5 (thus 4 spaces)

    1


In [60]:
print(f"{'Name':<10}{'Marks':<10}{'Grade':<10}")

print(f"{'John':<10}{'10':<10}{'A':<10}")
print(f"{'Dave':<10}{'20':<10}{'A':<10}")
print(f"{'Dan':<10}{'30':<10}{'A+':<10}")

Name      Marks     Grade     
John      10        A         
Dave      20        A         
Dan       30        A+        


In [61]:
print(f"A{'B':^5}C")
# 'B' will have a length of 5 and will be center aligned

A  B  C


In [62]:
print(f"{19:b}") # 1001 is binary representation of 19

10011


In [63]:
print(f"{19:o}") # 1001 is octal representation of 19

23


In [64]:
print(f"{29:x}") # 1001 is hex representation of 29 (in lowercase)

1d


In [65]:
print(f"{29:X}") # 1001 is hex representation of 29 (in uppercase)

1D


In [66]:
print(f"{2113123412:,}")

2,113,123,412


In [67]:
print(f"{2113123412:_}")

2_113_123_412


In [68]:
print(f"{'*' * 1:^9}")
print(f"{'*' * 3:^9}")
print(f"{'*' * 5:^9}")
print(f"{'*' * 7:^9}")
print(f"{'*' * 9:^9}")

    *    
   ***   
  *****  
 ******* 
*********


In [69]:
s = "*" * 5
f"{s:^9}"

'  *****  '

After learning loop, you can use the following syntax.

In [70]:
for i in range(1, 10, 2): 
    print(f"{'*' * i:^9}")

    *    
   ***   
  *****  
 ******* 
*********


A bit more advanced syntax.

In [71]:
print("\n".join([f"{'*' * i:^9}" for i in range(1, 10, 2)]))

    *    
   ***   
  *****  
 ******* 
*********


### Find in String
You can search if a substring of any length is present in a string using the keyword `in`. The result is a boolean, which you will learn in the next file.

In [75]:
"name" in "My name is John"

True

In [76]:
"age" in "My name is John"

False

In [77]:
"name" in "My Name is John" # case sensitive

False

### String Constant
The constants provided by the `string` library. We can import libraries as: `import library_name`.

In [78]:
import string

In [79]:
string.ascii_letters # A constant variable in the string library

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [80]:
string.ascii_lowercase

'abcdefghijklmnopqrstuvwxyz'

In [81]:
string.ascii_uppercase

'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [82]:
string.digits

'0123456789'

In [83]:
string.hexdigits

'0123456789abcdefABCDEF'

In [84]:
string.octdigits

'01234567'

In [85]:
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [86]:
string.printable

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

In [87]:
string.whitespace

' \t\n\r\x0b\x0c'

### String Functions
In Python, String has many useful in-built functions that can be called from a string object. In this section, some of the most useful functions will be discussed. You can check all the available functions and their usage in [this page](https://www.python-ds.com/python-3-string-methods).

The string methods can be used as follows: *string_object.method_name()*  
Examples:  
`s = 'Hello'`  
`hello.upper()`  
or, `"Hello".upper()`  
*Output will be `"HELLO"`*


**Note: All string methods returns new values. They do not change the original string.**

**captialize():** Capitalize first letter of the string

In [88]:
"my name is John. i am 30 years old".capitalize()

'My name is john. i am 30 years old'

**lower():** Convert string to lower case 

In [89]:
"Hello WORLD".lower()

'hello world'

**upper():** Convert string to upper case

In [90]:
"Hello WORLD".upper()

'HELLO WORLD'

*These function don't change the original string. They return the modified string.*

In [91]:
s = "Hello World"
print(s.upper())
print(s)

HELLO WORLD
Hello World


**title():** Capitalize first letter of each word

In [92]:
"my name is John".title()

'My Name Is John'

**swapcase():** Toggle the case for each character (All these string methods can be chained. Eg. `str.upper().swapcase()`. Keep in mind that using `swapcase()` twice may not return the original string)

In [93]:
"Hello WORLD".swapcase()

'hELLO world'

*Chaining functions*

In [94]:
"my name is John. i am 30 years old".title().swapcase()

'mY nAME iS jOHN. i aM 30 yEARS oLD'

**isalnum(), isalpha(), isdigit(), islower(), isupper(), etc.**

In [95]:
"2367189".isdigit()

True

In [96]:
"hello".islower()

True

In [97]:
"HELLO".isupper()

True

In [98]:
"Hello world".istitle()

False

You can check out all the 'is' functions in the link given at the beginning of this section.

Check if valid identifier (name of variable)

In [99]:
"temp100".isidentifier() 

True

In [100]:
"123abcd".isidentifier()

False

**replace():** It returns a string where every occurrence of a given substring is replaced with another substring. The substrings can be of any length.

In [101]:
"Hello World".replace("Hello", "Goodbye")

'Goodbye World'

In [102]:
"Hello".replace("l", "*")

'He**o'

In [103]:
s = "Hello World"
print(s.replace('l', '*', 2)) # Maximum times to replace
print(s) # You can see only 2 'l' were replaced with '*'

He**o World
Hello World


**find(), index():** Used to get index of a particular substring.

In [104]:
"Hello World".index("W")

6

In [106]:
"Hello World".index('w') # index() raises ValueError if not found

ValueError: ignored

In [108]:
"Hello World".find("World") # Substring can be of any length

6

In [109]:
"Hello World".find("w") # find() returns -1 if not found

-1

**count():** Counts the no. of occurence of a substring in the string

In [110]:
"Hello World".count("l")

3

In [111]:
"My name is John. My age is 30.".count("is")

2

**split():** The `split()` method breaks up a string at the specified separator and returns a list of strings. (You will learn about list in a later section) 

It has 2 optional parameters:
* `separator`: It is a delimiter. The string splits at the specified separator. If the separator is not specified, any whitespace (space, newline etc.) string is a separator.
* `maxsplit`: The maxsplit defines the maximum number of splits. The default value of maxsplit is -1, meaning, no limit on the number of splits.

*It will be very useful later when taking multiple inputs from user*

##### Splitting string at whitespaces (Check string.whitespace)

In [112]:
"My name is John.\nMy age is 30.".split()

['My', 'name', 'is', 'John.', 'My', 'age', 'is', '30.']

##### Splitting string at newline

In [113]:
"My name is John.\nMy age is 30.".split('\n')

['My name is John.', 'My age is 30.']

##### In the following code we are splitting the string at 'is'. Whatever is used for splitting is not included in result

In [114]:
"My name is John.\nMy age is 30.".split('is')

['My name ', ' John.\nMy age ', ' 30.']

##### In the following code we split 3 elements and then rest of the string is the third element.

In [115]:
"1 2 3 4 5 6".split(maxsplit=3)

['1', '2', '3', '4 5 6']

**join():** The `join()` methods is the opposite of the `split()` method. It is used to join a list or tuple of strings into a single string using a given separator.

In [116]:
" ".join(['My', 'name', 'is', 'John']) # Using space as separator

'My name is John'

In [117]:
"#".join(['A', 'B', 'C', 'D']) # Using

'A#B#C#D'

In [118]:
'is'.join(['My name ', ' John.\nMy age ', ' 30.'])

'My name is John.\nMy age is 30.'

**There are many more string functions. Even the functions shown above can make use of optional parameters. You can check them out from the external link given or from the Python documentation link.**