# Standard Data Types

The data stored in memory can be of many types. For example, a person's age is stored as a numeric value and his or her address is stored as alphanumeric characters. Python has various standard data types that are used to define the operations possible on them and the storage method for each of them.

Python has five standard data types −

* Numbers
* String
* List
* Tuple
* Dictionary

## Python Numbers

Number data types store numeric values. Number objects are created when you assign a value to them. For example −



In [54]:
var1 = 1
var2 = 10

You can also delete the reference to a number object by using the del statement. The syntax of the del statement is −

```
del var1[,var2[,var3[....,varN]]]]
```

In [55]:
del var1, var2

Python supports four different numerical types −

* int (signed integers)
* long (long integers, they can also be represented in octal and hexadecimal)
* float (floating point real values)
* complex (complex numbers)

Here are some examples of numbers −

| int | long | float | complex |
|-----|------|-------|---------|
|10 |	51924361L |	0.0 |	3.14j |
|100 |	-0x19323L |	15.20 | 45.j |
|-786 |	0122L |	-21.9 |	9.322e-36j |
|080 |	0xDEFABCECBDAECBFBAEl |	32.3+e18 |	.876j |
|-0490 |	535633629843L |	-90. |	-.6545+0J |
|-0x260 |	-052318172735L |	-32.54e100 |	3e+26J |
| 0x69 |	-4721885298529L	70.2-E12 |	4.53e-7j |

* Python allows you to use a lowercase l with long, but it is recommended that you use only an uppercase L to avoid confusion with the number 1. Python displays long integers with an uppercase L.

* A complex number consists of an ordered pair of real floating-point numbers denoted by x + yj, where x and y are the real numbers and j is the imaginary unit.

# Working with strings

Recall from the previous section that strings can be entered with single, double or triple quotes:

```python
  'All', "of", '''these''', """are
  valid strings"""
```

**Unicode:** Python supports unicode strings - however for the most part this will be ignored in here. If you are workign in an editor that supports unicode you can use non-ASCII characters in strings (or even for variable names). Alternatively typing something like `"\u00B3"` will give you the string "³" (superscript-3).  

## The Print Statement

As seen previously, The `print()` function prints all of its arguments as strings, separated by spaces and follows by a linebreak:

    - print("Hello World")
    - print("Hello",'World')
    - print("Hello", <Variable>)

Note that `print` is different in old versions of Python (2.7) where it was a statement and did not need parentheses around its arguments.

In [1]:
print("Hello","World")

Hello World


The print has some optional arguments to control where and how to print. This includes `sep` the separator (default space) and `end` (end charcter) and `file` to write to a file. When writing to a file, setting the argument `flush=True` may be useful to force the function to write the output immediately. Without this Python may buffer the output which helps to improve the speed for repeated calls to print(), but isn't helpful if you are, for example, wanting to see the output immediately during debugging)

In [2]:
print("Hello","World",sep='...',end='!!',flush=True)

Hello...World!!

## String Formating

There are lots of methods for formating and manipulating strings built into python. Some of these are illustrated here.

String concatenation is the "addition" of two strings. Observe that while concatenating there will be no space between the strings.

In [3]:
string1='World'
string2='!'
print('Hello' + " " + string1 + string2)

Hello World!


The `%` operator is used to format a string inserting the value that comes after. It relies on the string containing a format specifier that identifies where to insert the value. The most common types of format specifiers are:

    - %s -> string
    - %d -> Integer
    - %f -> Float
    - %o -> Octal
    - %x -> Hexadecimal
    - %e -> exponential
    
These will be very familiar to anyone who has ever written a C or Java program and follow nearly exactly the same rules as the [`printf()`](https://en.wikipedia.org/wiki/Printf_format_string) function.

In [4]:
print("Hello %s" % string1)
print("Actual Number = %d" %18)
print("Float of the number = %f" %18)
print("Octal equivalent of the number = %o" %18)
print("Hexadecimal equivalent of the number = %x" %18)
print("Exponential equivalent of the number = %e" %18)

Hello World
Actual Number = 18
Float of the number = 18.000000
Octal equivalent of the number = 22
Hexadecimal equivalent of the number = 12
Exponential equivalent of the number = 1.800000e+01


When referring to multiple variables parentheses is used. Values are inserted in the order they appear in the parantheses (more on tuples in the next section)

In [5]:
print("Hello %s %s. This meaning of life is %d" %(string1,string2,42))

Hello World !. This meaning of life is 42


We can also specify the width of the field and the number of decimal places to be used. For example:

In [6]:
print('Print width 10: |%10s|'%'x')
print('Print width 10: |%-10s|'%'x') # left justified
print("The number pi = %.2f to 2 decimal places"%3.1415)
print("More space pi = %10.2f"%3.1415)
print("Pad pi with 0 = %010.2f"%3.1415) # pad with zeros

Print width 10: |         x|
Print width 10: |x         |
The number pi = 3.14 to 2 decimal places
More space pi =       3.14
Pad pi with 0 = 0000003.14


## Other String Methods

Multiplying a string by an integer simply repeats it

In [7]:
print("Hello World! "*5)

Hello World! Hello World! Hello World! Hello World! Hello World! 


#### Formatting
Strings can be tranformed by a variety of functions that are all methods on a string. That is they are called by putting the function name with a `.` after the string. They include:

* Upper vs lower case: `upper()`, `lower()`, `captialize()`, `title()` and `swapcase()` with mostly the obvious meaning. Note that `capitalize` makes the first letter of the string a capital only, while `title` selects upper case for the first letter of every word.
* Padding strings: `center(n)`, `ljust(n)` and `rjust(n)` each place the string into a longer string of length n  padded by spaces (centered, left-justified or right-justified respectively). `zfill(n)` works similarly but pads with leading zeros.
* Stripping strings: Often we want to remove spaces, this is achived with the functions `strip()`, `lstrip()`, and `rstrip()` respectively to remove from spaces from the both end, just left or just the right respectively. An optional argument can be used to list a set of other characters to be removed.

In [8]:
s="heLLo wORLd!"
print(s.capitalize(),"vs",s.title())
print("upper: '%s'"%s.upper(),"lower: '%s'"%s.lower(),"and swapped: '%s'"%s.swapcase())
print('|%s|' % "Hello World".center(30)) # center in 30 characters
print('|%s|'% "     lots of space             ".strip()) # remove leading and trailing whitespace
print('%s without leading/trailing d,h,L or ! = |%s|',s.strip("dhL!"))
print("Hello World".replace("World","Class"))

Hello world! vs Hello World!
upper: 'HELLO WORLD!' lower: 'hello world!' and swapped: 'HEllO WorlD!'
|         Hello World          |
|lots of space|
%s without leading/trailing d,h,L or ! = |%s| eLLo wOR
Hello Class


#### Inspecting Strings
There are also lost of ways to inspect or check strings. Examples of a few of these are given here:

* Checking the start or end of a string: `startswith("string")` and `endswith("string")` checks if it starts/ends with the string given as argument
* Capitalisation: There are boolean counterparts for all forms of capitalisation, such as `isupper()`, `islower()` and `istitle()`
* Character type: does the string only contain the characters
  * 0-9: `isdecimal()`. Note there is also `isnumeric()` and `isdigit()` which are effectively the same function except for certain unicode characters
  * a-zA-Z: `isalpha()` or combined with digits: `isalnum()`
  * non-control code: `isprintable()` accepts anything except '\n' an other ASCII control codes
  * \t\n \r (white space characters): `isspace()`
  * Suitable as variable name: `isidentifier()`
* Find elements of string: `s.count(w)` finds the number of times w occurs in s, while `s.find(w)` and `s.rfind(w)` find the first and last position of the string w in s.


In [9]:
s="Hello World"
print("The length of '%s' is"%s,len(s),"characters") # len() gives length
s.startswith("Hello") and s.endswith("World") # check start/end
# count strings
print("There are %d 'l's but only %d World in %s" % (s.count('l'),s.count('World'),s))
print('"el" is at index',s.find('el'),"in",s) #index from 0 or -1

The length of 'Hello World' is 11 characters
There are 3 'l's but only 1 World in Hello World
"el" is at index 1 in Hello World


## String comparison operations
Strings can be compared in lexicographical order with the usual comparisons. In addition the `in` operator checks for substrings:

In [10]:
'abc' < 'bbc' <= 'bbc'

True

In [11]:
"ABC" in "This is the ABC of Python"

True

## Accessing parts of strings

Strings can be indexed with square brackets. Indexing starts from zero in Python. And the `len()` function provides the length of a string

In [12]:
s = '123456789'
print("The string '%s' string is %d characters long" % (s, len(s)) )
print('First character of',s,'is',s[0])
print('Last character of',s,'is',s[len(s)-1])

The string '123456789' string is 9 characters long
First character of 123456789 is 1
Last character of 123456789 is 9


Negative indices can be used to start counting from the back

In [13]:
print('First character of',s,'is',s[-len(s)])
print('Last character of',s,'is',s[-1])

First character of 123456789 is 1
Last character of 123456789 is 9


Finally a substring (range of characters) an be specified as using $a:b$ to specify the characters at index $a,a+1,\ldots,b-1$. Note that the last charcter is *not* included.

In [14]:
print("First three characters",s[0:3])
print("Next three characters",s[3:6])

First three characters 123
Next three characters 456


An empty beginning and end of the range denotes the beginning/end of the string:

In [15]:
print("First three characters", s[:3])
print("Last three characters", s[-3:])

First three characters 123
Last three characters 789


#### Breaking appart strings
When processing text, the ability to split strings appart is particularly useful. 

* `partition(separator)`: breaks a string into three parts based on a separator
* `split()`: breaks string into words separated by white-space (optionally takes a separator as argument)
* `join()`: joins the result of a split using string as separator

In [16]:
s = "one -> two  ->  three"
print( s.partition("->") )
print( s.split() )
print( s.split(" -> ") )
print( ";".join( s.split(" -> ") ) )

('one ', '->', ' two  ->  three')
['one', '->', 'two', '->', 'three']
['one', 'two ', ' three']
one;two ; three


## Strings are immutable

It is important that strings are constant, immutable values in Python. While new strings can easily be created it is not possible to modify a string:

In [17]:
s='012345'
sX=s[:2]+'X'+s[3:] # this creates a new string with 2 replaced by X
print("creating new string",sX,"OK")
sX=s.replace('2','X') # the same thing
print(sX,"still OK")
s[2] = 'X' # an error!!!

creating new string 01X345 OK
01X345 still OK


TypeError: 'str' object does not support item assignment

### Built-in Functions

**find( )** function returns the index value of the given data that is to found in the string. If it is not found it returns -1. Remember to not confuse the returned -1 for reverse indexing value.

In [60]:
print(String0.find('io'))
print(String0.find('in'))

3
20


The index value returned is the index of the first element in the input data.

In [61]:
print(String0[7])

e


One can also input **find( )** function between which index values it has to search.

In [62]:
print(String0.find('j',1))
print(String0.find('j',1,3))

-1
-1


**capitalize( )** is used to capitalize the first element in the string.

In [63]:
String3 = 'observe the first letter in this sentence.'
print(String3.capitalize())

Observe the first letter in this sentence.


**center( )** is used to center align the string by specifying the field width.

In [64]:
String0.center(70)

'                       O Rio de Janeiro é lindo                       '

One can also fill the left out spaces with any other character.

In [65]:
String0.center(70,'-')

'-----------------------O Rio de Janeiro é lindo-----------------------'

**zfill( )** is used for zero padding by specifying the field width.

In [66]:
String0.zfill(30)

'000000O Rio de Janeiro é lindo'

**expandtabs( )** allows you to change the spacing of the tab character. '\t' which is by default set to 8 spaces.

In [67]:
s = 'h\te\tl\tl\to'
print(s)
print(s.expandtabs(1))
print(s.expandtabs())

h	e	l	l	o
h e l l o
h       e       l       l       o


**index( )** works the same way as **find( )** function the only difference is find returns '-1' when the input element is not found in the string but **index( )** function throws a ValueError

In [68]:
print(String0.index('Rio'))
print(String0.index('Janeiro',0))
print(String0.index('Janeiro',12,20))

2
9


ValueError: substring not found

**endswith( )** function is used to check if the given string ends with the particular char which is given as input.

In [79]:
print(String0.endswith('y'))

False


The start and stop index values can also be specified.

In [80]:
print(String0.endswith('o',0))
print(String0.endswith('R',0,3))

True
True


**join( )** function is used add a char in between the elements of the input string.

In [81]:
'a'.join('*_-')

'*a_a-'

'*_-' is the input string and char 'a' is added in between each element

**join( )** function can also be used to convert a list into a string.

In [84]:
a = list(String0)
print(a)
b = ''.join(a)
print(b)

['O', ' ', 'R', 'i', 'o', ' ', 'd', 'e', ' ', 'J', 'a', 'n', 'e', 'i', 'r', 'o', ' ', 'é', ' ', 'l', 'i', 'n', 'd', 'o']
O Rio de Janeiro é lindo


Before converting it into a string **join( )** function can be used to insert any char in between the list elements.

In [85]:
c = '/'.join(a)[33:]
print(c)

/é/ /l/i/n/d/o


**split( )** function is used to convert a string back to a list. Think of it as the opposite of the **join()** function.

In [86]:
d = c.split('/')
print(d)

['', 'é', ' ', 'l', 'i', 'n', 'd', 'o']


In **split( )** function one can also specify the number of times you want to split the string or the number of elements the new returned list should conatin. The number of elements is always one more than the specified number this is because it is split the number of times specified.

In [87]:
e = c.split('/',3)
print(e)
print(len(e))

['', 'é', ' ', 'l/i/n/d/o']
4


String Indexing and Slicing are similar to Lists which was explained in detail earlier.

In [1]:
print(a[4])
print(a[4:])

NameError: name 'a' is not defined

**lower( )** converts any capital letter to small letter.

In [88]:
print(String0)
print(String0.lower())

O Rio de Janeiro é lindo
o rio de janeiro é lindo


**upper( )** converts any small letter to capital letter.

In [89]:
String0.upper()

'O RIO DE JANEIRO É LINDO'

**replace( )** function replaces the element with another element.

In [90]:
String0.replace('O Rio de Janeiro','São Paulo')

'São Paulo é lindo'

**strip( )** function is used to delete elements from the right end and the left end which is not required.

In [91]:
f = '    hello      '

If no char is specified then it will delete all the spaces that is present in the right and left hand side of the data.

In [92]:
f.strip()

'hello'

**strip( )** function, when a char is specified then it deletes that char if it is present in the two ends of the specified string.

In [93]:
f = '   ***----hello---*******     '

In [94]:
f.strip('*')

'   ***----hello---*******     '

The asterisk had to be deleted but is not. This is because there is a space in both the right and left hand side. So in strip function. The characters need to be inputted in the specific order in which they are present.

In [95]:
print(f.strip(' *'))
print(f.strip(' *-'))

----hello---
hello


**lstrip( )** and **rstrip( )** function have the same functionality as strip function but the only difference is **lstrip( )** deletes only towards the left side and **rstrip( )** towards the right.

In [96]:
print(f.lstrip(' *'))
print(f.rstrip(' *'))

----hello---*******     
   ***----hello---


## Advanced string processing
For more advanced string processing there are many libraries available in Python including for example:
* **re** for regular expression based searching and splitting of strings
* **html** for manipulating HTML format text
* **textwrap** for reformatting ASCII text
* ... and many more